Tessl Tile for pypi/google-cloud-language@2.17.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

client-management.md combined-analysis.md content-moderation.md entity-analysis.md entity-sentiment-analysis.md index.md sentiment-analysis.md syntax-analysis.md text-classification.md

text-classification.mddocs/

0
# Text Classification
1

2
Categorizes text documents into predefined classification categories, enabling automated content organization and filtering based on subject matter and themes. The classification system can identify topics, genres, and content types to help with content management, routing, and analysis at scale.
3

4
## Capabilities
5

6
### Classify Text
7

8
Analyzes the provided text and assigns it to relevant predefined categories with confidence scores.
9

10
```python { .api }
11
def classify_text(
12
    self,
13
    request: Optional[Union[ClassifyTextRequest, dict]] = None,
14
    *,
15
    document: Optional[Document] = None,
16
    retry: OptionalRetry = gapic_v1.method.DEFAULT,
17
    timeout: Union[float, object] = gapic_v1.method.DEFAULT,
18
    metadata: Sequence[Tuple[str, Union[str, bytes]]] = ()
19
) -> ClassifyTextResponse:
20
    """
21
    Classifies a document into categories.
22
    
23
    Args:
24
        request: The request object containing document and options
25
        document: Input document for classification
26
        retry: Retry configuration for the request
27
        timeout: Request timeout in seconds
28
        metadata: Additional metadata to send with the request
29
        
30
    Returns:
31
        ClassifyTextResponse containing classification categories
32
    """
33
```
34

35
#### Usage Example
36

37
```python
38
from google.cloud import language
39

40
# Initialize client
41
client = language.LanguageServiceClient()
42

43
# Create document
44
document = language.Document(
45
    content="""
46
    The latest advancements in artificial intelligence and machine learning 
47
    are revolutionizing how we approach data analysis and predictive modeling. 
48
    Neural networks and deep learning algorithms are becoming increasingly 
49
    sophisticated, enabling more accurate predictions and insights from 
50
    complex datasets.
51
    """,
52
    type_=language.Document.Type.PLAIN_TEXT
53
)
54

55
# Classify text
56
response = client.classify_text(
57
    request={"document": document}
58
)
59

60
# Process classification results
61
print("Classification Results:")
62
for category in response.categories:
63
    print(f"Category: {category.name}")
64
    print(f"Confidence: {category.confidence:.3f}")
65
    print()
66
```
67

68
## Request and Response Types
69

70
### ClassifyTextRequest
71

72
```python { .api }
73
class ClassifyTextRequest:
74
    document: Document
75
    classification_model_options: ClassificationModelOptions  # v1/v1beta2 only
76
```
77

78
### ClassifyTextResponse
79

80
```python { .api }
81
class ClassifyTextResponse:
82
    categories: MutableSequence[ClassificationCategory]
83
```
84

85
## Supporting Types
86

87
### ClassificationCategory
88

89
Represents a classification category with confidence score.
90

91
```python { .api }
92
class ClassificationCategory:
93
    name: str         # Category name (hierarchical path)
94
    confidence: float # Confidence score [0.0, 1.0]
95
```
96

97
### ClassificationModelOptions (v1/v1beta2 only)
98

99
Configuration options for the classification model.
100

101
```python { .api }
102
class ClassificationModelOptions:
103
    class V1Model(proto.Message):
104
        pass
105
    
106
    class V2Model(proto.Message):
107
        pass
108
    
109
    v1_model: V1Model  # Use V1 classification model
110
    v2_model: V2Model  # Use V2 classification model
111
```
112

113
## Category Hierarchy
114

115
Classification categories follow a hierarchical structure using forward slashes:
116

117
### Common Top-Level Categories
118

119
- `/Arts & Entertainment`
120
- `/Autos & Vehicles`
121
- `/Beauty & Fitness`
122
- `/Books & Literature`
123
- `/Business & Industrial`
124
- `/Computers & Electronics`
125
- `/Finance`
126
- `/Food & Drink`
127
- `/Games`
128
- `/Health`
129
- `/Hobbies & Leisure`
130
- `/Home & Garden`
131
- `/Internet & Telecom`
132
- `/Jobs & Education`
133
- `/Law & Government`
134
- `/News`
135
- `/Online Communities`
136
- `/People & Society`
137
- `/Pets & Animals`
138
- `/Real Estate`
139
- `/Reference`
140
- `/Science`
141
- `/Shopping`
142
- `/Sports`
143
- `/Travel`
144

145
### Hierarchical Examples
146

147
- `/Computers & Electronics/Software`
148
- `/Computers & Electronics/Software/Business Software`
149
- `/Arts & Entertainment/Movies`
150
- `/Arts & Entertainment/Music & Audio`
151
- `/Science/Computer Science`
152
- `/Business & Industrial/Advertising & Marketing`
153

154
## Advanced Usage
155

156
### Multi-Category Classification
157

158
```python
159
def classify_and_rank_categories(client, text, min_confidence=0.1):
160
    """Classify text and rank all categories above threshold."""
161
    document = language.Document(
162
        content=text,
163
        type_=language.Document.Type.PLAIN_TEXT
164
    )
165
    
166
    response = client.classify_text(
167
        request={"document": document}
168
    )
169
    
170
    # Filter and sort categories
171
    filtered_categories = [
172
        cat for cat in response.categories 
173
        if cat.confidence >= min_confidence
174
    ]
175
    
176
    sorted_categories = sorted(
177
        filtered_categories, 
178
        key=lambda x: x.confidence, 
179
        reverse=True
180
    )
181
    
182
    return sorted_categories
183

184
# Usage
185
text = """
186
Machine learning algorithms are transforming healthcare by enabling 
187
early disease detection through medical imaging analysis. Artificial 
188
intelligence systems can now identify patterns in X-rays, MRIs, and 
189
CT scans that might be missed by human radiologists.
190
"""
191

192
categories = classify_and_rank_categories(client, text, min_confidence=0.1)
193

194
print("All Categories (above 10% confidence):")
195
for cat in categories:
196
    print(f"{cat.name}: {cat.confidence:.3f}")
197
```
198

199
### Batch Classification
200

201
```python
202
def classify_multiple_documents(client, documents):
203
    """Classify multiple documents and return aggregated results."""
204
    results = []
205
    
206
    for i, doc_text in enumerate(documents):
207
        document = language.Document(
208
            content=doc_text,
209
            type_=language.Document.Type.PLAIN_TEXT
210
        )
211
        
212
        try:
213
            response = client.classify_text(
214
                request={"document": document}
215
            )
216
            
217
            doc_categories = []
218
            for category in response.categories:
219
                doc_categories.append({
220
                    'name': category.name,
221
                    'confidence': category.confidence
222
                })
223
            
224
            results.append({
225
                'document_index': i,
226
                'text_preview': doc_text[:100] + "..." if len(doc_text) > 100 else doc_text,
227
                'categories': doc_categories
228
            })
229
            
230
        except Exception as e:
231
            results.append({
232
                'document_index': i,
233
                'text_preview': doc_text[:100] + "..." if len(doc_text) > 100 else doc_text,
234
                'error': str(e),
235
                'categories': []
236
            })
237
    
238
    return results
239

240
# Usage
241
documents = [
242
    "Stock market analysis and investment strategies for portfolio management.",
243
    "Latest updates in artificial intelligence and machine learning research.",
244
    "Healthy cooking recipes for vegetarian and vegan diets.",
245
    "Professional basketball game highlights and player statistics."
246
]
247

248
batch_results = classify_multiple_documents(client, documents)
249

250
for result in batch_results:
251
    print(f"Document {result['document_index']}: {result['text_preview']}")
252
    if 'error' in result:
253
        print(f"  Error: {result['error']}")
254
    else:
255
        for cat in result['categories']:
256
            print(f"  {cat['name']}: {cat['confidence']:.3f}")
257
    print()
258
```
259

260
### Category Filtering and Grouping
261

262
```python
263
def group_by_top_level_category(categories):
264
    """Group categories by their top-level parent."""
265
    grouped = {}
266
    
267
    for category in categories:
268
        # Extract top-level category
269
        parts = category.name.split('/')
270
        top_level = '/' + parts[1] if len(parts) > 1 else category.name
271
        
272
        if top_level not in grouped:
273
            grouped[top_level] = []
274
        
275
        grouped[top_level].append(category)
276
    
277
    return grouped
278

279
def get_most_specific_categories(categories, max_categories=3):
280
    """Get the most specific (deepest) categories with highest confidence."""
281
    # Sort by depth (number of slashes) and confidence
282
    sorted_cats = sorted(
283
        categories,
284
        key=lambda x: (x.name.count('/'), x.confidence),
285
        reverse=True
286
    )
287
    
288
    return sorted_cats[:max_categories]
289

290
# Usage
291
response = client.classify_text(request={"document": document})
292

293
# Group by top-level category
294
grouped_categories = group_by_top_level_category(response.categories)
295

296
print("Categories grouped by top-level:")
297
for top_level, cats in grouped_categories.items():
298
    print(f"{top_level}:")
299
    for cat in cats:
300
        print(f"  {cat.name}: {cat.confidence:.3f}")
301
    print()
302

303
# Get most specific categories
304
specific_categories = get_most_specific_categories(response.categories)
305

306
print("Most specific categories:")
307
for cat in specific_categories:
308
    depth = cat.name.count('/')
309
    print(f"{cat.name} (depth: {depth}): {cat.confidence:.3f}")
310
```
311

312
### Content Organization System
313

314
```python
315
class ContentOrganizer:
316
    def __init__(self, client):
317
        self.client = client
318
        self.category_mapping = {
319
            'technology': ['/Computers & Electronics', '/Science'],
320
            'business': ['/Business & Industrial', '/Finance'],
321
            'entertainment': ['/Arts & Entertainment', '/Games'],
322
            'health': ['/Health', '/Beauty & Fitness'],
323
            'lifestyle': ['/Home & Garden', '/Food & Drink', '/Hobbies & Leisure'],
324
            'news': ['/News', '/Law & Government'],
325
            'education': ['/Jobs & Education', '/Reference', '/Books & Literature'],
326
            'travel': ['/Travel'],
327
            'sports': ['/Sports'],
328
            'other': []  # Catch-all for unmatched categories
329
        }
330
    
331
    def organize_content(self, text):
332
        """Organize content into predefined buckets."""
333
        document = language.Document(
334
            content=text,
335
            type_=language.Document.Type.PLAIN_TEXT
336
        )
337
        
338
        response = self.client.classify_text(
339
            request={"document": document}
340
        )
341
        
342
        if not response.categories:
343
            return 'other', []
344
        
345
        # Find best matching bucket
346
        best_bucket = 'other'
347
        best_confidence = 0
348
        matched_categories = []
349
        
350
        for category in response.categories:
351
            for bucket, prefixes in self.category_mapping.items():
352
                for prefix in prefixes:
353
                    if category.name.startswith(prefix):
354
                        if category.confidence > best_confidence:
355
                            best_bucket = bucket
356
                            best_confidence = category.confidence
357
                        matched_categories.append({
358
                            'bucket': bucket,
359
                            'category': category.name,
360
                            'confidence': category.confidence
361
                        })
362
                        break
363
        
364
        return best_bucket, matched_categories
365
    
366
    def get_bucket_statistics(self, texts):
367
        """Get distribution of texts across buckets."""
368
        bucket_counts = {bucket: 0 for bucket in self.category_mapping.keys()}
369
        bucket_examples = {bucket: [] for bucket in self.category_mapping.keys()}
370
        
371
        for text in texts:
372
            bucket, categories = self.organize_content(text)
373
            bucket_counts[bucket] += 1
374
            
375
            if len(bucket_examples[bucket]) < 3:  # Store up to 3 examples
376
                bucket_examples[bucket].append({
377
                    'text': text[:50] + "..." if len(text) > 50 else text,
378
                    'categories': categories
379
                })
380
        
381
        return bucket_counts, bucket_examples
382

383
# Usage
384
organizer = ContentOrganizer(client)
385

386
sample_texts = [
387
    "Latest developments in quantum computing and artificial intelligence.",
388
    "Investment strategies for stock market volatility and portfolio management.",
389
    "Delicious pasta recipes with organic ingredients and wine pairings.",
390
    "Professional soccer match analysis and player performance statistics.",
391
    "Breaking news about government policy changes and legal implications."
392
]
393

394
bucket_counts, bucket_examples = organizer.get_bucket_statistics(sample_texts)
395

396
print("Content Distribution:")
397
for bucket, count in bucket_counts.items():
398
    if count > 0:
399
        print(f"{bucket}: {count} documents")
400
        for example in bucket_examples[bucket]:
401
            print(f"  - {example['text']}")
402
```
403

404
### Model Selection (v1/v1beta2 only)
405

406
```python
407
def classify_with_specific_model(client, text, model_version='v2'):
408
    """Classify text using a specific model version."""    
409
    document = language_v1.Document(
410
        content=text,
411
        type_=language_v1.Document.Type.PLAIN_TEXT
412
    )
413
    
414
    # Configure model options
415
    if model_version == 'v1':
416
        model_options = language_v1.ClassificationModelOptions(
417
            v1_model=language_v1.ClassificationModelOptions.V1Model()
418
        )
419
    else:  # v2
420
        model_options = language_v1.ClassificationModelOptions(
421
            v2_model=language_v1.ClassificationModelOptions.V2Model()
422
        )
423
    
424
    response = client.classify_text(
425
        request={
426
            "document": document,
427
            "classification_model_options": model_options
428
        }
429
    )
430
    
431
    return response.categories
432

433
# Usage (only with v1/v1beta2 clients)
434
# v1_categories = classify_with_specific_model(client, text, 'v1')
435
# v2_categories = classify_with_specific_model(client, text, 'v2')
436
```
437

438
### Confidence Threshold Analysis
439

440
```python
441
def analyze_classification_confidence(client, texts, thresholds=[0.1, 0.3, 0.5, 0.7]):
442
    """Analyze how classification results vary with different confidence thresholds."""
443
    results = {}
444
    
445
    for threshold in thresholds:
446
        results[threshold] = {
447
            'classified_count': 0,
448
            'unclassified_count': 0,
449
            'avg_categories_per_doc': 0,
450
            'total_categories': 0
451
        }
452
    
453
    for text in texts:
454
        document = language.Document(
455
            content=text,
456
            type_=language.Document.Type.PLAIN_TEXT
457
        )
458
        
459
        try:
460
            response = client.classify_text(
461
                request={"document": document}
462
            )
463
            
464
            for threshold in thresholds:
465
                filtered_categories = [
466
                    cat for cat in response.categories 
467
                    if cat.confidence >= threshold
468
                ]
469
                
470
                if filtered_categories:
471
                    results[threshold]['classified_count'] += 1
472
                    results[threshold]['total_categories'] += len(filtered_categories)
473
                else:
474
                    results[threshold]['unclassified_count'] += 1
475
        
476
        except Exception:
477
            for threshold in thresholds:
478
                results[threshold]['unclassified_count'] += 1
479
    
480
    # Calculate averages
481
    for threshold in thresholds:
482
        classified = results[threshold]['classified_count']
483
        if classified > 0:
484
            results[threshold]['avg_categories_per_doc'] = (
485
                results[threshold]['total_categories'] / classified
486
            )
487
    
488
    return results
489

490
# Usage
491
texts = [
492
    "Advanced machine learning techniques for predictive analytics.",
493
    "Gourmet cooking with seasonal vegetables and herbs.",
494
    "Financial planning strategies for retirement savings.",
495
    "Professional basketball playoffs and championship predictions."
496
]
497

498
confidence_analysis = analyze_classification_confidence(client, texts)
499

500
print("Classification Analysis by Confidence Threshold:")
501
for threshold, stats in confidence_analysis.items():
502
    print(f"Threshold {threshold}:")
503
    print(f"  Classified: {stats['classified_count']}")
504
    print(f"  Unclassified: {stats['unclassified_count']}")
505
    print(f"  Avg categories per doc: {stats['avg_categories_per_doc']:.2f}")
506
    print()
507
```
508

509
## Error Handling
510

511
```python
512
from google.api_core import exceptions
513

514
try:
515
    response = client.classify_text(
516
        request={"document": document},
517
        timeout=15.0
518
    )
519
except exceptions.InvalidArgument as e:
520
    print(f"Invalid document: {e}")
521
    # Common causes: empty document, unsupported language, insufficient content
522
except exceptions.ResourceExhausted:
523
    print("API quota exceeded")
524
except exceptions.DeadlineExceeded:
525
    print("Request timed out")
526
except exceptions.GoogleAPIError as e:
527
    print(f"API error: {e}")
528

529
# Handle no classification results
530
if not response.categories:
531
    print("No classification categories found - document may be too short or ambiguous")
532
```
533

534
## Performance Considerations
535

536
- **Text Length**: Requires sufficient text (typically 20+ words) for accurate classification
537
- **Content Quality**: Better results with well-written, focused content
538
- **Language Support**: Optimized for English, with varying support for other languages
539
- **Caching**: Results can be cached for static content
540
- **Batch Processing**: Use async client for large document sets
541

542
## Use Cases
543

544
- **Content Management**: Automatically organize articles, documents, and web content
545
- **Email Routing**: Route support emails to appropriate departments
546
- **News Categorization**: Classify news articles by topic and theme
547
- **Product Categorization**: Organize product descriptions and reviews
548
- **Social Media Monitoring**: Categorize social media posts and comments
549
- **Document Archival**: Organize large document repositories
550
- **Content Recommendation**: Suggest related content based on categories
551
- **Compliance Filtering**: Filter content for regulatory compliance

Version

Tile

Files

text-classification.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

text-classification.mddocs/