or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

client-management.mdcombined-analysis.mdcontent-moderation.mdentity-analysis.mdentity-sentiment-analysis.mdindex.mdsentiment-analysis.mdsyntax-analysis.mdtext-classification.md

text-classification.mddocs/

0

# Text Classification

1

2

Categorizes text documents into predefined classification categories, enabling automated content organization and filtering based on subject matter and themes. The classification system can identify topics, genres, and content types to help with content management, routing, and analysis at scale.

3

4

## Capabilities

5

6

### Classify Text

7

8

Analyzes the provided text and assigns it to relevant predefined categories with confidence scores.

9

10

```python { .api }

11

def classify_text(

12

self,

13

request: Optional[Union[ClassifyTextRequest, dict]] = None,

14

*,

15

document: Optional[Document] = None,

16

retry: OptionalRetry = gapic_v1.method.DEFAULT,

17

timeout: Union[float, object] = gapic_v1.method.DEFAULT,

18

metadata: Sequence[Tuple[str, Union[str, bytes]]] = ()

19

) -> ClassifyTextResponse:

20

"""

21

Classifies a document into categories.

22

23

Args:

24

request: The request object containing document and options

25

document: Input document for classification

26

retry: Retry configuration for the request

27

timeout: Request timeout in seconds

28

metadata: Additional metadata to send with the request

29

30

Returns:

31

ClassifyTextResponse containing classification categories

32

"""

33

```

34

35

#### Usage Example

36

37

```python

38

from google.cloud import language

39

40

# Initialize client

41

client = language.LanguageServiceClient()

42

43

# Create document

44

document = language.Document(

45

content="""

46

The latest advancements in artificial intelligence and machine learning

47

are revolutionizing how we approach data analysis and predictive modeling.

48

Neural networks and deep learning algorithms are becoming increasingly

49

sophisticated, enabling more accurate predictions and insights from

50

complex datasets.

51

""",

52

type_=language.Document.Type.PLAIN_TEXT

53

)

54

55

# Classify text

56

response = client.classify_text(

57

request={"document": document}

58

)

59

60

# Process classification results

61

print("Classification Results:")

62

for category in response.categories:

63

print(f"Category: {category.name}")

64

print(f"Confidence: {category.confidence:.3f}")

65

print()

66

```

67

68

## Request and Response Types

69

70

### ClassifyTextRequest

71

72

```python { .api }

73

class ClassifyTextRequest:

74

document: Document

75

classification_model_options: ClassificationModelOptions # v1/v1beta2 only

76

```

77

78

### ClassifyTextResponse

79

80

```python { .api }

81

class ClassifyTextResponse:

82

categories: MutableSequence[ClassificationCategory]

83

```

84

85

## Supporting Types

86

87

### ClassificationCategory

88

89

Represents a classification category with confidence score.

90

91

```python { .api }

92

class ClassificationCategory:

93

name: str # Category name (hierarchical path)

94

confidence: float # Confidence score [0.0, 1.0]

95

```

96

97

### ClassificationModelOptions (v1/v1beta2 only)

98

99

Configuration options for the classification model.

100

101

```python { .api }

102

class ClassificationModelOptions:

103

class V1Model(proto.Message):

104

pass

105

106

class V2Model(proto.Message):

107

pass

108

109

v1_model: V1Model # Use V1 classification model

110

v2_model: V2Model # Use V2 classification model

111

```

112

113

## Category Hierarchy

114

115

Classification categories follow a hierarchical structure using forward slashes:

116

117

### Common Top-Level Categories

118

119

- `/Arts & Entertainment`

120

- `/Autos & Vehicles`

121

- `/Beauty & Fitness`

122

- `/Books & Literature`

123

- `/Business & Industrial`

124

- `/Computers & Electronics`

125

- `/Finance`

126

- `/Food & Drink`

127

- `/Games`

128

- `/Health`

129

- `/Hobbies & Leisure`

130

- `/Home & Garden`

131

- `/Internet & Telecom`

132

- `/Jobs & Education`

133

- `/Law & Government`

134

- `/News`

135

- `/Online Communities`

136

- `/People & Society`

137

- `/Pets & Animals`

138

- `/Real Estate`

139

- `/Reference`

140

- `/Science`

141

- `/Shopping`

142

- `/Sports`

143

- `/Travel`

144

145

### Hierarchical Examples

146

147

- `/Computers & Electronics/Software`

148

- `/Computers & Electronics/Software/Business Software`

149

- `/Arts & Entertainment/Movies`

150

- `/Arts & Entertainment/Music & Audio`

151

- `/Science/Computer Science`

152

- `/Business & Industrial/Advertising & Marketing`

153

154

## Advanced Usage

155

156

### Multi-Category Classification

157

158

```python

159

def classify_and_rank_categories(client, text, min_confidence=0.1):

160

"""Classify text and rank all categories above threshold."""

161

document = language.Document(

162

content=text,

163

type_=language.Document.Type.PLAIN_TEXT

164

)

165

166

response = client.classify_text(

167

request={"document": document}

168

)

169

170

# Filter and sort categories

171

filtered_categories = [

172

cat for cat in response.categories

173

if cat.confidence >= min_confidence

174

]

175

176

sorted_categories = sorted(

177

filtered_categories,

178

key=lambda x: x.confidence,

179

reverse=True

180

)

181

182

return sorted_categories

183

184

# Usage

185

text = """

186

Machine learning algorithms are transforming healthcare by enabling

187

early disease detection through medical imaging analysis. Artificial

188

intelligence systems can now identify patterns in X-rays, MRIs, and

189

CT scans that might be missed by human radiologists.

190

"""

191

192

categories = classify_and_rank_categories(client, text, min_confidence=0.1)

193

194

print("All Categories (above 10% confidence):")

195

for cat in categories:

196

print(f"{cat.name}: {cat.confidence:.3f}")

197

```

198

199

### Batch Classification

200

201

```python

202

def classify_multiple_documents(client, documents):

203

"""Classify multiple documents and return aggregated results."""

204

results = []

205

206

for i, doc_text in enumerate(documents):

207

document = language.Document(

208

content=doc_text,

209

type_=language.Document.Type.PLAIN_TEXT

210

)

211

212

try:

213

response = client.classify_text(

214

request={"document": document}

215

)

216

217

doc_categories = []

218

for category in response.categories:

219

doc_categories.append({

220

'name': category.name,

221

'confidence': category.confidence

222

})

223

224

results.append({

225

'document_index': i,

226

'text_preview': doc_text[:100] + "..." if len(doc_text) > 100 else doc_text,

227

'categories': doc_categories

228

})

229

230

except Exception as e:

231

results.append({

232

'document_index': i,

233

'text_preview': doc_text[:100] + "..." if len(doc_text) > 100 else doc_text,

234

'error': str(e),

235

'categories': []

236

})

237

238

return results

239

240

# Usage

241

documents = [

242

"Stock market analysis and investment strategies for portfolio management.",

243

"Latest updates in artificial intelligence and machine learning research.",

244

"Healthy cooking recipes for vegetarian and vegan diets.",

245

"Professional basketball game highlights and player statistics."

246

]

247

248

batch_results = classify_multiple_documents(client, documents)

249

250

for result in batch_results:

251

print(f"Document {result['document_index']}: {result['text_preview']}")

252

if 'error' in result:

253

print(f" Error: {result['error']}")

254

else:

255

for cat in result['categories']:

256

print(f" {cat['name']}: {cat['confidence']:.3f}")

257

print()

258

```

259

260

### Category Filtering and Grouping

261

262

```python

263

def group_by_top_level_category(categories):

264

"""Group categories by their top-level parent."""

265

grouped = {}

266

267

for category in categories:

268

# Extract top-level category

269

parts = category.name.split('/')

270

top_level = '/' + parts[1] if len(parts) > 1 else category.name

271

272

if top_level not in grouped:

273

grouped[top_level] = []

274

275

grouped[top_level].append(category)

276

277

return grouped

278

279

def get_most_specific_categories(categories, max_categories=3):

280

"""Get the most specific (deepest) categories with highest confidence."""

281

# Sort by depth (number of slashes) and confidence

282

sorted_cats = sorted(

283

categories,

284

key=lambda x: (x.name.count('/'), x.confidence),

285

reverse=True

286

)

287

288

return sorted_cats[:max_categories]

289

290

# Usage

291

response = client.classify_text(request={"document": document})

292

293

# Group by top-level category

294

grouped_categories = group_by_top_level_category(response.categories)

295

296

print("Categories grouped by top-level:")

297

for top_level, cats in grouped_categories.items():

298

print(f"{top_level}:")

299

for cat in cats:

300

print(f" {cat.name}: {cat.confidence:.3f}")

301

print()

302

303

# Get most specific categories

304

specific_categories = get_most_specific_categories(response.categories)

305

306

print("Most specific categories:")

307

for cat in specific_categories:

308

depth = cat.name.count('/')

309

print(f"{cat.name} (depth: {depth}): {cat.confidence:.3f}")

310

```

311

312

### Content Organization System

313

314

```python

315

class ContentOrganizer:

316

def __init__(self, client):

317

self.client = client

318

self.category_mapping = {

319

'technology': ['/Computers & Electronics', '/Science'],

320

'business': ['/Business & Industrial', '/Finance'],

321

'entertainment': ['/Arts & Entertainment', '/Games'],

322

'health': ['/Health', '/Beauty & Fitness'],

323

'lifestyle': ['/Home & Garden', '/Food & Drink', '/Hobbies & Leisure'],

324

'news': ['/News', '/Law & Government'],

325

'education': ['/Jobs & Education', '/Reference', '/Books & Literature'],

326

'travel': ['/Travel'],

327

'sports': ['/Sports'],

328

'other': [] # Catch-all for unmatched categories

329

}

330

331

def organize_content(self, text):

332

"""Organize content into predefined buckets."""

333

document = language.Document(

334

content=text,

335

type_=language.Document.Type.PLAIN_TEXT

336

)

337

338

response = self.client.classify_text(

339

request={"document": document}

340

)

341

342

if not response.categories:

343

return 'other', []

344

345

# Find best matching bucket

346

best_bucket = 'other'

347

best_confidence = 0

348

matched_categories = []

349

350

for category in response.categories:

351

for bucket, prefixes in self.category_mapping.items():

352

for prefix in prefixes:

353

if category.name.startswith(prefix):

354

if category.confidence > best_confidence:

355

best_bucket = bucket

356

best_confidence = category.confidence

357

matched_categories.append({

358

'bucket': bucket,

359

'category': category.name,

360

'confidence': category.confidence

361

})

362

break

363

364

return best_bucket, matched_categories

365

366

def get_bucket_statistics(self, texts):

367

"""Get distribution of texts across buckets."""

368

bucket_counts = {bucket: 0 for bucket in self.category_mapping.keys()}

369

bucket_examples = {bucket: [] for bucket in self.category_mapping.keys()}

370

371

for text in texts:

372

bucket, categories = self.organize_content(text)

373

bucket_counts[bucket] += 1

374

375

if len(bucket_examples[bucket]) < 3: # Store up to 3 examples

376

bucket_examples[bucket].append({

377

'text': text[:50] + "..." if len(text) > 50 else text,

378

'categories': categories

379

})

380

381

return bucket_counts, bucket_examples

382

383

# Usage

384

organizer = ContentOrganizer(client)

385

386

sample_texts = [

387

"Latest developments in quantum computing and artificial intelligence.",

388

"Investment strategies for stock market volatility and portfolio management.",

389

"Delicious pasta recipes with organic ingredients and wine pairings.",

390

"Professional soccer match analysis and player performance statistics.",

391

"Breaking news about government policy changes and legal implications."

392

]

393

394

bucket_counts, bucket_examples = organizer.get_bucket_statistics(sample_texts)

395

396

print("Content Distribution:")

397

for bucket, count in bucket_counts.items():

398

if count > 0:

399

print(f"{bucket}: {count} documents")

400

for example in bucket_examples[bucket]:

401

print(f" - {example['text']}")

402

```

403

404

### Model Selection (v1/v1beta2 only)

405

406

```python

407

def classify_with_specific_model(client, text, model_version='v2'):

408

"""Classify text using a specific model version."""

409

document = language_v1.Document(

410

content=text,

411

type_=language_v1.Document.Type.PLAIN_TEXT

412

)

413

414

# Configure model options

415

if model_version == 'v1':

416

model_options = language_v1.ClassificationModelOptions(

417

v1_model=language_v1.ClassificationModelOptions.V1Model()

418

)

419

else: # v2

420

model_options = language_v1.ClassificationModelOptions(

421

v2_model=language_v1.ClassificationModelOptions.V2Model()

422

)

423

424

response = client.classify_text(

425

request={

426

"document": document,

427

"classification_model_options": model_options

428

}

429

)

430

431

return response.categories

432

433

# Usage (only with v1/v1beta2 clients)

434

# v1_categories = classify_with_specific_model(client, text, 'v1')

435

# v2_categories = classify_with_specific_model(client, text, 'v2')

436

```

437

438

### Confidence Threshold Analysis

439

440

```python

441

def analyze_classification_confidence(client, texts, thresholds=[0.1, 0.3, 0.5, 0.7]):

442

"""Analyze how classification results vary with different confidence thresholds."""

443

results = {}

444

445

for threshold in thresholds:

446

results[threshold] = {

447

'classified_count': 0,

448

'unclassified_count': 0,

449

'avg_categories_per_doc': 0,

450

'total_categories': 0

451

}

452

453

for text in texts:

454

document = language.Document(

455

content=text,

456

type_=language.Document.Type.PLAIN_TEXT

457

)

458

459

try:

460

response = client.classify_text(

461

request={"document": document}

462

)

463

464

for threshold in thresholds:

465

filtered_categories = [

466

cat for cat in response.categories

467

if cat.confidence >= threshold

468

]

469

470

if filtered_categories:

471

results[threshold]['classified_count'] += 1

472

results[threshold]['total_categories'] += len(filtered_categories)

473

else:

474

results[threshold]['unclassified_count'] += 1

475

476

except Exception:

477

for threshold in thresholds:

478

results[threshold]['unclassified_count'] += 1

479

480

# Calculate averages

481

for threshold in thresholds:

482

classified = results[threshold]['classified_count']

483

if classified > 0:

484

results[threshold]['avg_categories_per_doc'] = (

485

results[threshold]['total_categories'] / classified

486

)

487

488

return results

489

490

# Usage

491

texts = [

492

"Advanced machine learning techniques for predictive analytics.",

493

"Gourmet cooking with seasonal vegetables and herbs.",

494

"Financial planning strategies for retirement savings.",

495

"Professional basketball playoffs and championship predictions."

496

]

497

498

confidence_analysis = analyze_classification_confidence(client, texts)

499

500

print("Classification Analysis by Confidence Threshold:")

501

for threshold, stats in confidence_analysis.items():

502

print(f"Threshold {threshold}:")

503

print(f" Classified: {stats['classified_count']}")

504

print(f" Unclassified: {stats['unclassified_count']}")

505

print(f" Avg categories per doc: {stats['avg_categories_per_doc']:.2f}")

506

print()

507

```

508

509

## Error Handling

510

511

```python

512

from google.api_core import exceptions

513

514

try:

515

response = client.classify_text(

516

request={"document": document},

517

timeout=15.0

518

)

519

except exceptions.InvalidArgument as e:

520

print(f"Invalid document: {e}")

521

# Common causes: empty document, unsupported language, insufficient content

522

except exceptions.ResourceExhausted:

523

print("API quota exceeded")

524

except exceptions.DeadlineExceeded:

525

print("Request timed out")

526

except exceptions.GoogleAPIError as e:

527

print(f"API error: {e}")

528

529

# Handle no classification results

530

if not response.categories:

531

print("No classification categories found - document may be too short or ambiguous")

532

```

533

534

## Performance Considerations

535

536

- **Text Length**: Requires sufficient text (typically 20+ words) for accurate classification

537

- **Content Quality**: Better results with well-written, focused content

538

- **Language Support**: Optimized for English, with varying support for other languages

539

- **Caching**: Results can be cached for static content

540

- **Batch Processing**: Use async client for large document sets

541

542

## Use Cases

543

544

- **Content Management**: Automatically organize articles, documents, and web content

545

- **Email Routing**: Route support emails to appropriate departments

546

- **News Categorization**: Classify news articles by topic and theme

547

- **Product Categorization**: Organize product descriptions and reviews

548

- **Social Media Monitoring**: Categorize social media posts and comments

549

- **Document Archival**: Organize large document repositories

550

- **Content Recommendation**: Suggest related content based on categories

551

- **Compliance Filtering**: Filter content for regulatory compliance