Tessl Tile for pypi/google-cloud-language@2.17.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

client-management.md combined-analysis.md content-moderation.md entity-analysis.md entity-sentiment-analysis.md index.md sentiment-analysis.md syntax-analysis.md text-classification.md

entity-analysis.mddocs/

0
# Entity Analysis
1

2
Identifies and extracts named entities (people, places, organizations, events, etc.) from text, providing detailed information about each entity including type classification, salience scores, and mention locations within the text. Entity analysis is essential for information extraction, content understanding, and knowledge graph construction.
3

4
## Capabilities
5

6
### Analyze Entities
7

8
Identifies named entities in the provided text and returns detailed information about each entity found.
9

10
```python { .api }
11
def analyze_entities(
12
    self,
13
    request: Optional[Union[AnalyzeEntitiesRequest, dict]] = None,
14
    *,
15
    document: Optional[Document] = None,
16
    encoding_type: Optional[EncodingType] = None,
17
    retry: OptionalRetry = gapic_v1.method.DEFAULT,
18
    timeout: Union[float, object] = gapic_v1.method.DEFAULT,
19
    metadata: Sequence[Tuple[str, Union[str, bytes]]] = ()
20
) -> AnalyzeEntitiesResponse:
21
    """
22
    Finds named entities in the text and returns information about them.
23
    
24
    Args:
25
        request: The request object containing document and options
26
        document: Input document for analysis
27
        encoding_type: Text encoding type for offset calculations
28
        retry: Retry configuration for the request
29
        timeout: Request timeout in seconds
30
        metadata: Additional metadata to send with the request
31
        
32
    Returns:
33
        AnalyzeEntitiesResponse containing found entities and metadata
34
    """
35
```
36

37
#### Usage Example
38

39
```python
40
from google.cloud import language
41

42
# Initialize client
43
client = language.LanguageServiceClient()
44

45
# Create document
46
document = language.Document(
47
    content="Google was founded by Larry Page and Sergey Brin in Mountain View, California.",
48
    type_=language.Document.Type.PLAIN_TEXT
49
)
50

51
# Analyze entities
52
response = client.analyze_entities(
53
    request={"document": document}
54
)
55

56
# Process entities
57
for entity in response.entities:
58
    print(f"Entity: {entity.name}")
59
    print(f"Type: {entity.type_.name}")
60
    print(f"Salience: {entity.salience}")
61
    print(f"Metadata: {dict(entity.metadata)}")
62
    
63
    # Print mentions
64
    for mention in entity.mentions:
65
        print(f"  Mention: '{mention.text.content}' ({mention.type_.name})")
66
    print()
67
```
68

69
## Request and Response Types
70

71
### AnalyzeEntitiesRequest
72

73
```python { .api }
74
class AnalyzeEntitiesRequest:
75
    document: Document
76
    encoding_type: EncodingType
77
```
78

79
### AnalyzeEntitiesResponse
80

81
```python { .api }
82
class AnalyzeEntitiesResponse:
83
    entities: MutableSequence[Entity]
84
    language: str
85
```
86

87
## Supporting Types
88

89
### Entity
90

91
Represents a named entity found in the text with comprehensive metadata.
92

93
```python { .api }
94
class Entity:
95
    class Type(proto.Enum):
96
        UNKNOWN = 0
97
        PERSON = 1
98
        LOCATION = 2
99
        ORGANIZATION = 3
100
        EVENT = 4
101
        WORK_OF_ART = 5
102
        CONSUMER_GOOD = 6
103
        OTHER = 7
104
        PHONE_NUMBER = 9
105
        ADDRESS = 10
106
        DATE = 11
107
        NUMBER = 12
108
        PRICE = 13
109
    
110
    name: str                                    # Canonical name of the entity
111
    type_: Type                                 # Entity type classification
112
    metadata: MutableMapping[str, str]          # Additional metadata (Wikipedia URL, etc.)
113
    salience: float                            # Salience/importance score [0.0, 1.0]
114
    mentions: MutableSequence[EntityMention]    # All mentions of this entity in text
115
    sentiment: Sentiment                        # Overall sentiment for this entity (v1/v1beta2 only)
116
```
117

118
**Entity Types:**
119
- **PERSON**: People, including fictional characters
120
- **LOCATION**: Physical locations (cities, countries, landmarks)
121
- **ORGANIZATION**: Companies, institutions, teams, groups
122
- **EVENT**: Named events (conferences, wars, sports events)
123
- **WORK_OF_ART**: Titles of books, movies, songs, etc.
124
- **CONSUMER_GOOD**: Products, brands, models
125
- **OTHER**: Entities that don't fit other categories
126
- **PHONE_NUMBER**: Phone numbers
127
- **ADDRESS**: Physical addresses
128
- **DATE**: Absolute or relative dates
129
- **NUMBER**: Numeric values
130
- **PRICE**: Monetary amounts
131

132
### EntityMention
133

134
Represents a specific mention of an entity within the text.
135

136
```python { .api }
137
class EntityMention:
138
    class Type(proto.Enum):
139
        TYPE_UNKNOWN = 0
140
        PROPER = 1      # Proper noun (e.g., "Google")
141
        COMMON = 2      # Common noun (e.g., "company")
142
    
143
    text: TextSpan      # The mention text with position
144
    type_: Type         # Mention type (proper/common noun)
145
    sentiment: Sentiment # Sentiment associated with this mention (v1/v1beta2 only)
146
    probability: float   # Confidence score for this mention [0.0, 1.0]
147
```
148

149
## Advanced Usage
150

151
### Entity Metadata Processing
152

153
```python
154
def extract_entity_metadata(entities):
155
    """Process entity metadata for additional information."""
156
    processed = []
157
    
158
    for entity in entities:
159
        entity_info = {
160
            'name': entity.name,
161
            'type': entity.type_.name,
162
            'salience': entity.salience,
163
            'mentions_count': len(entity.mentions)
164
        }
165
        
166
        # Extract Wikipedia URL if available
167
        if 'wikipedia_url' in entity.metadata:
168
            entity_info['wikipedia'] = entity.metadata['wikipedia_url']
169
        
170
        # Extract knowledge graph ID if available
171
        if 'mid' in entity.metadata:
172
            entity_info['knowledge_graph_id'] = entity.metadata['mid']
173
        
174
        processed.append(entity_info)
175
    
176
    return processed
177

178
# Usage
179
response = client.analyze_entities(request={"document": document})
180
metadata = extract_entity_metadata(response.entities)
181

182
for info in metadata:
183
    print(f"Entity: {info['name']} ({info['type']})")
184
    if 'wikipedia' in info:
185
        print(f"  Wikipedia: {info['wikipedia']}")
186
    print(f"  Salience: {info['salience']}")
187
```
188

189
### Filter Entities by Type
190

191
```python
192
def filter_entities_by_type(entities, target_types):
193
    """Filter entities by specific types."""
194
    from google.cloud.language import Entity
195
    
196
    # Convert string types to enum values
197
    type_mapping = {
198
        'PERSON': Entity.Type.PERSON,
199
        'LOCATION': Entity.Type.LOCATION,
200
        'ORGANIZATION': Entity.Type.ORGANIZATION,
201
        'EVENT': Entity.Type.EVENT,
202
        'WORK_OF_ART': Entity.Type.WORK_OF_ART,
203
        'CONSUMER_GOOD': Entity.Type.CONSUMER_GOOD,
204
        'PHONE_NUMBER': Entity.Type.PHONE_NUMBER,
205
        'ADDRESS': Entity.Type.ADDRESS,
206
        'DATE': Entity.Type.DATE,
207
        'NUMBER': Entity.Type.NUMBER,
208
        'PRICE': Entity.Type.PRICE,
209
    }
210
    
211
    target_enum_types = [type_mapping[t] for t in target_types if t in type_mapping]
212
    
213
    return [entity for entity in entities if entity.type_ in target_enum_types]
214

215
# Usage - extract only people and organizations
216
response = client.analyze_entities(request={"document": document})
217
people_and_orgs = filter_entities_by_type(
218
    response.entities, 
219
    ['PERSON', 'ORGANIZATION']
220
)
221

222
for entity in people_and_orgs:
223
    print(f"{entity.name} ({entity.type_.name})")
224
```
225

226
### Salience-Based Ranking
227

228
```python
229
def get_most_salient_entities(entities, limit=5):
230
    """Get the most important entities by salience score."""
231
    sorted_entities = sorted(entities, key=lambda e: e.salience, reverse=True)
232
    return sorted_entities[:limit]
233

234
# Usage
235
response = client.analyze_entities(request={"document": document})
236
top_entities = get_most_salient_entities(response.entities, limit=3)
237

238
print("Most salient entities:")
239
for entity in top_entities:
240
    print(f"  {entity.name}: {entity.salience:.3f}")
241
```
242

243
### Mention Analysis
244

245
```python
246
def analyze_entity_mentions(entity):
247
    """Analyze mentions of a specific entity."""
248
    print(f"Entity: {entity.name}")
249
    print(f"Total mentions: {len(entity.mentions)}")
250
    
251
    proper_mentions = [m for m in entity.mentions if m.type_ == language.EntityMention.Type.PROPER]
252
    common_mentions = [m for m in entity.mentions if m.type_ == language.EntityMention.Type.COMMON]
253
    
254
    print(f"Proper noun mentions: {len(proper_mentions)}")
255
    print(f"Common noun mentions: {len(common_mentions)}")
256
    
257
    print("Mention details:")
258
    for i, mention in enumerate(entity.mentions):
259
        print(f"  {i+1}. '{mention.text.content}' ({mention.type_.name})")
260
        print(f"     Position: {mention.text.begin_offset}")
261
        print(f"     Confidence: {mention.probability:.3f}")
262

263
# Usage
264
response = client.analyze_entities(request={"document": document})
265
if response.entities:
266
    analyze_entity_mentions(response.entities[0])
267
```
268

269
### Process Large Documents
270

271
```python
272
def process_long_document(client, long_text, chunk_size=4000):
273
    """Process long documents by chunking."""
274
    import textwrap
275
    
276
    # Split into chunks
277
    chunks = textwrap.wrap(long_text, chunk_size, break_long_words=False)
278
    all_entities = []
279
    
280
    for i, chunk in enumerate(chunks):
281
        print(f"Processing chunk {i+1}/{len(chunks)}")
282
        
283
        document = language.Document(
284
            content=chunk,
285
            type_=language.Document.Type.PLAIN_TEXT
286
        )
287
        
288
        response = client.analyze_entities(request={"document": document})
289
        all_entities.extend(response.entities)
290
    
291
    # Deduplicate entities by name
292
    unique_entities = {}
293
    for entity in all_entities:
294
        if entity.name not in unique_entities:
295
            unique_entities[entity.name] = entity
296
        else:
297
            # Merge salience scores (take maximum)
298
            if entity.salience > unique_entities[entity.name].salience:
299
                unique_entities[entity.name] = entity
300
    
301
    return list(unique_entities.values())
302
```
303

304
### HTML Content Processing
305

306
```python
307
# Process HTML content to extract entities
308
html_content = """
309
<html>
310
<body>
311
<h1>Tech News</h1>
312
<p>Apple announced new features at their conference in Cupertino. 
313
CEO Tim Cook presented the latest innovations to the audience.</p>
314
</body>
315
</html>
316
"""
317

318
document = language.Document(
319
    content=html_content,
320
    type_=language.Document.Type.HTML
321
)
322

323
response = client.analyze_entities(request={"document": document})
324

325
for entity in response.entities:
326
    print(f"Entity: {entity.name} ({entity.type_.name})")
327
```
328

329
## Error Handling
330

331
```python
332
from google.api_core import exceptions
333

334
try:
335
    response = client.analyze_entities(
336
        request={"document": document},
337
        timeout=15.0
338
    )
339
except exceptions.InvalidArgument as e:
340
    print(f"Invalid document format: {e}")
341
except exceptions.ResourceExhausted:
342
    print("API quota exceeded")
343
except exceptions.GoogleAPIError as e:
344
    print(f"API error: {e}")
345
```
346

347
## Performance Considerations
348

349
- **Text Length**: Optimal for documents under 1MB
350
- **Entity Density**: Performance may vary with high entity density
351
- **Language**: Better accuracy with supported languages
352
- **Caching**: Entity results can be cached for static content
353
- **Batch Processing**: Use async client for multiple documents
354

355
## Use Cases
356

357
- **Information Extraction**: Extract people, places, organizations from news articles
358
- **Content Categorization**: Classify content based on entity types
359
- **Knowledge Graph Construction**: Build relationships between entities
360
- **Search Enhancement**: Improve search with entity-aware indexing
361
- **Content Recommendation**: Recommend content based on entity similarity
362
- **Data Privacy**: Identify personal information (names, addresses, phone numbers)

Version

Tile

Files

entity-analysis.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

entity-analysis.mddocs/