0
# Entity Analysis
1
2
Identifies and extracts named entities (people, places, organizations, events, etc.) from text, providing detailed information about each entity including type classification, salience scores, and mention locations within the text. Entity analysis is essential for information extraction, content understanding, and knowledge graph construction.
3
4
## Capabilities
5
6
### Analyze Entities
7
8
Identifies named entities in the provided text and returns detailed information about each entity found.
9
10
```python { .api }
11
def analyze_entities(
12
self,
13
request: Optional[Union[AnalyzeEntitiesRequest, dict]] = None,
14
*,
15
document: Optional[Document] = None,
16
encoding_type: Optional[EncodingType] = None,
17
retry: OptionalRetry = gapic_v1.method.DEFAULT,
18
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
19
metadata: Sequence[Tuple[str, Union[str, bytes]]] = ()
20
) -> AnalyzeEntitiesResponse:
21
"""
22
Finds named entities in the text and returns information about them.
23
24
Args:
25
request: The request object containing document and options
26
document: Input document for analysis
27
encoding_type: Text encoding type for offset calculations
28
retry: Retry configuration for the request
29
timeout: Request timeout in seconds
30
metadata: Additional metadata to send with the request
31
32
Returns:
33
AnalyzeEntitiesResponse containing found entities and metadata
34
"""
35
```
36
37
#### Usage Example
38
39
```python
40
from google.cloud import language
41
42
# Initialize client
43
client = language.LanguageServiceClient()
44
45
# Create document
46
document = language.Document(
47
content="Google was founded by Larry Page and Sergey Brin in Mountain View, California.",
48
type_=language.Document.Type.PLAIN_TEXT
49
)
50
51
# Analyze entities
52
response = client.analyze_entities(
53
request={"document": document}
54
)
55
56
# Process entities
57
for entity in response.entities:
58
print(f"Entity: {entity.name}")
59
print(f"Type: {entity.type_.name}")
60
print(f"Salience: {entity.salience}")
61
print(f"Metadata: {dict(entity.metadata)}")
62
63
# Print mentions
64
for mention in entity.mentions:
65
print(f" Mention: '{mention.text.content}' ({mention.type_.name})")
66
print()
67
```
68
69
## Request and Response Types
70
71
### AnalyzeEntitiesRequest
72
73
```python { .api }
74
class AnalyzeEntitiesRequest:
75
document: Document
76
encoding_type: EncodingType
77
```
78
79
### AnalyzeEntitiesResponse
80
81
```python { .api }
82
class AnalyzeEntitiesResponse:
83
entities: MutableSequence[Entity]
84
language: str
85
```
86
87
## Supporting Types
88
89
### Entity
90
91
Represents a named entity found in the text with comprehensive metadata.
92
93
```python { .api }
94
class Entity:
95
class Type(proto.Enum):
96
UNKNOWN = 0
97
PERSON = 1
98
LOCATION = 2
99
ORGANIZATION = 3
100
EVENT = 4
101
WORK_OF_ART = 5
102
CONSUMER_GOOD = 6
103
OTHER = 7
104
PHONE_NUMBER = 9
105
ADDRESS = 10
106
DATE = 11
107
NUMBER = 12
108
PRICE = 13
109
110
name: str # Canonical name of the entity
111
type_: Type # Entity type classification
112
metadata: MutableMapping[str, str] # Additional metadata (Wikipedia URL, etc.)
113
salience: float # Salience/importance score [0.0, 1.0]
114
mentions: MutableSequence[EntityMention] # All mentions of this entity in text
115
sentiment: Sentiment # Overall sentiment for this entity (v1/v1beta2 only)
116
```
117
118
**Entity Types:**
119
- **PERSON**: People, including fictional characters
120
- **LOCATION**: Physical locations (cities, countries, landmarks)
121
- **ORGANIZATION**: Companies, institutions, teams, groups
122
- **EVENT**: Named events (conferences, wars, sports events)
123
- **WORK_OF_ART**: Titles of books, movies, songs, etc.
124
- **CONSUMER_GOOD**: Products, brands, models
125
- **OTHER**: Entities that don't fit other categories
126
- **PHONE_NUMBER**: Phone numbers
127
- **ADDRESS**: Physical addresses
128
- **DATE**: Absolute or relative dates
129
- **NUMBER**: Numeric values
130
- **PRICE**: Monetary amounts
131
132
### EntityMention
133
134
Represents a specific mention of an entity within the text.
135
136
```python { .api }
137
class EntityMention:
138
class Type(proto.Enum):
139
TYPE_UNKNOWN = 0
140
PROPER = 1 # Proper noun (e.g., "Google")
141
COMMON = 2 # Common noun (e.g., "company")
142
143
text: TextSpan # The mention text with position
144
type_: Type # Mention type (proper/common noun)
145
sentiment: Sentiment # Sentiment associated with this mention (v1/v1beta2 only)
146
probability: float # Confidence score for this mention [0.0, 1.0]
147
```
148
149
## Advanced Usage
150
151
### Entity Metadata Processing
152
153
```python
154
def extract_entity_metadata(entities):
155
"""Process entity metadata for additional information."""
156
processed = []
157
158
for entity in entities:
159
entity_info = {
160
'name': entity.name,
161
'type': entity.type_.name,
162
'salience': entity.salience,
163
'mentions_count': len(entity.mentions)
164
}
165
166
# Extract Wikipedia URL if available
167
if 'wikipedia_url' in entity.metadata:
168
entity_info['wikipedia'] = entity.metadata['wikipedia_url']
169
170
# Extract knowledge graph ID if available
171
if 'mid' in entity.metadata:
172
entity_info['knowledge_graph_id'] = entity.metadata['mid']
173
174
processed.append(entity_info)
175
176
return processed
177
178
# Usage
179
response = client.analyze_entities(request={"document": document})
180
metadata = extract_entity_metadata(response.entities)
181
182
for info in metadata:
183
print(f"Entity: {info['name']} ({info['type']})")
184
if 'wikipedia' in info:
185
print(f" Wikipedia: {info['wikipedia']}")
186
print(f" Salience: {info['salience']}")
187
```
188
189
### Filter Entities by Type
190
191
```python
192
def filter_entities_by_type(entities, target_types):
193
"""Filter entities by specific types."""
194
from google.cloud.language import Entity
195
196
# Convert string types to enum values
197
type_mapping = {
198
'PERSON': Entity.Type.PERSON,
199
'LOCATION': Entity.Type.LOCATION,
200
'ORGANIZATION': Entity.Type.ORGANIZATION,
201
'EVENT': Entity.Type.EVENT,
202
'WORK_OF_ART': Entity.Type.WORK_OF_ART,
203
'CONSUMER_GOOD': Entity.Type.CONSUMER_GOOD,
204
'PHONE_NUMBER': Entity.Type.PHONE_NUMBER,
205
'ADDRESS': Entity.Type.ADDRESS,
206
'DATE': Entity.Type.DATE,
207
'NUMBER': Entity.Type.NUMBER,
208
'PRICE': Entity.Type.PRICE,
209
}
210
211
target_enum_types = [type_mapping[t] for t in target_types if t in type_mapping]
212
213
return [entity for entity in entities if entity.type_ in target_enum_types]
214
215
# Usage - extract only people and organizations
216
response = client.analyze_entities(request={"document": document})
217
people_and_orgs = filter_entities_by_type(
218
response.entities,
219
['PERSON', 'ORGANIZATION']
220
)
221
222
for entity in people_and_orgs:
223
print(f"{entity.name} ({entity.type_.name})")
224
```
225
226
### Salience-Based Ranking
227
228
```python
229
def get_most_salient_entities(entities, limit=5):
230
"""Get the most important entities by salience score."""
231
sorted_entities = sorted(entities, key=lambda e: e.salience, reverse=True)
232
return sorted_entities[:limit]
233
234
# Usage
235
response = client.analyze_entities(request={"document": document})
236
top_entities = get_most_salient_entities(response.entities, limit=3)
237
238
print("Most salient entities:")
239
for entity in top_entities:
240
print(f" {entity.name}: {entity.salience:.3f}")
241
```
242
243
### Mention Analysis
244
245
```python
246
def analyze_entity_mentions(entity):
247
"""Analyze mentions of a specific entity."""
248
print(f"Entity: {entity.name}")
249
print(f"Total mentions: {len(entity.mentions)}")
250
251
proper_mentions = [m for m in entity.mentions if m.type_ == language.EntityMention.Type.PROPER]
252
common_mentions = [m for m in entity.mentions if m.type_ == language.EntityMention.Type.COMMON]
253
254
print(f"Proper noun mentions: {len(proper_mentions)}")
255
print(f"Common noun mentions: {len(common_mentions)}")
256
257
print("Mention details:")
258
for i, mention in enumerate(entity.mentions):
259
print(f" {i+1}. '{mention.text.content}' ({mention.type_.name})")
260
print(f" Position: {mention.text.begin_offset}")
261
print(f" Confidence: {mention.probability:.3f}")
262
263
# Usage
264
response = client.analyze_entities(request={"document": document})
265
if response.entities:
266
analyze_entity_mentions(response.entities[0])
267
```
268
269
### Process Large Documents
270
271
```python
272
def process_long_document(client, long_text, chunk_size=4000):
273
"""Process long documents by chunking."""
274
import textwrap
275
276
# Split into chunks
277
chunks = textwrap.wrap(long_text, chunk_size, break_long_words=False)
278
all_entities = []
279
280
for i, chunk in enumerate(chunks):
281
print(f"Processing chunk {i+1}/{len(chunks)}")
282
283
document = language.Document(
284
content=chunk,
285
type_=language.Document.Type.PLAIN_TEXT
286
)
287
288
response = client.analyze_entities(request={"document": document})
289
all_entities.extend(response.entities)
290
291
# Deduplicate entities by name
292
unique_entities = {}
293
for entity in all_entities:
294
if entity.name not in unique_entities:
295
unique_entities[entity.name] = entity
296
else:
297
# Merge salience scores (take maximum)
298
if entity.salience > unique_entities[entity.name].salience:
299
unique_entities[entity.name] = entity
300
301
return list(unique_entities.values())
302
```
303
304
### HTML Content Processing
305
306
```python
307
# Process HTML content to extract entities
308
html_content = """
309
<html>
310
<body>
311
<h1>Tech News</h1>
312
<p>Apple announced new features at their conference in Cupertino.
313
CEO Tim Cook presented the latest innovations to the audience.</p>
314
</body>
315
</html>
316
"""
317
318
document = language.Document(
319
content=html_content,
320
type_=language.Document.Type.HTML
321
)
322
323
response = client.analyze_entities(request={"document": document})
324
325
for entity in response.entities:
326
print(f"Entity: {entity.name} ({entity.type_.name})")
327
```
328
329
## Error Handling
330
331
```python
332
from google.api_core import exceptions
333
334
try:
335
response = client.analyze_entities(
336
request={"document": document},
337
timeout=15.0
338
)
339
except exceptions.InvalidArgument as e:
340
print(f"Invalid document format: {e}")
341
except exceptions.ResourceExhausted:
342
print("API quota exceeded")
343
except exceptions.GoogleAPIError as e:
344
print(f"API error: {e}")
345
```
346
347
## Performance Considerations
348
349
- **Text Length**: Optimal for documents under 1MB
350
- **Entity Density**: Performance may vary with high entity density
351
- **Language**: Better accuracy with supported languages
352
- **Caching**: Entity results can be cached for static content
353
- **Batch Processing**: Use async client for multiple documents
354
355
## Use Cases
356
357
- **Information Extraction**: Extract people, places, organizations from news articles
358
- **Content Categorization**: Classify content based on entity types
359
- **Knowledge Graph Construction**: Build relationships between entities
360
- **Search Enhancement**: Improve search with entity-aware indexing
361
- **Content Recommendation**: Recommend content based on entity similarity
362
- **Data Privacy**: Identify personal information (names, addresses, phone numbers)