or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

client-management.mdcombined-analysis.mdcontent-moderation.mdentity-analysis.mdentity-sentiment-analysis.mdindex.mdsentiment-analysis.mdsyntax-analysis.mdtext-classification.md

entity-analysis.mddocs/

0

# Entity Analysis

1

2

Identifies and extracts named entities (people, places, organizations, events, etc.) from text, providing detailed information about each entity including type classification, salience scores, and mention locations within the text. Entity analysis is essential for information extraction, content understanding, and knowledge graph construction.

3

4

## Capabilities

5

6

### Analyze Entities

7

8

Identifies named entities in the provided text and returns detailed information about each entity found.

9

10

```python { .api }

11

def analyze_entities(

12

self,

13

request: Optional[Union[AnalyzeEntitiesRequest, dict]] = None,

14

*,

15

document: Optional[Document] = None,

16

encoding_type: Optional[EncodingType] = None,

17

retry: OptionalRetry = gapic_v1.method.DEFAULT,

18

timeout: Union[float, object] = gapic_v1.method.DEFAULT,

19

metadata: Sequence[Tuple[str, Union[str, bytes]]] = ()

20

) -> AnalyzeEntitiesResponse:

21

"""

22

Finds named entities in the text and returns information about them.

23

24

Args:

25

request: The request object containing document and options

26

document: Input document for analysis

27

encoding_type: Text encoding type for offset calculations

28

retry: Retry configuration for the request

29

timeout: Request timeout in seconds

30

metadata: Additional metadata to send with the request

31

32

Returns:

33

AnalyzeEntitiesResponse containing found entities and metadata

34

"""

35

```

36

37

#### Usage Example

38

39

```python

40

from google.cloud import language

41

42

# Initialize client

43

client = language.LanguageServiceClient()

44

45

# Create document

46

document = language.Document(

47

content="Google was founded by Larry Page and Sergey Brin in Mountain View, California.",

48

type_=language.Document.Type.PLAIN_TEXT

49

)

50

51

# Analyze entities

52

response = client.analyze_entities(

53

request={"document": document}

54

)

55

56

# Process entities

57

for entity in response.entities:

58

print(f"Entity: {entity.name}")

59

print(f"Type: {entity.type_.name}")

60

print(f"Salience: {entity.salience}")

61

print(f"Metadata: {dict(entity.metadata)}")

62

63

# Print mentions

64

for mention in entity.mentions:

65

print(f" Mention: '{mention.text.content}' ({mention.type_.name})")

66

print()

67

```

68

69

## Request and Response Types

70

71

### AnalyzeEntitiesRequest

72

73

```python { .api }

74

class AnalyzeEntitiesRequest:

75

document: Document

76

encoding_type: EncodingType

77

```

78

79

### AnalyzeEntitiesResponse

80

81

```python { .api }

82

class AnalyzeEntitiesResponse:

83

entities: MutableSequence[Entity]

84

language: str

85

```

86

87

## Supporting Types

88

89

### Entity

90

91

Represents a named entity found in the text with comprehensive metadata.

92

93

```python { .api }

94

class Entity:

95

class Type(proto.Enum):

96

UNKNOWN = 0

97

PERSON = 1

98

LOCATION = 2

99

ORGANIZATION = 3

100

EVENT = 4

101

WORK_OF_ART = 5

102

CONSUMER_GOOD = 6

103

OTHER = 7

104

PHONE_NUMBER = 9

105

ADDRESS = 10

106

DATE = 11

107

NUMBER = 12

108

PRICE = 13

109

110

name: str # Canonical name of the entity

111

type_: Type # Entity type classification

112

metadata: MutableMapping[str, str] # Additional metadata (Wikipedia URL, etc.)

113

salience: float # Salience/importance score [0.0, 1.0]

114

mentions: MutableSequence[EntityMention] # All mentions of this entity in text

115

sentiment: Sentiment # Overall sentiment for this entity (v1/v1beta2 only)

116

```

117

118

**Entity Types:**

119

- **PERSON**: People, including fictional characters

120

- **LOCATION**: Physical locations (cities, countries, landmarks)

121

- **ORGANIZATION**: Companies, institutions, teams, groups

122

- **EVENT**: Named events (conferences, wars, sports events)

123

- **WORK_OF_ART**: Titles of books, movies, songs, etc.

124

- **CONSUMER_GOOD**: Products, brands, models

125

- **OTHER**: Entities that don't fit other categories

126

- **PHONE_NUMBER**: Phone numbers

127

- **ADDRESS**: Physical addresses

128

- **DATE**: Absolute or relative dates

129

- **NUMBER**: Numeric values

130

- **PRICE**: Monetary amounts

131

132

### EntityMention

133

134

Represents a specific mention of an entity within the text.

135

136

```python { .api }

137

class EntityMention:

138

class Type(proto.Enum):

139

TYPE_UNKNOWN = 0

140

PROPER = 1 # Proper noun (e.g., "Google")

141

COMMON = 2 # Common noun (e.g., "company")

142

143

text: TextSpan # The mention text with position

144

type_: Type # Mention type (proper/common noun)

145

sentiment: Sentiment # Sentiment associated with this mention (v1/v1beta2 only)

146

probability: float # Confidence score for this mention [0.0, 1.0]

147

```

148

149

## Advanced Usage

150

151

### Entity Metadata Processing

152

153

```python

154

def extract_entity_metadata(entities):

155

"""Process entity metadata for additional information."""

156

processed = []

157

158

for entity in entities:

159

entity_info = {

160

'name': entity.name,

161

'type': entity.type_.name,

162

'salience': entity.salience,

163

'mentions_count': len(entity.mentions)

164

}

165

166

# Extract Wikipedia URL if available

167

if 'wikipedia_url' in entity.metadata:

168

entity_info['wikipedia'] = entity.metadata['wikipedia_url']

169

170

# Extract knowledge graph ID if available

171

if 'mid' in entity.metadata:

172

entity_info['knowledge_graph_id'] = entity.metadata['mid']

173

174

processed.append(entity_info)

175

176

return processed

177

178

# Usage

179

response = client.analyze_entities(request={"document": document})

180

metadata = extract_entity_metadata(response.entities)

181

182

for info in metadata:

183

print(f"Entity: {info['name']} ({info['type']})")

184

if 'wikipedia' in info:

185

print(f" Wikipedia: {info['wikipedia']}")

186

print(f" Salience: {info['salience']}")

187

```

188

189

### Filter Entities by Type

190

191

```python

192

def filter_entities_by_type(entities, target_types):

193

"""Filter entities by specific types."""

194

from google.cloud.language import Entity

195

196

# Convert string types to enum values

197

type_mapping = {

198

'PERSON': Entity.Type.PERSON,

199

'LOCATION': Entity.Type.LOCATION,

200

'ORGANIZATION': Entity.Type.ORGANIZATION,

201

'EVENT': Entity.Type.EVENT,

202

'WORK_OF_ART': Entity.Type.WORK_OF_ART,

203

'CONSUMER_GOOD': Entity.Type.CONSUMER_GOOD,

204

'PHONE_NUMBER': Entity.Type.PHONE_NUMBER,

205

'ADDRESS': Entity.Type.ADDRESS,

206

'DATE': Entity.Type.DATE,

207

'NUMBER': Entity.Type.NUMBER,

208

'PRICE': Entity.Type.PRICE,

209

}

210

211

target_enum_types = [type_mapping[t] for t in target_types if t in type_mapping]

212

213

return [entity for entity in entities if entity.type_ in target_enum_types]

214

215

# Usage - extract only people and organizations

216

response = client.analyze_entities(request={"document": document})

217

people_and_orgs = filter_entities_by_type(

218

response.entities,

219

['PERSON', 'ORGANIZATION']

220

)

221

222

for entity in people_and_orgs:

223

print(f"{entity.name} ({entity.type_.name})")

224

```

225

226

### Salience-Based Ranking

227

228

```python

229

def get_most_salient_entities(entities, limit=5):

230

"""Get the most important entities by salience score."""

231

sorted_entities = sorted(entities, key=lambda e: e.salience, reverse=True)

232

return sorted_entities[:limit]

233

234

# Usage

235

response = client.analyze_entities(request={"document": document})

236

top_entities = get_most_salient_entities(response.entities, limit=3)

237

238

print("Most salient entities:")

239

for entity in top_entities:

240

print(f" {entity.name}: {entity.salience:.3f}")

241

```

242

243

### Mention Analysis

244

245

```python

246

def analyze_entity_mentions(entity):

247

"""Analyze mentions of a specific entity."""

248

print(f"Entity: {entity.name}")

249

print(f"Total mentions: {len(entity.mentions)}")

250

251

proper_mentions = [m for m in entity.mentions if m.type_ == language.EntityMention.Type.PROPER]

252

common_mentions = [m for m in entity.mentions if m.type_ == language.EntityMention.Type.COMMON]

253

254

print(f"Proper noun mentions: {len(proper_mentions)}")

255

print(f"Common noun mentions: {len(common_mentions)}")

256

257

print("Mention details:")

258

for i, mention in enumerate(entity.mentions):

259

print(f" {i+1}. '{mention.text.content}' ({mention.type_.name})")

260

print(f" Position: {mention.text.begin_offset}")

261

print(f" Confidence: {mention.probability:.3f}")

262

263

# Usage

264

response = client.analyze_entities(request={"document": document})

265

if response.entities:

266

analyze_entity_mentions(response.entities[0])

267

```

268

269

### Process Large Documents

270

271

```python

272

def process_long_document(client, long_text, chunk_size=4000):

273

"""Process long documents by chunking."""

274

import textwrap

275

276

# Split into chunks

277

chunks = textwrap.wrap(long_text, chunk_size, break_long_words=False)

278

all_entities = []

279

280

for i, chunk in enumerate(chunks):

281

print(f"Processing chunk {i+1}/{len(chunks)}")

282

283

document = language.Document(

284

content=chunk,

285

type_=language.Document.Type.PLAIN_TEXT

286

)

287

288

response = client.analyze_entities(request={"document": document})

289

all_entities.extend(response.entities)

290

291

# Deduplicate entities by name

292

unique_entities = {}

293

for entity in all_entities:

294

if entity.name not in unique_entities:

295

unique_entities[entity.name] = entity

296

else:

297

# Merge salience scores (take maximum)

298

if entity.salience > unique_entities[entity.name].salience:

299

unique_entities[entity.name] = entity

300

301

return list(unique_entities.values())

302

```

303

304

### HTML Content Processing

305

306

```python

307

# Process HTML content to extract entities

308

html_content = """

309

<html>

310

<body>

311

<h1>Tech News</h1>

312

<p>Apple announced new features at their conference in Cupertino.

313

CEO Tim Cook presented the latest innovations to the audience.</p>

314

</body>

315

</html>

316

"""

317

318

document = language.Document(

319

content=html_content,

320

type_=language.Document.Type.HTML

321

)

322

323

response = client.analyze_entities(request={"document": document})

324

325

for entity in response.entities:

326

print(f"Entity: {entity.name} ({entity.type_.name})")

327

```

328

329

## Error Handling

330

331

```python

332

from google.api_core import exceptions

333

334

try:

335

response = client.analyze_entities(

336

request={"document": document},

337

timeout=15.0

338

)

339

except exceptions.InvalidArgument as e:

340

print(f"Invalid document format: {e}")

341

except exceptions.ResourceExhausted:

342

print("API quota exceeded")

343

except exceptions.GoogleAPIError as e:

344

print(f"API error: {e}")

345

```

346

347

## Performance Considerations

348

349

- **Text Length**: Optimal for documents under 1MB

350

- **Entity Density**: Performance may vary with high entity density

351

- **Language**: Better accuracy with supported languages

352

- **Caching**: Entity results can be cached for static content

353

- **Batch Processing**: Use async client for multiple documents

354

355

## Use Cases

356

357

- **Information Extraction**: Extract people, places, organizations from news articles

358

- **Content Categorization**: Classify content based on entity types

359

- **Knowledge Graph Construction**: Build relationships between entities

360

- **Search Enhancement**: Improve search with entity-aware indexing

361

- **Content Recommendation**: Recommend content based on entity similarity

362

- **Data Privacy**: Identify personal information (names, addresses, phone numbers)