0
# Syntax Analysis (v1/v1beta2 only)
1
2
Provides comprehensive linguistic analysis including part-of-speech tagging, dependency parsing, morphological analysis, and token-level information to understand the grammatical structure and linguistic properties of text. Essential for applications requiring deep language understanding, grammar checking, and linguistic research.
3
4
**Note**: This feature is only available in API versions v1 and v1beta2. It is not included in the simplified v2 API.
5
6
## Capabilities
7
8
### Analyze Syntax
9
10
Performs detailed syntactic analysis of the provided text, returning information about sentences, tokens, part-of-speech tags, and dependency relationships.
11
12
```python { .api }
13
def analyze_syntax(
14
self,
15
request: Optional[Union[AnalyzeSyntaxRequest, dict]] = None,
16
*,
17
document: Optional[Document] = None,
18
encoding_type: Optional[EncodingType] = None,
19
retry: OptionalRetry = gapic_v1.method.DEFAULT,
20
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
21
metadata: Sequence[Tuple[str, Union[str, bytes]]] = ()
22
) -> AnalyzeSyntaxResponse:
23
"""
24
Analyzes the syntax of the text and provides part-of-speech tagging,
25
dependency parsing, and other linguistic information.
26
27
Args:
28
request: The request object containing document and options
29
document: Input document for analysis
30
encoding_type: Text encoding type for offset calculations
31
retry: Retry configuration for the request
32
timeout: Request timeout in seconds
33
metadata: Additional metadata to send with the request
34
35
Returns:
36
AnalyzeSyntaxResponse containing linguistic analysis results
37
"""
38
```
39
40
#### Usage Example
41
42
```python
43
from google.cloud import language_v1 # Use v1 or v1beta2
44
45
# Initialize client (must use v1 or v1beta2)
46
client = language_v1.LanguageServiceClient()
47
48
# Create document
49
document = language_v1.Document(
50
content="The quick brown fox jumps over the lazy dog.",
51
type_=language_v1.Document.Type.PLAIN_TEXT
52
)
53
54
# Analyze syntax
55
response = client.analyze_syntax(
56
request={"document": document}
57
)
58
59
# Process sentences
60
print("Sentences:")
61
for i, sentence in enumerate(response.sentences):
62
print(f"{i+1}. {sentence.text.content}")
63
64
print("\nTokens with POS tags:")
65
for token in response.tokens:
66
print(f"'{token.text.content}' - {token.part_of_speech.tag.name}")
67
68
print("\nDependency relationships:")
69
for i, token in enumerate(response.tokens):
70
if token.dependency_edge.head_token_index != i: # Not the root
71
head_token = response.tokens[token.dependency_edge.head_token_index]
72
print(f"'{token.text.content}' --{token.dependency_edge.label.name}--> '{head_token.text.content}'")
73
```
74
75
## Request and Response Types
76
77
### AnalyzeSyntaxRequest
78
79
```python { .api }
80
class AnalyzeSyntaxRequest:
81
document: Document
82
encoding_type: EncodingType
83
```
84
85
### AnalyzeSyntaxResponse
86
87
```python { .api }
88
class AnalyzeSyntaxResponse:
89
sentences: MutableSequence[Sentence]
90
tokens: MutableSequence[Token]
91
language: str
92
```
93
94
## Supporting Types
95
96
### Token
97
98
Represents a linguistic token with comprehensive morphological and syntactic information.
99
100
```python { .api }
101
class Token:
102
text: TextSpan # Token text and position
103
part_of_speech: PartOfSpeech # Part-of-speech information
104
dependency_edge: DependencyEdge # Dependency relationship
105
lemma: str # Canonical form of the token
106
```
107
108
### PartOfSpeech
109
110
Comprehensive part-of-speech and morphological information.
111
112
```python { .api }
113
class PartOfSpeech:
114
class Tag(proto.Enum):
115
UNKNOWN = 0
116
ADJ = 1 # Adjective
117
ADP = 2 # Adposition (preposition/postposition)
118
ADV = 3 # Adverb
119
CONJ = 4 # Conjunction
120
DET = 5 # Determiner
121
NOUN = 6 # Noun
122
NUM = 7 # Numeral
123
PRON = 8 # Pronoun
124
PRT = 9 # Particle
125
PUNCT = 10 # Punctuation
126
VERB = 11 # Verb
127
X = 12 # Other/Unknown
128
AFFIX = 13 # Affix
129
130
class Aspect(proto.Enum):
131
ASPECT_UNKNOWN = 0
132
PERFECTIVE = 1
133
IMPERFECTIVE = 2
134
PROGRESSIVE = 3
135
136
class Case(proto.Enum):
137
CASE_UNKNOWN = 0
138
ACCUSATIVE = 1
139
ADVERBIAL = 2
140
COMPLEMENTIVE = 3
141
DATIVE = 4
142
GENITIVE = 5
143
INSTRUMENTAL = 6
144
LOCATIVE = 7
145
NOMINATIVE = 8
146
OBLIQUE = 9
147
PARTITIVE = 10
148
PREPOSITIONAL = 11
149
REFLEXIVE_CASE = 12
150
RELATIVE_CASE = 13
151
VOCATIVE = 14
152
153
# Additional enums for Form, Gender, Mood, Number, Person, Proper, Reciprocity, Tense, Voice
154
155
tag: Tag # Main part-of-speech tag
156
aspect: Aspect # Verbal aspect
157
case: Case # Grammatical case
158
form: Form # Word form
159
gender: Gender # Grammatical gender
160
mood: Mood # Grammatical mood
161
number: Number # Grammatical number
162
person: Person # Grammatical person
163
proper: Proper # Proper noun indicator
164
reciprocity: Reciprocity # Reciprocity
165
tense: Tense # Grammatical tense
166
voice: Voice # Grammatical voice
167
```
168
169
### DependencyEdge
170
171
Represents a dependency relationship between tokens in the parse tree.
172
173
```python { .api }
174
class DependencyEdge:
175
class Label(proto.Enum):
176
UNKNOWN = 0
177
ABBREV = 1 # Abbreviation modifier
178
ACOMP = 2 # Adjectival complement
179
ADVCL = 3 # Adverbial clause modifier
180
ADVMOD = 4 # Adverbial modifier
181
AMOD = 5 # Adjectival modifier
182
APPOS = 6 # Appositional modifier
183
ATTR = 7 # Attribute
184
AUX = 8 # Auxiliary
185
AUXPASS = 9 # Passive auxiliary
186
CC = 10 # Coordinating conjunction
187
CCOMP = 11 # Clausal complement
188
CONJ = 12 # Conjunct
189
CSUBJ = 13 # Clausal subject
190
CSUBJPASS = 14 # Clausal passive subject
191
DEP = 15 # Dependent
192
DET = 16 # Determiner
193
DISCOURSE = 17 # Discourse element
194
DOBJ = 18 # Direct object
195
EXPL = 19 # Expletive
196
GOESWITH = 20 # Goes with
197
IOBJ = 21 # Indirect object
198
MARK = 22 # Marker
199
MWE = 23 # Multi-word expression
200
MWV = 24 # Multi-word verbal expression
201
NEG = 25 # Negation modifier
202
NN = 26 # Noun compound modifier
203
NPADVMOD = 27 # Noun phrase adverbial modifier
204
NSUBJ = 28 # Nominal subject
205
NSUBJPASS = 29 # Passive nominal subject
206
NUM = 30 # Numeric modifier
207
NUMBER = 31 # Element of compound number
208
P = 32 # Punctuation mark
209
PARATAXIS = 33 # Parataxis
210
PARTMOD = 34 # Participial modifier
211
PCOMP = 35 # Prepositional complement
212
POBJ = 36 # Object of preposition
213
POSS = 37 # Possession modifier
214
POSTNEG = 38 # Postverbal negative particle
215
PRECOMP = 39 # Predicate complement
216
PRECONJ = 40 # Preconjunct
217
PREDET = 41 # Predeterminer
218
PREF = 42 # Prefix
219
PREP = 43 # Prepositional modifier
220
PRONL = 44 # Pronominal locative
221
PRT = 45 # Particle
222
PS = 46 # Possessive ending
223
QUANTMOD = 47 # Quantifier phrase modifier
224
RCMOD = 48 # Relative clause modifier
225
RCMODREL = 49 # Complementizer in relative clause
226
RDROP = 50 # Ellipsis without a preceding predicate
227
REF = 51 # Referent
228
REMNANT = 52 # Remnant
229
REPARANDUM = 53 # Reparandum
230
ROOT = 54 # Root
231
SNUM = 55 # Suffix specifying a unit of number
232
SUFF = 56 # Suffix
233
TMOD = 57 # Temporal modifier
234
TOPIC = 58 # Topic marker
235
VMOD = 59 # Verbal modifier
236
VOCATIVE = 60 # Vocative
237
XCOMP = 61 # Open clausal complement
238
SUFFIX = 62 # Suffix
239
TITLE = 63 # Title
240
ADVPHMOD = 64 # Adverbial phrase modifier
241
AUXCAUS = 65 # Causative auxiliary
242
AUXVV = 66 # Helper auxiliary
243
DTMOD = 67 # Rentaishi (Prenominal modifier)
244
FOREIGN = 68 # Foreign words
245
KW = 69 # Keyword
246
LIST = 70 # List for chains of comparable items
247
NOMC = 71 # Nominalized clause
248
NOMCSUBJ = 72 # Nominalized clausal subject
249
NOMCSUBJPASS = 73 # Nominalized clausal passive
250
NUMC = 74 # Compound of numeric modifier
251
COP = 75 # Copula
252
DISLOCATED = 76 # Dislocated relation
253
ASP = 77 # Aspect marker
254
GMOD = 78 # Genitive modifier
255
GOBJ = 79 # Genitive object
256
INFMOD = 80 # Infinitival modifier
257
MES = 81 # Measure
258
NCOMP = 82 # Nominal complement of a noun
259
260
head_token_index: int # Index of the head token
261
label: Label # Dependency relationship label
262
```
263
264
## Advanced Usage
265
266
### Part-of-Speech Analysis
267
268
```python
269
def analyze_pos_distribution(client, text):
270
"""Analyze the distribution of parts of speech in text."""
271
document = language_v1.Document(
272
content=text,
273
type_=language_v1.Document.Type.PLAIN_TEXT
274
)
275
276
response = client.analyze_syntax(
277
request={"document": document}
278
)
279
280
pos_counts = {}
281
total_tokens = len(response.tokens)
282
283
for token in response.tokens:
284
pos_tag = token.part_of_speech.tag.name
285
pos_counts[pos_tag] = pos_counts.get(pos_tag, 0) + 1
286
287
print("Part-of-Speech Distribution:")
288
for pos, count in sorted(pos_counts.items(), key=lambda x: x[1], reverse=True):
289
percentage = (count / total_tokens) * 100
290
print(f"{pos}: {count} ({percentage:.1f}%)")
291
292
return pos_counts
293
294
# Usage
295
text = "The quick brown fox jumps gracefully over the very lazy dog near the old oak tree."
296
pos_distribution = analyze_pos_distribution(client, text)
297
```
298
299
### Dependency Tree Visualization
300
301
```python
302
def visualize_dependency_tree(client, text):
303
"""Create a simple text representation of the dependency tree."""
304
document = language_v1.Document(
305
content=text,
306
type_=language_v1.Document.Type.PLAIN_TEXT
307
)
308
309
response = client.analyze_syntax(
310
request={"document": document}
311
)
312
313
# Find the root token
314
root_index = None
315
for i, token in enumerate(response.tokens):
316
if token.dependency_edge.label == language_v1.DependencyEdge.Label.ROOT:
317
root_index = i
318
break
319
320
if root_index is not None:
321
print(f"Dependency Tree (root: '{response.tokens[root_index].text.content}'):")
322
print_dependency_subtree(response.tokens, root_index, 0)
323
324
return response.tokens
325
326
def print_dependency_subtree(tokens, head_index, depth):
327
"""Recursively print dependency subtree."""
328
head_token = tokens[head_index]
329
indent = " " * depth
330
pos_tag = head_token.part_of_speech.tag.name
331
print(f"{indent}{head_token.text.content} ({pos_tag})")
332
333
# Find children
334
children = []
335
for i, token in enumerate(tokens):
336
if token.dependency_edge.head_token_index == head_index and i != head_index:
337
children.append((i, token.dependency_edge.label.name))
338
339
# Sort children by position in sentence
340
children.sort(key=lambda x: tokens[x[0]].text.begin_offset)
341
342
for child_index, relation in children:
343
child_indent = " " * (depth + 1)
344
print(f"{child_indent}--{relation}-->")
345
print_dependency_subtree(tokens, child_index, depth + 2)
346
347
# Usage
348
text = "The cat sat on the mat."
349
visualize_dependency_tree(client, text)
350
```
351
352
### Lemmatization
353
354
```python
355
def extract_lemmas(client, text):
356
"""Extract lemmatized forms of words."""
357
document = language_v1.Document(
358
content=text,
359
type_=language_v1.Document.Type.PLAIN_TEXT
360
)
361
362
response = client.analyze_syntax(
363
request={"document": document}
364
)
365
366
lemmas = []
367
print("Word -> Lemma:")
368
for token in response.tokens:
369
word = token.text.content
370
lemma = token.lemma
371
pos = token.part_of_speech.tag.name
372
373
if word != lemma:
374
print(f"{word} -> {lemma} ({pos})")
375
376
lemmas.append(lemma)
377
378
return lemmas
379
380
# Usage
381
text = "The children were running quickly through the trees and jumped over the fallen logs."
382
lemmas = extract_lemmas(client, text)
383
print(f"\nLemmatized text: {' '.join(lemmas)}")
384
```
385
386
### Subject-Verb-Object Extraction
387
388
```python
389
def extract_svo_triples(client, text):
390
"""Extract Subject-Verb-Object triples from text."""
391
document = language_v1.Document(
392
content=text,
393
type_=language_v1.Document.Type.PLAIN_TEXT
394
)
395
396
response = client.analyze_syntax(
397
request={"document": document}
398
)
399
400
triples = []
401
402
# Find verbs
403
for i, token in enumerate(response.tokens):
404
if token.part_of_speech.tag == language_v1.PartOfSpeech.Tag.VERB:
405
verb = token.text.content
406
subject = None
407
obj = None
408
409
# Find subject and object
410
for j, dependent in enumerate(response.tokens):
411
if dependent.dependency_edge.head_token_index == i:
412
if dependent.dependency_edge.label == language_v1.DependencyEdge.Label.NSUBJ:
413
subject = dependent.text.content
414
elif dependent.dependency_edge.label == language_v1.DependencyEdge.Label.DOBJ:
415
obj = dependent.text.content
416
417
if subject and obj:
418
triples.append((subject, verb, obj))
419
420
return triples
421
422
# Usage
423
text = "The dog chased the cat. Mary loves books. John ate an apple."
424
svo_triples = extract_svo_triples(client, text)
425
426
print("Subject-Verb-Object triples:")
427
for subject, verb, obj in svo_triples:
428
print(f"{subject} -> {verb} -> {obj}")
429
```
430
431
### Morphological Analysis
432
433
```python
434
def analyze_morphology(client, text):
435
"""Analyze morphological features of words."""
436
document = language_v1.Document(
437
content=text,
438
type_=language_v1.Document.Type.PLAIN_TEXT
439
)
440
441
response = client.analyze_syntax(
442
request={"document": document}
443
)
444
445
print("Morphological Analysis:")
446
for token in response.tokens:
447
word = token.text.content
448
pos_info = token.part_of_speech
449
450
features = []
451
452
# Collect non-unknown morphological features
453
if pos_info.aspect != language_v1.PartOfSpeech.Aspect.ASPECT_UNKNOWN:
454
features.append(f"Aspect: {pos_info.aspect.name}")
455
if pos_info.case != language_v1.PartOfSpeech.Case.CASE_UNKNOWN:
456
features.append(f"Case: {pos_info.case.name}")
457
if pos_info.gender != language_v1.PartOfSpeech.Gender.GENDER_UNKNOWN:
458
features.append(f"Gender: {pos_info.gender.name}")
459
if pos_info.mood != language_v1.PartOfSpeech.Mood.MOOD_UNKNOWN:
460
features.append(f"Mood: {pos_info.mood.name}")
461
if pos_info.number != language_v1.PartOfSpeech.Number.NUMBER_UNKNOWN:
462
features.append(f"Number: {pos_info.number.name}")
463
if pos_info.person != language_v1.PartOfSpeech.Person.PERSON_UNKNOWN:
464
features.append(f"Person: {pos_info.person.name}")
465
if pos_info.tense != language_v1.PartOfSpeech.Tense.TENSE_UNKNOWN:
466
features.append(f"Tense: {pos_info.tense.name}")
467
if pos_info.voice != language_v1.PartOfSpeech.Voice.VOICE_UNKNOWN:
468
features.append(f"Voice: {pos_info.voice.name}")
469
470
if features:
471
print(f"{word} ({pos_info.tag.name}): {', '.join(features)}")
472
else:
473
print(f"{word} ({pos_info.tag.name})")
474
475
# Usage
476
text = "The cats were sleeping peacefully in their beds."
477
analyze_morphology(client, text)
478
```
479
480
### Sentence Complexity Analysis
481
482
```python
483
def analyze_sentence_complexity(client, text):
484
"""Analyze grammatical complexity of sentences."""
485
document = language_v1.Document(
486
content=text,
487
type_=language_v1.Document.Type.PLAIN_TEXT
488
)
489
490
response = client.analyze_syntax(
491
request={"document": document}
492
)
493
494
sentence_stats = []
495
496
for sentence in response.sentences:
497
# Find tokens in this sentence
498
sentence_tokens = [
499
token for token in response.tokens
500
if (token.text.begin_offset >= sentence.text.begin_offset and
501
token.text.begin_offset < sentence.text.begin_offset + len(sentence.text.content))
502
]
503
504
# Count different types of dependencies
505
clause_count = 0
506
modifier_count = 0
507
508
for token in sentence_tokens:
509
label = token.dependency_edge.label
510
if label in [language_v1.DependencyEdge.Label.CCOMP,
511
language_v1.DependencyEdge.Label.ADVCL,
512
language_v1.DependencyEdge.Label.RCMOD]:
513
clause_count += 1
514
elif label in [language_v1.DependencyEdge.Label.AMOD,
515
language_v1.DependencyEdge.Label.ADVMOD,
516
language_v1.DependencyEdge.Label.PREP]:
517
modifier_count += 1
518
519
stats = {
520
'sentence': sentence.text.content,
521
'token_count': len(sentence_tokens),
522
'clause_count': clause_count,
523
'modifier_count': modifier_count,
524
'complexity_score': len(sentence_tokens) + clause_count * 2 + modifier_count
525
}
526
527
sentence_stats.append(stats)
528
529
return sentence_stats
530
531
# Usage
532
text = """
533
The cat sat.
534
The big fluffy cat that we adopted last year sat quietly on the comfortable wooden chair
535
that my grandmother gave me when I moved into my first apartment.
536
"""
537
538
complexity_stats = analyze_sentence_complexity(client, text)
539
540
print("Sentence Complexity Analysis:")
541
for i, stats in enumerate(complexity_stats, 1):
542
print(f"Sentence {i}: {stats['sentence'][:50]}...")
543
print(f" Tokens: {stats['token_count']}")
544
print(f" Clauses: {stats['clause_count']}")
545
print(f" Modifiers: {stats['modifier_count']}")
546
print(f" Complexity Score: {stats['complexity_score']}")
547
print()
548
```
549
550
## Error Handling
551
552
```python
553
from google.api_core import exceptions
554
555
try:
556
response = client.analyze_syntax(
557
request={"document": document},
558
timeout=25.0
559
)
560
except exceptions.InvalidArgument as e:
561
print(f"Invalid request: {e}")
562
except exceptions.DeadlineExceeded:
563
print("Request timed out")
564
except exceptions.FailedPrecondition as e:
565
print(f"API version error: {e}")
566
print("Note: Syntax analysis requires v1 or v1beta2")
567
except exceptions.GoogleAPIError as e:
568
print(f"API error: {e}")
569
```
570
571
## Performance Considerations
572
573
- **Text Length**: Optimal for documents under 1MB
574
- **Computation**: Most intensive analysis type
575
- **Language Support**: Best results with well-supported languages
576
- **Caching**: Results can be cached for static text
577
- **API Version**: Only available in v1 and v1beta2
578
579
## Use Cases
580
581
- **Grammar Checking**: Identify grammatical errors and suggest corrections
582
- **Text Simplification**: Analyze and simplify complex sentence structures
583
- **Information Extraction**: Extract structured information using syntactic patterns
584
- **Language Learning**: Provide detailed grammatical analysis for educational purposes
585
- **Machine Translation**: Use syntactic information to improve translation quality
586
- **Content Analysis**: Analyze writing style and complexity
587
- **Search Enhancement**: Use syntactic features for better search understanding
588
- **Question Answering**: Use dependency parsing to understand question structure