0
# Results and Data Types
1
2
Structured data types for representing video analysis results. These classes contain annotations for detected objects, faces, text, speech, and other content with timestamps and confidence scores.
3
4
## Capabilities
5
6
### Core Response Types
7
8
Main response objects returned from video analysis operations.
9
10
```python { .api }
11
class AnnotateVideoResponse:
12
"""
13
Video annotation response. Contains annotation results for one or more videos.
14
15
Attributes:
16
annotation_results: Annotation results for all videos specified in AnnotateVideoRequest
17
"""
18
19
annotation_results: MutableSequence[VideoAnnotationResults]
20
21
class VideoAnnotationResults:
22
"""
23
Annotation results for a single video.
24
25
Attributes:
26
input_uri: Video file location in Google Cloud Storage
27
segment_label_annotations: Label annotations on video level or user-specified segment level
28
shot_label_annotations: Label annotations on shot level
29
frame_label_annotations: Label annotations on frame level
30
face_annotations: Face detection annotations
31
shot_annotations: Shot annotations which are represented as a list of shots
32
explicit_annotation: Explicit content annotation
33
speech_transcriptions: Speech transcription
34
text_annotations: OCR text detection and tracking
35
object_annotations: Annotations for list of objects detected and tracked in video
36
logo_recognition_annotations: Annotations for list of logos detected, tracked and recognized in video
37
person_detection_annotations: Person detection annotations
38
celebrity_recognition_annotations: Celebrity recognition annotations (available in v1p3beta1)
39
error: If processing failed, this field contains the details of the failure
40
"""
41
42
input_uri: str
43
segment_label_annotations: MutableSequence[LabelAnnotation]
44
shot_label_annotations: MutableSequence[LabelAnnotation]
45
frame_label_annotations: MutableSequence[LabelAnnotation]
46
face_annotations: MutableSequence[FaceAnnotation]
47
shot_annotations: MutableSequence[VideoSegment]
48
explicit_annotation: ExplicitContentAnnotation
49
speech_transcriptions: MutableSequence[SpeechTranscription]
50
text_annotations: MutableSequence[TextAnnotation]
51
object_annotations: MutableSequence[ObjectTrackingAnnotation]
52
logo_recognition_annotations: MutableSequence[LogoRecognitionAnnotation]
53
person_detection_annotations: MutableSequence[PersonDetectionAnnotation]
54
celebrity_recognition_annotations: MutableSequence[CelebrityRecognitionAnnotation]
55
error: status_pb2.Status
56
```
57
58
### Label Detection Results
59
60
Results from label detection analysis, including detected objects, activities, and concepts.
61
62
```python { .api }
63
class LabelAnnotation:
64
"""
65
Label annotation.
66
67
Attributes:
68
entity: Detected entity from Video Intelligence API
69
category_entities: Common categories for the detected entity
70
segments: All video segments where a label was detected
71
frames: All video frames where a label was detected
72
"""
73
74
entity: Entity
75
category_entities: MutableSequence[Entity]
76
segments: MutableSequence[LabelSegment]
77
frames: MutableSequence[LabelFrame]
78
79
class LabelSegment:
80
"""
81
Video segment level annotation results for label detection.
82
83
Attributes:
84
segment: Video segment where a label was detected
85
confidence: Confidence that the label is accurate (0.0 to 1.0)
86
"""
87
88
segment: VideoSegment
89
confidence: float
90
91
class LabelFrame:
92
"""
93
Video frame level annotation results for label detection.
94
95
Attributes:
96
time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
97
confidence: Confidence that the label is accurate (0.0 to 1.0)
98
"""
99
100
time_offset: duration_pb2.Duration
101
confidence: float
102
103
class Entity:
104
"""
105
Detected entity from Video Intelligence API.
106
107
Attributes:
108
entity_id: Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API
109
description: Textual description, e.g., "Fixed-gear bicycle"
110
language_code: Language code for description in BCP-47 format
111
"""
112
113
entity_id: str
114
description: str
115
language_code: str
116
```
117
118
### Face Detection Results
119
120
Results from face detection and tracking analysis.
121
122
```python { .api }
123
class FaceDetectionAnnotation:
124
"""
125
Face detection annotation.
126
127
Attributes:
128
version: Feature version
129
tracks: The face tracks with attributes
130
thumbnail: The thumbnail of a person's face
131
"""
132
133
version: str
134
tracks: MutableSequence[Track]
135
thumbnail: bytes
136
137
class FaceAnnotation:
138
"""
139
Deprecated. No effect.
140
141
Attributes:
142
thumbnail: The thumbnail of a person's face
143
segments: All video segments where a face was detected
144
"""
145
146
thumbnail: bytes
147
segments: MutableSequence[FaceSegment]
148
149
class FaceSegment:
150
"""
151
Video segment level annotation results for face detection.
152
153
Attributes:
154
segment: Video segment where a face was detected
155
"""
156
157
segment: VideoSegment
158
159
class FaceFrame:
160
"""
161
Deprecated. No effect.
162
163
Attributes:
164
normalized_bounding_boxes: Normalized Bounding boxes in a frame
165
time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
166
"""
167
168
normalized_bounding_boxes: MutableSequence[NormalizedBoundingBox]
169
time_offset: duration_pb2.Duration
170
171
class Track:
172
"""
173
A track of an object instance.
174
175
Attributes:
176
segment: Video segment of a track
177
timestamped_objects: The object with timestamp and attributes per frame in the track
178
attributes: Optional. Attributes in the track level
179
confidence: Optional. The confidence score of the tracked object
180
"""
181
182
segment: VideoSegment
183
timestamped_objects: MutableSequence[TimestampedObject]
184
attributes: MutableSequence[DetectedAttribute]
185
confidence: float
186
187
class TimestampedObject:
188
"""
189
For tracking the object throughout the video.
190
191
Attributes:
192
normalized_bounding_box: Normalized Bounding box location of this object track for the frame
193
time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
194
attributes: Optional. The attributes of the object in the bounding box
195
landmarks: Optional. The detected landmarks
196
"""
197
198
normalized_bounding_box: NormalizedBoundingBox
199
time_offset: duration_pb2.Duration
200
attributes: MutableSequence[DetectedAttribute]
201
landmarks: MutableSequence[DetectedLandmark]
202
203
class DetectedAttribute:
204
"""
205
A generic detected attribute represented by name in string format.
206
207
Attributes:
208
name: The name of the attribute, for example, glasses, dark_glasses, mouth_open
209
confidence: Detected attribute confidence (0.0 to 1.0)
210
value: Text value of the detection result
211
"""
212
213
name: str
214
confidence: float
215
value: str
216
217
class DetectedLandmark:
218
"""
219
A generic detected landmark represented by name in string format and a 2D location.
220
221
Attributes:
222
name: The name of this landmark, for example, left_hand, right_shoulder
223
point: The 2D point of the detected landmark using the normalized image coordinate system
224
confidence: The confidence score of the detected landmark (0.0 to 1.0)
225
"""
226
227
name: str
228
point: NormalizedVertex
229
confidence: float
230
```
231
232
### Object Tracking Results
233
234
Results from object detection and tracking analysis.
235
236
```python { .api }
237
class ObjectTrackingAnnotation:
238
"""
239
Annotations corresponding to one tracked object.
240
241
Attributes:
242
entity: Entity to specify the object category that this track is labeled as
243
confidence: Object category's labeling confidence of this track
244
frames: Information corresponding to all frames where this object track appears
245
segment: Non-streaming batch mode ONLY. Each object track corresponds to one video segment where it appears
246
track_id: Streaming mode ONLY. In streaming mode, we do not know the end time of a tracked object before it is completed
247
version: Feature version
248
"""
249
250
entity: Entity
251
confidence: float
252
frames: MutableSequence[ObjectTrackingFrame]
253
segment: VideoSegment
254
track_id: int
255
version: str
256
257
class ObjectTrackingFrame:
258
"""
259
Video frame level annotations for object detection and tracking.
260
261
Attributes:
262
normalized_bounding_box: The normalized bounding box location of this object track for the frame
263
time_offset: The timestamp of the frame in microseconds
264
"""
265
266
normalized_bounding_box: NormalizedBoundingBox
267
time_offset: duration_pb2.Duration
268
```
269
270
### Text Detection Results
271
272
Results from optical character recognition (OCR) analysis.
273
274
```python { .api }
275
class TextAnnotation:
276
"""
277
Annotations related to one detected OCR text snippet.
278
279
Attributes:
280
text: The detected text
281
segments: All video segments where OCR detected text appears
282
version: Feature version
283
"""
284
285
text: str
286
segments: MutableSequence[TextSegment]
287
version: str
288
289
class TextSegment:
290
"""
291
Video segment level annotation results for text detection.
292
293
Attributes:
294
segment: Video segment where a text snippet was detected
295
confidence: Confidence for the track of detected text
296
frames: Information related to the frames where OCR detected text appears
297
"""
298
299
segment: VideoSegment
300
confidence: float
301
frames: MutableSequence[TextFrame]
302
303
class TextFrame:
304
"""
305
Video frame level annotation results for text annotation (OCR).
306
307
Attributes:
308
rotated_bounding_box: Bounding polygon of the detected text for this frame
309
time_offset: Timestamp of this frame
310
"""
311
312
rotated_bounding_box: NormalizedBoundingPoly
313
time_offset: duration_pb2.Duration
314
```
315
316
### Speech Transcription Results
317
318
Results from speech-to-text analysis.
319
320
```python { .api }
321
class SpeechTranscription:
322
"""
323
A speech recognition result corresponding to a portion of the audio.
324
325
Attributes:
326
alternatives: May contain one or more recognition hypotheses
327
language_code: Output only. The BCP-47 language tag of the language in this result
328
"""
329
330
alternatives: MutableSequence[SpeechRecognitionAlternative]
331
language_code: str
332
333
class SpeechRecognitionAlternative:
334
"""
335
Alternative hypotheses (a.k.a. n-best list).
336
337
Attributes:
338
transcript: Transcript text representing the words that the user spoke
339
confidence: Output only. The confidence estimate between 0.0 and 1.0
340
words: Output only. A list of word-specific information for each recognized word
341
"""
342
343
transcript: str
344
confidence: float
345
words: MutableSequence[WordInfo]
346
347
class WordInfo:
348
"""
349
Word-specific information for recognized words. Word-specific information is only populated if the client requests it.
350
351
Attributes:
352
start_time: Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word
353
end_time: Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word
354
word: The word corresponding to this set of information
355
confidence: Output only. The confidence estimate between 0.0 and 1.0
356
speaker_tag: Output only. A distinct integer value is assigned for every speaker within the audio
357
"""
358
359
start_time: duration_pb2.Duration
360
end_time: duration_pb2.Duration
361
word: str
362
confidence: float
363
speaker_tag: int
364
```
365
366
### Explicit Content Detection Results
367
368
Results from explicit content detection analysis.
369
370
```python { .api }
371
class ExplicitContentAnnotation:
372
"""
373
Explicit content annotation (based on per-frame visual signals only).
374
375
Attributes:
376
frames: All video frames where explicit content was detected
377
version: Feature version
378
"""
379
380
frames: MutableSequence[ExplicitContentFrame]
381
version: str
382
383
class ExplicitContentFrame:
384
"""
385
Video frame level annotation results for explicit content.
386
387
Attributes:
388
time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
389
pornography_likelihood: Likelihood of the pornography content
390
"""
391
392
time_offset: duration_pb2.Duration
393
pornography_likelihood: Likelihood
394
```
395
396
### Person Detection Results
397
398
Results from person detection analysis.
399
400
```python { .api }
401
class PersonDetectionAnnotation:
402
"""
403
Person detection annotation per video.
404
405
Attributes:
406
tracks: The detected tracks of a person
407
version: Feature version
408
"""
409
410
tracks: MutableSequence[Track]
411
version: str
412
```
413
414
### Logo Recognition Results
415
416
Results from logo detection and recognition analysis.
417
418
```python { .api }
419
class LogoRecognitionAnnotation:
420
"""
421
Annotation corresponding to one detected, tracked and recognized logo class.
422
423
Attributes:
424
entity: Entity category information to specify the logo class that all the logo tracks within this LogoRecognitionAnnotation are recognized as
425
tracks: All logo tracks where the recognized logo appears
426
segments: All video segments where the recognized logo appears
427
"""
428
429
entity: Entity
430
tracks: MutableSequence[Track]
431
segments: MutableSequence[VideoSegment]
432
```
433
434
### Celebrity Recognition Results (Beta)
435
436
Results from celebrity recognition analysis available in v1p3beta1.
437
438
```python { .api }
439
class CelebrityRecognitionAnnotation:
440
"""
441
Celebrity recognition annotation per video.
442
443
Attributes:
444
celebrity_tracks: The tracks detected from the input video, including recognized celebrities and other detected faces
445
"""
446
447
celebrity_tracks: MutableSequence[CelebrityTrack]
448
449
class CelebrityTrack:
450
"""
451
The annotation result of a celebrity face track.
452
453
Attributes:
454
celebrities: Top N match of the celebrities for the face in this track
455
face_track: A track of a person's face
456
"""
457
458
celebrities: MutableSequence[RecognizedCelebrity]
459
face_track: Track
460
461
class RecognizedCelebrity:
462
"""
463
The recognized celebrity with confidence score.
464
465
Attributes:
466
celebrity: The recognized celebrity
467
confidence: Recognition confidence (0.0 to 1.0)
468
"""
469
470
celebrity: Celebrity
471
confidence: float
472
473
class Celebrity:
474
"""
475
Celebrity definition.
476
477
Attributes:
478
name: The resource name of the celebrity (format: video-intelligence/kg-mid)
479
display_name: The celebrity name
480
description: Textual description of additional information about the celebrity
481
"""
482
483
name: str
484
display_name: str
485
description: str
486
```
487
488
### Geometric Data Types
489
490
Data types for representing spatial information in videos.
491
492
```python { .api }
493
class NormalizedBoundingBox:
494
"""
495
Normalized bounding box. The normalized vertex coordinates are relative to the original image. Range: [0, 1].
496
497
Attributes:
498
left: Left X coordinate
499
top: Top Y coordinate
500
right: Right X coordinate
501
bottom: Bottom Y coordinate
502
"""
503
504
left: float
505
top: float
506
right: float
507
bottom: float
508
509
class NormalizedBoundingPoly:
510
"""
511
Normalized bounding polygon for text (that might not be axis-aligned).
512
513
Attributes:
514
vertices: The bounding polygon vertices
515
"""
516
517
vertices: MutableSequence[NormalizedVertex]
518
519
class NormalizedVertex:
520
"""
521
A vertex represents a 2D point in the image. Coordinates are in pixels.
522
523
Attributes:
524
x: X coordinate
525
y: Y coordinate
526
"""
527
528
x: float
529
y: float
530
```
531
532
### Progress and Status Types
533
534
Types for tracking operation progress and handling errors.
535
536
```python { .api }
537
class AnnotateVideoProgress:
538
"""
539
Video annotation progress. Included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.
540
541
Attributes:
542
annotation_progress: Progress metadata for all videos specified in AnnotateVideoRequest
543
"""
544
545
annotation_progress: MutableSequence[VideoAnnotationProgress]
546
547
class VideoAnnotationProgress:
548
"""
549
Annotation progress for a single video.
550
551
Attributes:
552
input_uri: Video file location in Google Cloud Storage
553
progress_percent: Approximate percentage processed thus far (0-100)
554
start_time: Time when the request was received
555
update_time: Time of the most recent update
556
feature: Specifies which feature is being tracked if the request contains more than one feature
557
segment: Specifies which segment is being tracked if the request contains more than one segment
558
"""
559
560
input_uri: str
561
progress_percent: int
562
start_time: timestamp_pb2.Timestamp
563
update_time: timestamp_pb2.Timestamp
564
feature: Feature
565
segment: VideoSegment
566
```
567
568
## Usage Examples
569
570
### Processing Label Detection Results
571
572
```python
573
from google.cloud import videointelligence
574
575
# Perform label detection
576
client = videointelligence.VideoIntelligenceServiceClient()
577
operation = client.annotate_video(
578
request={
579
"features": [videointelligence.Feature.LABEL_DETECTION],
580
"input_uri": "gs://your-bucket/your-video.mp4",
581
}
582
)
583
result = operation.result(timeout=300)
584
585
# Process results
586
for annotation_result in result.annotation_results:
587
print(f"Processing video: {annotation_result.input_uri}")
588
589
# Segment-level labels
590
for label in annotation_result.segment_label_annotations:
591
print(f"\nLabel: {label.entity.description}")
592
for segment in label.segments:
593
start_time = segment.segment.start_time_offset.total_seconds()
594
end_time = segment.segment.end_time_offset.total_seconds()
595
print(f" Segment: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})")
596
597
# Frame-level labels
598
for label in annotation_result.frame_label_annotations:
599
print(f"\nFrame-level label: {label.entity.description}")
600
for frame in label.frames[:5]: # Show first 5 frames
601
time_offset = frame.time_offset.total_seconds()
602
print(f" Frame at {time_offset:.1f}s (confidence: {frame.confidence:.2f})")
603
```
604
605
### Processing Face Detection Results
606
607
```python
608
# Process face detection results
609
for annotation_result in result.annotation_results:
610
face_annotations = annotation_result.face_annotations
611
612
for face_annotation in face_annotations:
613
print("Face detected:")
614
for segment in face_annotation.segments:
615
start_time = segment.segment.start_time_offset.total_seconds()
616
end_time = segment.segment.end_time_offset.total_seconds()
617
print(f" Time: {start_time:.1f}s to {end_time:.1f}s")
618
619
# Newer face detection format
620
for face_detection in annotation_result.face_detection_annotations:
621
for track in face_detection.tracks:
622
print(f"Face track (confidence: {track.confidence:.2f}):")
623
for timestamped_object in track.timestamped_objects:
624
time_offset = timestamped_object.time_offset.total_seconds()
625
bbox = timestamped_object.normalized_bounding_box
626
print(f" {time_offset:.1f}s: bbox({bbox.left:.3f}, {bbox.top:.3f}, {bbox.right:.3f}, {bbox.bottom:.3f})")
627
```
628
629
### Processing Speech Transcription Results
630
631
```python
632
# Process speech transcription results
633
for annotation_result in result.annotation_results:
634
for transcription in annotation_result.speech_transcriptions:
635
print(f"Language: {transcription.language_code}")
636
637
for alternative in transcription.alternatives:
638
print(f"Transcript: {alternative.transcript}")
639
print(f"Confidence: {alternative.confidence:.2f}")
640
641
# Word-level information
642
for word_info in alternative.words:
643
start_time = word_info.start_time.total_seconds()
644
end_time = word_info.end_time.total_seconds()
645
print(f" {word_info.word}: {start_time:.1f}s-{end_time:.1f}s (speaker: {word_info.speaker_tag})")
646
```
647
648
### Processing Object Tracking Results
649
650
```python
651
# Process object tracking results
652
for annotation_result in result.annotation_results:
653
for object_annotation in annotation_result.object_annotations:
654
print(f"Object: {object_annotation.entity.description}")
655
print(f"Confidence: {object_annotation.confidence:.2f}")
656
print(f"Track ID: {object_annotation.track_id}")
657
658
# Show first few frames
659
for frame in object_annotation.frames[:10]:
660
time_offset = frame.time_offset.total_seconds()
661
bbox = frame.normalized_bounding_box
662
print(f" {time_offset:.1f}s: ({bbox.left:.3f}, {bbox.top:.3f}) to ({bbox.right:.3f}, {bbox.bottom:.3f})")
663
```
664
665
### Processing Text Detection Results
666
667
```python
668
# Process text detection results
669
for annotation_result in result.annotation_results:
670
for text_annotation in annotation_result.text_annotations:
671
print(f"Detected text: {text_annotation.text}")
672
673
for segment in text_annotation.segments:
674
start_time = segment.segment.start_time_offset.total_seconds()
675
end_time = segment.segment.end_time_offset.total_seconds()
676
print(f" Time: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})")
677
678
# Frame-level information
679
for frame in segment.frames:
680
time_offset = frame.time_offset.total_seconds()
681
print(f" Frame at {time_offset:.1f}s")
682
```
683
684
### Processing Celebrity Recognition Results (Beta)
685
686
```python
687
from google.cloud import videointelligence_v1p3beta1
688
689
# Process celebrity recognition results (available in v1p3beta1)
690
for annotation_result in result.annotation_results:
691
if hasattr(annotation_result, 'celebrity_recognition_annotations'):
692
for celebrity_annotation in annotation_result.celebrity_recognition_annotations:
693
print("Celebrity Recognition Results:")
694
695
for celebrity_track in celebrity_annotation.celebrity_tracks:
696
print(f" Face track detected:")
697
698
# Process recognized celebrities for this track
699
for recognized_celebrity in celebrity_track.celebrities:
700
celebrity = recognized_celebrity.celebrity
701
confidence = recognized_celebrity.confidence
702
print(f" Celebrity: {celebrity.display_name}")
703
print(f" Confidence: {confidence:.2f}")
704
print(f" Description: {celebrity.description}")
705
print(f" Resource Name: {celebrity.name}")
706
707
# Process face track information
708
face_track = celebrity_track.face_track
709
if face_track.segment:
710
start_time = face_track.segment.start_time_offset.total_seconds()
711
end_time = face_track.segment.end_time_offset.total_seconds()
712
print(f" Track Duration: {start_time:.1f}s to {end_time:.1f}s")
713
714
# Show first few timestamped objects
715
for timestamped_obj in face_track.timestamped_objects[:5]:
716
time_offset = timestamped_obj.time_offset.total_seconds()
717
bbox = timestamped_obj.normalized_bounding_box
718
print(f" {time_offset:.1f}s: bbox({bbox.left:.3f}, {bbox.top:.3f}, {bbox.right:.3f}, {bbox.bottom:.3f})")
719
```