Tessl Tile for pypi/google-cloud-videointelligence@2.16.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

features-config.md index.md results-data-types.md streaming-analysis.md video-analysis.md

results-data-types.mddocs/

0
# Results and Data Types
1

2
Structured data types for representing video analysis results. These classes contain annotations for detected objects, faces, text, speech, and other content with timestamps and confidence scores.
3

4
## Capabilities
5

6
### Core Response Types
7

8
Main response objects returned from video analysis operations.
9

10
```python { .api }
11
class AnnotateVideoResponse:
12
    """
13
    Video annotation response. Contains annotation results for one or more videos.
14
    
15
    Attributes:
16
        annotation_results: Annotation results for all videos specified in AnnotateVideoRequest
17
    """
18
    
19
    annotation_results: MutableSequence[VideoAnnotationResults]
20

21
class VideoAnnotationResults:
22
    """
23
    Annotation results for a single video.
24
    
25
    Attributes:
26
        input_uri: Video file location in Google Cloud Storage
27
        segment_label_annotations: Label annotations on video level or user-specified segment level
28
        shot_label_annotations: Label annotations on shot level
29
        frame_label_annotations: Label annotations on frame level
30
        face_annotations: Face detection annotations
31
        shot_annotations: Shot annotations which are represented as a list of shots
32
        explicit_annotation: Explicit content annotation
33
        speech_transcriptions: Speech transcription
34
        text_annotations: OCR text detection and tracking
35
        object_annotations: Annotations for list of objects detected and tracked in video
36
        logo_recognition_annotations: Annotations for list of logos detected, tracked and recognized in video
37
        person_detection_annotations: Person detection annotations
38
        celebrity_recognition_annotations: Celebrity recognition annotations (available in v1p3beta1)
39
        error: If processing failed, this field contains the details of the failure
40
    """
41
    
42
    input_uri: str
43
    segment_label_annotations: MutableSequence[LabelAnnotation]
44
    shot_label_annotations: MutableSequence[LabelAnnotation]
45
    frame_label_annotations: MutableSequence[LabelAnnotation]
46
    face_annotations: MutableSequence[FaceAnnotation]
47
    shot_annotations: MutableSequence[VideoSegment]
48
    explicit_annotation: ExplicitContentAnnotation
49
    speech_transcriptions: MutableSequence[SpeechTranscription]
50
    text_annotations: MutableSequence[TextAnnotation]
51
    object_annotations: MutableSequence[ObjectTrackingAnnotation]
52
    logo_recognition_annotations: MutableSequence[LogoRecognitionAnnotation]
53
    person_detection_annotations: MutableSequence[PersonDetectionAnnotation]
54
    celebrity_recognition_annotations: MutableSequence[CelebrityRecognitionAnnotation]
55
    error: status_pb2.Status
56
```
57

58
### Label Detection Results
59

60
Results from label detection analysis, including detected objects, activities, and concepts.
61

62
```python { .api }
63
class LabelAnnotation:
64
    """
65
    Label annotation.
66
    
67
    Attributes:
68
        entity: Detected entity from Video Intelligence API
69
        category_entities: Common categories for the detected entity
70
        segments: All video segments where a label was detected
71
        frames: All video frames where a label was detected
72
    """
73
    
74
    entity: Entity
75
    category_entities: MutableSequence[Entity]
76
    segments: MutableSequence[LabelSegment]
77
    frames: MutableSequence[LabelFrame]
78

79
class LabelSegment:
80
    """
81
    Video segment level annotation results for label detection.
82
    
83
    Attributes:
84
        segment: Video segment where a label was detected
85
        confidence: Confidence that the label is accurate (0.0 to 1.0)
86
    """
87
    
88
    segment: VideoSegment
89
    confidence: float
90

91
class LabelFrame:
92
    """
93
    Video frame level annotation results for label detection.
94
    
95
    Attributes:
96
        time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
97
        confidence: Confidence that the label is accurate (0.0 to 1.0)
98
    """
99
    
100
    time_offset: duration_pb2.Duration
101
    confidence: float
102

103
class Entity:
104
    """
105
    Detected entity from Video Intelligence API.
106
    
107
    Attributes:
108
        entity_id: Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API
109
        description: Textual description, e.g., "Fixed-gear bicycle"
110
        language_code: Language code for description in BCP-47 format
111
    """
112
    
113
    entity_id: str
114
    description: str
115
    language_code: str
116
```
117

118
### Face Detection Results
119

120
Results from face detection and tracking analysis.
121

122
```python { .api }
123
class FaceDetectionAnnotation:
124
    """
125
    Face detection annotation.
126
    
127
    Attributes:
128
        version: Feature version
129
        tracks: The face tracks with attributes
130
        thumbnail: The thumbnail of a person's face
131
    """
132
    
133
    version: str
134
    tracks: MutableSequence[Track]
135
    thumbnail: bytes
136

137
class FaceAnnotation:
138
    """
139
    Deprecated. No effect.
140
    
141
    Attributes:
142
        thumbnail: The thumbnail of a person's face
143
        segments: All video segments where a face was detected
144
    """
145
    
146
    thumbnail: bytes
147
    segments: MutableSequence[FaceSegment]
148

149
class FaceSegment:
150
    """
151
    Video segment level annotation results for face detection.
152
    
153
    Attributes:
154
        segment: Video segment where a face was detected
155
    """
156
    
157
    segment: VideoSegment
158

159
class FaceFrame:
160
    """
161
    Deprecated. No effect.
162
    
163
    Attributes:
164
        normalized_bounding_boxes: Normalized Bounding boxes in a frame
165
        time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
166
    """
167
    
168
    normalized_bounding_boxes: MutableSequence[NormalizedBoundingBox]
169
    time_offset: duration_pb2.Duration
170

171
class Track:
172
    """
173
    A track of an object instance.
174
    
175
    Attributes:
176
        segment: Video segment of a track
177
        timestamped_objects: The object with timestamp and attributes per frame in the track
178
        attributes: Optional. Attributes in the track level
179
        confidence: Optional. The confidence score of the tracked object
180
    """
181
    
182
    segment: VideoSegment
183
    timestamped_objects: MutableSequence[TimestampedObject]
184
    attributes: MutableSequence[DetectedAttribute]
185
    confidence: float
186

187
class TimestampedObject:
188
    """
189
    For tracking the object throughout the video.
190
    
191
    Attributes:
192
        normalized_bounding_box: Normalized Bounding box location of this object track for the frame
193
        time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
194
        attributes: Optional. The attributes of the object in the bounding box
195
        landmarks: Optional. The detected landmarks
196
    """
197
    
198
    normalized_bounding_box: NormalizedBoundingBox
199
    time_offset: duration_pb2.Duration
200
    attributes: MutableSequence[DetectedAttribute]
201
    landmarks: MutableSequence[DetectedLandmark]
202

203
class DetectedAttribute:
204
    """
205
    A generic detected attribute represented by name in string format.
206
    
207
    Attributes:
208
        name: The name of the attribute, for example, glasses, dark_glasses, mouth_open
209
        confidence: Detected attribute confidence (0.0 to 1.0)
210
        value: Text value of the detection result
211
    """
212
    
213
    name: str
214
    confidence: float
215
    value: str
216

217
class DetectedLandmark:
218
    """
219
    A generic detected landmark represented by name in string format and a 2D location.
220
    
221
    Attributes:
222
        name: The name of this landmark, for example, left_hand, right_shoulder
223
        point: The 2D point of the detected landmark using the normalized image coordinate system
224
        confidence: The confidence score of the detected landmark (0.0 to 1.0)
225
    """
226
    
227
    name: str
228
    point: NormalizedVertex
229
    confidence: float
230
```
231

232
### Object Tracking Results
233

234
Results from object detection and tracking analysis.
235

236
```python { .api }
237
class ObjectTrackingAnnotation:
238
    """
239
    Annotations corresponding to one tracked object.
240
    
241
    Attributes:
242
        entity: Entity to specify the object category that this track is labeled as
243
        confidence: Object category's labeling confidence of this track
244
        frames: Information corresponding to all frames where this object track appears
245
        segment: Non-streaming batch mode ONLY. Each object track corresponds to one video segment where it appears
246
        track_id: Streaming mode ONLY. In streaming mode, we do not know the end time of a tracked object before it is completed
247
        version: Feature version
248
    """
249
    
250
    entity: Entity
251
    confidence: float
252
    frames: MutableSequence[ObjectTrackingFrame]
253
    segment: VideoSegment
254
    track_id: int
255
    version: str
256

257
class ObjectTrackingFrame:
258
    """
259
    Video frame level annotations for object detection and tracking.
260
    
261
    Attributes:
262
        normalized_bounding_box: The normalized bounding box location of this object track for the frame
263
        time_offset: The timestamp of the frame in microseconds
264
    """
265
    
266
    normalized_bounding_box: NormalizedBoundingBox
267
    time_offset: duration_pb2.Duration
268
```
269

270
### Text Detection Results
271

272
Results from optical character recognition (OCR) analysis.
273

274
```python { .api }
275
class TextAnnotation:
276
    """
277
    Annotations related to one detected OCR text snippet.
278
    
279
    Attributes:
280
        text: The detected text
281
        segments: All video segments where OCR detected text appears
282
        version: Feature version
283
    """
284
    
285
    text: str
286
    segments: MutableSequence[TextSegment]
287
    version: str
288

289
class TextSegment:
290
    """
291
    Video segment level annotation results for text detection.
292
    
293
    Attributes:
294
        segment: Video segment where a text snippet was detected
295
        confidence: Confidence for the track of detected text
296
        frames: Information related to the frames where OCR detected text appears
297
    """
298
    
299
    segment: VideoSegment
300
    confidence: float
301
    frames: MutableSequence[TextFrame]
302

303
class TextFrame:
304
    """
305
    Video frame level annotation results for text annotation (OCR).
306
    
307
    Attributes:
308
        rotated_bounding_box: Bounding polygon of the detected text for this frame
309
        time_offset: Timestamp of this frame
310
    """
311
    
312
    rotated_bounding_box: NormalizedBoundingPoly
313
    time_offset: duration_pb2.Duration
314
```
315

316
### Speech Transcription Results
317

318
Results from speech-to-text analysis.
319

320
```python { .api }
321
class SpeechTranscription:
322
    """
323
    A speech recognition result corresponding to a portion of the audio.
324
    
325
    Attributes:
326
        alternatives: May contain one or more recognition hypotheses
327
        language_code: Output only. The BCP-47 language tag of the language in this result
328
    """
329
    
330
    alternatives: MutableSequence[SpeechRecognitionAlternative]
331
    language_code: str
332

333
class SpeechRecognitionAlternative:
334
    """
335
    Alternative hypotheses (a.k.a. n-best list).
336
    
337
    Attributes:
338
        transcript: Transcript text representing the words that the user spoke
339
        confidence: Output only. The confidence estimate between 0.0 and 1.0
340
        words: Output only. A list of word-specific information for each recognized word
341
    """
342
    
343
    transcript: str
344
    confidence: float
345
    words: MutableSequence[WordInfo]
346

347
class WordInfo:
348
    """
349
    Word-specific information for recognized words. Word-specific information is only populated if the client requests it.
350
    
351
    Attributes:
352
        start_time: Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word
353
        end_time: Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word
354
        word: The word corresponding to this set of information
355
        confidence: Output only. The confidence estimate between 0.0 and 1.0
356
        speaker_tag: Output only. A distinct integer value is assigned for every speaker within the audio
357
    """
358
    
359
    start_time: duration_pb2.Duration
360
    end_time: duration_pb2.Duration
361
    word: str
362
    confidence: float
363
    speaker_tag: int
364
```
365

366
### Explicit Content Detection Results
367

368
Results from explicit content detection analysis.
369

370
```python { .api }
371
class ExplicitContentAnnotation:
372
    """
373
    Explicit content annotation (based on per-frame visual signals only).
374
    
375
    Attributes:
376
        frames: All video frames where explicit content was detected
377
        version: Feature version
378
    """
379
    
380
    frames: MutableSequence[ExplicitContentFrame]
381
    version: str
382

383
class ExplicitContentFrame:
384
    """
385
    Video frame level annotation results for explicit content.
386
    
387
    Attributes:
388
        time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location
389
        pornography_likelihood: Likelihood of the pornography content
390
    """
391
    
392
    time_offset: duration_pb2.Duration
393
    pornography_likelihood: Likelihood
394
```
395

396
### Person Detection Results
397

398
Results from person detection analysis.
399

400
```python { .api }
401
class PersonDetectionAnnotation:
402
    """
403
    Person detection annotation per video.
404
    
405
    Attributes:
406
        tracks: The detected tracks of a person
407
        version: Feature version
408
    """
409
    
410
    tracks: MutableSequence[Track]
411
    version: str
412
```
413

414
### Logo Recognition Results
415

416
Results from logo detection and recognition analysis.
417

418
```python { .api }
419
class LogoRecognitionAnnotation:
420
    """
421
    Annotation corresponding to one detected, tracked and recognized logo class.
422
    
423
    Attributes:
424
        entity: Entity category information to specify the logo class that all the logo tracks within this LogoRecognitionAnnotation are recognized as
425
        tracks: All logo tracks where the recognized logo appears
426
        segments: All video segments where the recognized logo appears
427
    """
428
    
429
    entity: Entity
430
    tracks: MutableSequence[Track]
431
    segments: MutableSequence[VideoSegment]
432
```
433

434
### Celebrity Recognition Results (Beta)
435

436
Results from celebrity recognition analysis available in v1p3beta1.
437

438
```python { .api }
439
class CelebrityRecognitionAnnotation:
440
    """
441
    Celebrity recognition annotation per video.
442
    
443
    Attributes:
444
        celebrity_tracks: The tracks detected from the input video, including recognized celebrities and other detected faces
445
    """
446
    
447
    celebrity_tracks: MutableSequence[CelebrityTrack]
448

449
class CelebrityTrack:
450
    """
451
    The annotation result of a celebrity face track.
452
    
453
    Attributes:
454
        celebrities: Top N match of the celebrities for the face in this track
455
        face_track: A track of a person's face
456
    """
457
    
458
    celebrities: MutableSequence[RecognizedCelebrity]
459
    face_track: Track
460

461
class RecognizedCelebrity:
462
    """
463
    The recognized celebrity with confidence score.
464
    
465
    Attributes:
466
        celebrity: The recognized celebrity
467
        confidence: Recognition confidence (0.0 to 1.0)
468
    """
469
    
470
    celebrity: Celebrity
471
    confidence: float
472

473
class Celebrity:
474
    """
475
    Celebrity definition.
476
    
477
    Attributes:
478
        name: The resource name of the celebrity (format: video-intelligence/kg-mid)
479
        display_name: The celebrity name
480
        description: Textual description of additional information about the celebrity
481
    """
482
    
483
    name: str
484
    display_name: str
485
    description: str
486
```
487

488
### Geometric Data Types
489

490
Data types for representing spatial information in videos.
491

492
```python { .api }
493
class NormalizedBoundingBox:
494
    """
495
    Normalized bounding box. The normalized vertex coordinates are relative to the original image. Range: [0, 1].
496
    
497
    Attributes:
498
        left: Left X coordinate
499
        top: Top Y coordinate
500
        right: Right X coordinate
501
        bottom: Bottom Y coordinate
502
    """
503
    
504
    left: float
505
    top: float
506
    right: float
507
    bottom: float
508

509
class NormalizedBoundingPoly:
510
    """
511
    Normalized bounding polygon for text (that might not be axis-aligned).
512
    
513
    Attributes:
514
        vertices: The bounding polygon vertices
515
    """
516
    
517
    vertices: MutableSequence[NormalizedVertex]
518

519
class NormalizedVertex:
520
    """
521
    A vertex represents a 2D point in the image. Coordinates are in pixels.
522
    
523
    Attributes:
524
        x: X coordinate
525
        y: Y coordinate
526
    """
527
    
528
    x: float
529
    y: float
530
```
531

532
### Progress and Status Types
533

534
Types for tracking operation progress and handling errors.
535

536
```python { .api }
537
class AnnotateVideoProgress:
538
    """
539
    Video annotation progress. Included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.
540
    
541
    Attributes:
542
        annotation_progress: Progress metadata for all videos specified in AnnotateVideoRequest
543
    """
544
    
545
    annotation_progress: MutableSequence[VideoAnnotationProgress]
546

547
class VideoAnnotationProgress:
548
    """
549
    Annotation progress for a single video.
550
    
551
    Attributes:
552
        input_uri: Video file location in Google Cloud Storage
553
        progress_percent: Approximate percentage processed thus far (0-100)
554
        start_time: Time when the request was received
555
        update_time: Time of the most recent update
556
        feature: Specifies which feature is being tracked if the request contains more than one feature
557
        segment: Specifies which segment is being tracked if the request contains more than one segment
558
    """
559
    
560
    input_uri: str
561
    progress_percent: int
562
    start_time: timestamp_pb2.Timestamp
563
    update_time: timestamp_pb2.Timestamp
564
    feature: Feature
565
    segment: VideoSegment
566
```
567

568
## Usage Examples
569

570
### Processing Label Detection Results
571

572
```python
573
from google.cloud import videointelligence
574

575
# Perform label detection
576
client = videointelligence.VideoIntelligenceServiceClient()
577
operation = client.annotate_video(
578
    request={
579
        "features": [videointelligence.Feature.LABEL_DETECTION],
580
        "input_uri": "gs://your-bucket/your-video.mp4",
581
    }
582
)
583
result = operation.result(timeout=300)
584

585
# Process results
586
for annotation_result in result.annotation_results:
587
    print(f"Processing video: {annotation_result.input_uri}")
588
    
589
    # Segment-level labels
590
    for label in annotation_result.segment_label_annotations:
591
        print(f"\nLabel: {label.entity.description}")
592
        for segment in label.segments:
593
            start_time = segment.segment.start_time_offset.total_seconds()
594
            end_time = segment.segment.end_time_offset.total_seconds()
595
            print(f"  Segment: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})")
596
    
597
    # Frame-level labels
598
    for label in annotation_result.frame_label_annotations:
599
        print(f"\nFrame-level label: {label.entity.description}")
600
        for frame in label.frames[:5]:  # Show first 5 frames
601
            time_offset = frame.time_offset.total_seconds()
602
            print(f"  Frame at {time_offset:.1f}s (confidence: {frame.confidence:.2f})")
603
```
604

605
### Processing Face Detection Results
606

607
```python
608
# Process face detection results
609
for annotation_result in result.annotation_results:
610
    face_annotations = annotation_result.face_annotations
611
    
612
    for face_annotation in face_annotations:
613
        print("Face detected:")
614
        for segment in face_annotation.segments:
615
            start_time = segment.segment.start_time_offset.total_seconds()
616
            end_time = segment.segment.end_time_offset.total_seconds()
617
            print(f"  Time: {start_time:.1f}s to {end_time:.1f}s")
618
    
619
    # Newer face detection format
620
    for face_detection in annotation_result.face_detection_annotations:
621
        for track in face_detection.tracks:
622
            print(f"Face track (confidence: {track.confidence:.2f}):")
623
            for timestamped_object in track.timestamped_objects:
624
                time_offset = timestamped_object.time_offset.total_seconds()
625
                bbox = timestamped_object.normalized_bounding_box
626
                print(f"  {time_offset:.1f}s: bbox({bbox.left:.3f}, {bbox.top:.3f}, {bbox.right:.3f}, {bbox.bottom:.3f})")
627
```
628

629
### Processing Speech Transcription Results
630

631
```python
632
# Process speech transcription results
633
for annotation_result in result.annotation_results:
634
    for transcription in annotation_result.speech_transcriptions:
635
        print(f"Language: {transcription.language_code}")
636
        
637
        for alternative in transcription.alternatives:
638
            print(f"Transcript: {alternative.transcript}")
639
            print(f"Confidence: {alternative.confidence:.2f}")
640
            
641
            # Word-level information
642
            for word_info in alternative.words:
643
                start_time = word_info.start_time.total_seconds()
644
                end_time = word_info.end_time.total_seconds()
645
                print(f"  {word_info.word}: {start_time:.1f}s-{end_time:.1f}s (speaker: {word_info.speaker_tag})")
646
```
647

648
### Processing Object Tracking Results
649

650
```python
651
# Process object tracking results
652
for annotation_result in result.annotation_results:
653
    for object_annotation in annotation_result.object_annotations:
654
        print(f"Object: {object_annotation.entity.description}")
655
        print(f"Confidence: {object_annotation.confidence:.2f}")
656
        print(f"Track ID: {object_annotation.track_id}")
657
        
658
        # Show first few frames
659
        for frame in object_annotation.frames[:10]:
660
            time_offset = frame.time_offset.total_seconds()
661
            bbox = frame.normalized_bounding_box
662
            print(f"  {time_offset:.1f}s: ({bbox.left:.3f}, {bbox.top:.3f}) to ({bbox.right:.3f}, {bbox.bottom:.3f})")
663
```
664

665
### Processing Text Detection Results
666

667
```python
668
# Process text detection results
669
for annotation_result in result.annotation_results:
670
    for text_annotation in annotation_result.text_annotations:
671
        print(f"Detected text: {text_annotation.text}")
672
        
673
        for segment in text_annotation.segments:
674
            start_time = segment.segment.start_time_offset.total_seconds()
675
            end_time = segment.segment.end_time_offset.total_seconds()
676
            print(f"  Time: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})")
677
            
678
            # Frame-level information
679
            for frame in segment.frames:
680
                time_offset = frame.time_offset.total_seconds()
681
                print(f"    Frame at {time_offset:.1f}s")
682
```
683

684
### Processing Celebrity Recognition Results (Beta)
685

686
```python
687
from google.cloud import videointelligence_v1p3beta1
688

689
# Process celebrity recognition results (available in v1p3beta1)
690
for annotation_result in result.annotation_results:
691
    if hasattr(annotation_result, 'celebrity_recognition_annotations'):
692
        for celebrity_annotation in annotation_result.celebrity_recognition_annotations:
693
            print("Celebrity Recognition Results:")
694
            
695
            for celebrity_track in celebrity_annotation.celebrity_tracks:
696
                print(f"  Face track detected:")
697
                
698
                # Process recognized celebrities for this track
699
                for recognized_celebrity in celebrity_track.celebrities:
700
                    celebrity = recognized_celebrity.celebrity
701
                    confidence = recognized_celebrity.confidence
702
                    print(f"    Celebrity: {celebrity.display_name}")
703
                    print(f"    Confidence: {confidence:.2f}")
704
                    print(f"    Description: {celebrity.description}")
705
                    print(f"    Resource Name: {celebrity.name}")
706
                
707
                # Process face track information
708
                face_track = celebrity_track.face_track
709
                if face_track.segment:
710
                    start_time = face_track.segment.start_time_offset.total_seconds()
711
                    end_time = face_track.segment.end_time_offset.total_seconds()
712
                    print(f"    Track Duration: {start_time:.1f}s to {end_time:.1f}s")
713
                
714
                # Show first few timestamped objects
715
                for timestamped_obj in face_track.timestamped_objects[:5]:
716
                    time_offset = timestamped_obj.time_offset.total_seconds()
717
                    bbox = timestamped_obj.normalized_bounding_box
718
                    print(f"    {time_offset:.1f}s: bbox({bbox.left:.3f}, {bbox.top:.3f}, {bbox.right:.3f}, {bbox.bottom:.3f})")
719
```

Version

Tile

Files

results-data-types.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

results-data-types.mddocs/