or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

features-config.mdindex.mdresults-data-types.mdstreaming-analysis.mdvideo-analysis.md

results-data-types.mddocs/

0

# Results and Data Types

1

2

Structured data types for representing video analysis results. These classes contain annotations for detected objects, faces, text, speech, and other content with timestamps and confidence scores.

3

4

## Capabilities

5

6

### Core Response Types

7

8

Main response objects returned from video analysis operations.

9

10

```python { .api }

11

class AnnotateVideoResponse:

12

"""

13

Video annotation response. Contains annotation results for one or more videos.

14

15

Attributes:

16

annotation_results: Annotation results for all videos specified in AnnotateVideoRequest

17

"""

18

19

annotation_results: MutableSequence[VideoAnnotationResults]

20

21

class VideoAnnotationResults:

22

"""

23

Annotation results for a single video.

24

25

Attributes:

26

input_uri: Video file location in Google Cloud Storage

27

segment_label_annotations: Label annotations on video level or user-specified segment level

28

shot_label_annotations: Label annotations on shot level

29

frame_label_annotations: Label annotations on frame level

30

face_annotations: Face detection annotations

31

shot_annotations: Shot annotations which are represented as a list of shots

32

explicit_annotation: Explicit content annotation

33

speech_transcriptions: Speech transcription

34

text_annotations: OCR text detection and tracking

35

object_annotations: Annotations for list of objects detected and tracked in video

36

logo_recognition_annotations: Annotations for list of logos detected, tracked and recognized in video

37

person_detection_annotations: Person detection annotations

38

celebrity_recognition_annotations: Celebrity recognition annotations (available in v1p3beta1)

39

error: If processing failed, this field contains the details of the failure

40

"""

41

42

input_uri: str

43

segment_label_annotations: MutableSequence[LabelAnnotation]

44

shot_label_annotations: MutableSequence[LabelAnnotation]

45

frame_label_annotations: MutableSequence[LabelAnnotation]

46

face_annotations: MutableSequence[FaceAnnotation]

47

shot_annotations: MutableSequence[VideoSegment]

48

explicit_annotation: ExplicitContentAnnotation

49

speech_transcriptions: MutableSequence[SpeechTranscription]

50

text_annotations: MutableSequence[TextAnnotation]

51

object_annotations: MutableSequence[ObjectTrackingAnnotation]

52

logo_recognition_annotations: MutableSequence[LogoRecognitionAnnotation]

53

person_detection_annotations: MutableSequence[PersonDetectionAnnotation]

54

celebrity_recognition_annotations: MutableSequence[CelebrityRecognitionAnnotation]

55

error: status_pb2.Status

56

```

57

58

### Label Detection Results

59

60

Results from label detection analysis, including detected objects, activities, and concepts.

61

62

```python { .api }

63

class LabelAnnotation:

64

"""

65

Label annotation.

66

67

Attributes:

68

entity: Detected entity from Video Intelligence API

69

category_entities: Common categories for the detected entity

70

segments: All video segments where a label was detected

71

frames: All video frames where a label was detected

72

"""

73

74

entity: Entity

75

category_entities: MutableSequence[Entity]

76

segments: MutableSequence[LabelSegment]

77

frames: MutableSequence[LabelFrame]

78

79

class LabelSegment:

80

"""

81

Video segment level annotation results for label detection.

82

83

Attributes:

84

segment: Video segment where a label was detected

85

confidence: Confidence that the label is accurate (0.0 to 1.0)

86

"""

87

88

segment: VideoSegment

89

confidence: float

90

91

class LabelFrame:

92

"""

93

Video frame level annotation results for label detection.

94

95

Attributes:

96

time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location

97

confidence: Confidence that the label is accurate (0.0 to 1.0)

98

"""

99

100

time_offset: duration_pb2.Duration

101

confidence: float

102

103

class Entity:

104

"""

105

Detected entity from Video Intelligence API.

106

107

Attributes:

108

entity_id: Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API

109

description: Textual description, e.g., "Fixed-gear bicycle"

110

language_code: Language code for description in BCP-47 format

111

"""

112

113

entity_id: str

114

description: str

115

language_code: str

116

```

117

118

### Face Detection Results

119

120

Results from face detection and tracking analysis.

121

122

```python { .api }

123

class FaceDetectionAnnotation:

124

"""

125

Face detection annotation.

126

127

Attributes:

128

version: Feature version

129

tracks: The face tracks with attributes

130

thumbnail: The thumbnail of a person's face

131

"""

132

133

version: str

134

tracks: MutableSequence[Track]

135

thumbnail: bytes

136

137

class FaceAnnotation:

138

"""

139

Deprecated. No effect.

140

141

Attributes:

142

thumbnail: The thumbnail of a person's face

143

segments: All video segments where a face was detected

144

"""

145

146

thumbnail: bytes

147

segments: MutableSequence[FaceSegment]

148

149

class FaceSegment:

150

"""

151

Video segment level annotation results for face detection.

152

153

Attributes:

154

segment: Video segment where a face was detected

155

"""

156

157

segment: VideoSegment

158

159

class FaceFrame:

160

"""

161

Deprecated. No effect.

162

163

Attributes:

164

normalized_bounding_boxes: Normalized Bounding boxes in a frame

165

time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location

166

"""

167

168

normalized_bounding_boxes: MutableSequence[NormalizedBoundingBox]

169

time_offset: duration_pb2.Duration

170

171

class Track:

172

"""

173

A track of an object instance.

174

175

Attributes:

176

segment: Video segment of a track

177

timestamped_objects: The object with timestamp and attributes per frame in the track

178

attributes: Optional. Attributes in the track level

179

confidence: Optional. The confidence score of the tracked object

180

"""

181

182

segment: VideoSegment

183

timestamped_objects: MutableSequence[TimestampedObject]

184

attributes: MutableSequence[DetectedAttribute]

185

confidence: float

186

187

class TimestampedObject:

188

"""

189

For tracking the object throughout the video.

190

191

Attributes:

192

normalized_bounding_box: Normalized Bounding box location of this object track for the frame

193

time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location

194

attributes: Optional. The attributes of the object in the bounding box

195

landmarks: Optional. The detected landmarks

196

"""

197

198

normalized_bounding_box: NormalizedBoundingBox

199

time_offset: duration_pb2.Duration

200

attributes: MutableSequence[DetectedAttribute]

201

landmarks: MutableSequence[DetectedLandmark]

202

203

class DetectedAttribute:

204

"""

205

A generic detected attribute represented by name in string format.

206

207

Attributes:

208

name: The name of the attribute, for example, glasses, dark_glasses, mouth_open

209

confidence: Detected attribute confidence (0.0 to 1.0)

210

value: Text value of the detection result

211

"""

212

213

name: str

214

confidence: float

215

value: str

216

217

class DetectedLandmark:

218

"""

219

A generic detected landmark represented by name in string format and a 2D location.

220

221

Attributes:

222

name: The name of this landmark, for example, left_hand, right_shoulder

223

point: The 2D point of the detected landmark using the normalized image coordinate system

224

confidence: The confidence score of the detected landmark (0.0 to 1.0)

225

"""

226

227

name: str

228

point: NormalizedVertex

229

confidence: float

230

```

231

232

### Object Tracking Results

233

234

Results from object detection and tracking analysis.

235

236

```python { .api }

237

class ObjectTrackingAnnotation:

238

"""

239

Annotations corresponding to one tracked object.

240

241

Attributes:

242

entity: Entity to specify the object category that this track is labeled as

243

confidence: Object category's labeling confidence of this track

244

frames: Information corresponding to all frames where this object track appears

245

segment: Non-streaming batch mode ONLY. Each object track corresponds to one video segment where it appears

246

track_id: Streaming mode ONLY. In streaming mode, we do not know the end time of a tracked object before it is completed

247

version: Feature version

248

"""

249

250

entity: Entity

251

confidence: float

252

frames: MutableSequence[ObjectTrackingFrame]

253

segment: VideoSegment

254

track_id: int

255

version: str

256

257

class ObjectTrackingFrame:

258

"""

259

Video frame level annotations for object detection and tracking.

260

261

Attributes:

262

normalized_bounding_box: The normalized bounding box location of this object track for the frame

263

time_offset: The timestamp of the frame in microseconds

264

"""

265

266

normalized_bounding_box: NormalizedBoundingBox

267

time_offset: duration_pb2.Duration

268

```

269

270

### Text Detection Results

271

272

Results from optical character recognition (OCR) analysis.

273

274

```python { .api }

275

class TextAnnotation:

276

"""

277

Annotations related to one detected OCR text snippet.

278

279

Attributes:

280

text: The detected text

281

segments: All video segments where OCR detected text appears

282

version: Feature version

283

"""

284

285

text: str

286

segments: MutableSequence[TextSegment]

287

version: str

288

289

class TextSegment:

290

"""

291

Video segment level annotation results for text detection.

292

293

Attributes:

294

segment: Video segment where a text snippet was detected

295

confidence: Confidence for the track of detected text

296

frames: Information related to the frames where OCR detected text appears

297

"""

298

299

segment: VideoSegment

300

confidence: float

301

frames: MutableSequence[TextFrame]

302

303

class TextFrame:

304

"""

305

Video frame level annotation results for text annotation (OCR).

306

307

Attributes:

308

rotated_bounding_box: Bounding polygon of the detected text for this frame

309

time_offset: Timestamp of this frame

310

"""

311

312

rotated_bounding_box: NormalizedBoundingPoly

313

time_offset: duration_pb2.Duration

314

```

315

316

### Speech Transcription Results

317

318

Results from speech-to-text analysis.

319

320

```python { .api }

321

class SpeechTranscription:

322

"""

323

A speech recognition result corresponding to a portion of the audio.

324

325

Attributes:

326

alternatives: May contain one or more recognition hypotheses

327

language_code: Output only. The BCP-47 language tag of the language in this result

328

"""

329

330

alternatives: MutableSequence[SpeechRecognitionAlternative]

331

language_code: str

332

333

class SpeechRecognitionAlternative:

334

"""

335

Alternative hypotheses (a.k.a. n-best list).

336

337

Attributes:

338

transcript: Transcript text representing the words that the user spoke

339

confidence: Output only. The confidence estimate between 0.0 and 1.0

340

words: Output only. A list of word-specific information for each recognized word

341

"""

342

343

transcript: str

344

confidence: float

345

words: MutableSequence[WordInfo]

346

347

class WordInfo:

348

"""

349

Word-specific information for recognized words. Word-specific information is only populated if the client requests it.

350

351

Attributes:

352

start_time: Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word

353

end_time: Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word

354

word: The word corresponding to this set of information

355

confidence: Output only. The confidence estimate between 0.0 and 1.0

356

speaker_tag: Output only. A distinct integer value is assigned for every speaker within the audio

357

"""

358

359

start_time: duration_pb2.Duration

360

end_time: duration_pb2.Duration

361

word: str

362

confidence: float

363

speaker_tag: int

364

```

365

366

### Explicit Content Detection Results

367

368

Results from explicit content detection analysis.

369

370

```python { .api }

371

class ExplicitContentAnnotation:

372

"""

373

Explicit content annotation (based on per-frame visual signals only).

374

375

Attributes:

376

frames: All video frames where explicit content was detected

377

version: Feature version

378

"""

379

380

frames: MutableSequence[ExplicitContentFrame]

381

version: str

382

383

class ExplicitContentFrame:

384

"""

385

Video frame level annotation results for explicit content.

386

387

Attributes:

388

time_offset: Time-offset, relative to the beginning of the video, corresponding to the video frame for this location

389

pornography_likelihood: Likelihood of the pornography content

390

"""

391

392

time_offset: duration_pb2.Duration

393

pornography_likelihood: Likelihood

394

```

395

396

### Person Detection Results

397

398

Results from person detection analysis.

399

400

```python { .api }

401

class PersonDetectionAnnotation:

402

"""

403

Person detection annotation per video.

404

405

Attributes:

406

tracks: The detected tracks of a person

407

version: Feature version

408

"""

409

410

tracks: MutableSequence[Track]

411

version: str

412

```

413

414

### Logo Recognition Results

415

416

Results from logo detection and recognition analysis.

417

418

```python { .api }

419

class LogoRecognitionAnnotation:

420

"""

421

Annotation corresponding to one detected, tracked and recognized logo class.

422

423

Attributes:

424

entity: Entity category information to specify the logo class that all the logo tracks within this LogoRecognitionAnnotation are recognized as

425

tracks: All logo tracks where the recognized logo appears

426

segments: All video segments where the recognized logo appears

427

"""

428

429

entity: Entity

430

tracks: MutableSequence[Track]

431

segments: MutableSequence[VideoSegment]

432

```

433

434

### Celebrity Recognition Results (Beta)

435

436

Results from celebrity recognition analysis available in v1p3beta1.

437

438

```python { .api }

439

class CelebrityRecognitionAnnotation:

440

"""

441

Celebrity recognition annotation per video.

442

443

Attributes:

444

celebrity_tracks: The tracks detected from the input video, including recognized celebrities and other detected faces

445

"""

446

447

celebrity_tracks: MutableSequence[CelebrityTrack]

448

449

class CelebrityTrack:

450

"""

451

The annotation result of a celebrity face track.

452

453

Attributes:

454

celebrities: Top N match of the celebrities for the face in this track

455

face_track: A track of a person's face

456

"""

457

458

celebrities: MutableSequence[RecognizedCelebrity]

459

face_track: Track

460

461

class RecognizedCelebrity:

462

"""

463

The recognized celebrity with confidence score.

464

465

Attributes:

466

celebrity: The recognized celebrity

467

confidence: Recognition confidence (0.0 to 1.0)

468

"""

469

470

celebrity: Celebrity

471

confidence: float

472

473

class Celebrity:

474

"""

475

Celebrity definition.

476

477

Attributes:

478

name: The resource name of the celebrity (format: video-intelligence/kg-mid)

479

display_name: The celebrity name

480

description: Textual description of additional information about the celebrity

481

"""

482

483

name: str

484

display_name: str

485

description: str

486

```

487

488

### Geometric Data Types

489

490

Data types for representing spatial information in videos.

491

492

```python { .api }

493

class NormalizedBoundingBox:

494

"""

495

Normalized bounding box. The normalized vertex coordinates are relative to the original image. Range: [0, 1].

496

497

Attributes:

498

left: Left X coordinate

499

top: Top Y coordinate

500

right: Right X coordinate

501

bottom: Bottom Y coordinate

502

"""

503

504

left: float

505

top: float

506

right: float

507

bottom: float

508

509

class NormalizedBoundingPoly:

510

"""

511

Normalized bounding polygon for text (that might not be axis-aligned).

512

513

Attributes:

514

vertices: The bounding polygon vertices

515

"""

516

517

vertices: MutableSequence[NormalizedVertex]

518

519

class NormalizedVertex:

520

"""

521

A vertex represents a 2D point in the image. Coordinates are in pixels.

522

523

Attributes:

524

x: X coordinate

525

y: Y coordinate

526

"""

527

528

x: float

529

y: float

530

```

531

532

### Progress and Status Types

533

534

Types for tracking operation progress and handling errors.

535

536

```python { .api }

537

class AnnotateVideoProgress:

538

"""

539

Video annotation progress. Included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

540

541

Attributes:

542

annotation_progress: Progress metadata for all videos specified in AnnotateVideoRequest

543

"""

544

545

annotation_progress: MutableSequence[VideoAnnotationProgress]

546

547

class VideoAnnotationProgress:

548

"""

549

Annotation progress for a single video.

550

551

Attributes:

552

input_uri: Video file location in Google Cloud Storage

553

progress_percent: Approximate percentage processed thus far (0-100)

554

start_time: Time when the request was received

555

update_time: Time of the most recent update

556

feature: Specifies which feature is being tracked if the request contains more than one feature

557

segment: Specifies which segment is being tracked if the request contains more than one segment

558

"""

559

560

input_uri: str

561

progress_percent: int

562

start_time: timestamp_pb2.Timestamp

563

update_time: timestamp_pb2.Timestamp

564

feature: Feature

565

segment: VideoSegment

566

```

567

568

## Usage Examples

569

570

### Processing Label Detection Results

571

572

```python

573

from google.cloud import videointelligence

574

575

# Perform label detection

576

client = videointelligence.VideoIntelligenceServiceClient()

577

operation = client.annotate_video(

578

request={

579

"features": [videointelligence.Feature.LABEL_DETECTION],

580

"input_uri": "gs://your-bucket/your-video.mp4",

581

}

582

)

583

result = operation.result(timeout=300)

584

585

# Process results

586

for annotation_result in result.annotation_results:

587

print(f"Processing video: {annotation_result.input_uri}")

588

589

# Segment-level labels

590

for label in annotation_result.segment_label_annotations:

591

print(f"\nLabel: {label.entity.description}")

592

for segment in label.segments:

593

start_time = segment.segment.start_time_offset.total_seconds()

594

end_time = segment.segment.end_time_offset.total_seconds()

595

print(f" Segment: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})")

596

597

# Frame-level labels

598

for label in annotation_result.frame_label_annotations:

599

print(f"\nFrame-level label: {label.entity.description}")

600

for frame in label.frames[:5]: # Show first 5 frames

601

time_offset = frame.time_offset.total_seconds()

602

print(f" Frame at {time_offset:.1f}s (confidence: {frame.confidence:.2f})")

603

```

604

605

### Processing Face Detection Results

606

607

```python

608

# Process face detection results

609

for annotation_result in result.annotation_results:

610

face_annotations = annotation_result.face_annotations

611

612

for face_annotation in face_annotations:

613

print("Face detected:")

614

for segment in face_annotation.segments:

615

start_time = segment.segment.start_time_offset.total_seconds()

616

end_time = segment.segment.end_time_offset.total_seconds()

617

print(f" Time: {start_time:.1f}s to {end_time:.1f}s")

618

619

# Newer face detection format

620

for face_detection in annotation_result.face_detection_annotations:

621

for track in face_detection.tracks:

622

print(f"Face track (confidence: {track.confidence:.2f}):")

623

for timestamped_object in track.timestamped_objects:

624

time_offset = timestamped_object.time_offset.total_seconds()

625

bbox = timestamped_object.normalized_bounding_box

626

print(f" {time_offset:.1f}s: bbox({bbox.left:.3f}, {bbox.top:.3f}, {bbox.right:.3f}, {bbox.bottom:.3f})")

627

```

628

629

### Processing Speech Transcription Results

630

631

```python

632

# Process speech transcription results

633

for annotation_result in result.annotation_results:

634

for transcription in annotation_result.speech_transcriptions:

635

print(f"Language: {transcription.language_code}")

636

637

for alternative in transcription.alternatives:

638

print(f"Transcript: {alternative.transcript}")

639

print(f"Confidence: {alternative.confidence:.2f}")

640

641

# Word-level information

642

for word_info in alternative.words:

643

start_time = word_info.start_time.total_seconds()

644

end_time = word_info.end_time.total_seconds()

645

print(f" {word_info.word}: {start_time:.1f}s-{end_time:.1f}s (speaker: {word_info.speaker_tag})")

646

```

647

648

### Processing Object Tracking Results

649

650

```python

651

# Process object tracking results

652

for annotation_result in result.annotation_results:

653

for object_annotation in annotation_result.object_annotations:

654

print(f"Object: {object_annotation.entity.description}")

655

print(f"Confidence: {object_annotation.confidence:.2f}")

656

print(f"Track ID: {object_annotation.track_id}")

657

658

# Show first few frames

659

for frame in object_annotation.frames[:10]:

660

time_offset = frame.time_offset.total_seconds()

661

bbox = frame.normalized_bounding_box

662

print(f" {time_offset:.1f}s: ({bbox.left:.3f}, {bbox.top:.3f}) to ({bbox.right:.3f}, {bbox.bottom:.3f})")

663

```

664

665

### Processing Text Detection Results

666

667

```python

668

# Process text detection results

669

for annotation_result in result.annotation_results:

670

for text_annotation in annotation_result.text_annotations:

671

print(f"Detected text: {text_annotation.text}")

672

673

for segment in text_annotation.segments:

674

start_time = segment.segment.start_time_offset.total_seconds()

675

end_time = segment.segment.end_time_offset.total_seconds()

676

print(f" Time: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})")

677

678

# Frame-level information

679

for frame in segment.frames:

680

time_offset = frame.time_offset.total_seconds()

681

print(f" Frame at {time_offset:.1f}s")

682

```

683

684

### Processing Celebrity Recognition Results (Beta)

685

686

```python

687

from google.cloud import videointelligence_v1p3beta1

688

689

# Process celebrity recognition results (available in v1p3beta1)

690

for annotation_result in result.annotation_results:

691

if hasattr(annotation_result, 'celebrity_recognition_annotations'):

692

for celebrity_annotation in annotation_result.celebrity_recognition_annotations:

693

print("Celebrity Recognition Results:")

694

695

for celebrity_track in celebrity_annotation.celebrity_tracks:

696

print(f" Face track detected:")

697

698

# Process recognized celebrities for this track

699

for recognized_celebrity in celebrity_track.celebrities:

700

celebrity = recognized_celebrity.celebrity

701

confidence = recognized_celebrity.confidence

702

print(f" Celebrity: {celebrity.display_name}")

703

print(f" Confidence: {confidence:.2f}")

704

print(f" Description: {celebrity.description}")

705

print(f" Resource Name: {celebrity.name}")

706

707

# Process face track information

708

face_track = celebrity_track.face_track

709

if face_track.segment:

710

start_time = face_track.segment.start_time_offset.total_seconds()

711

end_time = face_track.segment.end_time_offset.total_seconds()

712

print(f" Track Duration: {start_time:.1f}s to {end_time:.1f}s")

713

714

# Show first few timestamped objects

715

for timestamped_obj in face_track.timestamped_objects[:5]:

716

time_offset = timestamped_obj.time_offset.total_seconds()

717

bbox = timestamped_obj.normalized_bounding_box

718

print(f" {time_offset:.1f}s: bbox({bbox.left:.3f}, {bbox.top:.3f}, {bbox.right:.3f}, {bbox.bottom:.3f})")

719

```