0
# Features and Configuration
1
2
Comprehensive configuration options for different AI detection capabilities. Each feature can be fine-tuned with specific parameters and thresholds to optimize results for different use cases.
3
4
## Capabilities
5
6
### Video Analysis Features
7
8
Core features available for video analysis, each providing different types of AI-powered insights.
9
10
```python { .api }
11
class Feature(Enum):
12
"""
13
Video annotation feature.
14
15
Values:
16
FEATURE_UNSPECIFIED: Unspecified feature
17
LABEL_DETECTION: Label detection - detect objects, such as dog or flower
18
SHOT_CHANGE_DETECTION: Shot change detection
19
EXPLICIT_CONTENT_DETECTION: Explicit content detection
20
FACE_DETECTION: Human face detection
21
SPEECH_TRANSCRIPTION: Speech transcription
22
TEXT_DETECTION: OCR text detection and tracking
23
OBJECT_TRACKING: Object detection and tracking
24
LOGO_RECOGNITION: Logo detection, tracking, and recognition
25
PERSON_DETECTION: Person detection
26
"""
27
28
FEATURE_UNSPECIFIED = 0
29
LABEL_DETECTION = 1
30
SHOT_CHANGE_DETECTION = 2
31
EXPLICIT_CONTENT_DETECTION = 3
32
FACE_DETECTION = 4
33
SPEECH_TRANSCRIPTION = 6
34
TEXT_DETECTION = 7
35
OBJECT_TRACKING = 9
36
LOGO_RECOGNITION = 12
37
PERSON_DETECTION = 14
38
```
39
40
### Video Context Configuration
41
42
Main configuration object that allows fine-tuning of different analysis features.
43
44
```python { .api }
45
class VideoContext:
46
"""
47
Video context and/or feature-specific parameters.
48
49
Attributes:
50
segments: Video segments to annotate. If unspecified, each video is treated as a single segment
51
label_detection_config: Config for LABEL_DETECTION
52
shot_change_detection_config: Config for SHOT_CHANGE_DETECTION
53
explicit_content_detection_config: Config for EXPLICIT_CONTENT_DETECTION
54
face_detection_config: Config for FACE_DETECTION
55
speech_transcription_config: Config for SPEECH_TRANSCRIPTION
56
text_detection_config: Config for TEXT_DETECTION
57
object_tracking_config: Config for OBJECT_TRACKING
58
person_detection_config: Config for PERSON_DETECTION
59
"""
60
61
segments: MutableSequence[VideoSegment]
62
label_detection_config: LabelDetectionConfig
63
shot_change_detection_config: ShotChangeDetectionConfig
64
explicit_content_detection_config: ExplicitContentDetectionConfig
65
face_detection_config: FaceDetectionConfig
66
speech_transcription_config: SpeechTranscriptionConfig
67
text_detection_config: TextDetectionConfig
68
object_tracking_config: ObjectTrackingConfig
69
person_detection_config: PersonDetectionConfig
70
```
71
72
### Label Detection Configuration
73
74
Configure how labels (objects, activities, concepts) are detected in videos.
75
76
```python { .api }
77
class LabelDetectionConfig:
78
"""
79
Config for LABEL_DETECTION.
80
81
Attributes:
82
label_detection_mode: What labels should be detected with LABEL_DETECTION, in addition to video-level labels or segment-level labels
83
stationary_camera: Whether the video has been shot from a stationary (non-moving) camera
84
model: Model to use for label detection. Supported values: "builtin/stable", "builtin/latest"
85
frame_confidence_threshold: The confidence threshold for frame-level label detection (0.0-1.0)
86
video_confidence_threshold: The confidence threshold for video-level label detection (0.0-1.0)
87
"""
88
89
label_detection_mode: LabelDetectionMode
90
stationary_camera: bool
91
model: str
92
frame_confidence_threshold: float
93
video_confidence_threshold: float
94
95
class LabelDetectionMode(Enum):
96
"""
97
Label detection mode.
98
99
Values:
100
LABEL_DETECTION_MODE_UNSPECIFIED: Unspecified
101
SHOT_MODE: Detect shot-level labels
102
FRAME_MODE: Detect frame-level labels
103
SHOT_AND_FRAME_MODE: Detect both shot-level and frame-level labels
104
"""
105
106
LABEL_DETECTION_MODE_UNSPECIFIED = 0
107
SHOT_MODE = 1
108
FRAME_MODE = 2
109
SHOT_AND_FRAME_MODE = 3
110
```
111
112
### Face Detection Configuration
113
114
Configure detection and tracking of human faces in videos.
115
116
```python { .api }
117
class FaceDetectionConfig:
118
"""
119
Config for FACE_DETECTION.
120
121
Attributes:
122
model: Model to use for face detection. Supported values: "builtin/stable", "builtin/latest"
123
include_bounding_boxes: Whether bounding boxes are included in the face annotation output
124
include_attributes: Whether to enable face attributes detection, such as glasses, dark_glasses, mouth_open etc
125
"""
126
127
model: str
128
include_bounding_boxes: bool
129
include_attributes: bool
130
```
131
132
### Object Tracking Configuration
133
134
Configure detection and tracking of objects throughout the video.
135
136
```python { .api }
137
class ObjectTrackingConfig:
138
"""
139
Config for OBJECT_TRACKING.
140
141
Attributes:
142
model: Model to use for object tracking. Supported values: "builtin/stable", "builtin/latest"
143
"""
144
145
model: str
146
```
147
148
### Explicit Content Detection Configuration
149
150
Configure detection of explicit or inappropriate content.
151
152
```python { .api }
153
class ExplicitContentDetectionConfig:
154
"""
155
Config for EXPLICIT_CONTENT_DETECTION.
156
157
Attributes:
158
model: Model to use for explicit content detection. Supported values: "builtin/stable", "builtin/latest"
159
"""
160
161
model: str
162
```
163
164
### Speech Transcription Configuration
165
166
Configure speech-to-text transcription with language and context options.
167
168
```python { .api }
169
class SpeechTranscriptionConfig:
170
"""
171
Config for SPEECH_TRANSCRIPTION.
172
173
Attributes:
174
language_code: Required. BCP-47 language tag of the language spoken in the audio (e.g., "en-US")
175
max_alternatives: Maximum number of recognition hypotheses to be returned
176
filter_profanity: If set to true, the server will attempt to filter out profanities
177
speech_contexts: A means to provide context to assist the speech recognition
178
enable_automatic_punctuation: If set to true, adds punctuation to recognition result hypotheses
179
audio_tracks: For file formats that contain multiple audio tracks, this field controls which track should be transcribed
180
enable_speaker_diarization: If true, enable speaker detection for each recognized word
181
diarization_speaker_count: If speaker_diarization is enabled, set this field to specify the number of speakers
182
enable_word_confidence: If true, the top result includes a list of words and the confidence for those words
183
"""
184
185
language_code: str
186
max_alternatives: int
187
filter_profanity: bool
188
speech_contexts: MutableSequence[SpeechContext]
189
enable_automatic_punctuation: bool
190
audio_tracks: MutableSequence[int]
191
enable_speaker_diarization: bool
192
diarization_speaker_count: int
193
enable_word_confidence: bool
194
195
class SpeechContext:
196
"""
197
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
198
199
Attributes:
200
phrases: A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them
201
"""
202
203
phrases: MutableSequence[str]
204
```
205
206
### Text Detection Configuration
207
208
Configure optical character recognition (OCR) for detecting text in videos.
209
210
```python { .api }
211
class TextDetectionConfig:
212
"""
213
Config for TEXT_DETECTION.
214
215
Attributes:
216
language_hints: Language hint can be specified if the language spoken in the audio is known a priori
217
model: Model to use for text detection. Supported values: "builtin/stable", "builtin/latest"
218
"""
219
220
language_hints: MutableSequence[str]
221
model: str
222
```
223
224
### Person Detection Configuration
225
226
Configure detection and tracking of people in videos.
227
228
```python { .api }
229
class PersonDetectionConfig:
230
"""
231
Config for PERSON_DETECTION.
232
233
Attributes:
234
include_bounding_boxes: Whether bounding boxes are included in the person detection annotation output
235
include_pose_landmarks: Whether to enable pose landmarks detection
236
include_attributes: Whether to enable person attributes detection, such as cloth color
237
"""
238
239
include_bounding_boxes: bool
240
include_pose_landmarks: bool
241
include_attributes: bool
242
```
243
244
### Shot Change Detection Configuration
245
246
Configure detection of shot boundaries and scene changes.
247
248
```python { .api }
249
class ShotChangeDetectionConfig:
250
"""
251
Config for SHOT_CHANGE_DETECTION.
252
253
Attributes:
254
model: Model to use for shot change detection. Supported values: "builtin/stable", "builtin/latest"
255
"""
256
257
model: str
258
```
259
260
### Common Enums and Utilities
261
262
```python { .api }
263
class Likelihood(Enum):
264
"""
265
Bucketized representation of likelihood.
266
267
Values:
268
LIKELIHOOD_UNSPECIFIED: Unspecified likelihood
269
VERY_UNLIKELY: Very unlikely
270
UNLIKELY: Unlikely
271
POSSIBLE: Possible
272
LIKELY: Likely
273
VERY_LIKELY: Very likely
274
"""
275
276
LIKELIHOOD_UNSPECIFIED = 0
277
VERY_UNLIKELY = 1
278
UNLIKELY = 2
279
POSSIBLE = 3
280
LIKELY = 4
281
VERY_LIKELY = 5
282
283
class VideoSegment:
284
"""
285
Video segment.
286
287
Attributes:
288
start_time_offset: Time-offset, relative to the beginning of the video, corresponding to the start of the segment
289
end_time_offset: Time-offset, relative to the beginning of the video, corresponding to the end of the segment
290
"""
291
292
start_time_offset: duration_pb2.Duration
293
end_time_offset: duration_pb2.Duration
294
```
295
296
## Usage Examples
297
298
### Multi-Feature Analysis with Custom Configuration
299
300
```python
301
from google.cloud import videointelligence
302
303
# Create client
304
client = videointelligence.VideoIntelligenceServiceClient()
305
306
# Configure multiple features with custom settings
307
video_context = videointelligence.VideoContext(
308
segments=[
309
videointelligence.VideoSegment(
310
start_time_offset={"seconds": 10},
311
end_time_offset={"seconds": 50}
312
)
313
],
314
label_detection_config=videointelligence.LabelDetectionConfig(
315
label_detection_mode=videointelligence.LabelDetectionMode.SHOT_AND_FRAME_MODE,
316
stationary_camera=True,
317
model="builtin/latest",
318
frame_confidence_threshold=0.7,
319
video_confidence_threshold=0.8
320
),
321
face_detection_config=videointelligence.FaceDetectionConfig(
322
model="builtin/latest",
323
include_bounding_boxes=True,
324
include_attributes=True
325
),
326
speech_transcription_config=videointelligence.SpeechTranscriptionConfig(
327
language_code="en-US",
328
enable_automatic_punctuation=True,
329
enable_speaker_diarization=True,
330
diarization_speaker_count=2,
331
enable_word_confidence=True
332
)
333
)
334
335
# Annotate video with custom configuration
336
operation = client.annotate_video(
337
request={
338
"features": [
339
videointelligence.Feature.LABEL_DETECTION,
340
videointelligence.Feature.FACE_DETECTION,
341
videointelligence.Feature.SPEECH_TRANSCRIPTION
342
],
343
"input_uri": "gs://your-bucket/your-video.mp4",
344
"video_context": video_context
345
}
346
)
347
348
result = operation.result(timeout=600)
349
```
350
351
### Text Detection with Language Hints
352
353
```python
354
from google.cloud import videointelligence
355
356
client = videointelligence.VideoIntelligenceServiceClient()
357
358
# Configure text detection for multiple languages
359
text_config = videointelligence.TextDetectionConfig(
360
language_hints=["en", "fr", "es"], # English, French, Spanish
361
model="builtin/latest"
362
)
363
364
video_context = videointelligence.VideoContext(
365
text_detection_config=text_config
366
)
367
368
operation = client.annotate_video(
369
request={
370
"features": [videointelligence.Feature.TEXT_DETECTION],
371
"input_uri": "gs://your-bucket/multilingual-video.mp4",
372
"video_context": video_context
373
}
374
)
375
376
result = operation.result(timeout=300)
377
```
378
379
### Person Detection with Pose Landmarks
380
381
```python
382
from google.cloud import videointelligence
383
384
client = videointelligence.VideoIntelligenceServiceClient()
385
386
# Configure person detection with all features enabled
387
person_config = videointelligence.PersonDetectionConfig(
388
include_bounding_boxes=True,
389
include_pose_landmarks=True,
390
include_attributes=True
391
)
392
393
video_context = videointelligence.VideoContext(
394
person_detection_config=person_config
395
)
396
397
operation = client.annotate_video(
398
request={
399
"features": [videointelligence.Feature.PERSON_DETECTION],
400
"input_uri": "gs://your-bucket/sports-video.mp4",
401
"video_context": video_context
402
}
403
)
404
405
result = operation.result(timeout=400)
406
```
407
408
### Explicit Content Detection
409
410
```python
411
from google.cloud import videointelligence
412
413
client = videointelligence.VideoIntelligenceServiceClient()
414
415
# Configure explicit content detection
416
explicit_config = videointelligence.ExplicitContentDetectionConfig(
417
model="builtin/latest"
418
)
419
420
video_context = videointelligence.VideoContext(
421
explicit_content_detection_config=explicit_config
422
)
423
424
operation = client.annotate_video(
425
request={
426
"features": [videointelligence.Feature.EXPLICIT_CONTENT_DETECTION],
427
"input_uri": "gs://your-bucket/content-to-moderate.mp4",
428
"video_context": video_context
429
}
430
)
431
432
result = operation.result(timeout=300)
433
434
# Check explicit content results
435
for annotation_result in result.annotation_results:
436
explicit_annotation = annotation_result.explicit_annotation
437
for frame in explicit_annotation.frames:
438
likelihood = frame.pornography_likelihood
439
time_offset = frame.time_offset.total_seconds()
440
print(f"Frame at {time_offset}s: {likelihood.name}")
441
```