or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

features-config.mdindex.mdresults-data-types.mdstreaming-analysis.mdvideo-analysis.md

features-config.mddocs/

0

# Features and Configuration

1

2

Comprehensive configuration options for different AI detection capabilities. Each feature can be fine-tuned with specific parameters and thresholds to optimize results for different use cases.

3

4

## Capabilities

5

6

### Video Analysis Features

7

8

Core features available for video analysis, each providing different types of AI-powered insights.

9

10

```python { .api }

11

class Feature(Enum):

12

"""

13

Video annotation feature.

14

15

Values:

16

FEATURE_UNSPECIFIED: Unspecified feature

17

LABEL_DETECTION: Label detection - detect objects, such as dog or flower

18

SHOT_CHANGE_DETECTION: Shot change detection

19

EXPLICIT_CONTENT_DETECTION: Explicit content detection

20

FACE_DETECTION: Human face detection

21

SPEECH_TRANSCRIPTION: Speech transcription

22

TEXT_DETECTION: OCR text detection and tracking

23

OBJECT_TRACKING: Object detection and tracking

24

LOGO_RECOGNITION: Logo detection, tracking, and recognition

25

PERSON_DETECTION: Person detection

26

"""

27

28

FEATURE_UNSPECIFIED = 0

29

LABEL_DETECTION = 1

30

SHOT_CHANGE_DETECTION = 2

31

EXPLICIT_CONTENT_DETECTION = 3

32

FACE_DETECTION = 4

33

SPEECH_TRANSCRIPTION = 6

34

TEXT_DETECTION = 7

35

OBJECT_TRACKING = 9

36

LOGO_RECOGNITION = 12

37

PERSON_DETECTION = 14

38

```

39

40

### Video Context Configuration

41

42

Main configuration object that allows fine-tuning of different analysis features.

43

44

```python { .api }

45

class VideoContext:

46

"""

47

Video context and/or feature-specific parameters.

48

49

Attributes:

50

segments: Video segments to annotate. If unspecified, each video is treated as a single segment

51

label_detection_config: Config for LABEL_DETECTION

52

shot_change_detection_config: Config for SHOT_CHANGE_DETECTION

53

explicit_content_detection_config: Config for EXPLICIT_CONTENT_DETECTION

54

face_detection_config: Config for FACE_DETECTION

55

speech_transcription_config: Config for SPEECH_TRANSCRIPTION

56

text_detection_config: Config for TEXT_DETECTION

57

object_tracking_config: Config for OBJECT_TRACKING

58

person_detection_config: Config for PERSON_DETECTION

59

"""

60

61

segments: MutableSequence[VideoSegment]

62

label_detection_config: LabelDetectionConfig

63

shot_change_detection_config: ShotChangeDetectionConfig

64

explicit_content_detection_config: ExplicitContentDetectionConfig

65

face_detection_config: FaceDetectionConfig

66

speech_transcription_config: SpeechTranscriptionConfig

67

text_detection_config: TextDetectionConfig

68

object_tracking_config: ObjectTrackingConfig

69

person_detection_config: PersonDetectionConfig

70

```

71

72

### Label Detection Configuration

73

74

Configure how labels (objects, activities, concepts) are detected in videos.

75

76

```python { .api }

77

class LabelDetectionConfig:

78

"""

79

Config for LABEL_DETECTION.

80

81

Attributes:

82

label_detection_mode: What labels should be detected with LABEL_DETECTION, in addition to video-level labels or segment-level labels

83

stationary_camera: Whether the video has been shot from a stationary (non-moving) camera

84

model: Model to use for label detection. Supported values: "builtin/stable", "builtin/latest"

85

frame_confidence_threshold: The confidence threshold for frame-level label detection (0.0-1.0)

86

video_confidence_threshold: The confidence threshold for video-level label detection (0.0-1.0)

87

"""

88

89

label_detection_mode: LabelDetectionMode

90

stationary_camera: bool

91

model: str

92

frame_confidence_threshold: float

93

video_confidence_threshold: float

94

95

class LabelDetectionMode(Enum):

96

"""

97

Label detection mode.

98

99

Values:

100

LABEL_DETECTION_MODE_UNSPECIFIED: Unspecified

101

SHOT_MODE: Detect shot-level labels

102

FRAME_MODE: Detect frame-level labels

103

SHOT_AND_FRAME_MODE: Detect both shot-level and frame-level labels

104

"""

105

106

LABEL_DETECTION_MODE_UNSPECIFIED = 0

107

SHOT_MODE = 1

108

FRAME_MODE = 2

109

SHOT_AND_FRAME_MODE = 3

110

```

111

112

### Face Detection Configuration

113

114

Configure detection and tracking of human faces in videos.

115

116

```python { .api }

117

class FaceDetectionConfig:

118

"""

119

Config for FACE_DETECTION.

120

121

Attributes:

122

model: Model to use for face detection. Supported values: "builtin/stable", "builtin/latest"

123

include_bounding_boxes: Whether bounding boxes are included in the face annotation output

124

include_attributes: Whether to enable face attributes detection, such as glasses, dark_glasses, mouth_open etc

125

"""

126

127

model: str

128

include_bounding_boxes: bool

129

include_attributes: bool

130

```

131

132

### Object Tracking Configuration

133

134

Configure detection and tracking of objects throughout the video.

135

136

```python { .api }

137

class ObjectTrackingConfig:

138

"""

139

Config for OBJECT_TRACKING.

140

141

Attributes:

142

model: Model to use for object tracking. Supported values: "builtin/stable", "builtin/latest"

143

"""

144

145

model: str

146

```

147

148

### Explicit Content Detection Configuration

149

150

Configure detection of explicit or inappropriate content.

151

152

```python { .api }

153

class ExplicitContentDetectionConfig:

154

"""

155

Config for EXPLICIT_CONTENT_DETECTION.

156

157

Attributes:

158

model: Model to use for explicit content detection. Supported values: "builtin/stable", "builtin/latest"

159

"""

160

161

model: str

162

```

163

164

### Speech Transcription Configuration

165

166

Configure speech-to-text transcription with language and context options.

167

168

```python { .api }

169

class SpeechTranscriptionConfig:

170

"""

171

Config for SPEECH_TRANSCRIPTION.

172

173

Attributes:

174

language_code: Required. BCP-47 language tag of the language spoken in the audio (e.g., "en-US")

175

max_alternatives: Maximum number of recognition hypotheses to be returned

176

filter_profanity: If set to true, the server will attempt to filter out profanities

177

speech_contexts: A means to provide context to assist the speech recognition

178

enable_automatic_punctuation: If set to true, adds punctuation to recognition result hypotheses

179

audio_tracks: For file formats that contain multiple audio tracks, this field controls which track should be transcribed

180

enable_speaker_diarization: If true, enable speaker detection for each recognized word

181

diarization_speaker_count: If speaker_diarization is enabled, set this field to specify the number of speakers

182

enable_word_confidence: If true, the top result includes a list of words and the confidence for those words

183

"""

184

185

language_code: str

186

max_alternatives: int

187

filter_profanity: bool

188

speech_contexts: MutableSequence[SpeechContext]

189

enable_automatic_punctuation: bool

190

audio_tracks: MutableSequence[int]

191

enable_speaker_diarization: bool

192

diarization_speaker_count: int

193

enable_word_confidence: bool

194

195

class SpeechContext:

196

"""

197

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

198

199

Attributes:

200

phrases: A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them

201

"""

202

203

phrases: MutableSequence[str]

204

```

205

206

### Text Detection Configuration

207

208

Configure optical character recognition (OCR) for detecting text in videos.

209

210

```python { .api }

211

class TextDetectionConfig:

212

"""

213

Config for TEXT_DETECTION.

214

215

Attributes:

216

language_hints: Language hint can be specified if the language spoken in the audio is known a priori

217

model: Model to use for text detection. Supported values: "builtin/stable", "builtin/latest"

218

"""

219

220

language_hints: MutableSequence[str]

221

model: str

222

```

223

224

### Person Detection Configuration

225

226

Configure detection and tracking of people in videos.

227

228

```python { .api }

229

class PersonDetectionConfig:

230

"""

231

Config for PERSON_DETECTION.

232

233

Attributes:

234

include_bounding_boxes: Whether bounding boxes are included in the person detection annotation output

235

include_pose_landmarks: Whether to enable pose landmarks detection

236

include_attributes: Whether to enable person attributes detection, such as cloth color

237

"""

238

239

include_bounding_boxes: bool

240

include_pose_landmarks: bool

241

include_attributes: bool

242

```

243

244

### Shot Change Detection Configuration

245

246

Configure detection of shot boundaries and scene changes.

247

248

```python { .api }

249

class ShotChangeDetectionConfig:

250

"""

251

Config for SHOT_CHANGE_DETECTION.

252

253

Attributes:

254

model: Model to use for shot change detection. Supported values: "builtin/stable", "builtin/latest"

255

"""

256

257

model: str

258

```

259

260

### Common Enums and Utilities

261

262

```python { .api }

263

class Likelihood(Enum):

264

"""

265

Bucketized representation of likelihood.

266

267

Values:

268

LIKELIHOOD_UNSPECIFIED: Unspecified likelihood

269

VERY_UNLIKELY: Very unlikely

270

UNLIKELY: Unlikely

271

POSSIBLE: Possible

272

LIKELY: Likely

273

VERY_LIKELY: Very likely

274

"""

275

276

LIKELIHOOD_UNSPECIFIED = 0

277

VERY_UNLIKELY = 1

278

UNLIKELY = 2

279

POSSIBLE = 3

280

LIKELY = 4

281

VERY_LIKELY = 5

282

283

class VideoSegment:

284

"""

285

Video segment.

286

287

Attributes:

288

start_time_offset: Time-offset, relative to the beginning of the video, corresponding to the start of the segment

289

end_time_offset: Time-offset, relative to the beginning of the video, corresponding to the end of the segment

290

"""

291

292

start_time_offset: duration_pb2.Duration

293

end_time_offset: duration_pb2.Duration

294

```

295

296

## Usage Examples

297

298

### Multi-Feature Analysis with Custom Configuration

299

300

```python

301

from google.cloud import videointelligence

302

303

# Create client

304

client = videointelligence.VideoIntelligenceServiceClient()

305

306

# Configure multiple features with custom settings

307

video_context = videointelligence.VideoContext(

308

segments=[

309

videointelligence.VideoSegment(

310

start_time_offset={"seconds": 10},

311

end_time_offset={"seconds": 50}

312

)

313

],

314

label_detection_config=videointelligence.LabelDetectionConfig(

315

label_detection_mode=videointelligence.LabelDetectionMode.SHOT_AND_FRAME_MODE,

316

stationary_camera=True,

317

model="builtin/latest",

318

frame_confidence_threshold=0.7,

319

video_confidence_threshold=0.8

320

),

321

face_detection_config=videointelligence.FaceDetectionConfig(

322

model="builtin/latest",

323

include_bounding_boxes=True,

324

include_attributes=True

325

),

326

speech_transcription_config=videointelligence.SpeechTranscriptionConfig(

327

language_code="en-US",

328

enable_automatic_punctuation=True,

329

enable_speaker_diarization=True,

330

diarization_speaker_count=2,

331

enable_word_confidence=True

332

)

333

)

334

335

# Annotate video with custom configuration

336

operation = client.annotate_video(

337

request={

338

"features": [

339

videointelligence.Feature.LABEL_DETECTION,

340

videointelligence.Feature.FACE_DETECTION,

341

videointelligence.Feature.SPEECH_TRANSCRIPTION

342

],

343

"input_uri": "gs://your-bucket/your-video.mp4",

344

"video_context": video_context

345

}

346

)

347

348

result = operation.result(timeout=600)

349

```

350

351

### Text Detection with Language Hints

352

353

```python

354

from google.cloud import videointelligence

355

356

client = videointelligence.VideoIntelligenceServiceClient()

357

358

# Configure text detection for multiple languages

359

text_config = videointelligence.TextDetectionConfig(

360

language_hints=["en", "fr", "es"], # English, French, Spanish

361

model="builtin/latest"

362

)

363

364

video_context = videointelligence.VideoContext(

365

text_detection_config=text_config

366

)

367

368

operation = client.annotate_video(

369

request={

370

"features": [videointelligence.Feature.TEXT_DETECTION],

371

"input_uri": "gs://your-bucket/multilingual-video.mp4",

372

"video_context": video_context

373

}

374

)

375

376

result = operation.result(timeout=300)

377

```

378

379

### Person Detection with Pose Landmarks

380

381

```python

382

from google.cloud import videointelligence

383

384

client = videointelligence.VideoIntelligenceServiceClient()

385

386

# Configure person detection with all features enabled

387

person_config = videointelligence.PersonDetectionConfig(

388

include_bounding_boxes=True,

389

include_pose_landmarks=True,

390

include_attributes=True

391

)

392

393

video_context = videointelligence.VideoContext(

394

person_detection_config=person_config

395

)

396

397

operation = client.annotate_video(

398

request={

399

"features": [videointelligence.Feature.PERSON_DETECTION],

400

"input_uri": "gs://your-bucket/sports-video.mp4",

401

"video_context": video_context

402

}

403

)

404

405

result = operation.result(timeout=400)

406

```

407

408

### Explicit Content Detection

409

410

```python

411

from google.cloud import videointelligence

412

413

client = videointelligence.VideoIntelligenceServiceClient()

414

415

# Configure explicit content detection

416

explicit_config = videointelligence.ExplicitContentDetectionConfig(

417

model="builtin/latest"

418

)

419

420

video_context = videointelligence.VideoContext(

421

explicit_content_detection_config=explicit_config

422

)

423

424

operation = client.annotate_video(

425

request={

426

"features": [videointelligence.Feature.EXPLICIT_CONTENT_DETECTION],

427

"input_uri": "gs://your-bucket/content-to-moderate.mp4",

428

"video_context": video_context

429

}

430

)

431

432

result = operation.result(timeout=300)

433

434

# Check explicit content results

435

for annotation_result in result.annotation_results:

436

explicit_annotation = annotation_result.explicit_annotation

437

for frame in explicit_annotation.frames:

438

likelihood = frame.pornography_likelihood

439

time_offset = frame.time_offset.total_seconds()

440

print(f"Frame at {time_offset}s: {likelihood.name}")

441

```