or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-api.mddata-structures.mderror-handling.mdformatters.mdindex.mdproxy-config.md

formatters.mddocs/

0

# Output Formatters

1

2

Classes for converting transcript data into various output formats. Supports JSON, plain text, SRT subtitles, WebVTT, and pretty-printed formats for different use cases.

3

4

## Capabilities

5

6

### Base Formatter Class

7

8

Abstract base class defining the formatter interface. All concrete formatters inherit from this class.

9

10

```python { .api }

11

class Formatter:

12

def format_transcript(self, transcript, **kwargs):

13

"""

14

Format a single transcript.

15

16

Args:

17

transcript (FetchedTranscript): Transcript to format

18

**kwargs: Formatter-specific options

19

20

Returns:

21

str: Formatted transcript string

22

23

Raises:

24

NotImplementedError: Must be implemented by subclasses

25

"""

26

27

def format_transcripts(self, transcripts, **kwargs):

28

"""

29

Format multiple transcripts.

30

31

Args:

32

transcripts (List[FetchedTranscript]): Transcripts to format

33

**kwargs: Formatter-specific options

34

35

Returns:

36

str: Formatted transcripts string

37

38

Raises:

39

NotImplementedError: Must be implemented by subclasses

40

"""

41

```

42

43

### JSON Formatter

44

45

Converts transcript data to JSON format for programmatic processing and data interchange.

46

47

```python { .api }

48

class JSONFormatter(Formatter):

49

def format_transcript(self, transcript, **kwargs):

50

"""

51

Convert transcript to JSON string.

52

53

Args:

54

transcript (FetchedTranscript): Transcript to format

55

**kwargs: Passed to json.dumps() (indent, ensure_ascii, etc.)

56

57

Returns:

58

str: JSON representation of transcript data

59

"""

60

61

def format_transcripts(self, transcripts, **kwargs):

62

"""

63

Convert multiple transcripts to JSON array string.

64

65

Args:

66

transcripts (List[FetchedTranscript]): Transcripts to format

67

**kwargs: Passed to json.dumps()

68

69

Returns:

70

str: JSON array of transcript data

71

"""

72

```

73

74

### Text Formatter

75

76

Converts transcripts to plain text with no timestamps. Useful for text analysis and content extraction.

77

78

```python { .api }

79

class TextFormatter(Formatter):

80

def format_transcript(self, transcript, **kwargs):

81

"""

82

Convert transcript to plain text (no timestamps).

83

84

Args:

85

transcript (FetchedTranscript): Transcript to format

86

**kwargs: Unused

87

88

Returns:

89

str: Plain text with lines separated by newlines

90

"""

91

92

def format_transcripts(self, transcripts, **kwargs):

93

"""

94

Convert multiple transcripts to plain text.

95

96

Args:

97

transcripts (List[FetchedTranscript]): Transcripts to format

98

**kwargs: Unused

99

100

Returns:

101

str: Plain text with transcripts separated by triple newlines

102

"""

103

```

104

105

### Pretty Print Formatter

106

107

Human-readable formatted output using Python's pprint module for debugging and inspection.

108

109

```python { .api }

110

class PrettyPrintFormatter(Formatter):

111

def format_transcript(self, transcript, **kwargs):

112

"""

113

Pretty print transcript data.

114

115

Args:

116

transcript (FetchedTranscript): Transcript to format

117

**kwargs: Passed to pprint.pformat()

118

119

Returns:

120

str: Pretty formatted transcript representation

121

"""

122

123

def format_transcripts(self, transcripts, **kwargs):

124

"""

125

Pretty print multiple transcripts.

126

127

Args:

128

transcripts (List[FetchedTranscript]): Transcripts to format

129

**kwargs: Passed to pprint.pformat()

130

131

Returns:

132

str: Pretty formatted list of transcripts

133

"""

134

```

135

136

### SRT Formatter

137

138

Creates SRT (SubRip) subtitle files compatible with video players and subtitle software.

139

140

```python { .api }

141

class SRTFormatter(Formatter):

142

def format_transcript(self, transcript, **kwargs):

143

"""

144

Convert transcript to SRT subtitle format.

145

146

Args:

147

transcript (FetchedTranscript): Transcript to format

148

**kwargs: Unused

149

150

Returns:

151

str: SRT formatted subtitles with sequence numbers and timestamps

152

"""

153

154

def format_transcripts(self, transcripts, **kwargs):

155

"""

156

Convert multiple transcripts to SRT format.

157

158

Args:

159

transcripts (List[FetchedTranscript]): Transcripts to format

160

**kwargs: Unused

161

162

Returns:

163

str: Combined SRT formatted subtitles

164

"""

165

```

166

167

### WebVTT Formatter

168

169

Creates WebVTT subtitle files for web video players and HTML5 video elements.

170

171

```python { .api }

172

class WebVTTFormatter(Formatter):

173

def format_transcript(self, transcript, **kwargs):

174

"""

175

Convert transcript to WebVTT subtitle format.

176

177

Args:

178

transcript (FetchedTranscript): Transcript to format

179

**kwargs: Unused

180

181

Returns:

182

str: WebVTT formatted subtitles with WEBVTT header

183

"""

184

185

def format_transcripts(self, transcripts, **kwargs):

186

"""

187

Convert multiple transcripts to WebVTT format.

188

189

Args:

190

transcripts (List[FetchedTranscript]): Transcripts to format

191

**kwargs: Unused

192

193

Returns:

194

str: Combined WebVTT formatted subtitles

195

"""

196

```

197

198

### Formatter Loader

199

200

Utility class for loading formatters by type string. Provides a convenient interface for dynamic formatter selection.

201

202

```python { .api }

203

class FormatterLoader:

204

TYPES = {

205

"json": JSONFormatter,

206

"pretty": PrettyPrintFormatter,

207

"text": TextFormatter,

208

"webvtt": WebVTTFormatter,

209

"srt": SRTFormatter,

210

}

211

212

def load(self, formatter_type="pretty"):

213

"""

214

Load formatter by type string.

215

216

Args:

217

formatter_type (str): Formatter type name. Defaults to "pretty"

218

219

Returns:

220

Formatter: Formatter instance

221

222

Raises:

223

UnknownFormatterType: Invalid formatter type

224

"""

225

226

class UnknownFormatterType(Exception):

227

def __init__(self, formatter_type):

228

"""

229

Exception for invalid formatter types.

230

231

Args:

232

formatter_type (str): The invalid formatter type

233

"""

234

```

235

236

## Usage Examples

237

238

### Basic Formatting

239

240

```python

241

from youtube_transcript_api import YouTubeTranscriptApi

242

from youtube_transcript_api.formatters import JSONFormatter, TextFormatter

243

244

api = YouTubeTranscriptApi()

245

transcript = api.fetch('dQw4w9WgXcQ')

246

247

# JSON format

248

json_formatter = JSONFormatter()

249

json_output = json_formatter.format_transcript(transcript)

250

print(json_output)

251

252

# Plain text format

253

text_formatter = TextFormatter()

254

text_output = text_formatter.format_transcript(transcript)

255

print(text_output)

256

```

257

258

### Subtitle File Creation

259

260

```python

261

from youtube_transcript_api import YouTubeTranscriptApi

262

from youtube_transcript_api.formatters import SRTFormatter, WebVTTFormatter

263

264

api = YouTubeTranscriptApi()

265

transcript = api.fetch('dQw4w9WgXcQ')

266

267

# Create SRT subtitle file

268

srt_formatter = SRTFormatter()

269

srt_content = srt_formatter.format_transcript(transcript)

270

271

with open('subtitles.srt', 'w', encoding='utf-8') as f:

272

f.write(srt_content)

273

274

# Create WebVTT subtitle file

275

webvtt_formatter = WebVTTFormatter()

276

webvtt_content = webvtt_formatter.format_transcript(transcript)

277

278

with open('subtitles.vtt', 'w', encoding='utf-8') as f:

279

f.write(webvtt_content)

280

```

281

282

### Using FormatterLoader

283

284

```python

285

from youtube_transcript_api import YouTubeTranscriptApi

286

from youtube_transcript_api.formatters import FormatterLoader

287

288

api = YouTubeTranscriptApi()

289

transcript = api.fetch('dQw4w9WgXcQ')

290

291

loader = FormatterLoader()

292

293

# Load different formatters dynamically

294

for format_type in ['json', 'text', 'srt', 'webvtt', 'pretty']:

295

formatter = loader.load(format_type)

296

output = formatter.format_transcript(transcript)

297

print(f"=== {format_type.upper()} ===")

298

print(output[:200] + "..." if len(output) > 200 else output)

299

print()

300

```

301

302

### JSON Formatting with Options

303

304

```python

305

from youtube_transcript_api import YouTubeTranscriptApi

306

from youtube_transcript_api.formatters import JSONFormatter

307

import json

308

309

api = YouTubeTranscriptApi()

310

transcript = api.fetch('dQw4w9WgXcQ')

311

312

json_formatter = JSONFormatter()

313

314

# Pretty printed JSON

315

pretty_json = json_formatter.format_transcript(transcript, indent=2, ensure_ascii=False)

316

print(pretty_json)

317

318

# Compact JSON

319

compact_json = json_formatter.format_transcript(transcript, separators=(',', ':'))

320

print(compact_json)

321

```

322

323

### Multiple Transcripts

324

325

```python

326

from youtube_transcript_api import YouTubeTranscriptApi

327

from youtube_transcript_api.formatters import TextFormatter

328

329

api = YouTubeTranscriptApi()

330

331

# Get transcripts in different languages

332

video_ids = ['dQw4w9WgXcQ', 'jNQXAC9IVRw']

333

transcripts = []

334

335

for video_id in video_ids:

336

try:

337

transcript = api.fetch(video_id)

338

transcripts.append(transcript)

339

except Exception as e:

340

print(f"Failed to fetch {video_id}: {e}")

341

342

# Format all transcripts together

343

if transcripts:

344

text_formatter = TextFormatter()

345

combined_text = text_formatter.format_transcripts(transcripts)

346

print(combined_text)

347

```

348

349

## Types

350

351

```python { .api }

352

from typing import List

353

from youtube_transcript_api._transcripts import FetchedTranscript

354

355

# Formatter interface types

356

FormatterType = str # One of: "json", "text", "pretty", "srt", "webvtt"

357

FormatterKwargs = dict # Formatter-specific keyword arguments

358

```