0
# Output Formatters
1
2
Classes for converting transcript data into various output formats. Supports JSON, plain text, SRT subtitles, WebVTT, and pretty-printed formats for different use cases.
3
4
## Capabilities
5
6
### Base Formatter Class
7
8
Abstract base class defining the formatter interface. All concrete formatters inherit from this class.
9
10
```python { .api }
11
class Formatter:
12
def format_transcript(self, transcript, **kwargs):
13
"""
14
Format a single transcript.
15
16
Args:
17
transcript (FetchedTranscript): Transcript to format
18
**kwargs: Formatter-specific options
19
20
Returns:
21
str: Formatted transcript string
22
23
Raises:
24
NotImplementedError: Must be implemented by subclasses
25
"""
26
27
def format_transcripts(self, transcripts, **kwargs):
28
"""
29
Format multiple transcripts.
30
31
Args:
32
transcripts (List[FetchedTranscript]): Transcripts to format
33
**kwargs: Formatter-specific options
34
35
Returns:
36
str: Formatted transcripts string
37
38
Raises:
39
NotImplementedError: Must be implemented by subclasses
40
"""
41
```
42
43
### JSON Formatter
44
45
Converts transcript data to JSON format for programmatic processing and data interchange.
46
47
```python { .api }
48
class JSONFormatter(Formatter):
49
def format_transcript(self, transcript, **kwargs):
50
"""
51
Convert transcript to JSON string.
52
53
Args:
54
transcript (FetchedTranscript): Transcript to format
55
**kwargs: Passed to json.dumps() (indent, ensure_ascii, etc.)
56
57
Returns:
58
str: JSON representation of transcript data
59
"""
60
61
def format_transcripts(self, transcripts, **kwargs):
62
"""
63
Convert multiple transcripts to JSON array string.
64
65
Args:
66
transcripts (List[FetchedTranscript]): Transcripts to format
67
**kwargs: Passed to json.dumps()
68
69
Returns:
70
str: JSON array of transcript data
71
"""
72
```
73
74
### Text Formatter
75
76
Converts transcripts to plain text with no timestamps. Useful for text analysis and content extraction.
77
78
```python { .api }
79
class TextFormatter(Formatter):
80
def format_transcript(self, transcript, **kwargs):
81
"""
82
Convert transcript to plain text (no timestamps).
83
84
Args:
85
transcript (FetchedTranscript): Transcript to format
86
**kwargs: Unused
87
88
Returns:
89
str: Plain text with lines separated by newlines
90
"""
91
92
def format_transcripts(self, transcripts, **kwargs):
93
"""
94
Convert multiple transcripts to plain text.
95
96
Args:
97
transcripts (List[FetchedTranscript]): Transcripts to format
98
**kwargs: Unused
99
100
Returns:
101
str: Plain text with transcripts separated by triple newlines
102
"""
103
```
104
105
### Pretty Print Formatter
106
107
Human-readable formatted output using Python's pprint module for debugging and inspection.
108
109
```python { .api }
110
class PrettyPrintFormatter(Formatter):
111
def format_transcript(self, transcript, **kwargs):
112
"""
113
Pretty print transcript data.
114
115
Args:
116
transcript (FetchedTranscript): Transcript to format
117
**kwargs: Passed to pprint.pformat()
118
119
Returns:
120
str: Pretty formatted transcript representation
121
"""
122
123
def format_transcripts(self, transcripts, **kwargs):
124
"""
125
Pretty print multiple transcripts.
126
127
Args:
128
transcripts (List[FetchedTranscript]): Transcripts to format
129
**kwargs: Passed to pprint.pformat()
130
131
Returns:
132
str: Pretty formatted list of transcripts
133
"""
134
```
135
136
### SRT Formatter
137
138
Creates SRT (SubRip) subtitle files compatible with video players and subtitle software.
139
140
```python { .api }
141
class SRTFormatter(Formatter):
142
def format_transcript(self, transcript, **kwargs):
143
"""
144
Convert transcript to SRT subtitle format.
145
146
Args:
147
transcript (FetchedTranscript): Transcript to format
148
**kwargs: Unused
149
150
Returns:
151
str: SRT formatted subtitles with sequence numbers and timestamps
152
"""
153
154
def format_transcripts(self, transcripts, **kwargs):
155
"""
156
Convert multiple transcripts to SRT format.
157
158
Args:
159
transcripts (List[FetchedTranscript]): Transcripts to format
160
**kwargs: Unused
161
162
Returns:
163
str: Combined SRT formatted subtitles
164
"""
165
```
166
167
### WebVTT Formatter
168
169
Creates WebVTT subtitle files for web video players and HTML5 video elements.
170
171
```python { .api }
172
class WebVTTFormatter(Formatter):
173
def format_transcript(self, transcript, **kwargs):
174
"""
175
Convert transcript to WebVTT subtitle format.
176
177
Args:
178
transcript (FetchedTranscript): Transcript to format
179
**kwargs: Unused
180
181
Returns:
182
str: WebVTT formatted subtitles with WEBVTT header
183
"""
184
185
def format_transcripts(self, transcripts, **kwargs):
186
"""
187
Convert multiple transcripts to WebVTT format.
188
189
Args:
190
transcripts (List[FetchedTranscript]): Transcripts to format
191
**kwargs: Unused
192
193
Returns:
194
str: Combined WebVTT formatted subtitles
195
"""
196
```
197
198
### Formatter Loader
199
200
Utility class for loading formatters by type string. Provides a convenient interface for dynamic formatter selection.
201
202
```python { .api }
203
class FormatterLoader:
204
TYPES = {
205
"json": JSONFormatter,
206
"pretty": PrettyPrintFormatter,
207
"text": TextFormatter,
208
"webvtt": WebVTTFormatter,
209
"srt": SRTFormatter,
210
}
211
212
def load(self, formatter_type="pretty"):
213
"""
214
Load formatter by type string.
215
216
Args:
217
formatter_type (str): Formatter type name. Defaults to "pretty"
218
219
Returns:
220
Formatter: Formatter instance
221
222
Raises:
223
UnknownFormatterType: Invalid formatter type
224
"""
225
226
class UnknownFormatterType(Exception):
227
def __init__(self, formatter_type):
228
"""
229
Exception for invalid formatter types.
230
231
Args:
232
formatter_type (str): The invalid formatter type
233
"""
234
```
235
236
## Usage Examples
237
238
### Basic Formatting
239
240
```python
241
from youtube_transcript_api import YouTubeTranscriptApi
242
from youtube_transcript_api.formatters import JSONFormatter, TextFormatter
243
244
api = YouTubeTranscriptApi()
245
transcript = api.fetch('dQw4w9WgXcQ')
246
247
# JSON format
248
json_formatter = JSONFormatter()
249
json_output = json_formatter.format_transcript(transcript)
250
print(json_output)
251
252
# Plain text format
253
text_formatter = TextFormatter()
254
text_output = text_formatter.format_transcript(transcript)
255
print(text_output)
256
```
257
258
### Subtitle File Creation
259
260
```python
261
from youtube_transcript_api import YouTubeTranscriptApi
262
from youtube_transcript_api.formatters import SRTFormatter, WebVTTFormatter
263
264
api = YouTubeTranscriptApi()
265
transcript = api.fetch('dQw4w9WgXcQ')
266
267
# Create SRT subtitle file
268
srt_formatter = SRTFormatter()
269
srt_content = srt_formatter.format_transcript(transcript)
270
271
with open('subtitles.srt', 'w', encoding='utf-8') as f:
272
f.write(srt_content)
273
274
# Create WebVTT subtitle file
275
webvtt_formatter = WebVTTFormatter()
276
webvtt_content = webvtt_formatter.format_transcript(transcript)
277
278
with open('subtitles.vtt', 'w', encoding='utf-8') as f:
279
f.write(webvtt_content)
280
```
281
282
### Using FormatterLoader
283
284
```python
285
from youtube_transcript_api import YouTubeTranscriptApi
286
from youtube_transcript_api.formatters import FormatterLoader
287
288
api = YouTubeTranscriptApi()
289
transcript = api.fetch('dQw4w9WgXcQ')
290
291
loader = FormatterLoader()
292
293
# Load different formatters dynamically
294
for format_type in ['json', 'text', 'srt', 'webvtt', 'pretty']:
295
formatter = loader.load(format_type)
296
output = formatter.format_transcript(transcript)
297
print(f"=== {format_type.upper()} ===")
298
print(output[:200] + "..." if len(output) > 200 else output)
299
print()
300
```
301
302
### JSON Formatting with Options
303
304
```python
305
from youtube_transcript_api import YouTubeTranscriptApi
306
from youtube_transcript_api.formatters import JSONFormatter
307
import json
308
309
api = YouTubeTranscriptApi()
310
transcript = api.fetch('dQw4w9WgXcQ')
311
312
json_formatter = JSONFormatter()
313
314
# Pretty printed JSON
315
pretty_json = json_formatter.format_transcript(transcript, indent=2, ensure_ascii=False)
316
print(pretty_json)
317
318
# Compact JSON
319
compact_json = json_formatter.format_transcript(transcript, separators=(',', ':'))
320
print(compact_json)
321
```
322
323
### Multiple Transcripts
324
325
```python
326
from youtube_transcript_api import YouTubeTranscriptApi
327
from youtube_transcript_api.formatters import TextFormatter
328
329
api = YouTubeTranscriptApi()
330
331
# Get transcripts in different languages
332
video_ids = ['dQw4w9WgXcQ', 'jNQXAC9IVRw']
333
transcripts = []
334
335
for video_id in video_ids:
336
try:
337
transcript = api.fetch(video_id)
338
transcripts.append(transcript)
339
except Exception as e:
340
print(f"Failed to fetch {video_id}: {e}")
341
342
# Format all transcripts together
343
if transcripts:
344
text_formatter = TextFormatter()
345
combined_text = text_formatter.format_transcripts(transcripts)
346
print(combined_text)
347
```
348
349
## Types
350
351
```python { .api }
352
from typing import List
353
from youtube_transcript_api._transcripts import FetchedTranscript
354
355
# Formatter interface types
356
FormatterType = str # One of: "json", "text", "pretty", "srt", "webvtt"
357
FormatterKwargs = dict # Formatter-specific keyword arguments
358
```