Tessl Tile for pypi/pytube@15.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

captions.md collections.md exceptions.md index.md stream-management.md video-downloads.md

captions.mddocs/

0
# Caption and Subtitle Support
1

2
Caption track extraction and conversion to .srt format with support for multiple languages and automatic subtitle generation from YouTube videos.
3

4
## Capabilities
5

6
### Caption Class
7

8
Represents an individual caption track with language-specific subtitle data and conversion capabilities.
9

10
```python { .api }
11
class Caption:
12
    def __init__(self, caption_track: Dict):
13
        """
14
        Initialize a Caption object.
15
        
16
        Args:
17
            caption_track (dict): Caption track metadata dictionary
18
        """
19
```
20

21
### Caption Properties
22

23
Access caption track information and content.
24

25
```python { .api }
26
@property
27
def url(self) -> str:
28
    """Get the URL for downloading the caption track."""
29

30
@property
31
def name(self) -> str:
32
    """Get the human-readable name of the caption track (e.g., 'English', 'Spanish')."""
33

34
@property
35
def code(self) -> str:
36
    """Get the language code for the caption track (e.g., 'en', 'es', 'fr')."""
37

38
@property
39
def xml_captions(self) -> str:
40
    """Get the raw XML caption data from YouTube."""
41

42
@property
43
def json_captions(self) -> dict:
44
    """Get the parsed JSON caption data."""
45
```
46

47
### Caption Conversion
48

49
Convert caption data between formats.
50

51
```python { .api }
52
def generate_srt_captions(self) -> str:
53
    """
54
    Convert the caption track to SRT (SubRip) format.
55
    
56
    Returns:
57
        str: Caption content in SRT format with timestamps and text
58
    """
59
```
60

61
### Caption Download
62

63
Download caption files with various format options.
64

65
```python { .api }
66
def download(
67
    self,
68
    title: str,
69
    srt: bool = True,
70
    output_path: Optional[str] = None,
71
    filename_prefix: Optional[str] = None
72
) -> str:
73
    """
74
    Download the caption track to a file.
75
    
76
    Args:
77
        title (str): Base filename for the caption file
78
        srt (bool): Convert to SRT format (default: True)
79
        output_path (str, optional): Directory to save the file
80
        filename_prefix (str, optional): Prefix to add to filename
81
        
82
    Returns:
83
        str: Path to the downloaded caption file
84
    """
85
```
86

87
### Static Caption Utilities
88

89
Utility methods for caption format conversion.
90

91
```python { .api }
92
@staticmethod
93
def float_to_srt_time_format(d: float) -> str:
94
    """
95
    Convert a float timestamp to SRT time format.
96
    
97
    Args:
98
        d (float): Time in seconds as a float
99
        
100
    Returns:
101
        str: Time in SRT format (HH:MM:SS,mmm)
102
    """
103

104
@staticmethod  
105
def xml_caption_to_srt(xml_captions: str) -> str:
106
    """
107
    Convert XML caption data to SRT format.
108
    
109
    Args:
110
        xml_captions (str): Raw XML caption content
111
        
112
    Returns:
113
        str: Caption content converted to SRT format
114
    """
115
```
116

117
### CaptionQuery Class
118

119
Query interface for caption collections providing dictionary-like access to caption tracks by language code.
120

121
```python { .api }
122
class CaptionQuery:
123
    def __init__(self, captions: List[Caption]):
124
        """
125
        Initialize CaptionQuery with a list of caption tracks.
126
        
127
        Args:
128
            captions (List[Caption]): List of available caption tracks
129
        """
130
```
131

132
### Caption Access
133

134
Access caption tracks by language code and iterate through available captions.
135

136
```python { .api }
137
def __getitem__(self, lang_code: str) -> Caption:
138
    """
139
    Get caption track by language code.
140
    
141
    Args:
142
        lang_code (str): Language code (e.g., 'en', 'es', 'fr')
143
        
144
    Returns:
145
        Caption: Caption track for the specified language
146
        
147
    Raises:
148
        KeyError: If language code is not found
149
    """
150

151
def __len__(self) -> int:
152
    """
153
    Get the number of available caption tracks.
154
    
155
    Returns:
156
        int: Number of caption tracks
157
    """
158

159
def __iter__(self) -> Iterator[Caption]:
160
    """
161
    Iterate through all available caption tracks.
162
    
163
    Returns:
164
        Iterator[Caption]: Iterator over caption tracks
165
    """
166

167
### Deprecated Methods
168

169
Legacy methods maintained for backward compatibility.
170

171
```python { .api }
172
def get_by_language_code(self, lang_code: str) -> Optional[Caption]:
173
    """
174
    Get caption track by language code.
175
    
176
    **DEPRECATED**: Use dictionary-style access with captions[lang_code] instead.
177
    
178
    Args:
179
        lang_code (str): Language code (e.g., 'en', 'es')
180
        
181
    Returns:
182
        Caption or None: Caption track for the specified language
183
    """
184

185
def all(self) -> List[Caption]:
186
    """
187
    Get all the results represented by this query as a list.
188
    
189
    **DEPRECATED**: CaptionQuery can be treated as a dictionary/iterable directly.
190
    
191
    Returns:
192
        List[Caption]: All caption tracks
193
    """
194
```
195

196
## Usage Examples
197

198
### Basic Caption Download
199

200
```python
201
from pytube import YouTube
202

203
# Get video with captions
204
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
205

206
# Check available caption tracks
207
print("Available captions:")
208
for caption in yt.captions:
209
    print(f"- {caption.name} ({caption.code})")
210

211
# Download English captions
212
if 'en' in yt.captions:
213
    caption = yt.captions['en']
214
    caption.download(title=yt.title)
215
    print(f"Downloaded captions: {caption.name}")
216
```
217

218
### SRT Format Conversion
219

220
```python
221
from pytube import YouTube
222

223
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
224

225
# Get English captions and convert to SRT
226
if 'en' in yt.captions:
227
    caption = yt.captions['en']
228
    
229
    # Generate SRT content
230
    srt_content = caption.generate_srt_captions()
231
    
232
    # Save to custom file
233
    with open('custom_captions.srt', 'w', encoding='utf-8') as f:
234
        f.write(srt_content)
235
    
236
    print("SRT file created: custom_captions.srt")
237
```
238

239
### Multiple Language Downloads
240

241
```python
242
from pytube import YouTube
243
import os
244

245
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
246

247
# Create captions directory
248
captions_dir = "captions"
249
os.makedirs(captions_dir, exist_ok=True)
250

251
# Download all available caption tracks
252
for caption in yt.captions:
253
    try:
254
        file_path = caption.download(
255
            title=yt.title,
256
            output_path=captions_dir,
257
            filename_prefix=f"{caption.code}_"
258
        )
259
        print(f"Downloaded {caption.name}: {file_path}")
260
    except Exception as e:
261
        print(f"Failed to download {caption.name}: {e}")
262
```
263

264
### Caption Content Analysis
265

266
```python
267
from pytube import YouTube
268

269
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
270

271
if 'en' in yt.captions:
272
    caption = yt.captions['en']
273
    
274
    # Get raw caption data
275
    xml_data = caption.xml_captions
276
    json_data = caption.json_captions
277
    
278
    print(f"XML data length: {len(xml_data)} characters")
279
    print(f"JSON entries: {len(json_data.get('events', []))}")
280
    
281
    # Convert to SRT and analyze
282
    srt_content = caption.generate_srt_captions()
283
    srt_lines = srt_content.split('\n')
284
    subtitle_count = srt_content.count('\n\n') + 1
285
    
286
    print(f"SRT content: {len(srt_lines)} lines")
287
    print(f"Number of subtitles: {subtitle_count}")
288
```
289

290
### Custom SRT Processing
291

292
```python
293
from pytube import YouTube
294
import re
295

296
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
297

298
if 'en' in yt.captions:
299
    caption = yt.captions['en']
300
    srt_content = caption.generate_srt_captions()
301
    
302
    # Extract all subtitle text (remove timestamps and numbering)
303
    subtitle_pattern = r'\d+\n\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}\n(.+?)(?=\n\n|\n\d+\n|\Z)'
304
    matches = re.findall(subtitle_pattern, srt_content, re.DOTALL)
305
    
306
    all_text = ' '.join(match.replace('\n', ' ') for match in matches)
307
    print(f"Full transcript: {all_text[:200]}...")
308
```
309

310
### Error Handling
311

312
```python
313
from pytube import YouTube
314

315
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
316

317
# Check if captions are available
318
if len(yt.captions) == 0:
319
    print("No captions available for this video")
320
else:
321
    print(f"Found {len(yt.captions)} caption tracks")
322
    
323
    # Try to get specific language with fallback
324
    preferred_languages = ['en', 'en-US', 'en-GB']
325
    
326
    selected_caption = None
327
    for lang in preferred_languages:
328
        if lang in yt.captions:
329
            selected_caption = yt.captions[lang]
330
            break
331
    
332
    if selected_caption:
333
        try:
334
            selected_caption.download(title=yt.title)
335
            print(f"Downloaded captions: {selected_caption.name}")
336
        except Exception as e:
337
            print(f"Download failed: {e}")
338
    else:
339
        # Fall back to first available caption
340
        first_caption = next(iter(yt.captions))
341
        print(f"Using fallback caption: {first_caption.name}")
342
        first_caption.download(title=yt.title)
343
```
344

345
### Time-based Caption Extraction
346

347
```python
348
from pytube import YouTube
349
import json
350

351
def extract_captions_for_timerange(caption, start_seconds, end_seconds):
352
    """Extract captions for a specific time range."""
353
    json_data = caption.json_captions
354
    events = json_data.get('events', [])
355
    
356
    selected_captions = []
357
    for event in events:
358
        if 'tStartMs' in event and 'dDurationMs' in event:
359
            start_ms = event['tStartMs']
360
            duration_ms = event['dDurationMs']
361
            start_time = start_ms / 1000
362
            end_time = (start_ms + duration_ms) / 1000
363
            
364
            # Check if this caption overlaps with our time range
365
            if start_time < end_seconds and end_time > start_seconds:
366
                if 'segs' in event:
367
                    text = ''.join(seg.get('utf8', '') for seg in event['segs'])
368
                    selected_captions.append({
369
                        'start': start_time,
370
                        'end': end_time,
371
                        'text': text.strip()
372
                    })
373
    
374
    return selected_captions
375

376
# Usage
377
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
378
if 'en' in yt.captions:
379
    caption = yt.captions['en']
380
    
381
    # Get captions for first 60 seconds
382
    timerange_captions = extract_captions_for_timerange(caption, 0, 60)
383
    
384
    for cap in timerange_captions:
385
        print(f"{cap['start']:.1f}s - {cap['end']:.1f}s: {cap['text']}")
386
```
387

388
## Types
389

390
```python { .api }
391
from typing import Dict, List, Optional, Iterator
392

393
# Caption track metadata structure
394
CaptionTrackDict = Dict[str, Any]
395

396
# JSON caption event structure  
397
CaptionEvent = Dict[str, Any]
398
```

Version

Tile

Files

captions.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

captions.mddocs/