Python API for retrieving YouTube video transcripts and subtitles without browser automation
npx @tessl/cli install tessl/pypi-youtube-transcript-api@1.2.00
# YouTube Transcript API
1
2
A Python API for retrieving YouTube video transcripts and subtitles without requiring browser automation. Supports manually created and automatically generated subtitles, transcript translation, multiple output formats, and proxy configuration for working around IP restrictions.
3
4
## Package Information
5
6
- **Package Name**: youtube-transcript-api
7
- **Language**: Python
8
- **Installation**: `pip install youtube-transcript-api`
9
10
## Core Imports
11
12
```python
13
from youtube_transcript_api import YouTubeTranscriptApi
14
```
15
16
For specific functionality:
17
18
```python
19
from youtube_transcript_api import (
20
YouTubeTranscriptApi,
21
TranscriptList,
22
Transcript,
23
FetchedTranscript,
24
FetchedTranscriptSnippet,
25
YouTubeTranscriptApiException
26
)
27
```
28
29
For formatters:
30
31
```python
32
from youtube_transcript_api.formatters import (
33
JSONFormatter,
34
TextFormatter,
35
SRTFormatter,
36
WebVTTFormatter,
37
PrettyPrintFormatter,
38
FormatterLoader
39
)
40
```
41
42
For proxy configuration:
43
44
```python
45
from youtube_transcript_api.proxies import (
46
GenericProxyConfig,
47
WebshareProxyConfig
48
)
49
```
50
51
## Basic Usage
52
53
```python
54
from youtube_transcript_api import YouTubeTranscriptApi
55
56
# Simple transcript fetch
57
api = YouTubeTranscriptApi()
58
transcript = api.fetch('video_id')
59
60
# Process transcript data
61
for snippet in transcript:
62
print(f"{snippet.start}: {snippet.text}")
63
64
# Get list of available transcripts
65
transcript_list = api.list('video_id')
66
for t in transcript_list:
67
print(f"{t.language_code}: {t.language} ({'generated' if t.is_generated else 'manual'})")
68
69
# Fetch specific language with fallback
70
transcript = transcript_list.find_transcript(['es', 'en'])
71
fetched = transcript.fetch()
72
73
# Translate transcript
74
translated = transcript.translate('fr')
75
french_transcript = translated.fetch()
76
```
77
78
## Architecture
79
80
The library uses a hierarchical structure for transcript management:
81
82
- **YouTubeTranscriptApi**: Main entry point for all operations
83
- **TranscriptList**: Container for all available transcripts for a video
84
- **Transcript**: Individual transcript metadata and fetching capabilities
85
- **FetchedTranscript**: Actual transcript content with timing information
86
- **FetchedTranscriptSnippet**: Individual text segments with timestamps
87
88
This design enables efficient discovery of available transcripts, flexible language selection with fallbacks, and lazy loading of transcript content only when needed.
89
90
## Capabilities
91
92
### Core API Functions
93
94
Main API class for retrieving transcripts with support for language selection, proxy configuration, and custom HTTP clients.
95
96
```python { .api }
97
class YouTubeTranscriptApi:
98
def __init__(self, proxy_config=None, http_client=None): ...
99
def fetch(self, video_id, languages=("en",), preserve_formatting=False): ...
100
def list(self, video_id): ...
101
```
102
103
[Core API](./core-api.md)
104
105
### Transcript Data Structures
106
107
Data classes for representing transcript lists, individual transcripts, and fetched content with timing information.
108
109
```python { .api }
110
class TranscriptList:
111
def find_transcript(self, language_codes): ...
112
def find_generated_transcript(self, language_codes): ...
113
def find_manually_created_transcript(self, language_codes): ...
114
115
class Transcript:
116
def fetch(self, preserve_formatting=False): ...
117
def translate(self, language_code): ...
118
119
class FetchedTranscript:
120
def to_raw_data(self): ...
121
```
122
123
[Data Structures](./data-structures.md)
124
125
### Output Formatters
126
127
Classes for converting transcript data into various output formats including JSON, plain text, SRT subtitles, and WebVTT.
128
129
```python { .api }
130
class JSONFormatter:
131
def format_transcript(self, transcript, **kwargs): ...
132
def format_transcripts(self, transcripts, **kwargs): ...
133
134
class SRTFormatter:
135
def format_transcript(self, transcript, **kwargs): ...
136
137
class WebVTTFormatter:
138
def format_transcript(self, transcript, **kwargs): ...
139
```
140
141
[Formatters](./formatters.md)
142
143
### Proxy Configuration
144
145
Classes for configuring HTTP proxies to work around IP blocking, including generic proxy support and specialized Webshare residential proxy integration.
146
147
```python { .api }
148
class GenericProxyConfig:
149
def __init__(self, http_url=None, https_url=None): ...
150
151
class WebshareProxyConfig:
152
def __init__(self, proxy_username, proxy_password, **kwargs): ...
153
```
154
155
[Proxy Configuration](./proxy-config.md)
156
157
### Error Handling
158
159
Comprehensive exception hierarchy for handling all error scenarios including video unavailability, IP blocking, missing transcripts, and translation errors.
160
161
```python { .api }
162
class YouTubeTranscriptApiException(Exception): ...
163
class CouldNotRetrieveTranscript(YouTubeTranscriptApiException): ...
164
class VideoUnavailable(CouldNotRetrieveTranscript): ...
165
class TranscriptsDisabled(CouldNotRetrieveTranscript): ...
166
class NoTranscriptFound(CouldNotRetrieveTranscript): ...
167
```
168
169
[Error Handling](./error-handling.md)