or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-api.mddata-structures.mderror-handling.mdformatters.mdindex.mdproxy-config.md

index.mddocs/

0

# YouTube Transcript API

1

2

A Python API for retrieving YouTube video transcripts and subtitles without requiring browser automation. Supports manually created and automatically generated subtitles, transcript translation, multiple output formats, and proxy configuration for working around IP restrictions.

3

4

## Package Information

5

6

- **Package Name**: youtube-transcript-api

7

- **Language**: Python

8

- **Installation**: `pip install youtube-transcript-api`

9

10

## Core Imports

11

12

```python

13

from youtube_transcript_api import YouTubeTranscriptApi

14

```

15

16

For specific functionality:

17

18

```python

19

from youtube_transcript_api import (

20

YouTubeTranscriptApi,

21

TranscriptList,

22

Transcript,

23

FetchedTranscript,

24

FetchedTranscriptSnippet,

25

YouTubeTranscriptApiException

26

)

27

```

28

29

For formatters:

30

31

```python

32

from youtube_transcript_api.formatters import (

33

JSONFormatter,

34

TextFormatter,

35

SRTFormatter,

36

WebVTTFormatter,

37

PrettyPrintFormatter,

38

FormatterLoader

39

)

40

```

41

42

For proxy configuration:

43

44

```python

45

from youtube_transcript_api.proxies import (

46

GenericProxyConfig,

47

WebshareProxyConfig

48

)

49

```

50

51

## Basic Usage

52

53

```python

54

from youtube_transcript_api import YouTubeTranscriptApi

55

56

# Simple transcript fetch

57

api = YouTubeTranscriptApi()

58

transcript = api.fetch('video_id')

59

60

# Process transcript data

61

for snippet in transcript:

62

print(f"{snippet.start}: {snippet.text}")

63

64

# Get list of available transcripts

65

transcript_list = api.list('video_id')

66

for t in transcript_list:

67

print(f"{t.language_code}: {t.language} ({'generated' if t.is_generated else 'manual'})")

68

69

# Fetch specific language with fallback

70

transcript = transcript_list.find_transcript(['es', 'en'])

71

fetched = transcript.fetch()

72

73

# Translate transcript

74

translated = transcript.translate('fr')

75

french_transcript = translated.fetch()

76

```

77

78

## Architecture

79

80

The library uses a hierarchical structure for transcript management:

81

82

- **YouTubeTranscriptApi**: Main entry point for all operations

83

- **TranscriptList**: Container for all available transcripts for a video

84

- **Transcript**: Individual transcript metadata and fetching capabilities

85

- **FetchedTranscript**: Actual transcript content with timing information

86

- **FetchedTranscriptSnippet**: Individual text segments with timestamps

87

88

This design enables efficient discovery of available transcripts, flexible language selection with fallbacks, and lazy loading of transcript content only when needed.

89

90

## Capabilities

91

92

### Core API Functions

93

94

Main API class for retrieving transcripts with support for language selection, proxy configuration, and custom HTTP clients.

95

96

```python { .api }

97

class YouTubeTranscriptApi:

98

def __init__(self, proxy_config=None, http_client=None): ...

99

def fetch(self, video_id, languages=("en",), preserve_formatting=False): ...

100

def list(self, video_id): ...

101

```

102

103

[Core API](./core-api.md)

104

105

### Transcript Data Structures

106

107

Data classes for representing transcript lists, individual transcripts, and fetched content with timing information.

108

109

```python { .api }

110

class TranscriptList:

111

def find_transcript(self, language_codes): ...

112

def find_generated_transcript(self, language_codes): ...

113

def find_manually_created_transcript(self, language_codes): ...

114

115

class Transcript:

116

def fetch(self, preserve_formatting=False): ...

117

def translate(self, language_code): ...

118

119

class FetchedTranscript:

120

def to_raw_data(self): ...

121

```

122

123

[Data Structures](./data-structures.md)

124

125

### Output Formatters

126

127

Classes for converting transcript data into various output formats including JSON, plain text, SRT subtitles, and WebVTT.

128

129

```python { .api }

130

class JSONFormatter:

131

def format_transcript(self, transcript, **kwargs): ...

132

def format_transcripts(self, transcripts, **kwargs): ...

133

134

class SRTFormatter:

135

def format_transcript(self, transcript, **kwargs): ...

136

137

class WebVTTFormatter:

138

def format_transcript(self, transcript, **kwargs): ...

139

```

140

141

[Formatters](./formatters.md)

142

143

### Proxy Configuration

144

145

Classes for configuring HTTP proxies to work around IP blocking, including generic proxy support and specialized Webshare residential proxy integration.

146

147

```python { .api }

148

class GenericProxyConfig:

149

def __init__(self, http_url=None, https_url=None): ...

150

151

class WebshareProxyConfig:

152

def __init__(self, proxy_username, proxy_password, **kwargs): ...

153

```

154

155

[Proxy Configuration](./proxy-config.md)

156

157

### Error Handling

158

159

Comprehensive exception hierarchy for handling all error scenarios including video unavailability, IP blocking, missing transcripts, and translation errors.

160

161

```python { .api }

162

class YouTubeTranscriptApiException(Exception): ...

163

class CouldNotRetrieveTranscript(YouTubeTranscriptApiException): ...

164

class VideoUnavailable(CouldNotRetrieveTranscript): ...

165

class TranscriptsDisabled(CouldNotRetrieveTranscript): ...

166

class NoTranscriptFound(CouldNotRetrieveTranscript): ...

167

```

168

169

[Error Handling](./error-handling.md)