Tessl Tile for pypi/youtube-dl@2021.12.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

downloaders.md extractors.md index.md main-downloader.md post-processors.md utilities.md

extractors.mddocs/

0
# Information Extractors
1

2
Information extractors are site-specific modules that handle video metadata extraction from over 1000 supported sites. Each extractor understands the specific URL patterns, API interfaces, and data structures for its target site.
3

4
## Capabilities
5

6
### Extractor Management
7

8
Functions for discovering, listing, and managing available extractors.
9

10
```python { .api }
11
def gen_extractors():
12
    """
13
    Return a list of instances of every supported extractor.
14
    The order matters; the first extractor matched handles the URL.
15
    
16
    Returns:
17
    list: List of extractor instances
18
    """
19

20
def gen_extractor_classes():
21
    """
22
    Return a list of supported extractor classes.
23
    The order matters; the first extractor matched handles the URL.
24
    
25
    Returns:
26
    list: List of extractor classes
27
    """
28

29
def list_extractors(age_limit):
30
    """
31
    Return a list of extractors suitable for the given age limit,
32
    sorted by extractor ID.
33
    
34
    Parameters:
35
    - age_limit (int): Age limit for content filtering
36
    
37
    Returns:
38
    list: List of suitable extractor instances
39
    """
40

41
def get_info_extractor(ie_name):
42
    """
43
    Returns the info extractor class with the given name.
44
    
45
    Parameters:
46
    - ie_name (str): Extractor name (without 'IE' suffix)
47
    
48
    Returns:
49
    class: Extractor class
50
    """
51
```
52

53
### Base InfoExtractor Class
54

55
Base class that all site-specific extractors inherit from, providing common functionality and interfaces.
56

57
```python { .api }
58
class InfoExtractor:
59
    def __init__(self, downloader=None):
60
        """
61
        Base class for information extractors.
62
        
63
        Parameters:
64
        - downloader: YoutubeDL instance
65
        """
66
    
67
    def suitable(self, url):
68
        """
69
        Check if the extractor is suitable for the given URL.
70
        
71
        Parameters:
72
        - url (str): URL to check
73
        
74
        Returns:
75
        bool: True if suitable, False otherwise
76
        """
77
    
78
    def extract(self, url):
79
        """
80
        Extract information from the given URL.
81
        
82
        Parameters:
83
        - url (str): URL to extract from
84
        
85
        Returns:
86
        dict: Extracted information dictionary
87
        """
88
    
89
    def _real_extract(self, url):
90
        """
91
        Actual extraction logic (implemented by subclasses).
92
        
93
        Parameters:
94
        - url (str): URL to extract from
95
        
96
        Returns:
97
        dict: Extracted information dictionary
98
        """
99
```
100

101
### Common Extractor Methods
102

103
Utility methods available to all extractors for common operations.
104

105
```python { .api }
106
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
107
    """
108
    Download webpage content.
109
    
110
    Parameters:
111
    - url_or_request: URL string or Request object
112
    - video_id (str): Video identifier for error reporting
113
    - note (str): Progress note to display
114
    - errnote (str): Error note for failures
115
    - fatal (bool): Whether to raise error on failure
116
    - tries (int): Number of retry attempts
117
    - timeout (float): Request timeout
118
    - encoding (str): Character encoding
119
    - data: POST data
120
    - headers (dict): HTTP headers
121
    - query (dict): URL query parameters
122
    
123
    Returns:
124
    str: Webpage content
125
    """
126

127
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', errnote='Unable to download JSON metadata', transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
128
    """
129
    Download and parse JSON data.
130
    
131
    Parameters:
132
    - url_or_request: URL string or Request object
133
    - video_id (str): Video identifier
134
    - note (str): Progress note
135
    - errnote (str): Error note
136
    - transform_source (callable): Function to transform JSON source
137
    - fatal (bool): Whether to raise error on failure
138
    - encoding (str): Character encoding
139
    - data: POST data
140
    - headers (dict): HTTP headers
141
    - query (dict): URL query parameters
142
    
143
    Returns:
144
    dict: Parsed JSON data
145
    """
146

147
def _html_search_regex(self, pattern, string, name, default=None, fatal=True, flags=0, group=None):
148
    """
149
    Search for regex pattern in HTML string.
150
    
151
    Parameters:
152
    - pattern (str): Regex pattern
153
    - string (str): HTML string to search
154
    - name (str): Description for error messages
155
    - default: Default value if not found
156
    - fatal (bool): Whether to raise error if not found
157
    - flags (int): Regex flags
158
    - group (int/str): Capture group to return
159
    
160
    Returns:
161
    str: Matched text
162
    """
163
```
164

165
### Extractor Information Dictionary
166

167
Standard format for information returned by extractors.
168

169
```python { .api }
170
InfoDict = {
171
    'id': str,                    # Video identifier
172
    'title': str,                 # Video title
173
    'url': str,                   # Video URL (for single videos)
174
    'ext': str,                   # File extension
175
    'format': str,                # Format description
176
    'format_id': str,             # Format identifier
177
    'uploader': str,              # Video uploader name
178
    'uploader_id': str,           # Uploader identifier
179
    'uploader_url': str,          # Uploader profile URL
180
    'upload_date': str,           # Upload date (YYYYMMDD format)
181
    'timestamp': int,             # Upload timestamp (Unix)
182
    'duration': int,              # Duration in seconds
183
    'view_count': int,            # View count
184
    'like_count': int,            # Like count
185
    'dislike_count': int,         # Dislike count
186
    'description': str,           # Video description
187
    'tags': list,                 # List of tags
188
    'thumbnail': str,             # Thumbnail URL
189
    'thumbnails': list,           # List of thumbnail dictionaries
190
    'subtitles': dict,            # Subtitle tracks
191
    'automatic_captions': dict,   # Auto-generated captions
192
    'formats': list,              # List of available formats
193
    'playlist': str,              # Playlist title (for playlist entries)
194
    'playlist_id': str,           # Playlist identifier
195
    'playlist_index': int,        # Position in playlist
196
    'webpage_url': str,           # Original webpage URL
197
    'webpage_url_basename': str,  # Basename of webpage URL
198
    'extractor': str,             # Extractor name
199
    'extractor_key': str,         # Extractor key
200
}
201
```
202

203
### Format Dictionary
204

205
Structure for individual video/audio format information.
206

207
```python { .api }
208
FormatDict = {
209
    'format_id': str,             # Unique format identifier
210
    'url': str,                   # Direct media URL
211
    'ext': str,                   # File extension
212
    'width': int,                 # Video width
213
    'height': int,                # Video height
214
    'resolution': str,            # Resolution string
215
    'fps': float,                 # Frames per second
216
    'vcodec': str,                # Video codec
217
    'vbr': float,                 # Video bitrate
218
    'acodec': str,                # Audio codec
219
    'abr': float,                 # Audio bitrate
220
    'asr': int,                   # Audio sample rate
221
    'filesize': int,              # File size in bytes
222
    'tbr': float,                 # Total bitrate
223
    'protocol': str,              # Download protocol
224
    'preference': int,            # Format preference (-1 to 100)
225
    'quality': int,               # Quality metric
226
    'format_note': str,           # Additional format info
227
    'language': str,              # Language code
228
    'http_headers': dict,         # Required HTTP headers
229
}
230
```
231

232
## Site-Specific Extractors
233

234
youtube-dl includes extractors for over 1000 sites. Some notable ones include:
235

236
### Video Platforms
237
- **YoutubeIE**: YouTube videos, playlists, channels, live streams
238
- **VimeoIE**: Vimeo videos and channels  
239
- **DailymotionIE**: Dailymotion videos and playlists
240
- **TwitchIE**: Twitch streams and VODs
241
- **FacebookIE**: Facebook videos
242

243
### News and Media
244
- **BBCIE**: BBC iPlayer content
245
- **CNNIE**: CNN video content
246
- **NBCIE**: NBC video content
247
- **CBSIE**: CBS video content
248

249
### Social Media
250
- **TwitterIE**: Twitter videos
251
- **InstagramIE**: Instagram videos and stories
252
- **TikTokIE**: TikTok videos
253

254
### Educational
255
- **CourseraIE**: Coursera course videos
256
- **KhanAcademyIE**: Khan Academy content
257
- **TedIE**: TED Talks
258

259
## Usage Examples
260

261
### List Available Extractors
262
```python
263
from youtube_dl import list_extractors
264

265
# Get all extractors
266
extractors = list_extractors(age_limit=18)
267
for extractor in extractors:
268
    print(f"{extractor.IE_NAME}: {extractor.IE_DESC}")
269
```
270

271
### Get Specific Extractor
272
```python
273
from youtube_dl.extractor import get_info_extractor
274

275
# Get YouTube extractor class
276
YoutubeIE = get_info_extractor('Youtube')
277
extractor = YoutubeIE()
278
```
279

280
### Extract Information Only
281
```python
282
from youtube_dl import YoutubeDL
283

284
ydl_opts = {'quiet': True}
285
with YoutubeDL(ydl_opts) as ydl:
286
    info = ydl.extract_info('https://www.youtube.com/watch?v=dQw4w9WgXcQ', download=False)
287
    print(f"Title: {info['title']}")
288
    print(f"Duration: {info['duration']} seconds")
289
    print(f"Uploader: {info['uploader']}")
290
    
291
    # List available formats
292
    for fmt in info['formats']:
293
        print(f"Format: {fmt['format_id']} - {fmt['ext']} - {fmt.get('height', 'audio')}p")
294
```

Version

Tile

Files

extractors.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

extractors.mddocs/