Tessl Tile for pypi/yt-dlp@2024.12.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-download.md exceptions.md extractor-system.md index.md post-processing.md utilities.md

extractor-system.mddocs/

0
# Extractor System
1

2
The extractor system provides discovery and management of site-specific extractors that handle URL pattern matching, metadata extraction, and format enumeration for over 1000 supported video platforms including YouTube, Vimeo, Twitch, TikTok, and many others.
3

4
## Capabilities
5

6
### Extractor Discovery Functions
7

8
Functions for discovering and listing available extractors in the system.
9

10
```python { .api }
11
def gen_extractors():
12
    """
13
    Generate all available extractor instances.
14
    
15
    Yields:
16
    InfoExtractor: extractor instances
17
    """
18

19
def list_extractors(age_limit=None):
20
    """
21
    Get list of all available extractor instances, sorted by name.
22
    
23
    Parameters:
24
    - age_limit: int|None, filter by age limit
25
    
26
    Returns:
27
    list[InfoExtractor]: sorted list of extractor instances
28
    """
29

30
def gen_extractor_classes():
31
    """
32
    Generate all available extractor classes.
33
    
34
    Yields:
35
    type[InfoExtractor]: extractor classes
36
    """
37

38
def list_extractor_classes(age_limit=None):
39
    """
40
    Get list of all available extractor classes, sorted by name.
41
    
42
    Parameters:
43
    - age_limit: int|None, filter by age limit
44
    
45
    Returns:
46
    list[type[InfoExtractor]]: sorted list of extractor classes
47
    """
48

49
def get_info_extractor(ie_name):
50
    """
51
    Get specific extractor class by name.
52
    
53
    Parameters:
54
    - ie_name: str, extractor name/key
55
    
56
    Returns:
57
    type[InfoExtractor]: extractor class
58
    
59
    Raises:
60
    ValueError: if extractor not found
61
    """
62
```
63

64
### Extractor Base Classes
65

66
Core extractor infrastructure providing the foundation for all site-specific extractors.
67

68
```python { .api }
69
class InfoExtractor:
70
    """
71
    Base class for all information extractors.
72
    
73
    Provides common functionality for URL matching, information extraction,
74
    and format processing across all supported sites.
75
    """
76
    
77
    IE_NAME = None  # Extractor identifier
78
    IE_DESC = None  # Human-readable description
79
    _VALID_URL = None  # URL pattern regex
80
    _TESTS = []  # Test cases
81
    
82
    def suitable(self, url):
83
        """
84
        Check if URL is suitable for this extractor.
85
        
86
        Parameters:
87
        - url: str, URL to check
88
        
89
        Returns:
90
        bool: True if URL matches
91
        """
92
    
93
    def extract(self, url):
94
        """
95
        Extract information from URL.
96
        
97
        Parameters:
98
        - url: str, URL to extract from
99
        
100
        Returns:
101
        dict: extracted information
102
        """
103
    
104
    def _real_extract(self, url):
105
        """
106
        Perform actual extraction (implemented by subclasses).
107
        
108
        Parameters:
109
        - url: str, URL to extract from
110
        
111
        Returns:
112
        dict: extracted information
113
        """
114

115
class GenericIE(InfoExtractor):
116
    """
117
    Generic extractor that attempts to extract from any URL.
118
    
119
    Used as a fallback when no specific extractor matches the URL.
120
    Attempts to find video/audio content using generic patterns.
121
    """
122
    
123
    IE_NAME = 'generic'
124
    IE_DESC = 'Generic downloader that works on many sites'
125
```
126

127
### Popular Site Extractors
128

129
Key extractors for major video platforms (representative examples from 1000+ available).
130

131
```python { .api }
132
class YoutubeIE(InfoExtractor):
133
    """YouTube video extractor supporting various YouTube URL formats."""
134
    
135
    IE_NAME = 'youtube'
136
    
137
class VimeoIE(InfoExtractor):
138
    """Vimeo video extractor."""
139
    
140
    IE_NAME = 'vimeo'
141
    
142
class TwitchVodIE(InfoExtractor):
143
    """Twitch VOD (Video on Demand) extractor."""
144
    
145
    IE_NAME = 'twitch:vod'
146
    
147
class TikTokIE(InfoExtractor):
148
    """TikTok video extractor."""
149
    
150
    IE_NAME = 'tiktok'
151
    
152
class TwitterIE(InfoExtractor):
153
    """Twitter/X video extractor."""
154
    
155
    IE_NAME = 'twitter'
156
    
157
class InstagramIE(InfoExtractor):
158
    """Instagram video extractor."""
159
    
160
    IE_NAME = 'instagram'
161
    
162
class FacebookIE(InfoExtractor):
163
    """Facebook video extractor."""
164
    
165
    IE_NAME = 'facebook'
166
```
167

168
## Usage Examples
169

170
### List Available Extractors
171

172
```python
173
from yt_dlp import list_extractors
174

175
# Get all extractors
176
extractors = list_extractors()
177
print(f"Total extractors: {len(extractors)}")
178

179
# Print first 10 extractor names
180
for ie in extractors[:10]:
181
    print(f"- {ie.IE_NAME}: {ie.IE_DESC}")
182
```
183

184
### Check URL Compatibility
185

186
```python
187
from yt_dlp import list_extractors
188

189
url = "https://www.youtube.com/watch?v=example"
190

191
# Find compatible extractors
192
compatible = []
193
for ie in list_extractors():
194
    if ie.suitable(url):
195
        compatible.append(ie.IE_NAME)
196

197
print(f"Compatible extractors for {url}: {compatible}")
198
```
199

200
### Use Specific Extractor
201

202
```python
203
import yt_dlp
204

205
# Force use of specific extractor
206
ydl_opts = {
207
    'forcejson': True,  # Output JSON info
208
    'skip_download': True,
209
}
210

211
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
212
    # Extract using specific extractor key
213
    info = ydl.extract_info(
214
        'https://www.youtube.com/watch?v=example',
215
        ie_key='youtube'
216
    )
217
    print(f"Extractor used: {info.get('extractor')}")
218
```
219

220
### Get Extractor Information
221

222
```python
223
from yt_dlp.extractor import get_info_extractor
224

225
# Get specific extractor class
226
youtube_ie = get_info_extractor('youtube')
227
print(f"Name: {youtube_ie.IE_NAME}")
228
print(f"Description: {youtube_ie.IE_DESC}")
229

230
# Check if URL is suitable
231
url = "https://www.youtube.com/watch?v=example"
232
is_suitable = youtube_ie.suitable(url)
233
print(f"Suitable for {url}: {is_suitable}")
234
```
235

236
### Filter Extractors by Age Limit
237

238
```python
239
from yt_dlp import list_extractors
240

241
# Get extractors that respect age limits
242
safe_extractors = list_extractors(age_limit=18)
243
all_extractors = list_extractors()
244

245
print(f"All extractors: {len(all_extractors)}")
246
print(f"Age-appropriate extractors: {len(safe_extractors)}")
247
```
248

249
### Custom Extractor Registration
250

251
```python
252
import yt_dlp
253
from yt_dlp.extractor import InfoExtractor
254

255
class CustomSiteIE(InfoExtractor):
256
    IE_NAME = 'customsite'
257
    IE_DESC = 'Custom site extractor'
258
    _VALID_URL = r'https?://customsite\.com/video/(?P<id>[0-9]+)'
259
    
260
    def _real_extract(self, url):
261
        video_id = self._match_id(url)
262
        # Custom extraction logic here
263
        return {
264
            'id': video_id,
265
            'title': f'Video {video_id}',
266
            'url': f'https://customsite.com/stream/{video_id}.mp4',
267
        }
268

269
# Register custom extractor
270
with yt_dlp.YoutubeDL() as ydl:
271
    ydl.add_info_extractor(CustomSiteIE())
272
    # Now can extract from custom site URLs
273
```
274

275
## Supported Platforms
276

277
The extractor system supports over 1000 video platforms including:
278

279
### Major Platforms
280
- **YouTube** - Videos, playlists, channels, live streams
281
- **Vimeo** - Videos, albums, channels, groups
282
- **Twitch** - VODs, clips, live streams
283
- **TikTok** - Videos, user profiles
284
- **Instagram** - Videos, stories, IGTV
285
- **Twitter/X** - Videos, spaces
286
- **Facebook** - Videos, live streams
287

288
### Educational
289
- **Coursera** - Course videos and lectures
290
- **edX** - Educational content
291
- **Khan Academy** - Educational videos
292
- **MIT OCW** - Course materials
293
- **Udemy** - Course content
294

295
### News and Media
296
- **BBC iPlayer** - BBC content
297
- **CNN** - News videos
298
- **NPR** - Audio and video content
299
- **Reuters** - News videos
300
- **Associated Press** - News content
301

302
### Entertainment
303
- **Netflix** - Limited support for accessible content
304
- **Amazon Prime Video** - Limited support
305
- **Hulu** - Limited support
306
- **Crunchyroll** - Anime content
307
- **Funimation** - Anime content
308

309
### Live Streaming
310
- **YouTube Live** - Live streams and premieres
311
- **Twitch** - Live gaming streams
312
- **Facebook Live** - Live videos
313
- **Periscope** - Live broadcasts
314
- **Dailymotion Live** - Live content
315

316
### Regional Platforms
317
- **Bilibili** - Chinese video platform
318
- **Niconico** - Japanese video platform
319
- **VK** - Russian social network videos
320
- **Youku** - Chinese video platform
321
- **Tudou** - Chinese video platform
322

323
And hundreds more platforms across different regions and specialties.
324

325
## Types
326

327
```python { .api }
328
# Base extractor type
329
InfoExtractor = type
330

331
# Extractor result information dictionary
332
ExtractorResult = dict[str, Any]
333

334
# URL pattern matching result
335
URLMatch = re.Match[str] | None
336
```

Version

Tile

Files

extractor-system.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

extractor-system.mddocs/