0
# Extractor System
1
2
The extractor system provides discovery and management of site-specific extractors that handle URL pattern matching, metadata extraction, and format enumeration for over 1000 supported video platforms including YouTube, Vimeo, Twitch, TikTok, and many others.
3
4
## Capabilities
5
6
### Extractor Discovery Functions
7
8
Functions for discovering and listing available extractors in the system.
9
10
```python { .api }
11
def gen_extractors():
12
"""
13
Generate all available extractor instances.
14
15
Yields:
16
InfoExtractor: extractor instances
17
"""
18
19
def list_extractors(age_limit=None):
20
"""
21
Get list of all available extractor instances, sorted by name.
22
23
Parameters:
24
- age_limit: int|None, filter by age limit
25
26
Returns:
27
list[InfoExtractor]: sorted list of extractor instances
28
"""
29
30
def gen_extractor_classes():
31
"""
32
Generate all available extractor classes.
33
34
Yields:
35
type[InfoExtractor]: extractor classes
36
"""
37
38
def list_extractor_classes(age_limit=None):
39
"""
40
Get list of all available extractor classes, sorted by name.
41
42
Parameters:
43
- age_limit: int|None, filter by age limit
44
45
Returns:
46
list[type[InfoExtractor]]: sorted list of extractor classes
47
"""
48
49
def get_info_extractor(ie_name):
50
"""
51
Get specific extractor class by name.
52
53
Parameters:
54
- ie_name: str, extractor name/key
55
56
Returns:
57
type[InfoExtractor]: extractor class
58
59
Raises:
60
ValueError: if extractor not found
61
"""
62
```
63
64
### Extractor Base Classes
65
66
Core extractor infrastructure providing the foundation for all site-specific extractors.
67
68
```python { .api }
69
class InfoExtractor:
70
"""
71
Base class for all information extractors.
72
73
Provides common functionality for URL matching, information extraction,
74
and format processing across all supported sites.
75
"""
76
77
IE_NAME = None # Extractor identifier
78
IE_DESC = None # Human-readable description
79
_VALID_URL = None # URL pattern regex
80
_TESTS = [] # Test cases
81
82
def suitable(self, url):
83
"""
84
Check if URL is suitable for this extractor.
85
86
Parameters:
87
- url: str, URL to check
88
89
Returns:
90
bool: True if URL matches
91
"""
92
93
def extract(self, url):
94
"""
95
Extract information from URL.
96
97
Parameters:
98
- url: str, URL to extract from
99
100
Returns:
101
dict: extracted information
102
"""
103
104
def _real_extract(self, url):
105
"""
106
Perform actual extraction (implemented by subclasses).
107
108
Parameters:
109
- url: str, URL to extract from
110
111
Returns:
112
dict: extracted information
113
"""
114
115
class GenericIE(InfoExtractor):
116
"""
117
Generic extractor that attempts to extract from any URL.
118
119
Used as a fallback when no specific extractor matches the URL.
120
Attempts to find video/audio content using generic patterns.
121
"""
122
123
IE_NAME = 'generic'
124
IE_DESC = 'Generic downloader that works on many sites'
125
```
126
127
### Popular Site Extractors
128
129
Key extractors for major video platforms (representative examples from 1000+ available).
130
131
```python { .api }
132
class YoutubeIE(InfoExtractor):
133
"""YouTube video extractor supporting various YouTube URL formats."""
134
135
IE_NAME = 'youtube'
136
137
class VimeoIE(InfoExtractor):
138
"""Vimeo video extractor."""
139
140
IE_NAME = 'vimeo'
141
142
class TwitchVodIE(InfoExtractor):
143
"""Twitch VOD (Video on Demand) extractor."""
144
145
IE_NAME = 'twitch:vod'
146
147
class TikTokIE(InfoExtractor):
148
"""TikTok video extractor."""
149
150
IE_NAME = 'tiktok'
151
152
class TwitterIE(InfoExtractor):
153
"""Twitter/X video extractor."""
154
155
IE_NAME = 'twitter'
156
157
class InstagramIE(InfoExtractor):
158
"""Instagram video extractor."""
159
160
IE_NAME = 'instagram'
161
162
class FacebookIE(InfoExtractor):
163
"""Facebook video extractor."""
164
165
IE_NAME = 'facebook'
166
```
167
168
## Usage Examples
169
170
### List Available Extractors
171
172
```python
173
from yt_dlp import list_extractors
174
175
# Get all extractors
176
extractors = list_extractors()
177
print(f"Total extractors: {len(extractors)}")
178
179
# Print first 10 extractor names
180
for ie in extractors[:10]:
181
print(f"- {ie.IE_NAME}: {ie.IE_DESC}")
182
```
183
184
### Check URL Compatibility
185
186
```python
187
from yt_dlp import list_extractors
188
189
url = "https://www.youtube.com/watch?v=example"
190
191
# Find compatible extractors
192
compatible = []
193
for ie in list_extractors():
194
if ie.suitable(url):
195
compatible.append(ie.IE_NAME)
196
197
print(f"Compatible extractors for {url}: {compatible}")
198
```
199
200
### Use Specific Extractor
201
202
```python
203
import yt_dlp
204
205
# Force use of specific extractor
206
ydl_opts = {
207
'forcejson': True, # Output JSON info
208
'skip_download': True,
209
}
210
211
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
212
# Extract using specific extractor key
213
info = ydl.extract_info(
214
'https://www.youtube.com/watch?v=example',
215
ie_key='youtube'
216
)
217
print(f"Extractor used: {info.get('extractor')}")
218
```
219
220
### Get Extractor Information
221
222
```python
223
from yt_dlp.extractor import get_info_extractor
224
225
# Get specific extractor class
226
youtube_ie = get_info_extractor('youtube')
227
print(f"Name: {youtube_ie.IE_NAME}")
228
print(f"Description: {youtube_ie.IE_DESC}")
229
230
# Check if URL is suitable
231
url = "https://www.youtube.com/watch?v=example"
232
is_suitable = youtube_ie.suitable(url)
233
print(f"Suitable for {url}: {is_suitable}")
234
```
235
236
### Filter Extractors by Age Limit
237
238
```python
239
from yt_dlp import list_extractors
240
241
# Get extractors that respect age limits
242
safe_extractors = list_extractors(age_limit=18)
243
all_extractors = list_extractors()
244
245
print(f"All extractors: {len(all_extractors)}")
246
print(f"Age-appropriate extractors: {len(safe_extractors)}")
247
```
248
249
### Custom Extractor Registration
250
251
```python
252
import yt_dlp
253
from yt_dlp.extractor import InfoExtractor
254
255
class CustomSiteIE(InfoExtractor):
256
IE_NAME = 'customsite'
257
IE_DESC = 'Custom site extractor'
258
_VALID_URL = r'https?://customsite\.com/video/(?P<id>[0-9]+)'
259
260
def _real_extract(self, url):
261
video_id = self._match_id(url)
262
# Custom extraction logic here
263
return {
264
'id': video_id,
265
'title': f'Video {video_id}',
266
'url': f'https://customsite.com/stream/{video_id}.mp4',
267
}
268
269
# Register custom extractor
270
with yt_dlp.YoutubeDL() as ydl:
271
ydl.add_info_extractor(CustomSiteIE())
272
# Now can extract from custom site URLs
273
```
274
275
## Supported Platforms
276
277
The extractor system supports over 1000 video platforms including:
278
279
### Major Platforms
280
- **YouTube** - Videos, playlists, channels, live streams
281
- **Vimeo** - Videos, albums, channels, groups
282
- **Twitch** - VODs, clips, live streams
283
- **TikTok** - Videos, user profiles
284
- **Instagram** - Videos, stories, IGTV
285
- **Twitter/X** - Videos, spaces
286
- **Facebook** - Videos, live streams
287
288
### Educational
289
- **Coursera** - Course videos and lectures
290
- **edX** - Educational content
291
- **Khan Academy** - Educational videos
292
- **MIT OCW** - Course materials
293
- **Udemy** - Course content
294
295
### News and Media
296
- **BBC iPlayer** - BBC content
297
- **CNN** - News videos
298
- **NPR** - Audio and video content
299
- **Reuters** - News videos
300
- **Associated Press** - News content
301
302
### Entertainment
303
- **Netflix** - Limited support for accessible content
304
- **Amazon Prime Video** - Limited support
305
- **Hulu** - Limited support
306
- **Crunchyroll** - Anime content
307
- **Funimation** - Anime content
308
309
### Live Streaming
310
- **YouTube Live** - Live streams and premieres
311
- **Twitch** - Live gaming streams
312
- **Facebook Live** - Live videos
313
- **Periscope** - Live broadcasts
314
- **Dailymotion Live** - Live content
315
316
### Regional Platforms
317
- **Bilibili** - Chinese video platform
318
- **Niconico** - Japanese video platform
319
- **VK** - Russian social network videos
320
- **Youku** - Chinese video platform
321
- **Tudou** - Chinese video platform
322
323
And hundreds more platforms across different regions and specialties.
324
325
## Types
326
327
```python { .api }
328
# Base extractor type
329
InfoExtractor = type
330
331
# Extractor result information dictionary
332
ExtractorResult = dict[str, Any]
333
334
# URL pattern matching result
335
URLMatch = re.Match[str] | None
336
```