or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-download.mdexceptions.mdextractor-system.mdindex.mdpost-processing.mdutilities.md

extractor-system.mddocs/

0

# Extractor System

1

2

The extractor system provides discovery and management of site-specific extractors that handle URL pattern matching, metadata extraction, and format enumeration for over 1000 supported video platforms including YouTube, Vimeo, Twitch, TikTok, and many others.

3

4

## Capabilities

5

6

### Extractor Discovery Functions

7

8

Functions for discovering and listing available extractors in the system.

9

10

```python { .api }

11

def gen_extractors():

12

"""

13

Generate all available extractor instances.

14

15

Yields:

16

InfoExtractor: extractor instances

17

"""

18

19

def list_extractors(age_limit=None):

20

"""

21

Get list of all available extractor instances, sorted by name.

22

23

Parameters:

24

- age_limit: int|None, filter by age limit

25

26

Returns:

27

list[InfoExtractor]: sorted list of extractor instances

28

"""

29

30

def gen_extractor_classes():

31

"""

32

Generate all available extractor classes.

33

34

Yields:

35

type[InfoExtractor]: extractor classes

36

"""

37

38

def list_extractor_classes(age_limit=None):

39

"""

40

Get list of all available extractor classes, sorted by name.

41

42

Parameters:

43

- age_limit: int|None, filter by age limit

44

45

Returns:

46

list[type[InfoExtractor]]: sorted list of extractor classes

47

"""

48

49

def get_info_extractor(ie_name):

50

"""

51

Get specific extractor class by name.

52

53

Parameters:

54

- ie_name: str, extractor name/key

55

56

Returns:

57

type[InfoExtractor]: extractor class

58

59

Raises:

60

ValueError: if extractor not found

61

"""

62

```

63

64

### Extractor Base Classes

65

66

Core extractor infrastructure providing the foundation for all site-specific extractors.

67

68

```python { .api }

69

class InfoExtractor:

70

"""

71

Base class for all information extractors.

72

73

Provides common functionality for URL matching, information extraction,

74

and format processing across all supported sites.

75

"""

76

77

IE_NAME = None # Extractor identifier

78

IE_DESC = None # Human-readable description

79

_VALID_URL = None # URL pattern regex

80

_TESTS = [] # Test cases

81

82

def suitable(self, url):

83

"""

84

Check if URL is suitable for this extractor.

85

86

Parameters:

87

- url: str, URL to check

88

89

Returns:

90

bool: True if URL matches

91

"""

92

93

def extract(self, url):

94

"""

95

Extract information from URL.

96

97

Parameters:

98

- url: str, URL to extract from

99

100

Returns:

101

dict: extracted information

102

"""

103

104

def _real_extract(self, url):

105

"""

106

Perform actual extraction (implemented by subclasses).

107

108

Parameters:

109

- url: str, URL to extract from

110

111

Returns:

112

dict: extracted information

113

"""

114

115

class GenericIE(InfoExtractor):

116

"""

117

Generic extractor that attempts to extract from any URL.

118

119

Used as a fallback when no specific extractor matches the URL.

120

Attempts to find video/audio content using generic patterns.

121

"""

122

123

IE_NAME = 'generic'

124

IE_DESC = 'Generic downloader that works on many sites'

125

```

126

127

### Popular Site Extractors

128

129

Key extractors for major video platforms (representative examples from 1000+ available).

130

131

```python { .api }

132

class YoutubeIE(InfoExtractor):

133

"""YouTube video extractor supporting various YouTube URL formats."""

134

135

IE_NAME = 'youtube'

136

137

class VimeoIE(InfoExtractor):

138

"""Vimeo video extractor."""

139

140

IE_NAME = 'vimeo'

141

142

class TwitchVodIE(InfoExtractor):

143

"""Twitch VOD (Video on Demand) extractor."""

144

145

IE_NAME = 'twitch:vod'

146

147

class TikTokIE(InfoExtractor):

148

"""TikTok video extractor."""

149

150

IE_NAME = 'tiktok'

151

152

class TwitterIE(InfoExtractor):

153

"""Twitter/X video extractor."""

154

155

IE_NAME = 'twitter'

156

157

class InstagramIE(InfoExtractor):

158

"""Instagram video extractor."""

159

160

IE_NAME = 'instagram'

161

162

class FacebookIE(InfoExtractor):

163

"""Facebook video extractor."""

164

165

IE_NAME = 'facebook'

166

```

167

168

## Usage Examples

169

170

### List Available Extractors

171

172

```python

173

from yt_dlp import list_extractors

174

175

# Get all extractors

176

extractors = list_extractors()

177

print(f"Total extractors: {len(extractors)}")

178

179

# Print first 10 extractor names

180

for ie in extractors[:10]:

181

print(f"- {ie.IE_NAME}: {ie.IE_DESC}")

182

```

183

184

### Check URL Compatibility

185

186

```python

187

from yt_dlp import list_extractors

188

189

url = "https://www.youtube.com/watch?v=example"

190

191

# Find compatible extractors

192

compatible = []

193

for ie in list_extractors():

194

if ie.suitable(url):

195

compatible.append(ie.IE_NAME)

196

197

print(f"Compatible extractors for {url}: {compatible}")

198

```

199

200

### Use Specific Extractor

201

202

```python

203

import yt_dlp

204

205

# Force use of specific extractor

206

ydl_opts = {

207

'forcejson': True, # Output JSON info

208

'skip_download': True,

209

}

210

211

with yt_dlp.YoutubeDL(ydl_opts) as ydl:

212

# Extract using specific extractor key

213

info = ydl.extract_info(

214

'https://www.youtube.com/watch?v=example',

215

ie_key='youtube'

216

)

217

print(f"Extractor used: {info.get('extractor')}")

218

```

219

220

### Get Extractor Information

221

222

```python

223

from yt_dlp.extractor import get_info_extractor

224

225

# Get specific extractor class

226

youtube_ie = get_info_extractor('youtube')

227

print(f"Name: {youtube_ie.IE_NAME}")

228

print(f"Description: {youtube_ie.IE_DESC}")

229

230

# Check if URL is suitable

231

url = "https://www.youtube.com/watch?v=example"

232

is_suitable = youtube_ie.suitable(url)

233

print(f"Suitable for {url}: {is_suitable}")

234

```

235

236

### Filter Extractors by Age Limit

237

238

```python

239

from yt_dlp import list_extractors

240

241

# Get extractors that respect age limits

242

safe_extractors = list_extractors(age_limit=18)

243

all_extractors = list_extractors()

244

245

print(f"All extractors: {len(all_extractors)}")

246

print(f"Age-appropriate extractors: {len(safe_extractors)}")

247

```

248

249

### Custom Extractor Registration

250

251

```python

252

import yt_dlp

253

from yt_dlp.extractor import InfoExtractor

254

255

class CustomSiteIE(InfoExtractor):

256

IE_NAME = 'customsite'

257

IE_DESC = 'Custom site extractor'

258

_VALID_URL = r'https?://customsite\.com/video/(?P<id>[0-9]+)'

259

260

def _real_extract(self, url):

261

video_id = self._match_id(url)

262

# Custom extraction logic here

263

return {

264

'id': video_id,

265

'title': f'Video {video_id}',

266

'url': f'https://customsite.com/stream/{video_id}.mp4',

267

}

268

269

# Register custom extractor

270

with yt_dlp.YoutubeDL() as ydl:

271

ydl.add_info_extractor(CustomSiteIE())

272

# Now can extract from custom site URLs

273

```

274

275

## Supported Platforms

276

277

The extractor system supports over 1000 video platforms including:

278

279

### Major Platforms

280

- **YouTube** - Videos, playlists, channels, live streams

281

- **Vimeo** - Videos, albums, channels, groups

282

- **Twitch** - VODs, clips, live streams

283

- **TikTok** - Videos, user profiles

284

- **Instagram** - Videos, stories, IGTV

285

- **Twitter/X** - Videos, spaces

286

- **Facebook** - Videos, live streams

287

288

### Educational

289

- **Coursera** - Course videos and lectures

290

- **edX** - Educational content

291

- **Khan Academy** - Educational videos

292

- **MIT OCW** - Course materials

293

- **Udemy** - Course content

294

295

### News and Media

296

- **BBC iPlayer** - BBC content

297

- **CNN** - News videos

298

- **NPR** - Audio and video content

299

- **Reuters** - News videos

300

- **Associated Press** - News content

301

302

### Entertainment

303

- **Netflix** - Limited support for accessible content

304

- **Amazon Prime Video** - Limited support

305

- **Hulu** - Limited support

306

- **Crunchyroll** - Anime content

307

- **Funimation** - Anime content

308

309

### Live Streaming

310

- **YouTube Live** - Live streams and premieres

311

- **Twitch** - Live gaming streams

312

- **Facebook Live** - Live videos

313

- **Periscope** - Live broadcasts

314

- **Dailymotion Live** - Live content

315

316

### Regional Platforms

317

- **Bilibili** - Chinese video platform

318

- **Niconico** - Japanese video platform

319

- **VK** - Russian social network videos

320

- **Youku** - Chinese video platform

321

- **Tudou** - Chinese video platform

322

323

And hundreds more platforms across different regions and specialties.

324

325

## Types

326

327

```python { .api }

328

# Base extractor type

329

InfoExtractor = type

330

331

# Extractor result information dictionary

332

ExtractorResult = dict[str, Any]

333

334

# URL pattern matching result

335

URLMatch = re.Match[str] | None

336

```