or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch.mdcrawling.mdextraction.mdindex.mdmonitoring.mdscraping.mdusage.mdv1-api.md

index.mddocs/

0

# Firecrawl Python SDK

1

2

A comprehensive Python SDK for the Firecrawl API that enables web scraping, crawling, and content extraction with output formatted for use with language models (LLMs). The SDK offers both synchronous and asynchronous clients for web scraping, crawling, and monitoring operations with support for multiple output formats including markdown and HTML.

3

4

## Package Information

5

6

- **Package Name**: firecrawl-py

7

- **Package Type**: PyPI

8

- **Language**: Python 3.x

9

- **Installation**: `pip install firecrawl-py`

10

- **Documentation**: https://docs.firecrawl.dev

11

12

## Core Imports

13

14

```python

15

from firecrawl import Firecrawl, AsyncFirecrawl

16

```

17

18

Legacy compatibility (aliases):

19

```python

20

from firecrawl import FirecrawlApp, AsyncFirecrawlApp

21

```

22

23

Version-specific access:

24

```python

25

from firecrawl import V1FirecrawlApp, AsyncV1FirecrawlApp

26

```

27

28

Monitoring:

29

```python

30

from firecrawl import Watcher, AsyncWatcher

31

```

32

33

## Basic Usage

34

35

```python

36

from firecrawl import Firecrawl

37

38

# Initialize client

39

app = Firecrawl(api_key="your-api-key")

40

41

# Scrape a single URL

42

result = app.scrape("https://example.com")

43

print(result)

44

45

# Search the web

46

search_results = app.search("latest AI developments")

47

print(search_results)

48

49

# Crawl a website

50

crawl_result = app.crawl("https://example.com", limit=100)

51

print(crawl_result)

52

```

53

54

Async usage:

55

```python

56

import asyncio

57

from firecrawl import AsyncFirecrawl

58

59

async def main():

60

app = AsyncFirecrawl(api_key="your-api-key")

61

62

# Async scraping

63

result = await app.scrape("https://example.com")

64

print(result)

65

66

asyncio.run(main())

67

```

68

69

## Architecture

70

71

The firecrawl-py SDK provides a unified interface with dual API version support:

72

73

- **Unified Clients**: `Firecrawl` and `AsyncFirecrawl` expose v2 API by default with v1 access via `.v1` property

74

- **Version-Specific**: Direct access to v1 and v2 clients for explicit version control

75

- **Sync/Async Support**: Full synchronous and asynchronous operation support

76

- **Job Monitoring**: WebSocket-based watchers for real-time job progress tracking

77

- **Type Safety**: Comprehensive type definitions for all operations and responses

78

79

## Capabilities

80

81

### Core Scraping Operations

82

83

Essential web scraping functionality including single URL scraping, web search, and site mapping. These operations provide immediate results with comprehensive format options.

84

85

```python { .api }

86

def scrape(url: str, *, formats: Optional[List[str]] = None, **kwargs) -> Document

87

def search(query: str, *, sources: Optional[List[str]] = None, **kwargs) -> SearchData

88

def map(url: str, **kwargs) -> MapData

89

```

90

91

[Scraping Operations](./scraping.md)

92

93

### Crawling Operations

94

95

Website crawling functionality for discovering and processing multiple pages from a website. Supports both complete crawling with result polling and asynchronous job-based crawling for large sites.

96

97

```python { .api }

98

def crawl(url: str, options: Optional[CrawlOptions] = None) -> CrawlResponse

99

def start_crawl(url: str, options: Optional[CrawlOptions] = None) -> str

100

def get_crawl_status(crawl_id: str) -> CrawlJobStatus

101

def cancel_crawl(crawl_id: str) -> dict

102

```

103

104

[Crawling Operations](./crawling.md)

105

106

### Batch Processing

107

108

Batch operations for processing multiple URLs efficiently. Includes both batch scraping with full result polling and asynchronous job management for large-scale operations.

109

110

```python { .api }

111

def batch_scrape(urls: List[str], options: Optional[ScrapeOptions] = None) -> BatchScrapeResponse

112

def start_batch_scrape(urls: List[str], options: Optional[ScrapeOptions] = None) -> str

113

def get_batch_scrape_status(batch_id: str) -> BatchScrapeJobStatus

114

def cancel_batch_scrape(batch_id: str) -> dict

115

```

116

117

[Batch Processing](./batch.md)

118

119

### Data Extraction

120

121

AI-powered structured data extraction using custom schemas. Supports both immediate extraction with result polling and asynchronous job-based extraction for complex data processing.

122

123

```python { .api }

124

def extract(url: str, schema: dict, options: Optional[ExtractOptions] = None) -> ExtractResponse

125

def start_extract(url: str, schema: dict, options: Optional[ExtractOptions] = None) -> str

126

def get_extract_status(extract_id: str) -> ExtractJobStatus

127

```

128

129

[Data Extraction](./extraction.md)

130

131

### Job Monitoring

132

133

Real-time job monitoring using WebSocket connections for tracking long-running operations. Provides both synchronous and asynchronous monitoring interfaces.

134

135

```python { .api }

136

class Watcher:

137

def watch(self, job_id: str, job_type: str) -> Iterator[dict]

138

139

class AsyncWatcher:

140

def watch(self, job_id: str, job_type: str) -> AsyncIterator[dict]

141

```

142

143

[Job Monitoring](./monitoring.md)

144

145

### Usage & Statistics

146

147

Account usage monitoring including credit usage, token consumption, concurrency limits, and job queue status tracking. Includes both current usage and historical usage data.

148

149

```python { .api }

150

def get_credit_usage() -> CreditUsage

151

def get_token_usage() -> TokenUsage

152

def get_credit_usage_historical(by_api_key: bool = False) -> CreditUsageHistoricalResponse

153

def get_token_usage_historical(by_api_key: bool = False) -> TokenUsageHistoricalResponse

154

def get_concurrency() -> ConcurrencyInfo

155

def get_queue_status() -> QueueStatus

156

```

157

158

[Usage & Statistics](./usage.md)

159

160

### Legacy V1 API

161

162

Complete v1 API support for backward compatibility with existing implementations. Includes all v1-specific operations and data types.

163

164

```python { .api }

165

class V1FirecrawlApp:

166

def scrape_url(self, url: str, params: Optional[dict] = None) -> dict

167

def crawl_url(self, url: str, params: Optional[dict] = None) -> dict

168

def extract(self, data: dict, schema: dict, prompt: Optional[str] = None) -> dict

169

```

170

171

[Legacy V1 API](./v1-api.md)

172

173

## Types

174

175

Core type definitions used across the API:

176

177

```python { .api }

178

class Document:

179

"""Main document result structure"""

180

url: str

181

content: str

182

metadata: dict

183

184

class ScrapeOptions:

185

"""Configuration options for scraping operations"""

186

formats: Optional[List[str]]

187

include_tags: Optional[List[str]]

188

exclude_tags: Optional[List[str]]

189

wait_for: Optional[int]

190

screenshot: Optional[bool]

191

192

class CrawlOptions:

193

"""Configuration options for crawling operations"""

194

limit: Optional[int]

195

max_depth: Optional[int]

196

allowed_domains: Optional[List[str]]

197

ignored_paths: Optional[List[str]]

198

scrape_options: Optional[ScrapeOptions]

199

200

class SearchOptions:

201

"""Configuration options for search operations"""

202

limit: Optional[int]

203

search_type: Optional[str]

204

language: Optional[str]

205

country: Optional[str]

206

207

class PaginationConfig:

208

"""Configuration for paginated requests"""

209

auto_paginate: Optional[bool]

210

max_pages: Optional[int]

211

max_results: Optional[int]

212

max_wait_time: Optional[int]

213

214

class CreditUsageHistoricalResponse:

215

"""Historical credit usage data"""

216

data: List[CreditUsageHistoricalPeriod]

217

218

class CreditUsageHistoricalPeriod:

219

"""Credit usage for a specific period"""

220

period_start: str

221

period_end: str

222

credits_used: int

223

credits_remaining: int

224

225

class TokenUsageHistoricalResponse:

226

"""Historical token usage data"""

227

data: List[TokenUsageHistoricalPeriod]

228

229

class TokenUsageHistoricalPeriod:

230

"""Token usage for a specific period"""

231

period_start: str

232

period_end: str

233

tokens_used: int

234

tokens_remaining: int

235

236

class Location:

237

"""Geographic location configuration"""

238

country: Optional[str]

239

languages: Optional[List[str]]

240

241

class Viewport:

242

"""Browser viewport configuration"""

243

width: int

244

height: int

245

246

class WebhookConfig:

247

"""Webhook configuration for job notifications"""

248

url: str

249

headers: Optional[Dict[str, str]]

250

metadata: Optional[Dict[str, Any]]

251

events: Optional[List[str]]

252

253

# Action Types for browser automation

254

class WaitAction:

255

"""Wait action for browser automation"""

256

type: Literal["wait"]

257

milliseconds: int

258

259

class ScreenshotAction:

260

"""Screenshot action for browser automation"""

261

type: Literal["screenshot"]

262

full_page: Optional[bool]

263

264

class ClickAction:

265

"""Click action for browser automation"""

266

type: Literal["click"]

267

selector: str

268

269

class WriteAction:

270

"""Write action for browser automation"""

271

type: Literal["write"]

272

text: str

273

274

class PressAction:

275

"""Press key action for browser automation"""

276

type: Literal["press"]

277

key: str

278

279

class ScrollAction:

280

"""Scroll action for browser automation"""

281

type: Literal["scroll"]

282

x: Optional[int]

283

y: Optional[int]

284

285

class ScrapeAction:

286

"""Scrape action for browser automation"""

287

type: Literal["scrape"]

288

289

class ExecuteJavascriptAction:

290

"""Execute JavaScript action for browser automation"""

291

type: Literal["execute_javascript"]

292

code: str

293

294

class PDFAction:

295

"""PDF action for browser automation"""

296

type: Literal["pdf"]

297

298

# Format Types for advanced output formatting

299

class JsonFormat:

300

"""JSON format configuration"""

301

type: Literal["json"]

302

schema: Optional[Dict[str, Any]]

303

prompt: Optional[str]

304

305

class ChangeTrackingFormat:

306

"""Change tracking format configuration"""

307

type: Literal["change_tracking"]

308

threshold: Optional[float]

309

310

class ScreenshotFormat:

311

"""Screenshot format configuration"""

312

type: Literal["screenshot"]

313

full_page: Optional[bool]

314

viewport: Optional[Viewport]

315

316

class AttributesFormat:

317

"""Attributes format configuration"""

318

type: Literal["attributes"]

319

selectors: List[AttributeSelector]

320

321

class AttributeSelector:

322

"""Attribute selector for extraction"""

323

selector: str

324

attribute: str

325

326

class PDFParser:

327

"""PDF parser configuration"""

328

type: Literal["pdf"]

329

max_pages: Optional[int]

330

```