Python SDK for Firecrawl API that enables web scraping, crawling, and content extraction with LLM-optimized output formats
npx @tessl/cli install tessl/pypi-firecrawl-py@4.3.00
# Firecrawl Python SDK
1
2
A comprehensive Python SDK for the Firecrawl API that enables web scraping, crawling, and content extraction with output formatted for use with language models (LLMs). The SDK offers both synchronous and asynchronous clients for web scraping, crawling, and monitoring operations with support for multiple output formats including markdown and HTML.
3
4
## Package Information
5
6
- **Package Name**: firecrawl-py
7
- **Package Type**: PyPI
8
- **Language**: Python 3.x
9
- **Installation**: `pip install firecrawl-py`
10
- **Documentation**: https://docs.firecrawl.dev
11
12
## Core Imports
13
14
```python
15
from firecrawl import Firecrawl, AsyncFirecrawl
16
```
17
18
Legacy compatibility (aliases):
19
```python
20
from firecrawl import FirecrawlApp, AsyncFirecrawlApp
21
```
22
23
Version-specific access:
24
```python
25
from firecrawl import V1FirecrawlApp, AsyncV1FirecrawlApp
26
```
27
28
Monitoring:
29
```python
30
from firecrawl import Watcher, AsyncWatcher
31
```
32
33
## Basic Usage
34
35
```python
36
from firecrawl import Firecrawl
37
38
# Initialize client
39
app = Firecrawl(api_key="your-api-key")
40
41
# Scrape a single URL
42
result = app.scrape("https://example.com")
43
print(result)
44
45
# Search the web
46
search_results = app.search("latest AI developments")
47
print(search_results)
48
49
# Crawl a website
50
crawl_result = app.crawl("https://example.com", limit=100)
51
print(crawl_result)
52
```
53
54
Async usage:
55
```python
56
import asyncio
57
from firecrawl import AsyncFirecrawl
58
59
async def main():
60
app = AsyncFirecrawl(api_key="your-api-key")
61
62
# Async scraping
63
result = await app.scrape("https://example.com")
64
print(result)
65
66
asyncio.run(main())
67
```
68
69
## Architecture
70
71
The firecrawl-py SDK provides a unified interface with dual API version support:
72
73
- **Unified Clients**: `Firecrawl` and `AsyncFirecrawl` expose v2 API by default with v1 access via `.v1` property
74
- **Version-Specific**: Direct access to v1 and v2 clients for explicit version control
75
- **Sync/Async Support**: Full synchronous and asynchronous operation support
76
- **Job Monitoring**: WebSocket-based watchers for real-time job progress tracking
77
- **Type Safety**: Comprehensive type definitions for all operations and responses
78
79
## Capabilities
80
81
### Core Scraping Operations
82
83
Essential web scraping functionality including single URL scraping, web search, and site mapping. These operations provide immediate results with comprehensive format options.
84
85
```python { .api }
86
def scrape(url: str, *, formats: Optional[List[str]] = None, **kwargs) -> Document
87
def search(query: str, *, sources: Optional[List[str]] = None, **kwargs) -> SearchData
88
def map(url: str, **kwargs) -> MapData
89
```
90
91
[Scraping Operations](./scraping.md)
92
93
### Crawling Operations
94
95
Website crawling functionality for discovering and processing multiple pages from a website. Supports both complete crawling with result polling and asynchronous job-based crawling for large sites.
96
97
```python { .api }
98
def crawl(url: str, options: Optional[CrawlOptions] = None) -> CrawlResponse
99
def start_crawl(url: str, options: Optional[CrawlOptions] = None) -> str
100
def get_crawl_status(crawl_id: str) -> CrawlJobStatus
101
def cancel_crawl(crawl_id: str) -> dict
102
```
103
104
[Crawling Operations](./crawling.md)
105
106
### Batch Processing
107
108
Batch operations for processing multiple URLs efficiently. Includes both batch scraping with full result polling and asynchronous job management for large-scale operations.
109
110
```python { .api }
111
def batch_scrape(urls: List[str], options: Optional[ScrapeOptions] = None) -> BatchScrapeResponse
112
def start_batch_scrape(urls: List[str], options: Optional[ScrapeOptions] = None) -> str
113
def get_batch_scrape_status(batch_id: str) -> BatchScrapeJobStatus
114
def cancel_batch_scrape(batch_id: str) -> dict
115
```
116
117
[Batch Processing](./batch.md)
118
119
### Data Extraction
120
121
AI-powered structured data extraction using custom schemas. Supports both immediate extraction with result polling and asynchronous job-based extraction for complex data processing.
122
123
```python { .api }
124
def extract(url: str, schema: dict, options: Optional[ExtractOptions] = None) -> ExtractResponse
125
def start_extract(url: str, schema: dict, options: Optional[ExtractOptions] = None) -> str
126
def get_extract_status(extract_id: str) -> ExtractJobStatus
127
```
128
129
[Data Extraction](./extraction.md)
130
131
### Job Monitoring
132
133
Real-time job monitoring using WebSocket connections for tracking long-running operations. Provides both synchronous and asynchronous monitoring interfaces.
134
135
```python { .api }
136
class Watcher:
137
def watch(self, job_id: str, job_type: str) -> Iterator[dict]
138
139
class AsyncWatcher:
140
def watch(self, job_id: str, job_type: str) -> AsyncIterator[dict]
141
```
142
143
[Job Monitoring](./monitoring.md)
144
145
### Usage & Statistics
146
147
Account usage monitoring including credit usage, token consumption, concurrency limits, and job queue status tracking. Includes both current usage and historical usage data.
148
149
```python { .api }
150
def get_credit_usage() -> CreditUsage
151
def get_token_usage() -> TokenUsage
152
def get_credit_usage_historical(by_api_key: bool = False) -> CreditUsageHistoricalResponse
153
def get_token_usage_historical(by_api_key: bool = False) -> TokenUsageHistoricalResponse
154
def get_concurrency() -> ConcurrencyInfo
155
def get_queue_status() -> QueueStatus
156
```
157
158
[Usage & Statistics](./usage.md)
159
160
### Legacy V1 API
161
162
Complete v1 API support for backward compatibility with existing implementations. Includes all v1-specific operations and data types.
163
164
```python { .api }
165
class V1FirecrawlApp:
166
def scrape_url(self, url: str, params: Optional[dict] = None) -> dict
167
def crawl_url(self, url: str, params: Optional[dict] = None) -> dict
168
def extract(self, data: dict, schema: dict, prompt: Optional[str] = None) -> dict
169
```
170
171
[Legacy V1 API](./v1-api.md)
172
173
## Types
174
175
Core type definitions used across the API:
176
177
```python { .api }
178
class Document:
179
"""Main document result structure"""
180
url: str
181
content: str
182
metadata: dict
183
184
class ScrapeOptions:
185
"""Configuration options for scraping operations"""
186
formats: Optional[List[str]]
187
include_tags: Optional[List[str]]
188
exclude_tags: Optional[List[str]]
189
wait_for: Optional[int]
190
screenshot: Optional[bool]
191
192
class CrawlOptions:
193
"""Configuration options for crawling operations"""
194
limit: Optional[int]
195
max_depth: Optional[int]
196
allowed_domains: Optional[List[str]]
197
ignored_paths: Optional[List[str]]
198
scrape_options: Optional[ScrapeOptions]
199
200
class SearchOptions:
201
"""Configuration options for search operations"""
202
limit: Optional[int]
203
search_type: Optional[str]
204
language: Optional[str]
205
country: Optional[str]
206
207
class PaginationConfig:
208
"""Configuration for paginated requests"""
209
auto_paginate: Optional[bool]
210
max_pages: Optional[int]
211
max_results: Optional[int]
212
max_wait_time: Optional[int]
213
214
class CreditUsageHistoricalResponse:
215
"""Historical credit usage data"""
216
data: List[CreditUsageHistoricalPeriod]
217
218
class CreditUsageHistoricalPeriod:
219
"""Credit usage for a specific period"""
220
period_start: str
221
period_end: str
222
credits_used: int
223
credits_remaining: int
224
225
class TokenUsageHistoricalResponse:
226
"""Historical token usage data"""
227
data: List[TokenUsageHistoricalPeriod]
228
229
class TokenUsageHistoricalPeriod:
230
"""Token usage for a specific period"""
231
period_start: str
232
period_end: str
233
tokens_used: int
234
tokens_remaining: int
235
236
class Location:
237
"""Geographic location configuration"""
238
country: Optional[str]
239
languages: Optional[List[str]]
240
241
class Viewport:
242
"""Browser viewport configuration"""
243
width: int
244
height: int
245
246
class WebhookConfig:
247
"""Webhook configuration for job notifications"""
248
url: str
249
headers: Optional[Dict[str, str]]
250
metadata: Optional[Dict[str, Any]]
251
events: Optional[List[str]]
252
253
# Action Types for browser automation
254
class WaitAction:
255
"""Wait action for browser automation"""
256
type: Literal["wait"]
257
milliseconds: int
258
259
class ScreenshotAction:
260
"""Screenshot action for browser automation"""
261
type: Literal["screenshot"]
262
full_page: Optional[bool]
263
264
class ClickAction:
265
"""Click action for browser automation"""
266
type: Literal["click"]
267
selector: str
268
269
class WriteAction:
270
"""Write action for browser automation"""
271
type: Literal["write"]
272
text: str
273
274
class PressAction:
275
"""Press key action for browser automation"""
276
type: Literal["press"]
277
key: str
278
279
class ScrollAction:
280
"""Scroll action for browser automation"""
281
type: Literal["scroll"]
282
x: Optional[int]
283
y: Optional[int]
284
285
class ScrapeAction:
286
"""Scrape action for browser automation"""
287
type: Literal["scrape"]
288
289
class ExecuteJavascriptAction:
290
"""Execute JavaScript action for browser automation"""
291
type: Literal["execute_javascript"]
292
code: str
293
294
class PDFAction:
295
"""PDF action for browser automation"""
296
type: Literal["pdf"]
297
298
# Format Types for advanced output formatting
299
class JsonFormat:
300
"""JSON format configuration"""
301
type: Literal["json"]
302
schema: Optional[Dict[str, Any]]
303
prompt: Optional[str]
304
305
class ChangeTrackingFormat:
306
"""Change tracking format configuration"""
307
type: Literal["change_tracking"]
308
threshold: Optional[float]
309
310
class ScreenshotFormat:
311
"""Screenshot format configuration"""
312
type: Literal["screenshot"]
313
full_page: Optional[bool]
314
viewport: Optional[Viewport]
315
316
class AttributesFormat:
317
"""Attributes format configuration"""
318
type: Literal["attributes"]
319
selectors: List[AttributeSelector]
320
321
class AttributeSelector:
322
"""Attribute selector for extraction"""
323
selector: str
324
attribute: str
325
326
class PDFParser:
327
"""PDF parser configuration"""
328
type: Literal["pdf"]
329
max_pages: Optional[int]
330
```