Tessl Tile for pypi/firecrawl-py@4.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

batch.md crawling.md extraction.md index.md monitoring.md scraping.md usage.md v1-api.md

index.mddocs/

0
# Firecrawl Python SDK
1

2
A comprehensive Python SDK for the Firecrawl API that enables web scraping, crawling, and content extraction with output formatted for use with language models (LLMs). The SDK offers both synchronous and asynchronous clients for web scraping, crawling, and monitoring operations with support for multiple output formats including markdown and HTML.
3

4
## Package Information
5

6
- **Package Name**: firecrawl-py
7
- **Package Type**: PyPI
8
- **Language**: Python 3.x
9
- **Installation**: `pip install firecrawl-py`
10
- **Documentation**: https://docs.firecrawl.dev
11

12
## Core Imports
13

14
```python
15
from firecrawl import Firecrawl, AsyncFirecrawl
16
```
17

18
Legacy compatibility (aliases):
19
```python
20
from firecrawl import FirecrawlApp, AsyncFirecrawlApp
21
```
22

23
Version-specific access:
24
```python
25
from firecrawl import V1FirecrawlApp, AsyncV1FirecrawlApp
26
```
27

28
Monitoring:
29
```python
30
from firecrawl import Watcher, AsyncWatcher
31
```
32

33
## Basic Usage
34

35
```python
36
from firecrawl import Firecrawl
37

38
# Initialize client
39
app = Firecrawl(api_key="your-api-key")
40

41
# Scrape a single URL
42
result = app.scrape("https://example.com")
43
print(result)
44

45
# Search the web
46
search_results = app.search("latest AI developments")
47
print(search_results)
48

49
# Crawl a website
50
crawl_result = app.crawl("https://example.com", limit=100)
51
print(crawl_result)
52
```
53

54
Async usage:
55
```python
56
import asyncio
57
from firecrawl import AsyncFirecrawl
58

59
async def main():
60
    app = AsyncFirecrawl(api_key="your-api-key")
61
    
62
    # Async scraping
63
    result = await app.scrape("https://example.com")
64
    print(result)
65

66
asyncio.run(main())
67
```
68

69
## Architecture
70

71
The firecrawl-py SDK provides a unified interface with dual API version support:
72

73
- **Unified Clients**: `Firecrawl` and `AsyncFirecrawl` expose v2 API by default with v1 access via `.v1` property
74
- **Version-Specific**: Direct access to v1 and v2 clients for explicit version control
75
- **Sync/Async Support**: Full synchronous and asynchronous operation support
76
- **Job Monitoring**: WebSocket-based watchers for real-time job progress tracking
77
- **Type Safety**: Comprehensive type definitions for all operations and responses
78

79
## Capabilities
80

81
### Core Scraping Operations
82

83
Essential web scraping functionality including single URL scraping, web search, and site mapping. These operations provide immediate results with comprehensive format options.
84

85
```python { .api }
86
def scrape(url: str, *, formats: Optional[List[str]] = None, **kwargs) -> Document
87
def search(query: str, *, sources: Optional[List[str]] = None, **kwargs) -> SearchData  
88
def map(url: str, **kwargs) -> MapData
89
```
90

91
[Scraping Operations](./scraping.md)
92

93
### Crawling Operations
94

95
Website crawling functionality for discovering and processing multiple pages from a website. Supports both complete crawling with result polling and asynchronous job-based crawling for large sites.
96

97
```python { .api }
98
def crawl(url: str, options: Optional[CrawlOptions] = None) -> CrawlResponse
99
def start_crawl(url: str, options: Optional[CrawlOptions] = None) -> str
100
def get_crawl_status(crawl_id: str) -> CrawlJobStatus
101
def cancel_crawl(crawl_id: str) -> dict
102
```
103

104
[Crawling Operations](./crawling.md)
105

106
### Batch Processing
107

108
Batch operations for processing multiple URLs efficiently. Includes both batch scraping with full result polling and asynchronous job management for large-scale operations.
109

110
```python { .api }
111
def batch_scrape(urls: List[str], options: Optional[ScrapeOptions] = None) -> BatchScrapeResponse
112
def start_batch_scrape(urls: List[str], options: Optional[ScrapeOptions] = None) -> str
113
def get_batch_scrape_status(batch_id: str) -> BatchScrapeJobStatus
114
def cancel_batch_scrape(batch_id: str) -> dict
115
```
116

117
[Batch Processing](./batch.md)
118

119
### Data Extraction
120

121
AI-powered structured data extraction using custom schemas. Supports both immediate extraction with result polling and asynchronous job-based extraction for complex data processing.
122

123
```python { .api }
124
def extract(url: str, schema: dict, options: Optional[ExtractOptions] = None) -> ExtractResponse
125
def start_extract(url: str, schema: dict, options: Optional[ExtractOptions] = None) -> str
126
def get_extract_status(extract_id: str) -> ExtractJobStatus
127
```
128

129
[Data Extraction](./extraction.md)
130

131
### Job Monitoring
132

133
Real-time job monitoring using WebSocket connections for tracking long-running operations. Provides both synchronous and asynchronous monitoring interfaces.
134

135
```python { .api }
136
class Watcher:
137
    def watch(self, job_id: str, job_type: str) -> Iterator[dict]
138

139
class AsyncWatcher:
140
    def watch(self, job_id: str, job_type: str) -> AsyncIterator[dict]
141
```
142

143
[Job Monitoring](./monitoring.md)
144

145
### Usage & Statistics
146

147
Account usage monitoring including credit usage, token consumption, concurrency limits, and job queue status tracking. Includes both current usage and historical usage data.
148

149
```python { .api }
150
def get_credit_usage() -> CreditUsage
151
def get_token_usage() -> TokenUsage
152
def get_credit_usage_historical(by_api_key: bool = False) -> CreditUsageHistoricalResponse
153
def get_token_usage_historical(by_api_key: bool = False) -> TokenUsageHistoricalResponse
154
def get_concurrency() -> ConcurrencyInfo
155
def get_queue_status() -> QueueStatus
156
```
157

158
[Usage & Statistics](./usage.md)
159

160
### Legacy V1 API
161

162
Complete v1 API support for backward compatibility with existing implementations. Includes all v1-specific operations and data types.
163

164
```python { .api }
165
class V1FirecrawlApp:
166
    def scrape_url(self, url: str, params: Optional[dict] = None) -> dict
167
    def crawl_url(self, url: str, params: Optional[dict] = None) -> dict
168
    def extract(self, data: dict, schema: dict, prompt: Optional[str] = None) -> dict
169
```
170

171
[Legacy V1 API](./v1-api.md)
172

173
## Types
174

175
Core type definitions used across the API:
176

177
```python { .api }
178
class Document:
179
    """Main document result structure"""
180
    url: str
181
    content: str
182
    metadata: dict
183

184
class ScrapeOptions:
185
    """Configuration options for scraping operations"""
186
    formats: Optional[List[str]]
187
    include_tags: Optional[List[str]]
188
    exclude_tags: Optional[List[str]]
189
    wait_for: Optional[int]
190
    screenshot: Optional[bool]
191

192
class CrawlOptions:
193
    """Configuration options for crawling operations"""
194
    limit: Optional[int]
195
    max_depth: Optional[int]
196
    allowed_domains: Optional[List[str]]
197
    ignored_paths: Optional[List[str]]
198
    scrape_options: Optional[ScrapeOptions]
199

200
class SearchOptions:
201
    """Configuration options for search operations"""
202
    limit: Optional[int]
203
    search_type: Optional[str]
204
    language: Optional[str]
205
    country: Optional[str]
206

207
class PaginationConfig:
208
    """Configuration for paginated requests"""
209
    auto_paginate: Optional[bool]
210
    max_pages: Optional[int]
211
    max_results: Optional[int]
212
    max_wait_time: Optional[int]
213

214
class CreditUsageHistoricalResponse:
215
    """Historical credit usage data"""
216
    data: List[CreditUsageHistoricalPeriod]
217

218
class CreditUsageHistoricalPeriod:
219
    """Credit usage for a specific period"""
220
    period_start: str
221
    period_end: str
222
    credits_used: int
223
    credits_remaining: int
224

225
class TokenUsageHistoricalResponse:
226
    """Historical token usage data"""
227
    data: List[TokenUsageHistoricalPeriod]
228

229
class TokenUsageHistoricalPeriod:
230
    """Token usage for a specific period"""
231
    period_start: str
232
    period_end: str
233
    tokens_used: int
234
    tokens_remaining: int
235

236
class Location:
237
    """Geographic location configuration"""
238
    country: Optional[str]
239
    languages: Optional[List[str]]
240

241
class Viewport:
242
    """Browser viewport configuration"""
243
    width: int
244
    height: int
245

246
class WebhookConfig:
247
    """Webhook configuration for job notifications"""
248
    url: str
249
    headers: Optional[Dict[str, str]]
250
    metadata: Optional[Dict[str, Any]]
251
    events: Optional[List[str]]
252

253
# Action Types for browser automation
254
class WaitAction:
255
    """Wait action for browser automation"""
256
    type: Literal["wait"]
257
    milliseconds: int
258

259
class ScreenshotAction:
260
    """Screenshot action for browser automation"""
261
    type: Literal["screenshot"]
262
    full_page: Optional[bool]
263

264
class ClickAction:
265
    """Click action for browser automation"""
266
    type: Literal["click"]
267
    selector: str
268

269
class WriteAction:
270
    """Write action for browser automation"""
271
    type: Literal["write"]
272
    text: str
273

274
class PressAction:
275
    """Press key action for browser automation"""
276
    type: Literal["press"]
277
    key: str
278

279
class ScrollAction:
280
    """Scroll action for browser automation"""
281
    type: Literal["scroll"]
282
    x: Optional[int]
283
    y: Optional[int]
284

285
class ScrapeAction:
286
    """Scrape action for browser automation"""
287
    type: Literal["scrape"]
288

289
class ExecuteJavascriptAction:
290
    """Execute JavaScript action for browser automation"""
291
    type: Literal["execute_javascript"]
292
    code: str
293

294
class PDFAction:
295
    """PDF action for browser automation"""
296
    type: Literal["pdf"]
297

298
# Format Types for advanced output formatting
299
class JsonFormat:
300
    """JSON format configuration"""
301
    type: Literal["json"]
302
    schema: Optional[Dict[str, Any]]
303
    prompt: Optional[str]
304

305
class ChangeTrackingFormat:
306
    """Change tracking format configuration"""
307
    type: Literal["change_tracking"]
308
    threshold: Optional[float]
309

310
class ScreenshotFormat:
311
    """Screenshot format configuration"""
312
    type: Literal["screenshot"]
313
    full_page: Optional[bool]
314
    viewport: Optional[Viewport]
315

316
class AttributesFormat:
317
    """Attributes format configuration"""
318
    type: Literal["attributes"]
319
    selectors: List[AttributeSelector]
320

321
class AttributeSelector:
322
    """Attribute selector for extraction"""
323
    selector: str
324
    attribute: str
325

326
class PDFParser:
327
    """PDF parser configuration"""
328
    type: Literal["pdf"]
329
    max_pages: Optional[int]
330
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/