Tessl Tile for pypi/firecrawl-py@4.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

batch.md crawling.md extraction.md index.md monitoring.md scraping.md usage.md v1-api.md

scraping.mddocs/

0
# Scraping Operations
1

2
Essential web scraping functionality for extracting content from single URLs, searching the web, and mapping website structures. These operations provide immediate results with comprehensive format and processing options.
3

4
## Capabilities
5

6
### Single URL Scraping
7

8
Extract content from a single webpage with extensive formatting and processing options including markdown conversion, HTML extraction, screenshots, and metadata collection.
9

10
```python { .api }
11
def scrape(
12
    url: str,
13
    *,
14
    formats: Optional[List[str]] = None,
15
    headers: Optional[Dict[str, str]] = None,
16
    include_tags: Optional[List[str]] = None,
17
    exclude_tags: Optional[List[str]] = None,
18
    only_main_content: Optional[bool] = None,
19
    timeout: Optional[int] = None,
20
    wait_for: Optional[int] = None,
21
    mobile: Optional[bool] = None,
22
    parsers: Optional[List[str]] = None,
23
    actions: Optional[List[dict]] = None,
24
    location: Optional[dict] = None,
25
    skip_tls_verification: Optional[bool] = None,
26
    remove_base64_images: Optional[bool] = None,
27
    fast_mode: Optional[bool] = None,
28
    use_mock: Optional[str] = None,
29
    block_ads: Optional[bool] = None,
30
    proxy: Optional[str] = None,
31
    max_age: Optional[int] = None,
32
    store_in_cache: Optional[bool] = None,
33
    integration: Optional[str] = None
34
) -> Document:
35
    """
36
    Scrape content from a single URL.
37
    
38
    Parameters:
39
    - url: str, target URL to scrape
40
    - formats: List[str], output formats ("markdown", "html", "rawHtml", "screenshot", "links")
41
    - headers: Dict[str, str], custom HTTP headers
42
    - include_tags: List[str], HTML tags to include
43
    - exclude_tags: List[str], HTML tags to exclude  
44
    - only_main_content: bool, extract only main content
45
    - timeout: int, request timeout in milliseconds
46
    - wait_for: int, wait time before scraping in milliseconds
47
    - mobile: bool, use mobile user agent
48
    - parsers: List[str], content parsers to use
49
    - actions: List[dict], browser actions to perform
50
    - location: dict, geographic location settings
51
    - skip_tls_verification: bool, skip SSL certificate verification
52
    - remove_base64_images: bool, remove base64 encoded images
53
    - fast_mode: bool, use faster scraping mode
54
    - use_mock: str, use mock response for testing
55
    - block_ads: bool, block advertisements
56
    - proxy: str, proxy server to use
57
    - max_age: int, maximum cache age in seconds
58
    - store_in_cache: bool, store result in cache
59
    - integration: str, integration identifier
60
    
61
    Returns:
62
    - Document: scraped content and metadata
63
    """
64
```
65

66
### Web Search
67

68
Search the web with content extraction, returning relevant results with extracted content formatted for LLM consumption.
69

70
```python { .api }
71
def search(
72
    query: str,
73
    *,
74
    sources: Optional[List[str]] = None,
75
    categories: Optional[List[str]] = None,
76
    limit: Optional[int] = None,
77
    tbs: Optional[str] = None,
78
    location: Optional[str] = None,
79
    ignore_invalid_urls: Optional[bool] = None,
80
    timeout: Optional[int] = None,
81
    scrape_options: Optional[dict] = None,
82
    integration: Optional[str] = None
83
) -> SearchData:
84
    """
85
    Search the web and extract content from results.
86
    
87
    Parameters:
88
    - query: str, search query
89
    - sources: List[str], search sources to use
90
    - categories: List[str], content categories to filter
91
    - limit: int, maximum number of results
92
    - tbs: str, time-based search parameters
93
    - location: str, geographic location for search
94
    - ignore_invalid_urls: bool, skip invalid URLs in results
95
    - timeout: int, request timeout in milliseconds
96
    - scrape_options: dict, options for scraping search results
97
    - integration: str, integration identifier
98
    
99
    Returns:
100
    - SearchData: search results with extracted content
101
    """
102
```
103

104
### Website Mapping
105

106
Generate a structural map of a website showing available pages and their relationships, useful for understanding site architecture before crawling.
107

108
```python { .api }
109
def map(
110
    url: str,
111
    *,
112
    search: Optional[str] = None,
113
    include_subdomains: Optional[bool] = None,
114
    limit: Optional[int] = None,
115
    sitemap: str = "include",
116
    timeout: Optional[int] = None,
117
    integration: Optional[str] = None,
118
    location: Optional[dict] = None
119
) -> MapData:
120
    """
121
    Generate a map of website structure.
122
    
123
    Parameters:
124
    - url: str, target website URL
125
    - search: Optional[str], search term to filter URLs
126
    - include_subdomains: Optional[bool], include subdomain URLs
127
    - limit: Optional[int], maximum number of URLs to return
128
    - sitemap: str, sitemap handling ("include", "exclude", "only")
129
    - timeout: Optional[int], request timeout in milliseconds
130
    - integration: Optional[str], integration identifier
131
    - location: Optional[dict], geographic location settings
132
    
133
    Returns:
134
    - MapData: website structure map with URLs and metadata
135
    """
136
```
137

138
## Usage Examples
139

140
### Basic Scraping
141

142
```python
143
from firecrawl import Firecrawl, ScrapeOptions
144

145
app = Firecrawl(api_key="your-api-key")
146

147
# Simple scraping
148
result = app.scrape("https://example.com")
149
print(result.data.content)
150

151
# Scraping with options
152
options = ScrapeOptions(
153
    formats=["markdown", "html"],
154
    include_tags=["article", "main"],
155
    wait_for=2000,
156
    screenshot=True
157
)
158
result = app.scrape("https://example.com", options)
159
```
160

161
### Web Search
162

163
```python
164
from firecrawl import Firecrawl, SearchOptions
165

166
app = Firecrawl(api_key="your-api-key")
167

168
# Basic search
169
results = app.search("latest AI developments")
170
for doc in results.data:
171
    print(f"Title: {doc.metadata.get('title')}")
172
    print(f"Content: {doc.content[:200]}...")
173

174
# Search with options
175
options = SearchOptions(
176
    limit=10,
177
    search_type="news",
178
    language="en",
179
    country="US"
180
)
181
results = app.search("AI breakthrough", options)
182
```
183

184
### Website Mapping
185

186
```python
187
from firecrawl import Firecrawl, MapOptions
188

189
app = Firecrawl(api_key="your-api-key")
190

191
# Generate site map
192
options = MapOptions(max_depth=3)
193
site_map = app.map("https://example.com", options)
194

195
for page in site_map.data:
196
    print(f"URL: {page.url}")
197
    print(f"Status: {page.status}")
198
```
199

200
## Types
201

202
```python { .api }
203
class ScrapeOptions:
204
    """Configuration options for scraping operations"""
205
    formats: Optional[List[str]]  # Output formats: ["markdown", "html", "rawHtml", "screenshot", "links"]
206
    include_tags: Optional[List[str]]  # HTML tags to include
207
    exclude_tags: Optional[List[str]]  # HTML tags to exclude
208
    wait_for: Optional[int]  # Wait time in milliseconds
209
    screenshot: Optional[bool]  # Capture screenshot
210
    full_page_screenshot: Optional[bool]  # Full page screenshot
211
    mobile: Optional[bool]  # Use mobile user agent
212

213
class ScrapeResponse:
214
    """Response from scrape operation"""
215
    success: bool
216
    data: Document
217
    
218
class SearchOptions:
219
    """Configuration options for search operations"""
220
    limit: Optional[int]  # Maximum number of results (default: 5)
221
    search_type: Optional[str]  # Search type: "web", "news", "academic"
222
    language: Optional[str]  # Language code (e.g., "en")
223
    country: Optional[str]  # Country code (e.g., "US")
224
    
225
class SearchResponse:
226
    """Response from search operation"""
227
    success: bool
228
    data: List[Document]
229
    
230
class MapOptions:
231
    """Configuration options for mapping operations"""
232
    max_depth: Optional[int]  # Maximum crawl depth
233
    limit: Optional[int]  # Maximum pages to map
234
    ignore_sitemap: Optional[bool]  # Ignore sitemap.xml
235
    
236
class MapResponse:
237
    """Response from map operation"""
238
    success: bool
239
    data: List[dict]  # List of page information
240
```
241

242
## Async Usage
243

244
All scraping operations have async equivalents:
245

246
```python
247
import asyncio
248
from firecrawl import AsyncFirecrawl
249

250
async def scrape_async():
251
    app = AsyncFirecrawl(api_key="your-api-key")
252
    
253
    # Async scraping
254
    result = await app.scrape("https://example.com")
255
    
256
    # Async search
257
    search_results = await app.search("query")
258
    
259
    # Async mapping
260
    site_map = await app.map("https://example.com")
261

262
asyncio.run(scrape_async())
263
```

Version

Tile

Files

scraping.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

scraping.mddocs/