Convert websites into LLM-ready data with Firecrawl API. Features: scrape, crawl, map, search, extract, agent (autonomous), batch operations, and change tracking. Handles JavaScript, anti-bot bypass, PDF/DOCX parsing, and branding extraction. Prevents 10 documented errors. Use when: scraping websites, crawling sites, web search + scrape, autonomous data gathering, monitoring content changes, extracting brand/design systems, or troubleshooting content not loading, JavaScript rendering, bot detection, v2 migration, job status errors, DNS resolution, or stealth mode pricing.
Install with Tessl CLI
npx tessl i github:jezweb/claude-skills --skill firecrawl-scraper86
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Status: Production Ready Last Updated: 2026-01-20 Official Docs: https://docs.firecrawl.dev API Version: v2 SDK Versions: firecrawl-py 4.13.0+, @mendable/firecrawl-js 4.11.1+
Firecrawl is a Web Data API for AI that turns websites into LLM-ready markdown or structured data. It handles:
| Endpoint | Purpose | Use Case |
|---|---|---|
/scrape | Single page | Extract article, product page |
/crawl | Full site | Index docs, archive sites |
/map | URL discovery | Find all pages, plan strategy |
/search | Web search + scrape | Research with live data |
/extract | Structured data | Product prices, contacts |
/agent | Autonomous gathering | No URLs needed, AI navigates |
/batch-scrape | Multiple URLs | Bulk processing |
/v2/scrape)Scrapes a single webpage and returns clean, structured content.
from firecrawl import Firecrawl
import os
app = Firecrawl(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# Basic scrape
doc = app.scrape(
url="https://example.com/article",
formats=["markdown", "html"],
only_main_content=True
)
print(doc.markdown)
print(doc.metadata)import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await app.scrapeUrl('https://example.com/article', {
formats: ['markdown', 'html'],
onlyMainContent: true
});
console.log(result.markdown);| Format | Description |
|---|---|
markdown | LLM-optimized content |
html | Full HTML |
rawHtml | Unprocessed HTML |
screenshot | Page capture (with viewport options) |
links | All URLs on page |
json | Structured data extraction |
summary | AI-generated summary |
branding | Design system data |
changeTracking | Content change detection |
doc = app.scrape(
url="https://example.com",
formats=["markdown", "screenshot"],
only_main_content=True,
remove_base64_images=True,
wait_for=5000, # Wait 5s for JS
timeout=30000,
# Location & language
location={"country": "AU", "languages": ["en-AU"]},
# Cache control
max_age=0, # Fresh content (no cache)
store_in_cache=True,
# Stealth mode for complex sites
stealth=True,
# Custom headers
headers={"User-Agent": "Custom Bot 1.0"}
)Perform interactions before scraping:
doc = app.scrape(
url="https://example.com",
actions=[
{"type": "click", "selector": "button.load-more"},
{"type": "wait", "milliseconds": 2000},
{"type": "scroll", "direction": "down"},
{"type": "write", "selector": "input#search", "text": "query"},
{"type": "press", "key": "Enter"},
{"type": "screenshot"} # Capture state mid-action
]
)# With schema
doc = app.scrape(
url="https://example.com/product",
formats=["json"],
json_options={
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}
}
)
# Without schema (prompt-only)
doc = app.scrape(
url="https://example.com/product",
formats=["json"],
json_options={
"prompt": "Extract the product name, price, and availability"
}
)Extract design system and brand identity:
doc = app.scrape(
url="https://example.com",
formats=["branding"]
)
# Returns:
# - Color schemes and palettes
# - Typography (fonts, sizes, weights)
# - Spacing and layout metrics
# - UI component styles
# - Logo and imagery URLs
# - Brand personality traits/v2/crawl)Crawls all accessible pages from a starting URL.
result = app.crawl(
url="https://docs.example.com",
limit=100,
max_depth=3,
allowed_domains=["docs.example.com"],
exclude_paths=["/api/*", "/admin/*"],
scrape_options={
"formats": ["markdown"],
"only_main_content": True
}
)
for page in result.data:
print(f"Scraped: {page.metadata.source_url}")
print(f"Content: {page.markdown[:200]}...")# Start crawl (returns immediately)
job = app.start_crawl(
url="https://docs.example.com",
limit=1000,
webhook="https://your-domain.com/webhook"
)
print(f"Job ID: {job.id}")
# Or poll for status
status = app.check_crawl_status(job.id)/v2/map)Rapidly discover all URLs on a website without scraping content.
urls = app.map(url="https://example.com")
print(f"Found {len(urls)} pages")
for url in urls[:10]:
print(url)Use for: sitemap discovery, crawl planning, website audits.
/search) - NEWPerform web searches and optionally scrape the results in one operation.
# Basic search
results = app.search(
query="best practices for React server components",
limit=10
)
for result in results:
print(f"{result.title}: {result.url}")
# Search + scrape results
results = app.search(
query="React server components tutorial",
limit=5,
scrape_options={
"formats": ["markdown"],
"only_main_content": True
}
)
for result in results:
print(f"{result.title}")
print(result.markdown[:500])results = app.search(
query="machine learning papers",
limit=20,
# Filter by source type
sources=["web", "news", "images"],
# Filter by category
categories=["github", "research", "pdf"],
# Location
location={"country": "US"},
# Time filter
tbs="qdr:m", # Past month (qdr:h=hour, qdr:d=day, qdr:w=week, qdr:y=year)
timeout=30000
)Cost: 2 credits per 10 results + scraping costs if enabled.
/v2/extract)AI-powered structured data extraction from single pages, multiple pages, or entire domains.
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
description: str
in_stock: bool
result = app.extract(
urls=["https://example.com/product"],
schema=Product,
system_prompt="Extract product information"
)
print(result.data)# Extract from entire domain using wildcard
result = app.extract(
urls=["example.com/*"], # All pages on domain
schema=Product,
system_prompt="Extract all products"
)
# Enable web search for additional context
result = app.extract(
urls=["example.com/products"],
schema=Product,
enable_web_search=True # Follow external links
)result = app.extract(
urls=["https://example.com/about"],
prompt="Extract the company name, founding year, and key executives"
)
# LLM determines output structure/agent) - NEWAutonomous web data gathering without requiring specific URLs. The agent searches, navigates, and gathers data using natural language prompts.
# Basic agent usage
result = app.agent(
prompt="Find the pricing plans for the top 3 headless CMS platforms and compare their features"
)
print(result.data)
# With schema for structured output
from pydantic import BaseModel
from typing import List
class CMSPricing(BaseModel):
name: str
free_tier: bool
starter_price: float
features: List[str]
result = app.agent(
prompt="Find pricing for Contentful, Sanity, and Strapi",
schema=CMSPricing
)
# Optional: focus on specific URLs
result = app.agent(
prompt="Extract the enterprise pricing details",
urls=["https://contentful.com/pricing", "https://sanity.io/pricing"]
)| Model | Best For | Cost |
|---|---|---|
spark-1-mini (default) | Simple extractions, high volume | Standard |
spark-1-pro | Complex analysis, ambiguous data | 60% more |
result = app.agent(
prompt="Analyze competitive positioning...",
model="spark-1-pro" # For complex tasks
)# Start agent (returns immediately)
job = app.start_agent(
prompt="Research market trends..."
)
# Poll for results
status = app.check_agent_status(job.id)
if status.status == "completed":
print(status.data)Note: Agent is in Research Preview. 5 free daily requests, then credit-based billing.
Process multiple URLs efficiently in a single operation.
results = app.batch_scrape(
urls=[
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
formats=["markdown"],
only_main_content=True
)
for page in results.data:
print(f"{page.metadata.source_url}: {len(page.markdown)} chars")job = app.start_batch_scrape(
urls=url_list,
formats=["markdown"],
webhook="https://your-domain.com/webhook"
)
# Webhook receives events: started, page, completed, failedconst job = await app.startBatchScrape(urls, {
formats: ['markdown'],
webhook: 'https://your-domain.com/webhook'
});
// Poll for status
const status = await app.checkBatchScrapeStatus(job.id);Monitor content changes over time by comparing scrapes.
# Enable change tracking
doc = app.scrape(
url="https://example.com/pricing",
formats=["markdown", "changeTracking"]
)
# Response includes:
print(doc.change_tracking.status) # new, same, changed, removed
print(doc.change_tracking.previous_scrape_at)
print(doc.change_tracking.visibility) # visible, hidden# Git-diff mode (default)
doc = app.scrape(
url="https://example.com/docs",
formats=["markdown", "changeTracking"],
change_tracking_options={
"mode": "diff"
}
)
print(doc.change_tracking.diff) # Line-by-line changes
# JSON mode (structured comparison)
doc = app.scrape(
url="https://example.com/pricing",
formats=["markdown", "changeTracking"],
change_tracking_options={
"mode": "json",
"schema": {"type": "object", "properties": {"price": {"type": "number"}}}
}
)
# Costs 5 credits per pageChange States:
new - Page not seen beforesame - No changes since last scrapechanged - Content modifiedremoved - Page no longer accessible# Get API key from https://www.firecrawl.dev/app
# Store in environment
FIRECRAWL_API_KEY=fc-your-api-key-hereNever hardcode API keys!
The Firecrawl SDK cannot run in Cloudflare Workers (requires Node.js). Use the REST API directly:
interface Env {
FIRECRAWL_API_KEY: string;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { url } = await request.json<{ url: string }>();
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
formats: ['markdown'],
onlyMainContent: true
})
});
const result = await response.json();
return Response.json(result);
}
};Stealth mode now costs 5 credits per request when actively used. Default behavior uses "auto" mode which only charges stealth credits if basic fails.
Recommended pattern:
# Use auto mode (default) - only charges 5 credits if stealth is needed
doc = app.scrape(url, formats=["markdown"])
# Or conditionally enable stealth for specific errors
if error_status_code in [401, 403, 500]:
doc = app.scrape(url, formats=["markdown"], proxy="stealth")Credits and tokens merged into single system. Extract endpoint uses credits (15 tokens = 1 credit).
| Tier | Credits/Month | Notes |
|---|---|---|
| Free | 500 | Good for testing |
| Hobby | 3,000 | $19/month |
| Standard | 100,000 | $99/month |
| Growth | 500,000 | $399/month |
Credit Costs:
| Issue | Cause | Solution |
|---|---|---|
| Empty content | JS not loaded | Add wait_for: 5000 or use actions |
| Rate limit exceeded | Over quota | Check dashboard, upgrade plan |
| Timeout error | Slow page | Increase timeout, use stealth: true |
| Bot detection | Anti-scraping | Use stealth: true, add location |
| Invalid API key | Wrong format | Must start with fc- |
This skill prevents 10 documented issues:
Error: Unexpected credit costs when using stealth mode Source: Stealth Mode Docs | Changelog Why It Happens: Starting May 8th, 2025, Stealth Mode proxy requests cost 5 credits per request (previously included in standard pricing). This is a significant billing change. Prevention: Use auto mode (default) which only charges stealth credits if basic fails
# RECOMMENDED: Use auto mode (default)
doc = app.scrape(url, formats=['markdown'])
# Auto retries with stealth (5 credits) only if basic fails
# Or conditionally enable based on error status
try:
doc = app.scrape(url, formats=['markdown'], proxy='basic')
except Exception as e:
if e.status_code in [401, 403, 500]:
doc = app.scrape(url, formats=['markdown'], proxy='stealth')Stealth Mode Options:
auto (default): Charges 5 credits only if stealth succeeds after basic failsbasic: Standard proxies, 1 credit coststealth: 5 credits per request when actively usedError: AttributeError: 'FirecrawlApp' object has no attribute 'scrape_url'
Source: v2.0.0 Release | Migration Guide
Why It Happens: v2.0.0 (August 2025) renamed SDK methods across all languages
Prevention: Use new method names
JavaScript/TypeScript:
scrapeUrl() → scrape()crawlUrl() → crawl() or startCrawl()asyncCrawlUrl() → startCrawl()checkCrawlStatus() → getCrawlStatus()Python:
scrape_url() → scrape()crawl_url() → crawl() or start_crawl()# OLD (v1)
doc = app.scrape_url("https://example.com")
# NEW (v2)
doc = app.scrape("https://example.com")Error: 'extract' is not a valid format
Source: v2.0.0 Release
Why It Happens: Old "extract" format renamed to "json" in v2.0.0
Prevention: Use new object format for JSON extraction
# OLD (v1)
doc = app.scrape_url(
url="https://example.com",
params={
"formats": ["extract"],
"extract": {"prompt": "Extract title"}
}
)
# NEW (v2)
doc = app.scrape(
url="https://example.com",
formats=[{"type": "json", "prompt": "Extract title"}]
)
# With schema
doc = app.scrape(
url="https://example.com",
formats=[{
"type": "json",
"prompt": "Extract product info",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"}
}
}
}]
)Screenshot format also changed:
# NEW: Screenshot as object
formats=[{
"type": "screenshot",
"fullPage": True,
"quality": 80,
"viewport": {"width": 1920, "height": 1080}
}]Error: 'allowBackwardCrawling' is not a valid parameter
Source: v2.0.0 Release
Why It Happens: Several crawl parameters renamed or removed in v2.0.0
Prevention: Use new parameter names
Parameter Changes:
allowBackwardCrawling → Use crawlEntireDomain insteadmaxDepth → Use maxDiscoveryDepth insteadignoreSitemap (bool) → sitemap ("only", "skip", "include")# OLD (v1)
app.crawl_url(
url="https://docs.example.com",
params={
"allowBackwardCrawling": True,
"maxDepth": 3,
"ignoreSitemap": False
}
)
# NEW (v2)
app.crawl(
url="https://docs.example.com",
crawl_entire_domain=True,
max_discovery_depth=3,
sitemap="include" # "only", "skip", or "include"
)Error: Stale cached content returned unexpectedly Source: v2.0.0 Release Why It Happens: v2.0.0 changed several defaults Prevention: Be aware of new defaults
Default Changes:
maxAge now defaults to 2 days (cached by default)blockAds, skipTlsVerification, removeBase64Images enabled by default# Force fresh data if needed
doc = app.scrape(url, formats=['markdown'], max_age=0)
# Disable cache entirely
doc = app.scrape(url, formats=['markdown'], store_in_cache=False)Error: "Job not found" when checking crawl status immediately after creation
Source: GitHub Issue #2662
Why It Happens: Database replication delay between job creation and status endpoint availability
Prevention: Wait 1-3 seconds before first status check, or implement retry logic
import time
# Start crawl
job = app.start_crawl(url="https://docs.example.com")
print(f"Job ID: {job.id}")
# REQUIRED: Wait before first status check
time.sleep(2) # 1-3 seconds recommended
# Now status check succeeds
status = app.get_crawl_status(job.id)
# Or implement retry logic
def get_status_with_retry(job_id, max_retries=3, delay=1):
for attempt in range(max_retries):
try:
return app.get_crawl_status(job_id)
except Exception as e:
if "Job not found" in str(e) and attempt < max_retries - 1:
time.sleep(delay)
continue
raise
status = get_status_with_retry(job.id)Error: DNS resolution failures return success: false with HTTP 200 status instead of 4xx
Source: GitHub Issue #2402 | Fixed in v2.7.0
Why It Happens: Changed in v2.7.0 for consistent error handling
Prevention: Check success field and code field, don't rely on HTTP status alone
const result = await app.scrape('https://nonexistent-domain-xyz.com');
// DON'T rely on HTTP status code
// Response: HTTP 200 with { success: false, code: "SCRAPE_DNS_RESOLUTION_ERROR" }
// DO check success field
if (!result.success) {
if (result.code === 'SCRAPE_DNS_RESOLUTION_ERROR') {
console.error('DNS resolution failed');
}
throw new Error(result.error);
}Note: DNS resolution errors still charge 1 credit despite failure.
Error: Cloudflare error page returned as "successful" scrape, credits charged Source: GitHub Issue #2413 Why It Happens: Fire-1 engine charges credits even when bot detection prevents access Prevention: Validate content isn't an error page before processing; use stealth mode for protected sites
# First attempt without stealth
doc = app.scrape(url="https://protected-site.com", formats=["markdown"])
# Validate content isn't an error page
if "cloudflare" in doc.markdown.lower() or "access denied" in doc.markdown.lower():
# Retry with stealth (costs 5 credits if successful)
doc = app.scrape(url, formats=["markdown"], stealth=True)Cost Impact: Basic scrape charges 1 credit even on failure, stealth retry charges additional 5 credits.
Error: "All scraping engines failed!" (SCRAPE_ALL_ENGINES_FAILED) on sites with anti-bot measures
Source: GitHub Issue #2257
Why It Happens: Self-hosted Firecrawl lacks advanced anti-fingerprinting techniques present in cloud service
Prevention: Use Firecrawl cloud service for sites with strong anti-bot measures, or configure proxy
# Self-hosted fails on Cloudflare-protected sites
curl -X POST 'http://localhost:3002/v2/scrape' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://www.example.com/",
"pageOptions": { "engine": "playwright" }
}'
# Error: "All scraping engines failed!"
# Workaround: Use cloud service instead
# Cloud service has better anti-fingerprintingNote: This affects self-hosted v2.3.0+ with default docker-compose setup. Warning present: "⚠️ WARNING: No proxy server provided. Your IP address may be blocked."
Suboptimal: Not leveraging cache can make requests 500% slower
Source: Fast Scraping Docs | Blog Post
Why It Matters: Default maxAge is 2 days in v2+, but many use cases need different strategies
Prevention: Use appropriate cache strategy for your content type
# Fresh data (real-time pricing, stock prices)
doc = app.scrape(url, formats=["markdown"], max_age=0)
# 10-minute cache (news, blogs)
doc = app.scrape(url, formats=["markdown"], max_age=600000) # milliseconds
# Use default cache (2 days) for static content
doc = app.scrape(url, formats=["markdown"]) # maxAge defaults to 172800000
# Don't store in cache (one-time scrape)
doc = app.scrape(url, formats=["markdown"], store_in_cache=False)
# Require minimum age before re-scraping (v2.7.0+)
doc = app.scrape(url, formats=["markdown"], min_age=3600000) # 1 hour minimumPerformance Impact:
| Package | Version | Last Checked |
|---|---|---|
| firecrawl-py | 4.13.0+ | 2026-01-20 |
| @mendable/firecrawl-js | 4.11.1+ | 2026-01-20 |
| API Version | v2 | Current |
Token Savings: ~65% vs manual integration Error Prevention: 10 documented issues (v2 migration, stealth pricing, job status race, DNS errors, bot detection billing, self-hosted limitations, cache optimization) Production Ready: Yes Last verified: 2026-01-21 | Skill version: 2.0.0 | Changes: Added Known Issues Prevention section with 10 documented errors from TIER 1-2 research findings; added v2 migration guidance; documented stealth mode pricing change and unified billing model
fa91c34
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.