CtrlK
BlogDocsLog inGet started
Tessl Logo

firecrawl-core-workflow-a

Execute Firecrawl primary workflow: scrape and crawl websites into LLM-ready markdown. Use when scraping single pages, crawling entire sites, or building content ingestion pipelines with Firecrawl's scrapeUrl and crawlUrl methods. Trigger with phrases like "firecrawl scrape", "firecrawl crawl site", "scrape page to markdown", "crawl documentation".

89

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Firecrawl Core Workflow A — Scrape & Crawl

Overview

Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with scrapeUrl, multi-page crawling with crawlUrl, async crawl jobs with polling, and content processing pipelines.

Prerequisites

  • @mendable/firecrawl-js installed
  • FIRECRAWL_API_KEY environment variable set
  • Target URL(s) identified

Instructions

Step 1: Single-Page Scrape

import FirecrawlApp from "@mendable/firecrawl-js";

const firecrawl = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY!,
});

// Scrape a single page to clean markdown
const result = await firecrawl.scrapeUrl("https://docs.example.com/api", {
  formats: ["markdown"],
  onlyMainContent: true,  // strips nav, footer, sidebars
  waitFor: 2000,           // wait 2s for JS to render
});

if (result.success) {
  console.log("Title:", result.metadata?.title);
  console.log("Source:", result.metadata?.sourceURL);
  console.log("Markdown:", result.markdown?.substring(0, 200));
}

Step 2: Multi-Page Synchronous Crawl

// Crawl a site — Firecrawl follows links, renders JS, returns all pages
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
  limit: 50,                   // max pages to crawl
  maxDepth: 3,                 // link depth from start URL
  includePaths: ["/docs/*", "/api/*"],   // only these paths
  excludePaths: ["/blog/*", "/changelog/*"],
  allowBackwardLinks: false,   // only crawl child paths
  scrapeOptions: {
    formats: ["markdown"],
    onlyMainContent: true,
  },
});

console.log(`Crawled ${crawlResult.data?.length} pages`);
for (const page of crawlResult.data || []) {
  console.log(`  ${page.metadata?.sourceURL}: ${page.markdown?.length} chars`);
}

Step 3: Async Crawl for Large Sites

// Start an async crawl job — returns immediately with job ID
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
  limit: 500,
  scrapeOptions: { formats: ["markdown"] },
});

console.log(`Crawl started: ${job.id}`);

// Poll for completion with backoff
let pollInterval = 2000;
let status = await firecrawl.checkCrawlStatus(job.id);

while (status.status === "scraping") {
  console.log(`Progress: ${status.completed}/${status.total} pages`);
  await new Promise(r => setTimeout(r, pollInterval));
  pollInterval = Math.min(pollInterval * 1.5, 30000);
  status = await firecrawl.checkCrawlStatus(job.id);
}

if (status.status === "completed") {
  console.log(`Done: ${status.data?.length} pages scraped`);
} else {
  console.error("Crawl failed:", status.error);
}

Step 4: Process and Store Results

import { writeFileSync, mkdirSync } from "fs";

function processResults(pages: any[], outputDir: string) {
  mkdirSync(outputDir, { recursive: true });

  const manifest = pages.map((page, i) => {
    const url = page.metadata?.sourceURL || `page-${i}`;
    const slug = new URL(url).pathname
      .replace(/\//g, "_")
      .replace(/^_|_$/g, "") || "index";
    const filename = `${slug}.md`;

    // Clean markdown: collapse whitespace, remove JS links
    const content = (page.markdown || "")
      .replace(/\n{3,}/g, "\n\n")
      .replace(/\[.*?\]\(javascript:.*?\)/g, "")
      .trim();

    writeFileSync(`${outputDir}/${filename}`, content);

    return { url, filename, chars: content.length };
  });

  writeFileSync(`${outputDir}/manifest.json`, JSON.stringify(manifest, null, 2));
  return manifest;
}

Output

  • Clean markdown files per crawled page
  • manifest.json with URL-to-file mapping
  • Crawl summary with page count and failures

Error Handling

ErrorCauseSolution
Empty markdownJS content not renderedIncrease waitFor to 5000ms
429 Too Many RequestsRate limit hitBack off, reduce concurrency
Crawl returns few pagesURL filters too strictWiden includePaths patterns
402 Payment RequiredCredits exhaustedCheck balance, reduce limit
Partial crawl resultsSite blocks bot on some pagesUse scrapeUrl for failed URLs individually

Examples

Scrape with Multiple Formats

const result = await firecrawl.scrapeUrl("https://example.com", {
  formats: ["markdown", "html", "links"],
  onlyMainContent: true,
});

console.log("Markdown:", result.markdown?.length);
console.log("HTML:", result.html?.length);
console.log("Links:", result.links?.length);

Crawl with Webhook (No Polling)

const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
  limit: 100,
  scrapeOptions: { formats: ["markdown"] },
  webhook: {
    url: "https://api.yourapp.com/webhooks/firecrawl",
    events: ["completed", "page"],
  },
});
console.log(`Crawl ${job.id} started — webhook will fire on completion`);

Resources

  • Scrape Endpoint
  • Crawl Endpoint
  • Advanced Scraping Guide

Next Steps

For structured data extraction, see firecrawl-core-workflow-b.

Repository
jeremylongshore/claude-code-plugins-plus-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.