Transforms, validates, and loads data in ETL pipelines. Use when building scrapers, validating NDJSON feeds, or importing data into CMS/DB targets.
100
100%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Generic pipeline patterns. For project-specific sources and full schema references see REFERENCE.md.
Launch a headless browser cluster (Puppeteer Cluster / Playwright) with retryLimit: 3, retryDelay: 5000, timeout: 30000, args: ['--no-sandbox', '--disable-setuid-sandbox'].
One record per line. Schema:
| Field | Type | Notes |
|---|---|---|
name | Required | Preserve original encoding |
lat/lng | Required | GPS coordinates |
address | Required | Full text |
source | Required | e.g. google-maps |
sourceId | Required | Source-unique ID |
category | Required | Domain category |
rating, reviewCount, phone, website, openingHours, photos, priceLevel | Optional | — |
--dry-run to collect a sample (50–200 records).
validate-ndjson.js example).
ndjson-filter to isolate failing records and inspect source HTML.createOrReplace disabled; check counts and duplicates.
node ./scripts/scrape-to-ndjson.js --out=data.ndjson --pages=100
node ./scripts/validate-ndjson.js data.ndjson
node ./scripts/dry-import.js data.ndjson --target=staging
node ./scripts/import.js data.ndjson --target=productionconst fs = require('fs'), rl = require('readline'), { z } = require('zod');
const schema = z.object({ name: z.string(), source: z.string(), sourceId: z.string() });
const iface = rl.createInterface({ input: fs.createReadStream(process.argv[2]) });
let line = 0, errors = 0;
for await (const l of iface) { line++; try { schema.parse(JSON.parse(l)); } catch(e) { console.error(`Line ${line}:`, e.message); errors++; } }
if (errors) { console.error(`${errors} errors`); process.exit(2); }
console.log('OK');Full scraper and extended validator: see REFERENCE.md.
f5c8508
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.