Transforms, validates, loads data in ETL pipelines. Use when building scrapers, validating NDJSON feeds, or importing data into CMS/DB targets.
77
96%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Generic pipeline patterns. For project-specific sources, full schema references see REFERENCE.md.
Launch a headless browser cluster (Puppeteer Cluster / Playwright) with retryLimit: 3, retryDelay: 5000, timeout: 30000, args: ['--no-sandbox', '--disable-setuid-sandbox'].
One record per line. Schema:
| Field | Type | Notes |
|---|---|---|
name | Required | Preserve original encoding |
lat/lng | Required | GPS coordinates |
address | Required | Full text |
source | Required | e.g. google-maps |
sourceId | Required | Source-unique ID |
category | Required | Domain category |
rating, reviewCount, phone, website, openingHours, photos, priceLevel | Optional | — |
--dry-run to collect sample (50–200 records).
validate-ndjson.js example).
ndjson-filter to isolate failing records; inspect source HTML.createOrReplace disabled; check counts, duplicates.
node ./scripts/scrape-to-ndjson.js --out=data.ndjson --pages=100
node ./scripts/validate-ndjson.js data.ndjson
node ./scripts/dry-import.js data.ndjson --target=staging
node ./scripts/import.js data.ndjson --target=productionconst fs = require('fs'), rl = require('readline'), { z } = require('zod');
const schema = z.object({ name: z.string(), source: z.string(), sourceId: z.string() });
const iface = rl.createInterface({ input: fs.createReadStream(process.argv[2]) });
let line = 0, errors = 0;
for await (const l of iface) { line++; try { schema.parse(JSON.parse(l)); } catch(e) { console.error(`Line ${line}:`, e.message); errors++; } }
if (errors) { console.error(`${errors} errors`); process.exit(2); }
console.log('OK');Full scraper, extended validator: see REFERENCE.md.
7a69a05
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.