Build a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.
81
81%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly articulates specific capabilities, includes natural trigger terms, and has an explicit 'Use when' clause. Its main weakness is the very broad scope ('any public source — anything') which could create overlap with more focused scraping or automation skills. The description is well-structured and informative, though slightly verbose.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, learns from user feedback, runs on GitHub Actions. Also enumerates concrete source examples (job boards, prices, news, GitHub, sports). | 3 / 3 |
Completeness | Clearly answers both 'what' (builds an automated AI-powered data collection agent that scrapes, enriches, and stores data) and 'when' with an explicit trigger clause: 'Use when the user wants to monitor, collect, or track any public data automatically.' | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'scrape', 'monitor', 'collect', 'track', 'data collection', 'job boards', 'prices', 'news', 'GitHub Actions', 'Notion', 'Sheets', 'Supabase', 'schedule', 'automated'. Good coverage of terms a user requesting web scraping or data monitoring would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | While the description is detailed, the extremely broad scope ('any public source', 'anything') could overlap with more targeted web scraping skills, API integration skills, or general automation skills. The combination of scheduled scraping + LLM enrichment + storage is somewhat distinctive, but the breadth increases conflict risk. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a highly actionable skill with complete, executable code covering an end-to-end data scraping agent pipeline. Its main weaknesses are verbosity (redundant patterns, decorative examples list) and lack of validation checkpoints in the workflow — there's no guidance on testing individual components before integration. The monolithic structure would benefit from splitting into a concise overview with linked reference files for patterns, storage adapters, and configuration details.
Suggestions
Add explicit validation checkpoints between steps — e.g., 'Test scraper output: run `python -m scraper.sources.my_source` and verify it returns ≥1 item with all required fields before proceeding to Step 4'
Remove the redundant 'Common Scraping Patterns' section since the same patterns (REST, HTML, RSS) already appear in Step 3, or consolidate them into a single referenced PATTERNS.md file
Split storage adapter examples (Sheets, Supabase) into a separate STORAGE.md reference file and keep only the Notion example inline, with a clear link to alternatives
Remove or condense the 'Real-World Examples' section — these are just prompt strings that don't provide actionable guidance beyond what 'When to Activate' already covers
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is quite long (~500+ lines) with some redundancy — scraping patterns appear both in Step 3 and again in the 'Common Scraping Patterns' section. The free tier limits table and anti-patterns table add value but the overall document could be tightened by ~30%. Some sections like 'Real-World Examples' are just prompt strings that don't add actionable guidance. | 2 / 3 |
Actionability | Excellent actionability — every step includes complete, executable Python code with proper imports, error handling, and realistic patterns. The GitHub Actions YAML, config.yaml template, and requirements.txt are all copy-paste ready. The code covers the full pipeline from scraping to storage. | 3 / 3 |
Workflow Clarity | The 10-step workflow is clearly sequenced and logically ordered (understand → design → build scraper → AI client → pipeline → feedback → storage → orchestrate → CI → config). However, there are no explicit validation checkpoints between steps — no 'test your scraper output before wiring AI', no 'verify Notion connection before running full pipeline', and the feedback loop between storage and learning is described vaguely ('implement a separate feedback_sync.py') rather than concretely. | 2 / 3 |
Progressive Disclosure | The content is entirely monolithic — everything lives in one massive file with no references to external documents. The directory structure suggests files like profile/context.md and config.yaml but there are no linked supplementary docs (e.g., a FORMS.md equivalent for storage alternatives, or a separate patterns reference). For a skill this large, splitting storage adapters, scraping patterns, or the anti-patterns table into referenced files would significantly improve navigability. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (765 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
Reviewed
Table of Contents