Name: tdg-personal/data-scraper-agent
Rating: 81 (1 reviews)
Author: tdg-personal

tdg-personal/data-scraper-agent

Build a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.

Quality

81%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Discovery

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly articulates specific capabilities, includes natural trigger terms, and has an explicit 'Use when' clause. Its main weakness is the very broad scope ('any public source — anything') which could create overlap with more focused scraping or automation skills. The description is well-structured and informative, though slightly verbose.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, learns from user feedback, runs on GitHub Actions. Also enumerates concrete source examples (job boards, prices, news, GitHub, sports).	3 / 3
Completeness	Clearly answers both 'what' (builds an automated AI-powered data collection agent that scrapes, enriches, and stores data) and 'when' with an explicit trigger clause: 'Use when the user wants to monitor, collect, or track any public data automatically.'	3 / 3
Trigger Term Quality	Includes strong natural keywords users would say: 'scrape', 'monitor', 'collect', 'track', 'data collection', 'job boards', 'prices', 'news', 'GitHub Actions', 'Notion', 'Sheets', 'Supabase', 'schedule', 'automated'. Good coverage of terms a user requesting web scraping or data monitoring would naturally use.	3 / 3
Distinctiveness Conflict Risk	While the description is detailed, the extremely broad scope ('any public source', 'anything') could overlap with more targeted web scraping skills, API integration skills, or general automation skills. The combination of scheduled scraping + LLM enrichment + storage is somewhat distinctive, but the breadth increases conflict risk.	2 / 3
	Total	11 / 12 Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a highly actionable skill with complete, executable code covering an end-to-end data scraping agent pipeline. Its main weaknesses are verbosity (redundant patterns, decorative examples list) and lack of validation checkpoints in the workflow — there's no guidance on testing individual components before integration. The monolithic structure would benefit from splitting into a concise overview with linked reference files for patterns, storage adapters, and configuration details.

Suggestions

Add explicit validation checkpoints between steps — e.g., 'Test scraper output: run `python -m scraper.sources.my_source` and verify it returns ≥1 item with all required fields before proceeding to Step 4'

Remove the redundant 'Common Scraping Patterns' section since the same patterns (REST, HTML, RSS) already appear in Step 3, or consolidate them into a single referenced PATTERNS.md file

Split storage adapter examples (Sheets, Supabase) into a separate STORAGE.md reference file and keep only the Notion example inline, with a clear link to alternatives

Remove or condense the 'Real-World Examples' section — these are just prompt strings that don't provide actionable guidance beyond what 'When to Activate' already covers

Dimension	Reasoning	Score
Conciseness	The skill is quite long (~500+ lines) with some redundancy — scraping patterns appear both in Step 3 and again in the 'Common Scraping Patterns' section. The free tier limits table and anti-patterns table add value but the overall document could be tightened by ~30%. Some sections like 'Real-World Examples' are just prompt strings that don't add actionable guidance.	2 / 3
Actionability	Excellent actionability — every step includes complete, executable Python code with proper imports, error handling, and realistic patterns. The GitHub Actions YAML, config.yaml template, and requirements.txt are all copy-paste ready. The code covers the full pipeline from scraping to storage.	3 / 3
Workflow Clarity	The 10-step workflow is clearly sequenced and logically ordered (understand → design → build scraper → AI client → pipeline → feedback → storage → orchestrate → CI → config). However, there are no explicit validation checkpoints between steps — no 'test your scraper output before wiring AI', no 'verify Notion connection before running full pipeline', and the feedback loop between storage and learning is described vaguely ('implement a separate feedback_sync.py') rather than concretely.	2 / 3
Progressive Disclosure	The content is entirely monolithic — everything lives in one massive file with no references to external documents. The directory structure suggests files like profile/context.md and config.yaml but there are no linked supplementary docs (e.g., a FORMS.md equivalent for storage alternatives, or a separate patterns reference). For a skill this large, splitting storage adapters, scraping patterns, or the anti-patterns table into referenced files would significantly improve navigability.	2 / 3
	Total	9 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (765 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Reviewed

5 days ago

Table of Contents

Discovery Implementation Validation