Name: tdg-personal/data-scraper-agent
Rating: 81 (1 reviews)
Author: tdg-personal

tdg-personal/data-scraper-agent

Build a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.

Quality

81%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Security

1 medium severity finding. This skill can be installed but you should review these findings before use.

Medium

W011: Third-party content exposure detected (indirect prompt injection risk)

What this means

The skill exposes the agent to untrusted, user-generated content from public third-party sources, creating a risk of indirect prompt injection. This includes browsing arbitrary URLs, reading social media posts or forum comments, and analyzing content from unknown websites.

Why it was flagged

Third-party content exposure detected (high risk: 1.00). The skill's SKILL.md and code (e.g., scraper/sources/my_source.py and the HTML/RSS/Playwright patterns, plus ai/pipeline.py which builds prompts from scraped items and sends them to Gemini) explicitly fetch and ingest public/untrusted web content (examples include Hacker News, subreddits, LinkedIn, etc.) and uses the LLM's analyses to score/filter/store items, so third-party content can materially influence agent decisions.

Report incorrect finding

Audited: 5 days ago
Security analysis