evidence-collector

Screenshot-obsessed, fantasy-allergic QA specialist - Default to finding 3-5 issues, requires visual proof for everything

Quality

21%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./testing-evidence-collector/skills/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description reads more like a persona or character description than a functional skill description. It prioritizes personality traits ('obsessed', 'allergic') over concrete capabilities and completely lacks trigger guidance. The description fails to communicate what specific QA actions are performed, what domain it applies to, or when Claude should select it.

Suggestions

Replace personality language with concrete actions, e.g., 'Reviews screenshots of web pages and UI elements to identify visual bugs, layout issues, and inconsistencies.'

Add an explicit 'Use when...' clause with natural trigger terms like 'QA review', 'check screenshot', 'find bugs', 'visual testing', 'UI issues'.

Specify the domain and scope clearly - what types of artifacts are being QA'd (web apps, mobile screens, designs) and what types of issues are found (visual regressions, broken layouts, accessibility problems).

Dimension	Reasoning	Score
Specificity	The description uses personality-driven language ('screenshot-obsessed', 'fantasy-allergic') rather than concrete actions. It mentions 'finding 3-5 issues' but doesn't specify what kind of issues, what domain, or what concrete actions are performed (e.g., test web pages, validate UI elements, check accessibility).	1 / 3
Completeness	The 'what' is extremely vague (some kind of QA with screenshots) and there is no 'when' clause at all. There's no 'Use when...' guidance or explicit trigger conditions for Claude to know when to select this skill.	1 / 3
Trigger Term Quality	The description lacks natural keywords a user would say when needing QA help. Terms like 'screenshot-obsessed' and 'fantasy-allergic' are not things users would search for. Missing natural terms like 'test', 'bug', 'UI testing', 'visual regression', 'QA review', 'screenshot comparison', etc.	1 / 3
Distinctiveness Conflict Risk	The combination of QA + screenshots + visual proof gives it some distinctiveness from generic coding or testing skills, but the lack of specificity about what is being tested (web apps? mobile? documents?) means it could still overlap with other QA-related skills.	2 / 3
	Total	5 / 12 Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a personality-driven QA agent definition that suffers from significant verbosity and repetition. The core idea — screenshot-based evidence gathering with a skeptical default — is sound, but it's expressed 3-4 times across different sections (beliefs, process, fail triggers, communication style, success metrics). The actionable parts (bash commands, report template, testing protocols) are useful but buried in personality fluff that Claude doesn't need.

Suggestions

Cut the content by 50-60%: merge 'Core Beliefs', 'Communication Style', 'Success Metrics', and 'Learning & Memory' into 2-3 bullet points at the top. Claude doesn't need motivational framing to follow instructions.

Move the full report template and testing protocol templates to a separate referenced file (e.g., `qa-report-template.md`) and keep only a brief summary in SKILL.md.

Add a validation/feedback loop to the workflow: after Step 3 testing, explicitly define what happens — e.g., 'If critical issues found → block approval and list fixes; if only low-priority → conditional pass with noted improvements.'

Define or document the `qa-playwright-capture.sh` script location and expected behavior, since the entire workflow depends on it but it's referenced without explanation.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~180+ lines. Extensive repetition of the same concepts (evidence-based, screenshot-obsessed, fantasy-allergic) across multiple sections. Personality traits, beliefs, communication style, success metrics, and learning sections all restate the same ideas. Claude doesn't need to be told its 'core beliefs' or 'success metrics' in this much detail.	1 / 3
Actionability	The bash commands in Step 1 are concrete and executable, and the testing protocols provide structured templates. However, much of the content is personality description rather than actionable instruction. The accordion/form/mobile testing protocols are template-based rather than providing executable test code. The script `qa-playwright-capture.sh` is referenced but not defined.	2 / 3
Workflow Clarity	There is a 3-step mandatory process (Reality Check → Visual Evidence Analysis → Interactive Testing) which provides reasonable sequencing. However, validation checkpoints are weak — there's no explicit feedback loop for what to do when issues are found beyond listing them in a report. The 'AUTOMATIC FAIL' section lists triggers but doesn't integrate into the workflow as decision points.	2 / 3
Progressive Disclosure	The content references `ai/agents/qa.md` for detailed methodology at the end, which is good progressive disclosure. However, the SKILL.md itself is monolithic — the report template, testing protocols, personality description, beliefs, communication style, learning/memory, and success metrics are all inline when much of this could be split into referenced files. The massive inline report template especially should be a separate reference.	2 / 3
	Total	7 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: OpenRoster-ai/awesome-openroster
Commit: 09aef5d

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.