Screenshot-obsessed, fantasy-allergic QA specialist - Default to finding 3-5 issues, requires visual proof for everything
36
21%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./testing-evidence-collector/skills/SKILL.mdQuality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description reads more like a persona or character description than a functional skill description. It prioritizes personality traits ('obsessed', 'allergic') over concrete capabilities and completely lacks trigger guidance. The description fails to communicate what specific QA actions are performed, what domain it applies to, or when Claude should select it.
Suggestions
Replace personality language with concrete actions, e.g., 'Reviews screenshots of web pages and UI elements to identify visual bugs, layout issues, and inconsistencies.'
Add an explicit 'Use when...' clause with natural trigger terms like 'QA review', 'check screenshot', 'find bugs', 'visual testing', 'UI issues'.
Specify the domain and scope clearly - what types of artifacts are being QA'd (web apps, mobile screens, designs) and what types of issues are found (visual regressions, broken layouts, accessibility problems).
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses personality-driven language ('screenshot-obsessed', 'fantasy-allergic') rather than concrete actions. It mentions 'finding 3-5 issues' but doesn't specify what kind of issues, what domain, or what concrete actions are performed (e.g., test web pages, validate UI elements, check accessibility). | 1 / 3 |
Completeness | The 'what' is extremely vague (some kind of QA with screenshots) and there is no 'when' clause at all. There's no 'Use when...' guidance or explicit trigger conditions for Claude to know when to select this skill. | 1 / 3 |
Trigger Term Quality | The description lacks natural keywords a user would say when needing QA help. Terms like 'screenshot-obsessed' and 'fantasy-allergic' are not things users would search for. Missing natural terms like 'test', 'bug', 'UI testing', 'visual regression', 'QA review', 'screenshot comparison', etc. | 1 / 3 |
Distinctiveness Conflict Risk | The combination of QA + screenshots + visual proof gives it some distinctiveness from generic coding or testing skills, but the lack of specificity about what is being tested (web apps? mobile? documents?) means it could still overlap with other QA-related skills. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a personality-driven QA agent definition that suffers from significant verbosity and repetition. The core idea — screenshot-based evidence gathering with a skeptical default — is sound, but it's expressed 3-4 times across different sections (beliefs, process, fail triggers, communication style, success metrics). The actionable parts (bash commands, report template, testing protocols) are useful but buried in personality fluff that Claude doesn't need.
Suggestions
Cut the content by 50-60%: merge 'Core Beliefs', 'Communication Style', 'Success Metrics', and 'Learning & Memory' into 2-3 bullet points at the top. Claude doesn't need motivational framing to follow instructions.
Move the full report template and testing protocol templates to a separate referenced file (e.g., `qa-report-template.md`) and keep only a brief summary in SKILL.md.
Add a validation/feedback loop to the workflow: after Step 3 testing, explicitly define what happens — e.g., 'If critical issues found → block approval and list fixes; if only low-priority → conditional pass with noted improvements.'
Define or document the `qa-playwright-capture.sh` script location and expected behavior, since the entire workflow depends on it but it's referenced without explanation.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~180+ lines. Extensive repetition of the same concepts (evidence-based, screenshot-obsessed, fantasy-allergic) across multiple sections. Personality traits, beliefs, communication style, success metrics, and learning sections all restate the same ideas. Claude doesn't need to be told its 'core beliefs' or 'success metrics' in this much detail. | 1 / 3 |
Actionability | The bash commands in Step 1 are concrete and executable, and the testing protocols provide structured templates. However, much of the content is personality description rather than actionable instruction. The accordion/form/mobile testing protocols are template-based rather than providing executable test code. The script `qa-playwright-capture.sh` is referenced but not defined. | 2 / 3 |
Workflow Clarity | There is a 3-step mandatory process (Reality Check → Visual Evidence Analysis → Interactive Testing) which provides reasonable sequencing. However, validation checkpoints are weak — there's no explicit feedback loop for what to do when issues are found beyond listing them in a report. The 'AUTOMATIC FAIL' section lists triggers but doesn't integrate into the workflow as decision points. | 2 / 3 |
Progressive Disclosure | The content references `ai/agents/qa.md` for detailed methodology at the end, which is good progressive disclosure. However, the SKILL.md itself is monolithic — the report template, testing protocols, personality description, beliefs, communication style, learning/memory, and success metrics are all inline when much of this could be split into referenced files. The massive inline report template especially should be a separate reference. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
09aef5d
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.