"GAN-inspired Generator-Evaluator agent harness for building high-quality applications autonomously. Based on Anthropic's March 2026 harness design paper."
36
36%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description reads more like a marketing tagline or paper abstract than a functional skill description. It fails to specify concrete actions, lacks natural trigger terms users would employ, and provides no guidance on when Claude should select this skill. The reference to 'Anthropic's March 2026 harness design paper' adds no actionable information for skill selection.
Suggestions
Replace abstract language with specific concrete actions the skill performs, e.g., 'Generates application code iteratively using a generator-evaluator loop, automatically testing and refining outputs until quality thresholds are met.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to build a complete application from scratch, wants iterative code refinement, or mentions autonomous code generation.'
Remove the paper citation and instead describe the practical workflow or methodology in user-facing terms that help Claude distinguish this from other code-generation skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague, abstract language like 'building high-quality applications autonomously' without listing any concrete actions. It names a concept (GAN-inspired Generator-Evaluator) but doesn't specify what the skill actually does in practical terms. | 1 / 3 |
Completeness | The 'what' is extremely vague ('building high-quality applications autonomously') and there is no 'when' clause or any explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | The keywords are highly technical jargon ('GAN-inspired', 'Generator-Evaluator agent harness') that users would almost never naturally say. There are no natural trigger terms a user would use when needing this skill. | 1 / 3 |
Distinctiveness Conflict Risk | The GAN-inspired Generator-Evaluator framing is somewhat distinctive and unlikely to overlap with most other skills, but 'building high-quality applications' is generic enough to potentially conflict with any application-building skill. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a comprehensive conceptual overview of the GAN-style harness pattern but suffers from significant verbosity and incomplete actionability. It explains too many concepts Claude already understands (GANs, product management, what Playwright does) while leaving critical implementation details undefined (the actual prompt files, the harness script). The content would benefit greatly from being split into a concise overview with references to detailed companion files.
Suggestions
Cut the content by at least 50%: remove the GAN explanation, the 'When to Use/Not Use' sections, the model evolution history, and the results table — or move them to a separate REFERENCE.md file. Focus the main skill on the executable workflow.
Provide or clearly define the referenced files (PLANNER_PROMPT.md, EVALUATOR_PROMPT.md, scripts/gan-harness.sh) — either inline their essential content or create them as companion files with clear navigation links.
Add explicit validation checkpoints to the workflow: verify dev server is running (curl localhost:3000), verify spec.md was created successfully, verify feedback file exists before next generation cycle.
Move the evaluation rubric, configuration table, and anti-patterns into separate reference files, keeping only a 2-3 line summary of each in the main SKILL.md with clear links.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines. It explains concepts Claude already knows (what GANs are, what a Product Manager does, what Playwright does), includes lengthy architecture diagrams, extensive rubric definitions, model evolution history, and results tables that are informational rather than actionable. Much of this could be condensed to 1/3 the length or split into reference files. | 1 / 3 |
Actionability | The manual Claude Code usage section provides concrete, executable commands, and the shell script examples are specific. However, the /project:gan-build commands appear to reference non-existent slash commands without explaining how to set them up, and the shell script references (./scripts/gan-harness.sh) assume infrastructure that isn't provided. The PLANNER_PROMPT.md and EVALUATOR_PROMPT.md files are referenced but never defined. | 2 / 3 |
Workflow Clarity | The manual usage section shows a clear sequence (Plan → Generate → Evaluate → Iterate), and the anti-patterns section addresses failure modes. However, there are no explicit validation checkpoints between steps (e.g., how to verify the dev server is running before evaluation, how to confirm the spec is well-formed before generation). The feedback loop termination condition (pass threshold) is mentioned but the actual decision logic for stopping vs continuing is implicit. | 2 / 3 |
Progressive Disclosure | The skill references external files (PLANNER_PROMPT.md, EVALUATOR_PROMPT.md, scripts/gan-harness.sh) but doesn't clearly signal where they are or provide them. The content that could be in separate reference files (the full evaluation rubric, the evolution stages, the results table, the configuration reference) is all inline, making the main file a monolithic wall. The external references section at the bottom links to papers rather than companion skill files. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents