This skill should be used when the user asks to "compare code", "compare worktrees", "compare solutions", "which solution is better", "compare branches", "best of", "diff worktrees", "evaluate solutions", "pick the better implementation", "compare implementations", "review both solutions", or wants a structured, criteria-driven comparison of code across two git worktrees. Also triggered by the /best-of command.
68
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description excels at trigger term coverage and completeness, providing an extensive list of natural phrases that would activate the skill and clearly stating both what it does and when to use it. Its main weakness is that it focuses heavily on trigger terms at the expense of describing the specific actions and outputs of the skill (e.g., what does the comparison produce?). The description could also be more concise by summarizing the trigger terms rather than exhaustively listing them.
Suggestions
Add specific concrete actions describing what the skill produces, e.g., 'Performs structured, criteria-driven comparison of code across two git worktrees, generating scored evaluations across dimensions like correctness, performance, and readability.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description mentions comparing code across git worktrees and criteria-driven comparison, but it doesn't list specific concrete actions beyond 'compare' and 'evaluate'. It lacks detail on what the comparison produces (e.g., generates a report, scores implementations, produces a summary table). | 2 / 3 |
Completeness | The description explicitly answers both 'what' (structured, criteria-driven comparison of code across two git worktrees) and 'when' (with a comprehensive list of trigger phrases and the /best-of command). The 'Use when' guidance is clearly present via 'This skill should be used when...'. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms: 'compare code', 'compare worktrees', 'which solution is better', 'compare branches', 'best of', 'diff worktrees', 'evaluate solutions', 'pick the better implementation', 'compare implementations', 'review both solutions', and the /best-of command. These are terms users would naturally say. | 3 / 3 |
Distinctiveness Conflict Risk | The skill is clearly scoped to comparing code across two git worktrees, which is a distinct niche. The combination of 'worktrees', 'compare implementations', and 'criteria-driven comparison' makes it unlikely to conflict with general code review or diff skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted, highly actionable skill with excellent workflow clarity and a sophisticated multi-agent comparison framework. Its main weaknesses are moderate verbosity (the three full agent prompts inline add significant length) and a referenced bundle file that doesn't actually exist. The scoring methodology with weighted criteria and structured output template is a strong design choice.
Suggestions
Extract the three agent prompt templates into a bundled reference file (e.g., `references/agent-prompts.md`) to reduce inline verbosity and improve progressive disclosure.
Provide the referenced `references/evaluation-criteria.md` bundle file, or remove the reference to avoid a phantom reference that could confuse Claude at runtime.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is thorough but verbose in places — the agent prompts are fully spelled out inline (could be referenced), and some instructions like 'Extract the two worktree paths from the arguments' are obvious to Claude. However, most content is substantive and earns its place given the complexity of the task. | 2 / 3 |
Actionability | Highly actionable with concrete bash commands, specific file patterns to glob, exact agent prompts, a detailed scoring table with explicit weights, and a complete output template. Nearly every step is copy-paste executable. | 3 / 3 |
Workflow Clarity | The 6-step workflow is clearly sequenced with explicit validation (Step 1 validates worktrees and stops on error, Step 2 builds a checklist before analysis, Step 3 has three strategies with clear selection criteria, Step 5 has explicit scoring before verdict). The feedback loop of asking the user for missing paths and the truncation warning in Step 3 show good error recovery design. | 3 / 3 |
Progressive Disclosure | References `references/evaluation-criteria.md` which is appropriate, but no bundle file is actually provided to support it — this is a phantom reference in practice. The agent prompts (which are lengthy) could be extracted to a reference file. The inline content is well-structured with headers but the skill is quite long and monolithic. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
8fe6eb4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.