Audit and build the infrastructure a repo needs so agents can work autonomously — boot scripts, smoke tests, CI/CD gates, dev environment setup, observability, and isolation. Use when a repo can't boot, tests are broken or missing, there's no dev environment, agents can't verify their work, or agents need human help to get anything done. Do not use for reviewing an existing diff or for documentation-only cleanup.
94
94%
Does it follow best practices?
Impact
95%
1.13xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific concrete actions, includes natural trigger terms tied to real problem scenarios, explicitly addresses both 'what' and 'when' with a 'Use when' clause, and even includes a 'Do not use' clause to reduce conflict with adjacent skills. The description is concise yet comprehensive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: boot scripts, smoke tests, CI/CD gates, dev environment setup, observability, and isolation. These are clearly defined infrastructure components rather than vague abstractions. | 3 / 3 |
Completeness | Clearly answers 'what' (audit and build infrastructure — boot scripts, smoke tests, CI/CD gates, etc.) and 'when' (explicit 'Use when' clause with multiple trigger scenarios). Also includes a 'Do not use' clause that further clarifies scope, which is excellent for disambiguation. | 3 / 3 |
Trigger Term Quality | Includes natural terms users and agents would encounter: 'repo can't boot', 'tests are broken or missing', 'no dev environment', 'agents can't verify their work', 'boot scripts', 'smoke tests', 'CI/CD gates'. These cover realistic problem scenarios users would describe. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche around repo infrastructure for agent autonomy. The explicit 'Do not use' clause for diff review and documentation cleanup actively reduces conflict risk with other skills. The focus on agent-readiness infrastructure is highly distinctive. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, concise skill that defines a clear framework for making repos agent-ready. Its greatest strengths are its efficient use of tokens, clear multi-phase workflow with explicit stopping conditions, and good progressive disclosure to reference files. The main weakness is that actionability depends heavily on the referenced files (which weren't provided for evaluation), and the Setup phase could benefit from more inline executable examples rather than deferring almost entirely to setup-patterns.md.
Suggestions
Consider adding one concrete inline example in the Setup section showing a minimal boot script or verify script, so the skill is actionable even without the reference files.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is lean and efficient throughout. It assumes Claude's competence, avoids explaining basic concepts, and uses terse but clear formatting. Every section earns its place — the 7-layer stack is presented as a compact list with one-line examples, the workflow steps are minimal, and the output format is tightly specified with explicit instructions to keep things compact. | 3 / 3 |
Actionability | The skill provides a clear framework and concrete examples (curl commands, pnpm/cargo/docker commands, audit output format), but much of the actual executable guidance is deferred to reference files (grading.md, setup-patterns.md) which are not provided in the bundle. The audit output example is helpful but the setup step lacks inline executable code for building each layer. | 2 / 3 |
Workflow Clarity | The four-phase workflow (Audit → Setup → Improve → Stop) is clearly sequenced with explicit ordering within Setup (Boot → Smoke → Interact → E2e → Enforce → Observe → Isolate). There are clear stopping conditions ('when the repo reaches C+ and can be judged honestly'), validation is embedded in the approach (grading dimensions, 'reliably verifiable'), and the instruction to stop and report rather than expand scope serves as a feedback loop. | 3 / 3 |
Progressive Disclosure | The skill provides a clear overview with well-signaled one-level-deep references to three specific files (grading.md, setup-patterns.md, industry-examples.md). The main SKILL.md contains enough context to understand the framework while appropriately deferring detailed patterns and criteria to reference files. Navigation is straightforward. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents