Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
45
45%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is almost entirely composed of abstract buzzwords and jargon without specifying concrete actions, use cases, or trigger conditions. It fails to communicate what the skill actually does in practical terms and provides no guidance for when Claude should select it. A user or Claude would have no clear basis for choosing this skill over any other engineering-related skill.
Suggestions
Replace abstract terms with concrete actions the skill performs, e.g., 'Breaks down complex coding tasks into subtasks, writes test cases before implementation, and routes work to appropriate models based on complexity and cost.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to build a feature end-to-end, needs test-driven development, or wants to optimize API costs across multiple model calls.'
Remove or define jargon like 'eval-first execution' and 'cost-aware model routing' — describe the actual behaviors these represent in plain language so Claude can match user requests to this skill.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses abstract, buzzword-heavy language like 'agentic engineer', 'eval-first execution', 'decomposition', and 'cost-aware model routing' without listing any concrete actions the skill performs. No specific tasks or outputs are described. | 1 / 3 |
Completeness | The description vaguely addresses 'what' (operate as an agentic engineer) but in extremely abstract terms, and completely lacks any 'when' clause or explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | The terms used are technical jargon ('eval-first execution', 'cost-aware model routing', 'decomposition') that users would almost never naturally say when requesting help. There are no natural user-facing keywords. | 1 / 3 |
Distinctiveness Conflict Risk | The description is so vague and broad ('operate as an agentic engineer') that it could conflict with virtually any coding or engineering skill. There are no distinct triggers that would help differentiate it from other skills. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, concise process skill that efficiently communicates agentic engineering principles without over-explaining. Its main weakness is the lack of concrete, executable examples—no sample eval definitions, no cost-tracking templates, no example decomposition—which limits actionability. The workflow steps would also benefit from explicit validation checkpoints and error-recovery paths.
Suggestions
Add a concrete example of an eval-first loop: show a sample capability eval definition, a baseline result, and a post-implementation comparison to make the workflow actionable.
Include a sample cost-tracking template (e.g., a markdown table or JSON schema) so Claude can immediately apply cost discipline without inventing a format.
Add explicit feedback loops to the Eval-First Loop: what happens when evals regress? When should Claude retry vs. escalate model tier vs. abort?
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every section is lean and assumes Claude's competence. No unnecessary explanations of what agents are, what models are, or how evals work. Each bullet earns its place as a concrete directive. | 3 / 3 |
Actionability | Guidance is specific and directive (e.g., 15-minute unit rule, model routing tiers, review priorities), but lacks concrete executable examples—no code snippets, no example eval definitions, no sample cost-tracking output. It describes what to do at a process level but doesn't provide copy-paste-ready artifacts. | 2 / 3 |
Workflow Clarity | The Eval-First Loop provides a clear 4-step sequence, but lacks explicit validation checkpoints or feedback loops (e.g., what to do if evals regress, when to abort vs. retry). The decomposition and session strategy sections are lists of heuristics rather than sequenced workflows with error recovery. | 2 / 3 |
Progressive Disclosure | Content is well-organized into clearly labeled sections with good headers, but everything is inline in a single file. For a skill of this breadth (evals, decomposition, routing, cost tracking, review), some sections could benefit from references to deeper guides (e.g., an EVALS.md or COST_TRACKING.md). However, the content is under ~60 lines and reasonably scoped. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents