Generate closure-grade HE eval and drift proof for one execution slice. Use when Linear, milestone, or source-prompt closure needs validation evidence.
50
55%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./Plugins/harness-engineering/skills/he-eval-report/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has good structural completeness with explicit 'what' and 'when' clauses, and its niche terminology makes it highly distinctive. However, the heavy use of domain-specific jargon ('HE eval', 'drift proof', 'execution slice') reduces clarity and may hinder trigger matching for users who don't use these exact terms. The specificity of actions is moderate—it names one action ('Generate') but doesn't elaborate on what the output entails.
Suggestions
Expand the jargon terms to be more self-explanatory, e.g., replace 'HE eval' with its full meaning and briefly clarify what 'drift proof' entails as a concrete deliverable.
Add natural language trigger terms that users might actually say, such as 'verify task completion', 'generate validation report', or 'prove milestone is done'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | It names a domain ('HE eval and drift proof') and an action ('Generate'), but the terms 'closure-grade', 'execution slice', and 'drift proof' are jargon-heavy and don't clearly describe concrete, understandable actions. It's not as vague as 'helps with documents' but falls short of listing multiple specific concrete actions. | 2 / 3 |
Completeness | It explicitly answers both 'what' (generate closure-grade HE eval and drift proof for one execution slice) and 'when' (Use when Linear, milestone, or source-prompt closure needs validation evidence), with a clear 'Use when...' clause containing explicit triggers. | 3 / 3 |
Trigger Term Quality | It includes some relevant keywords like 'Linear', 'milestone', 'closure', and 'validation evidence', which could match user queries. However, terms like 'HE eval', 'drift proof', and 'execution slice' are highly specialized jargon that users are unlikely to naturally say, and common variations are missing. | 2 / 3 |
Distinctiveness Conflict Risk | The description is highly specialized with niche terminology ('closure-grade HE eval', 'drift proof', 'execution slice') that is unlikely to conflict with other skills. Its narrow focus makes it clearly distinguishable. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like an internal policy specification than an actionable skill for Claude. While it demonstrates thoroughness in covering edge cases, safety boundaries, and evidence requirements, the extreme verbosity and abstract procedural language significantly reduce its effectiveness. The concrete validation commands and output format are strengths, but they're buried in dense policy prose that could be dramatically compressed or offloaded to reference files.
Suggestions
Compress the Procedure section to 4-5 concrete steps with clear decision gates, moving policy rationale (first-principles checks, XP, gate-selection, etc.) to a referenced contract file.
Remove or drastically shorten sections that restate general safety principles Claude already follows (Safety Boundaries, much of Evidence Requirements) — keep only domain-specific constraints.
Convert the 'When Not to Use' and 'Handoff Rules' sections into a single compact table mapping scenarios to actions.
Add a concrete example of a minimal eval report output (even abbreviated) so Claude can see the expected structure rather than inferring it from abstract descriptions.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose (~200+ lines) with extensive procedural detail, many of which describe internal process concepts Claude doesn't need explained at length. Sections like 'Philosophy', 'When Not to Use', 'Evidence Requirements', and 'Safety Boundaries' contain significant redundancy and could be dramatically compressed. Many bullet points restate the same principle in slightly different ways. | 1 / 3 |
Actionability | The skill provides concrete validation commands (three Python scripts with paths) and a clear output file naming convention, which is good. However, the 10-step procedure is largely abstract guidance ('Apply first-principles, XP, gate-selection... checks only when relevant') rather than executable steps. The procedure reads more like a policy document than actionable instructions. | 2 / 3 |
Workflow Clarity | The 10-step procedure provides a sequence, and step 10 includes a validation-before-closure checkpoint. The Validation section has explicit commands. However, the workflow is dense and mixes high-level policy with operational steps, making it hard to follow. The 'fail fast' instruction in Validation is good but the overall procedure lacks clear decision points and feedback loops between steps. | 2 / 3 |
Progressive Disclosure | The References section provides extensive one-level-deep links to contracts, templates, schemas, and taxonomies with clear 'Read when...' guidance for each. However, without bundle files to verify these references exist, and given the massive amount of inline content that could be offloaded to reference files, the skill body itself is overloaded. The inline content-to-reference ratio is poor — much of the procedure and evidence requirements could live in referenced contracts. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
4c78f98
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.