Agent skill for benchmark-suite - invoke with $agent-benchmark-suite
33
0%
Does it follow best practices?
Impact
89%
2.17xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-benchmark-suite/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an extremely minimal description that fails on all dimensions. It provides no information about what the skill does, when it should be used, or what distinguishes it from other skills. It reads more like an internal label than a functional description.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks, generates comparison reports, and tracks regression metrics across test suites.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to run benchmarks, measure performance, compare test results, or evaluate system throughput.'
Replace the invocation instruction ('invoke with $agent-benchmark-suite') with functional context — invocation syntax belongs in the skill body, not the description used for skill selection.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. 'Agent skill for benchmark-suite' is entirely vague and does not describe what the skill actually does. | 1 / 3 |
Completeness | Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an 'agent skill' and how to invoke it, providing no functional or contextual information. | 1 / 3 |
Trigger Term Quality | The only keyword is 'benchmark-suite', which is a technical/internal name rather than a natural term a user would say. There are no natural language trigger terms like 'run benchmarks', 'performance testing', etc. | 1 / 3 |
Distinctiveness Conflict Risk | The description is so generic that it provides no distinguishing characteristics. 'Agent skill for benchmark-suite' could overlap with any benchmarking, testing, or performance-related skill without clear differentiation. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is an architectural design document masquerading as an actionable skill. It consists almost entirely of non-executable JavaScript pseudocode defining class structures with undefined dependencies, providing no concrete guidance Claude could actually follow. The CLI commands at the end hint at actionability but lack installation, configuration, or usage context.
Suggestions
Replace pseudocode class hierarchies with actual executable examples showing how to run a specific benchmark end-to-end using real tools or the actual claude-flow CLI
Add a clear step-by-step workflow: install dependencies → configure benchmarks → run suite → interpret results → detect regressions, with validation at each step
Cut content by 80%+ — remove all illustrative class definitions and focus on the concrete CLI commands and their expected inputs/outputs
Move detailed benchmark definitions and configuration schemas to separate reference files, keeping SKILL.md as a concise overview with quick-start instructions
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with ~500 lines of non-executable pseudocode. Explains class hierarchies, constructors, and architectural patterns that Claude already understands. The code is illustrative rather than functional—none of it can be copy-pasted and run. Massive token waste. | 1 / 3 |
Actionability | The code examples are elaborate pseudocode with undefined classes (ThroughputBenchmark, LatencyBenchmark, etc.) that don't exist. The CLI commands reference 'npx claude-flow' tools without explaining how to install or configure them. Nothing is executable or copy-paste ready. | 1 / 3 |
Workflow Clarity | There is no clear step-by-step workflow for actually running benchmarks. The content describes abstract class structures and method signatures but never walks through a concrete process with validation checkpoints. No feedback loops or error recovery steps are defined. | 1 / 3 |
Progressive Disclosure | Monolithic wall of code blocks with no references to external files. All content is dumped inline with no clear hierarchy or navigation. The sections are organized by class but not by user task or progressive complexity. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (670 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
ccb062f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.