Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
31
0%
Does it follow best practices?
Impact
81%
2.89xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an extremely minimal description that fails on all dimensions. It provides no information about what the skill does, when to use it, or what types of user requests should trigger it. It reads more like an invocation instruction than a skill description.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks, measures latency and throughput, generates comparison reports across test runs.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring response times, or comparing performance metrics.'
Remove the invocation instruction ('invoke with $agent-performance-benchmarker') from the description and replace it with functional content that helps Claude decide when to select this skill.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague about what the skill actually does. | 1 / 3 |
Completeness | Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an agent skill and how to invoke it, providing no functional or contextual information. | 1 / 3 |
Trigger Term Quality | The only keyword is 'performance-benchmarker' which is a tool name, not a natural term a user would say. There are no natural language trigger terms like 'benchmark', 'performance testing', 'latency', 'throughput', etc. | 1 / 3 |
Distinctiveness Conflict Risk | The term 'performance-benchmarker' is somewhat specific as a tool name, but the description is so vague that it's unclear what domain it operates in, making it impossible to distinguish from other performance-related skills. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is an extremely verbose collection of non-executable pseudocode masquerading as a comprehensive benchmarking framework. It references dozens of fictional classes and APIs that don't exist, making it entirely non-actionable. The content wastes enormous token budget on code that Claude cannot execute while providing no actual workflow, real tools, or concrete guidance for performing performance benchmarks.
Suggestions
Replace fictional class hierarchies with actual executable code using real benchmarking tools (e.g., autocannon for HTTP throughput, pidusage for process monitoring, or clinic.js for Node.js profiling)
Add a clear step-by-step workflow: 1) Set up benchmark environment, 2) Run specific benchmark command, 3) Validate results, 4) Generate report — with actual commands at each step
Reduce content to under 100 lines focusing on the essential: what tool to use, how to configure it, how to run it, and how to interpret results
Split detailed reference material (metric definitions, analysis algorithms) into separate files and link from a concise overview in SKILL.md
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~600+ lines of non-executable pseudocode. The code references fictional classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) that don't exist, making this essentially elaborate pseudocode dressed as real code. Massive token waste explaining concepts Claude already understands about benchmarking, percentile calculations, and resource monitoring. | 1 / 3 |
Actionability | Despite the volume of code, none of it is executable. Every class depends on undefined imports and fictional infrastructure (PerformanceAlertSystem, LoadGenerator, SystemMonitor, MetricsCollector, etc.). There are no real commands, no installable packages, no actual tools referenced. Claude cannot copy-paste and run any of this. | 1 / 3 |
Workflow Clarity | There is no clear workflow or sequence of steps for actually performing benchmarks. The content is a collection of class definitions without explaining when or how to invoke them. No validation checkpoints, no error recovery guidance, and no clear entry point for executing a benchmark. | 1 / 3 |
Progressive Disclosure | Monolithic wall of code with no references to external files, no clear navigation structure, and no separation of overview from detail. Everything is dumped inline with no hierarchy or signposting. The content would benefit enormously from splitting into separate reference files. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (856 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
398f7c2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.