Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
31
0%
Does it follow best practices?
Impact
81%
2.89xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an extremely minimal description that fails on all dimensions. It provides no information about what the skill does, when to use it, or what types of user requests should trigger it. It reads more like an invocation instruction than a skill description.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks, measures latency and throughput, generates comparison reports across test runs.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring response times, or comparing performance metrics.'
Remove the invocation instruction ('invoke with $agent-performance-benchmarker') from the description and replace it with functional content that helps Claude decide when to select this skill.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague about what the skill actually does. | 1 / 3 |
Completeness | Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an agent skill and how to invoke it, providing no functional or contextual information. | 1 / 3 |
Trigger Term Quality | The only keyword is 'performance-benchmarker' which is a tool name, not a natural term a user would say. There are no natural language trigger terms like 'benchmark', 'performance testing', 'latency', 'throughput', etc. | 1 / 3 |
Distinctiveness Conflict Risk | The term 'performance-benchmarker' is somewhat specific as a tool name, but the description is so vague that it's unclear what domain it operates in, making it impossible to distinguish from other performance-related skills. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a massive dump of non-executable pseudocode that describes a theoretical performance benchmarking framework. It provides no actionable guidance for Claude — every class depends on undefined dependencies, and there are no real commands, tools, or workflows Claude can follow. The content wastes significant token budget explaining algorithmic concepts (percentile calculation, standard deviation, load generation) that Claude already knows.
Suggestions
Replace the fictional class implementations with actual actionable instructions: what tools should Claude use, what commands to run, what output format to produce when asked to benchmark something.
Define a clear step-by-step workflow (e.g., 1. Identify protocol to benchmark, 2. Run specific commands, 3. Collect metrics, 4. Validate results, 5. Generate report) with explicit validation checkpoints.
Remove all undefined dependencies (TimeSeriesDatabase, SystemMonitor, PerformanceModel, neural_patterns MCP tool, etc.) and replace with real, available tools or clearly document what must be installed.
Reduce content to under 100 lines focusing on what Claude should actually do when invoked, with concrete examples of expected inputs and outputs.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~600+ lines of code that Claude cannot execute. The code references fictional classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) and explains concepts Claude already understands like calculating averages, percentiles, and standard deviations. The entire content is illustrative pseudocode masquerading as implementation. | 1 / 3 |
Actionability | None of this code is executable — it depends on numerous undefined classes (BenchmarkSuite, LoadGenerator, SystemMonitor, PerformanceModel, MetricsCollector, etc.) and fictional MCP tools (neural_patterns, neural_predict, metrics_collect). There are no concrete commands, real tool invocations, or copy-paste-ready instructions that Claude could actually use. | 1 / 3 |
Workflow Clarity | Despite being a multi-step benchmarking process, there is no clear workflow sequence for Claude to follow. The content describes class architectures rather than actionable steps. There are no validation checkpoints, no explicit ordering of what Claude should do when invoked, and no error recovery guidance. | 1 / 3 |
Progressive Disclosure | The content is a monolithic wall of code with no structure for progressive disclosure. There are no references to supporting files, no separation of overview from detail, and no navigation aids. Everything is dumped inline with no organization beyond class-level headings. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (856 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
9d4a9ea
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.