Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
31
0%
Does it follow best practices?
Impact
81%
2.89xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an extremely weak description that provides virtually no useful information for skill selection. It reads more like a stub or placeholder than a functional description, containing only the skill's name and invocation command without any explanation of capabilities, use cases, or trigger conditions.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks, measures latency, throughput, and response times for APIs, databases, or applications.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring response times, or profiling system performance.'
Specify the domain or type of performance benchmarking to distinguish it from other potential performance-related skills (e.g., web performance, database performance, code profiling).
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague and abstract, giving no indication of what the skill actually does. | 1 / 3 |
Completeness | The description fails to answer both 'what does this do' and 'when should Claude use it'. It provides neither meaningful capability information nor any trigger guidance. The only instruction is how to invoke it. | 1 / 3 |
Trigger Term Quality | The only potentially relevant term is 'performance-benchmarker', which is a tool name rather than a natural keyword a user would say. There are no natural trigger terms like 'benchmark', 'performance testing', 'latency', 'throughput', etc. | 1 / 3 |
Distinctiveness Conflict Risk | The description is so vague that 'performance' could overlap with many other skills. Without specifying what kind of performance benchmarking (web, database, API, CPU, etc.), it lacks any clear niche or distinct triggers. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is an architectural design document masquerading as an actionable skill. It consists of hundreds of lines of non-executable pseudocode referencing fictional classes and infrastructure, with no real libraries, commands, or concrete steps Claude could follow. The content would be better served as a brief overview with actual executable examples using real benchmarking tools.
Suggestions
Replace fictional class hierarchies with actual executable code using real benchmarking libraries (e.g., autocannon for HTTP throughput, pidusage for resource monitoring, or custom scripts with real Node.js APIs).
Add a clear step-by-step workflow: 1) Set up benchmark environment, 2) Configure protocols, 3) Run benchmarks with specific commands, 4) Validate results, 5) Generate reports — with concrete validation checkpoints.
Reduce content to under 100 lines focusing on what Claude needs to know that it doesn't already: specific tool configurations, project-specific conventions, and concrete command sequences.
Extract detailed reference material (metric definitions, analysis algorithms) into separate bundle files and reference them from a concise SKILL.md overview.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~600+ lines of non-executable pseudocode. The code references fictional classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) that don't exist, making this essentially elaborate pseudocode dressed as real code. Massive token waste explaining concepts Claude already understands about benchmarking, percentile calculations, and resource monitoring. | 1 / 3 |
Actionability | None of this code is executable. Every class depends on undefined imports and fictional infrastructure (PerformanceAlertSystem, LoadGenerator, SystemMonitor, MetricsCollector, etc.). There are no real commands, no real libraries, no installable dependencies — it's entirely aspirational architecture documentation rather than actionable guidance. | 1 / 3 |
Workflow Clarity | There is no clear workflow for actually performing a benchmark. The skill describes class architectures but never provides a step-by-step process for Claude to follow. No validation checkpoints, no error recovery steps, no concrete sequence of actions to execute. The 'Core Responsibilities' section lists abstract goals rather than actionable steps. | 1 / 3 |
Progressive Disclosure | Monolithic wall of code with no structure for progressive disclosure. Everything is dumped into a single file with no references to supporting documents. The content that could be split into separate reference files (throughput measurement, latency analysis, resource monitoring, adaptive optimization) is all inlined, creating an overwhelming and poorly navigable document. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (856 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
d29d87f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.