Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
40
Quality
13%
Does it follow best practices?
Impact
81%
2.89xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is critically deficient across all dimensions. It functions as a label/invocation instruction rather than a skill description, providing no information about capabilities, use cases, or trigger conditions. Claude would have no basis for selecting this skill appropriately from a skill library.
Suggestions
Add specific concrete actions: describe what the skill benchmarks (e.g., 'Measures code execution time, memory usage, and CPU performance for functions and scripts')
Add explicit 'Use when...' clause with natural trigger terms users would say (e.g., 'Use when the user asks about performance testing, speed optimization, benchmarking code, or measuring execution time')
Specify the domain/scope to distinguish from other potential performance tools (e.g., is this for Python code, API endpoints, database queries, etc.)
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever - only states it's an 'agent skill for performance-benchmarker' without explaining what it actually does. | 1 / 3 |
Completeness | Missing both 'what does this do' and 'when should Claude use it'. The description only provides invocation syntax, not purpose or triggers. | 1 / 3 |
Trigger Term Quality | Contains only technical jargon ('agent skill', 'invoke') and a command syntax. No natural keywords a user would say like 'benchmark', 'speed test', 'measure performance', etc. | 1 / 3 |
Distinctiveness Conflict Risk | Extremely generic - 'performance-benchmarker' could apply to any type of performance testing (code, system, network, database). No clear niche or distinguishing characteristics. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
27%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is an extensive code dump that reads more like a library implementation than actionable guidance. It explains standard benchmarking concepts Claude already understands while failing to provide protocol-specific insights or executable examples. The lack of structure, validation checkpoints, and progressive disclosure makes it difficult to use effectively.
Suggestions
Reduce to a concise overview with 2-3 executable examples showing actual protocol benchmarking commands, moving detailed implementation to separate reference files
Add explicit workflow steps with validation checkpoints (e.g., '1. Run baseline benchmark 2. Verify metrics collected 3. Apply optimization 4. Compare results')
Replace framework pseudocode with concrete, copy-paste ready snippets that work with actual MCP tools
Split content into SKILL.md (quick start + overview) and separate files for throughput measurement, latency analysis, and resource monitoring details
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~600+ lines with extensive code that explains concepts Claude already knows (how to measure latency, calculate percentiles, monitor resources). Much of this is standard benchmarking boilerplate that doesn't add protocol-specific value. | 1 / 3 |
Actionability | Provides extensive code examples but they are framework-like pseudocode rather than executable snippets. References undefined classes (SystemMonitor, PerformanceModel, TimeSeriesDatabase) and MCP tools without showing how to actually invoke them in practice. | 2 / 3 |
Workflow Clarity | The code implies a workflow (setup -> benchmark -> analyze -> optimize) but lacks explicit step-by-step instructions with validation checkpoints. The adaptive optimization section mentions reverting failed optimizations but doesn't provide clear decision criteria or verification steps. | 2 / 3 |
Progressive Disclosure | Monolithic wall of code with no references to external files or clear section organization. All content is inline regardless of complexity. No separation between quick-start usage and advanced implementation details. | 1 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (856 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
b2618f9
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.