CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

40

2.89x

Quality

13%

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is critically deficient across all dimensions. It functions as a label/invocation instruction rather than a skill description, providing no information about capabilities, use cases, or trigger conditions. Claude would have no basis for selecting this skill appropriately from a skill library.

Suggestions

Add specific concrete actions: describe what the skill benchmarks (e.g., 'Measures code execution time, memory usage, and CPU performance for functions and scripts')

Add explicit 'Use when...' clause with natural trigger terms users would say (e.g., 'Use when the user asks about performance testing, speed optimization, benchmarking code, or measuring execution time')

Specify the domain/scope to distinguish from other potential performance tools (e.g., is this for Python code, API endpoints, database queries, etc.)

DimensionReasoningScore

Specificity

The description contains no concrete actions whatsoever - only states it's an 'agent skill for performance-benchmarker' without explaining what it actually does.

1 / 3

Completeness

Missing both 'what does this do' and 'when should Claude use it'. The description only provides invocation syntax, not purpose or triggers.

1 / 3

Trigger Term Quality

Contains only technical jargon ('agent skill', 'invoke') and a command syntax. No natural keywords a user would say like 'benchmark', 'speed test', 'measure performance', etc.

1 / 3

Distinctiveness Conflict Risk

Extremely generic - 'performance-benchmarker' could apply to any type of performance testing (code, system, network, database). No clear niche or distinguishing characteristics.

1 / 3

Total

4

/

12

Passed

Implementation

27%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an extensive code dump that reads more like a library implementation than actionable guidance. It explains standard benchmarking concepts Claude already understands while failing to provide protocol-specific insights or executable examples. The lack of structure, validation checkpoints, and progressive disclosure makes it difficult to use effectively.

Suggestions

Reduce to a concise overview with 2-3 executable examples showing actual protocol benchmarking commands, moving detailed implementation to separate reference files

Add explicit workflow steps with validation checkpoints (e.g., '1. Run baseline benchmark 2. Verify metrics collected 3. Apply optimization 4. Compare results')

Replace framework pseudocode with concrete, copy-paste ready snippets that work with actual MCP tools

Split content into SKILL.md (quick start + overview) and separate files for throughput measurement, latency analysis, and resource monitoring details

DimensionReasoningScore

Conciseness

Extremely verbose at ~600+ lines with extensive code that explains concepts Claude already knows (how to measure latency, calculate percentiles, monitor resources). Much of this is standard benchmarking boilerplate that doesn't add protocol-specific value.

1 / 3

Actionability

Provides extensive code examples but they are framework-like pseudocode rather than executable snippets. References undefined classes (SystemMonitor, PerformanceModel, TimeSeriesDatabase) and MCP tools without showing how to actually invoke them in practice.

2 / 3

Workflow Clarity

The code implies a workflow (setup -> benchmark -> analyze -> optimize) but lacks explicit step-by-step instructions with validation checkpoints. The adaptive optimization section mentions reverting failed optimizations but doesn't provide clear decision criteria or verification steps.

2 / 3

Progressive Disclosure

Monolithic wall of code with no references to external files or clear section organization. All content is inline regardless of complexity. No separation between quick-start usage and advanced implementation details.

1 / 3

Total

6

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (856 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
ruvnet/claude-flow
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.