CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

31

2.89x
Quality

0%

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an extremely minimal description that fails on all dimensions. It provides no information about what the skill does, when it should be used, or what types of user requests should trigger it. It reads more like an invocation instruction than a skill description.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks on APIs/services, measures latency, throughput, and response times, generates comparison reports.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring response times, or comparing performance metrics.'

Specify the domain or type of performance benchmarking (e.g., web APIs, database queries, code execution) to distinguish it from other performance-related skills.

DimensionReasoningScore

Specificity

The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague about what the skill actually does.

1 / 3

Completeness

Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an agent skill and how to invoke it, providing no functional or contextual information.

1 / 3

Trigger Term Quality

The only keyword is 'performance-benchmarker' which is a tool name, not a natural term a user would say. There are no natural language trigger terms like 'benchmark', 'performance testing', 'latency', 'throughput', etc.

1 / 3

Distinctiveness Conflict Risk

The term 'performance-benchmarker' is too vague to carve out a clear niche. Without specifying what kind of performance (web, database, API, CPU) or what benchmarking entails, it could overlap with many performance-related skills.

1 / 3

Total

4

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an extremely verbose collection of non-executable pseudocode masquerading as a performance benchmarking framework. It provides no actionable guidance—every class depends on fictional infrastructure that doesn't exist. The content would be better served by a concise overview of what to benchmark, specific tools/commands to use, and a clear step-by-step workflow with validation checkpoints.

Suggestions

Replace fictional class hierarchies with actual executable code using real benchmarking tools (e.g., autocannon for HTTP throughput, perf_hooks for latency measurement, os module for resource monitoring)

Add a clear step-by-step workflow: 1) Set up benchmark environment, 2) Run throughput test, 3) Validate results, 4) Run latency test, 5) Generate comparison report—with concrete commands at each step

Reduce content by 80%+ by removing the elaborate class structures and focusing on the specific commands, configurations, and patterns Claude needs to actually perform benchmarking

Split detailed protocol-specific benchmarking guidance into separate referenced files and keep SKILL.md as a concise overview with quick-start instructions

DimensionReasoningScore

Conciseness

Extremely verbose at ~600+ lines of non-executable pseudocode. The code references fictional classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) that don't exist, making this essentially elaborate pseudocode dressed as real code. Massive token waste explaining concepts Claude already understands about benchmarking, percentile calculations, and resource monitoring.

1 / 3

Actionability

Despite the volume of code, none of it is executable. Every class depends on undefined imports and fictional infrastructure (PerformanceAlertSystem, LoadGenerator, SystemMonitor, MetricsCollector, etc.). There are no real commands, no installable packages, no concrete steps Claude could actually follow to benchmark anything.

1 / 3

Workflow Clarity

There is no clear workflow or sequence of steps to follow. The content is a collection of class definitions without any guidance on how to actually run a benchmark. No validation checkpoints, no error recovery steps, no ordered process for conducting performance analysis.

1 / 3

Progressive Disclosure

Monolithic wall of code with no references to external files and no meaningful structure beyond class-level organization. Everything is dumped inline with no navigation aids, no quick-start section, and no separation of overview from detailed implementation.

1 / 3

Total

4

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (856 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
ruvnet/claude-flow
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.