CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

31

2.89x
Quality

0%

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is essentially a placeholder that provides no useful information about the skill's capabilities, use cases, or triggers. It only names the skill and provides an invocation command, which is insufficient for Claude to make informed skill selection decisions. This is among the weakest possible descriptions.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks on APIs/services, measures response times, throughput, and latency under load, and generates comparison reports.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring latency, throughput analysis, or stress testing.'

Specify the domain or type of performance benchmarking (e.g., web APIs, database queries, code execution) to distinguish it from other performance-related skills.

DimensionReasoningScore

Specificity

The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague about what the skill actually does.

1 / 3

Completeness

Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an agent skill and how to invoke it, providing no functional or contextual information.

1 / 3

Trigger Term Quality

The only keyword is 'performance-benchmarker' which is a tool name, not a natural term a user would say. There are no natural language trigger terms like 'benchmark', 'performance testing', 'load test', 'latency', etc.

1 / 3

Distinctiveness Conflict Risk

The term 'performance-benchmarker' is too vague to carve out a clear niche. Without specifying what kind of performance (web, database, API, code) or what benchmarking entails, it could overlap with many performance-related skills.

1 / 3

Total

4

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an extremely verbose, non-executable code dump that provides no actionable guidance for Claude. It consists entirely of illustrative JavaScript class definitions with undefined dependencies, explaining benchmarking concepts Claude already understands. There are no concrete steps, no real commands, no validation checkpoints, and no progressive disclosure — just ~600 lines of pseudocode masquerading as implementation.

Suggestions

Replace the entire body with a concise workflow: define 2-5 clear steps Claude should follow when asked to benchmark consensus protocols, with actual executable commands or real tool invocations.

Remove all illustrative class definitions and replace with concrete, copy-paste-ready code snippets that use real, available libraries (e.g., actual Node.js benchmarking tools like 'benchmark.js' or system monitoring via 'os' module).

Add a quick-start section of under 20 lines that shows the most common benchmarking task end-to-end, then link to separate files for advanced scenarios like latency analysis or resource monitoring.

Include explicit validation checkpoints (e.g., 'verify metrics collection is working before running full benchmark suite') and error recovery steps for when benchmarks fail.

DimensionReasoningScore

Conciseness

Extremely verbose at ~600+ lines of non-executable pseudocode. Explains concepts Claude already knows (what percentiles are, how to calculate standard deviation, basic monitoring patterns). The vast majority of code is illustrative class definitions that cannot be executed and don't teach Claude anything actionable.

1 / 3

Actionability

Despite the massive amount of code, none of it is executable. All classes reference undefined dependencies (TimeSeriesDatabase, SystemMonitor, PerformanceModel, LoadGenerator, etc.). This is elaborate pseudocode dressed up as implementation — Claude cannot copy-paste and run any of it. No concrete commands, no real tool usage, no actual benchmarking instructions.

1 / 3

Workflow Clarity

There is no clear workflow or sequence of steps for Claude to follow. The content is organized as class definitions rather than actionable procedures. No validation checkpoints, no error recovery steps, no clear 'do this, then this' guidance. The 'Core Responsibilities' section lists abstract goals without concrete steps.

1 / 3

Progressive Disclosure

Monolithic wall of code with no references to external files and no meaningful content hierarchy. Everything is dumped inline — hundreds of lines of class implementations that could be split into separate reference files. No quick-start section, no overview that points to detailed materials.

1 / 3

Total

4

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (856 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
ruvnet/ruflo
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.