CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

31

2.89x
Quality

0%

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an extremely verbose, non-executable code dump that provides no actionable guidance for performing performance benchmarking. It consists entirely of fictional class implementations referencing undefined dependencies, making it impossible to use in practice. The content violates every dimension of quality: it wastes tokens on concepts Claude already knows, provides no executable code, lacks workflow structure, and dumps everything into a single monolithic file.

Suggestions

Replace the fictional class implementations with actual executable benchmarking code using real libraries (e.g., Benchmark.js, autocannon, or clinic.js) that Claude can run.

Add a clear step-by-step workflow: 1) Set up benchmark environment, 2) Run throughput tests, 3) Validate results, 4) Generate report — with explicit validation checkpoints.

Reduce content to under 100 lines focusing on the specific commands and configurations needed, moving detailed reference material to separate files.

Remove all fictional infrastructure classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) and replace with concrete tool invocations or real library calls.

DimensionReasoningScore

Conciseness

Extremely verbose at ~600+ lines of code that Claude cannot execute. The code references fictional classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) and explains concepts Claude already understands. The entire content could be reduced to a fraction of its size with actual actionable guidance.

1 / 3

Actionability

Despite the massive amount of code, none of it is executable. Every class depends on undefined imports and fictional infrastructure (SystemMonitor, PerformanceModel, LoadGenerator, LatencyHistogram, etc.). This is elaborate pseudocode dressed as real code — Claude cannot copy-paste and run any of it.

1 / 3

Workflow Clarity

There is no clear step-by-step workflow for how to actually perform benchmarking. The content is a collection of class definitions without sequencing, validation checkpoints, or error recovery guidance. The 'Core Responsibilities' section lists goals but never provides an actionable process.

1 / 3

Progressive Disclosure

Monolithic wall of code with no structure beyond class-level headings. No references to external files, no separation of overview from detail, and no navigation aids. Everything is dumped inline with no progressive layering.

1 / 3

Total

4

/

12

Passed

Description

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is essentially a stub that provides no useful information for skill selection. It fails on every dimension: it names no concrete actions, includes no natural trigger terms, answers neither 'what' nor 'when', and is too generic to be distinguishable from other skills. The only content is the invocation command, which is operational metadata rather than a functional description.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Runs load tests, measures response times, profiles CPU/memory usage, and generates benchmark reports for applications and APIs.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, latency measurement, throughput analysis, or stress testing.'

Specify the domain or type of performance benchmarking (e.g., web APIs, database queries, system resources) to reduce conflict risk with other potentially similar skills.

DimensionReasoningScore

Specificity

The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague about what the skill actually does.

1 / 3

Completeness

Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an agent skill and how to invoke it, providing no functional or contextual information.

1 / 3

Trigger Term Quality

The only keyword is 'performance-benchmarker' which is a tool name, not a natural term a user would say. There are no natural language trigger terms like 'benchmark', 'performance testing', 'load test', 'latency', etc.

1 / 3

Distinctiveness Conflict Risk

The term 'performance-benchmarker' is too vague to carve out a clear niche. Without specifying what kind of performance (web, database, CPU, API) or what benchmarking actions it performs, it could overlap with many tools.

1 / 3

Total

4

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (856 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
ruvnet/ruflo
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.