CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

31

2.89x
Quality

0%

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an extremely weak description that provides virtually no useful information for skill selection. It reads more like an invocation instruction than a description, failing to communicate what the skill does, what actions it performs, or when it should be used. It scores at the lowest level across all dimensions.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks on APIs/services, measures response times, throughput, and latency under load, and generates comparison reports.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring latency, throughput analysis, or stress testing.'

Remove the invocation instruction ('invoke with $agent-performance-benchmarker') from the description and replace it with capability and trigger information that helps Claude select the right skill.

DimensionReasoningScore

Specificity

The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague and abstract, giving no indication of what the skill actually does.

1 / 3

Completeness

The description fails to answer both 'what does this do' and 'when should Claude use it'. It provides neither capability information nor usage triggers — only an invocation command.

1 / 3

Trigger Term Quality

The only potentially relevant term is 'performance-benchmarker', which is a tool name rather than a natural keyword a user would say. There are no natural trigger terms like 'benchmark', 'performance testing', 'load test', 'latency', etc.

1 / 3

Distinctiveness Conflict Risk

The term 'performance-benchmarker' is too vague to carve out a clear niche. Without specifying what kind of performance (web, database, API, code) or what benchmarking entails, it could easily conflict with other performance-related skills.

1 / 3

Total

4

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an extremely verbose, non-executable pseudocode dump that provides no actionable guidance for Claude. It explains concepts Claude already knows (statistical calculations, monitoring patterns) through hundreds of lines of fictional class implementations that reference non-existent dependencies. The content fails on every dimension: it wastes tokens, provides no executable code, lacks workflow structure, and has no progressive disclosure.

Suggestions

Replace the entire pseudocode implementation with a concise overview of benchmarking responsibilities and 2-3 short, actually executable code snippets or concrete command examples that Claude can use.

Define a clear step-by-step workflow: e.g., 1) Configure benchmark parameters, 2) Run benchmarks, 3) Validate results against thresholds, 4) Generate report — with explicit validation checkpoints.

Remove all references to fictional classes and MCP tools that don't exist, or replace them with real, usable tools and libraries.

Extract detailed reference material into separate bundle files and keep SKILL.md as a lean overview with navigation links to those files.

DimensionReasoningScore

Conciseness

Extremely verbose at ~600+ lines of code that Claude doesn't need spelled out. The entire implementation is pseudocode-style class definitions for hypothetical classes (SystemMonitor, PerformanceModel, etc.) that don't actually exist. This explains concepts Claude already understands (how to calculate percentiles, standard deviations, etc.) and wastes enormous token budget on non-executable illustrative code.

1 / 3

Actionability

Despite the massive amount of code, none of it is executable. It references non-existent classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, LoadGenerator, etc.) and fictional MCP tools (neural_patterns, neural_predict, metrics_collect). There are no real commands, no real libraries, and nothing copy-paste ready. This is elaborate pseudocode dressed up as implementation.

1 / 3

Workflow Clarity

There is no clear workflow or sequence of steps for Claude to follow. The content is a collection of class definitions without any guidance on when to use them, in what order, or how to validate results. No validation checkpoints, no error recovery steps, no decision points are articulated as a workflow.

1 / 3

Progressive Disclosure

The content is a monolithic wall of code with no structure for progressive disclosure. Everything is dumped into a single file with no references to supporting documents. The massive inline code blocks should be separated into reference files, with the SKILL.md providing a concise overview and navigation.

1 / 3

Total

4

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (856 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
ruvnet/ruflo
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.