CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

45

2.89x
Quality

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is a large monolithic dump of non-executable framework code with no external references, organized by headers but far exceeding what a SKILL.md overview should hold. It offers concrete structural guidance but lacks runnability, explicit validation workflows, and any progressive disclosure into separate files.

Suggestions

Move the full class implementations into scripts/ files and reduce SKILL.md to a concise overview with pointers, so the main file respects token budget.

Make code examples self-contained and executable, or explicitly justify the stubbed dependencies (TimeSeriesDatabase, SystemMonitor, mcpTools) instead of leaving them undefined.

Add an explicit numbered workflow with validation checkpoints (e.g. run benchmark -> validate results -> compare -> tune -> re-measure) rather than encoding the procedure only inside class methods.

DimensionReasoningScore

Conciseness

The body is an ~850-line dump of full JavaScript class implementations; a SKILL.md should be a lean overview, not a monolithic code listing, so it is heavily padded and not token-efficient, matching the 'verbose; padded with unnecessary context' anchor rather than the tightened score-2 anchor.

1 / 3

Actionability

The code is concrete and detailed, but it is not executable as written: it depends on many undefined collaborators (TimeSeriesDatabase, AdaptiveOptimizer, MetricsCollector, SystemMonitor, this.mcpTools.*, this.sleep), fitting 'some concrete guidance but incomplete; pseudocode instead of executable code' rather than the copy-paste-ready score-3 anchor.

2 / 3

Workflow Clarity

A sequence exists (the numbered Core Responsibilities and the run-benchmarks/analyze/optimize flow, including an apply-measure-revert feedback loop in the optimizer), but there is no explicit validation-checkpoint workflow for Claude to follow, fitting 'steps listed but validation gaps; checkpoints missing or implicit'.

2 / 3

Progressive Disclosure

Section headers provide some organization, but no bundle files exist and the entire implementation is inline in SKILL.md, so 'content that should be separate is inline' applies; it does not reach score 3 because there are no well-signaled one-level references, and it avoids score 1 only because the headers keep it from being an unorganized monolith.

2 / 3

Total

7

/

12

Passed

Description

7%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is a templated placeholder that identifies the skill and its invocation syntax but says nothing about what it does or when to use it. It fails on specificity, trigger terms, and completeness, with only the niche-specific name providing marginal distinctiveness.

Suggestions

Replace the placeholder with a concrete statement of actions, e.g. 'Measure throughput, latency, and resource usage of distributed consensus protocols and generate optimization recommendations.'

Add an explicit 'Use when...' trigger clause naming natural user terms such as 'benchmark', 'throughput', 'latency', 'Raft', 'Byzantine', or 'consensus protocol performance'.

Use third-person voice and drop the '$agent-performance-benchmarker' invocation syntax, which is not a natural user trigger.

DimensionReasoningScore

Specificity

The description ('Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker') names no concrete actions at all; it only identifies the skill and its invocation syntax, matching the 'vague or no actions' anchor rather than the score-2 anchor that names a domain and some actions.

1 / 3

Completeness

Neither 'what does this do' (no benchmarking/throughput mention) nor 'when should Claude use it' (no 'Use when...' trigger) is present; both are missing/very weak, and the missing trigger clause caps completeness at 2 at most, so it lands at 1.

1 / 3

Trigger Term Quality

'performance-benchmarker' and '$agent-performance-benchmarker' are the skill's own identifier and invocation syntax, not natural terms a user would say (e.g. 'benchmark', 'throughput', 'latency'), fitting the 'no natural keywords; technical jargon or overly generic' anchor.

1 / 3

Distinctiveness Conflict Risk

The embedded name 'performance-benchmarker' carves a somewhat specific niche, but the description provides no trigger context to distinguish it from other skills, matching 'somewhat specific but could still overlap' rather than the fully-distinct score-3 anchor.

2 / 3

Total

5

/

12

Passed

Validation

93%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation15 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (856 lines); consider splitting into references/ and linking

Warning

Total

15

/

16

Passed

Repository
ruvnet/claude-flow
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.