agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

2.89x

Quality

Does it follow best practices?

Impact

81%

2.89x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-performance-benchmarker/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an extremely weak description that provides virtually no useful information for skill selection. It reads more like a stub or placeholder than a functional description, containing only the skill's name and invocation command without any explanation of capabilities, use cases, or trigger conditions.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Runs performance benchmarks, measures latency, throughput, and response times for APIs, databases, or applications.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about performance testing, benchmarking, load testing, measuring response times, or profiling system performance.'

Specify the domain or type of performance benchmarking to distinguish it from other potential performance-related skills (e.g., web performance, database performance, code profiling).

Dimension	Reasoning	Score
Specificity	The description contains no concrete actions whatsoever. It only says 'Agent skill for performance-benchmarker' which is entirely vague and abstract, giving no indication of what the skill actually does.	1 / 3
Completeness	The description fails to answer both 'what does this do' and 'when should Claude use it'. It provides neither meaningful capability information nor any trigger guidance. The only instruction is how to invoke it.	1 / 3
Trigger Term Quality	The only potentially relevant term is 'performance-benchmarker', which is a tool name rather than a natural keyword a user would say. There are no natural trigger terms like 'benchmark', 'performance testing', 'latency', 'throughput', etc.	1 / 3
Distinctiveness Conflict Risk	The description is so vague that 'performance' could overlap with many other skills. Without specifying what kind of performance benchmarking (web, database, API, CPU, etc.), it lacks any clear niche or distinct triggers.	1 / 3
	Total	4 / 12 Passed

Implementation

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an architectural design document masquerading as an actionable skill. It consists of hundreds of lines of non-executable pseudocode referencing fictional classes and infrastructure, with no real libraries, commands, or concrete steps Claude could follow. The content would be better served as a brief overview with actual executable examples using real benchmarking tools.

Suggestions

Replace fictional class hierarchies with actual executable code using real benchmarking libraries (e.g., autocannon for HTTP throughput, pidusage for resource monitoring, or custom scripts with real Node.js APIs).

Add a clear step-by-step workflow: 1) Set up benchmark environment, 2) Configure protocols, 3) Run benchmarks with specific commands, 4) Validate results, 5) Generate reports — with concrete validation checkpoints.

Reduce content to under 100 lines focusing on what Claude needs to know that it doesn't already: specific tool configurations, project-specific conventions, and concrete command sequences.

Extract detailed reference material (metric definitions, analysis algorithms) into separate bundle files and reference them from a concise SKILL.md overview.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~600+ lines of non-executable pseudocode. The code references fictional classes (TimeSeriesDatabase, SystemMonitor, PerformanceModel, etc.) that don't exist, making this essentially elaborate pseudocode dressed as real code. Massive token waste explaining concepts Claude already understands about benchmarking, percentile calculations, and resource monitoring.	1 / 3
Actionability	None of this code is executable. Every class depends on undefined imports and fictional infrastructure (PerformanceAlertSystem, LoadGenerator, SystemMonitor, MetricsCollector, etc.). There are no real commands, no real libraries, no installable dependencies — it's entirely aspirational architecture documentation rather than actionable guidance.	1 / 3
Workflow Clarity	There is no clear workflow for actually performing a benchmark. The skill describes class architectures but never provides a step-by-step process for Claude to follow. No validation checkpoints, no error recovery steps, no concrete sequence of actions to execute. The 'Core Responsibilities' section lists abstract goals rather than actionable steps.	1 / 3
Progressive Disclosure	Monolithic wall of code with no structure for progressive disclosure. Everything is dumped into a single file with no references to supporting documents. The content that could be split into separate reference files (throughput measurement, latency analysis, resource monitoring, adaptive optimization) is all inlined, creating an overwhelming and poorly navigable document.	1 / 3
	Total	4 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (856 lines); consider splitting into references/ and linking	Warning

	Total	10 / 11 Passed

Repository: ruvnet/claude-flow
Commit: d29d87f

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.