Agent skill for benchmark-suite - invoke with $agent-benchmark-suite
Install with Tessl CLI
npx tessl i github:ruvnet/claude-flow --skill agent-benchmark-suite44
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillEvaluation — 89%
↑ 2.17xAgent success when using this skill
Validation for skill structure
Benchmark suite configuration and execution structure
Warmup duration default
0%
100%
Cooldown duration default
0%
100%
Default benchmark duration
0%
100%
Default iterations
100%
100%
Core benchmark categories
50%
100%
Extended benchmark categories
0%
100%
Parallel execution mode
0%
100%
Sequential execution mode
100%
100%
Inter-benchmark pause
0%
0%
Warmup phase invoked
62%
100%
Cooldown phase invoked
0%
100%
Baseline comparison support
0%
100%
Without context: $0.5250 · 1m 58s · 21 turns · 28 in / 7,867 out tokens
With context: $0.9676 · 3m 44s · 23 turns · 279 in / 16,449 out tokens
Performance regression detection with CUSUM and multi-detector aggregation
CUSUM algorithm
0%
100%
Multiple detector types
100%
100%
Parallel detector execution
0%
100%
Aggregate confidence score
100%
100%
Severity classification
100%
100%
Default sensitivity 0.95
0%
100%
Change point timestamp
0%
100%
Change point magnitude
100%
100%
Change point direction
28%
100%
Change point significance
100%
100%
detection-report.json produced
100%
100%
Regression correctly identified
100%
100%
Without context: $0.4800 · 1m 58s · 21 turns · 26 in / 8,588 out tokens
With context: $0.8577 · 3m 5s · 28 turns · 33 in / 12,158 out tokens
Load and stress test design with SLA thresholds and scalability targets
Load test ramp-up phase
100%
100%
Load test sustained phase
71%
100%
Load test ramp-down phase
0%
100%
Stress test breaking point
100%
100%
Stress test degradation curve
100%
100%
Latency p50 threshold
0%
0%
Latency p90 threshold
0%
0%
Latency p95 threshold
0%
100%
Latency p99 threshold
0%
100%
Scalability load points
0%
0%
Linear coefficient target
0%
100%
Efficiency retention target
100%
100%
test-plan.json produced
100%
100%
Without context: $0.5483 · 2m 36s · 21 turns · 26 in / 10,082 out tokens
With context: $1.0959 · 4m 49s · 28 turns · 35 in / 16,688 out tokens
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.