CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-benchmark-suite

Agent skill for benchmark-suite - invoke with $agent-benchmark-suite

Install with Tessl CLI

npx tessl i github:ruvnet/claude-flow --skill agent-benchmark-suite
What are skills?

44

2.17x

Does it follow best practices?

Evaluation89%

2.17x

Agent success when using this skill

Validation for skill structure

SKILL.md
Review
Evals

Evaluation results

92%

68%

Performance Benchmarking Framework for Distributed API Service

Benchmark suite configuration and execution structure

Criteria
Without context
With context

Warmup duration default

0%

100%

Cooldown duration default

0%

100%

Default benchmark duration

0%

100%

Default iterations

100%

100%

Core benchmark categories

50%

100%

Extended benchmark categories

0%

100%

Parallel execution mode

0%

100%

Sequential execution mode

100%

100%

Inter-benchmark pause

0%

0%

Warmup phase invoked

62%

100%

Cooldown phase invoked

0%

100%

Baseline comparison support

0%

100%

Without context: $0.5250 · 1m 58s · 21 turns · 28 in / 7,867 out tokens

With context: $0.9676 · 3m 44s · 23 turns · 279 in / 16,449 out tokens

100%

43%

Automated Performance Regression Detector

Performance regression detection with CUSUM and multi-detector aggregation

Criteria
Without context
With context

CUSUM algorithm

0%

100%

Multiple detector types

100%

100%

Parallel detector execution

0%

100%

Aggregate confidence score

100%

100%

Severity classification

100%

100%

Default sensitivity 0.95

0%

100%

Change point timestamp

0%

100%

Change point magnitude

100%

100%

Change point direction

28%

100%

Change point significance

100%

100%

detection-report.json produced

100%

100%

Regression correctly identified

100%

100%

Without context: $0.4800 · 1m 58s · 21 turns · 26 in / 8,588 out tokens

With context: $0.8577 · 3m 5s · 28 turns · 33 in / 12,158 out tokens

75%

31%

Performance Testing Strategy for Pre-Launch API Validation

Load and stress test design with SLA thresholds and scalability targets

Criteria
Without context
With context

Load test ramp-up phase

100%

100%

Load test sustained phase

71%

100%

Load test ramp-down phase

0%

100%

Stress test breaking point

100%

100%

Stress test degradation curve

100%

100%

Latency p50 threshold

0%

0%

Latency p90 threshold

0%

0%

Latency p95 threshold

0%

100%

Latency p99 threshold

0%

100%

Scalability load points

0%

0%

Linear coefficient target

0%

100%

Efficiency retention target

100%

100%

test-plan.json produced

100%

100%

Without context: $0.5483 · 2m 36s · 21 turns · 26 in / 10,082 out tokens

With context: $1.0959 · 4m 49s · 28 turns · 35 in / 16,688 out tokens

Evaluated
Agent
Claude Code
Model
Unknown

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.