agent-benchmark-suite

Agent skill for benchmark-suite - invoke with $agent-benchmark-suite

2.17x

Quality

Does it follow best practices?

Impact

89%

2.17x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-benchmark-suite/SKILL.md

Evaluation results

92%

68%

Performance Benchmarking Framework for Distributed API Service

Benchmark suite configuration and execution structure

Criteria

Without context

With context

Warmup duration default

100%

Cooldown duration default

100%

Default benchmark duration

100%

Default iterations

100%

Core benchmark categories

50%

100%

Extended benchmark categories

100%

Parallel execution mode

100%

Sequential execution mode

100%

Inter-benchmark pause

Warmup phase invoked

62%

100%

Cooldown phase invoked

100%

Baseline comparison support

100%

43%

Automated Performance Regression Detector

Performance regression detection with CUSUM and multi-detector aggregation

Criteria

Without context

With context

CUSUM algorithm

100%

Multiple detector types

100%

Parallel detector execution

100%

Aggregate confidence score

100%

Severity classification

100%

Default sensitivity 0.95

100%

Change point timestamp

100%

Change point magnitude

100%

Change point direction

28%

100%

Change point significance

100%

detection-report.json produced

100%

Regression correctly identified

100%

75%

31%

Performance Testing Strategy for Pre-Launch API Validation

Load and stress test design with SLA thresholds and scalability targets

Criteria

Without context

With context

Load test ramp-up phase

100%

Load test sustained phase

71%

100%

Load test ramp-down phase

100%

Stress test breaking point

100%

Stress test degradation curve

100%

Latency p50 threshold

Latency p90 threshold

Latency p95 threshold

100%

Latency p99 threshold

100%

Scalability load points

Linear coefficient target

100%

Efficiency retention target

100%

test-plan.json produced

100%

Repository: ruvnet/claude-flow
Commit: d29d87f

Evaluated: 3 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Performance Benchmarking Framework for Distributed API Service Automated Performance Regression Detector Performance Testing Strategy for Pre-Launch API Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.