CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-benchmark-suite

Agent skill for benchmark-suite - invoke with $agent-benchmark-suite

33

2.17x
Quality

0%

Does it follow best practices?

Impact

89%

2.17x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-benchmark-suite/SKILL.md
SKILL.md
Quality
Evals
Security

Evaluation results

92%

68%

Performance Benchmarking Framework for Distributed API Service

Benchmark suite configuration and execution structure

Criteria
Without context
With context

Warmup duration default

0%

100%

Cooldown duration default

0%

100%

Default benchmark duration

0%

100%

Default iterations

100%

100%

Core benchmark categories

50%

100%

Extended benchmark categories

0%

100%

Parallel execution mode

0%

100%

Sequential execution mode

100%

100%

Inter-benchmark pause

0%

0%

Warmup phase invoked

62%

100%

Cooldown phase invoked

0%

100%

Baseline comparison support

0%

100%

100%

43%

Automated Performance Regression Detector

Performance regression detection with CUSUM and multi-detector aggregation

Criteria
Without context
With context

CUSUM algorithm

0%

100%

Multiple detector types

100%

100%

Parallel detector execution

0%

100%

Aggregate confidence score

100%

100%

Severity classification

100%

100%

Default sensitivity 0.95

0%

100%

Change point timestamp

0%

100%

Change point magnitude

100%

100%

Change point direction

28%

100%

Change point significance

100%

100%

detection-report.json produced

100%

100%

Regression correctly identified

100%

100%

75%

31%

Performance Testing Strategy for Pre-Launch API Validation

Load and stress test design with SLA thresholds and scalability targets

Criteria
Without context
With context

Load test ramp-up phase

100%

100%

Load test sustained phase

71%

100%

Load test ramp-down phase

0%

100%

Stress test breaking point

100%

100%

Stress test degradation curve

100%

100%

Latency p50 threshold

0%

0%

Latency p90 threshold

0%

0%

Latency p95 threshold

0%

100%

Latency p99 threshold

0%

100%

Scalability load points

0%

0%

Linear coefficient target

0%

100%

Efficiency retention target

100%

100%

test-plan.json produced

100%

100%

Repository
ruvnet/claude-flow
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.