performance-benchmarker

Runs load tests, profiles application bottlenecks, analyzes response time metrics, and generates performance optimization recommendations. Use when you need to benchmark a system, run load or stress tests, measure latency or throughput, identify performance bottlenecks, optimize Core Web Vitals (LCP, FID, CLS), plan capacity, enforce performance budgets in CI/CD pipelines, or investigate slow API response times, database query performance, or frontend rendering delays.

1.08x

Quality

88%

Does it follow best practices?

Impact

92%

1.08x

Average score across 3 eval scenarios

Securityby

Risky

Do not use without reviewing

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured skill that provides concrete, executable guidance for performance benchmarking with clear multi-step workflows and validation checkpoints. The k6 script is production-ready and the triage steps are specific and actionable. Minor verbosity in some sections (test type table, Core Web Vitals fixes) prevents a perfect conciseness score, but overall the content earns its token budget.

Dimension	Reasoning	Score
Conciseness	The content is mostly efficient and avoids explaining basic concepts Claude already knows, but some sections could be tightened—e.g., the test type table is useful but the descriptions are somewhat generic, and the Core Web Vitals fixes section restates well-known optimization patterns. Overall reasonably lean but not maximally token-efficient.	2 / 3
Actionability	The skill provides a fully executable k6 script with thresholds, checks, and authentication flow; concrete bash commands; specific metric thresholds (p95<500, error rate<1%); and actionable triage steps per layer (EXPLAIN ANALYZE, py-spy, Lighthouse). Guidance is copy-paste ready and specific.	3 / 3
Workflow Clarity	The 5-step workflow is clearly sequenced with explicit validation checkpoints at multiple stages: error rate checks during ramp-up (Step 2), threshold checks after each run (Step 3), statistical significance validation with 3 iterations (Step 4), and CI/CD quality gates (Step 5). Feedback loops for error recovery are clearly stated (e.g., 'stop the test, fix, restart from Step 2').	3 / 3
Progressive Disclosure	The skill provides a clear overview with well-organized sections, then appropriately delegates detailed content to one-level-deep references (REPORT_TEMPLATE.md, CAPACITY_PLANNING.md). The main file contains enough actionable content to be useful standalone while pointing to supplementary materials for deeper topics.	3 / 3
	Total	11 / 12 Passed

Description

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly articulates specific capabilities and provides an extensive 'Use when...' clause with many natural trigger terms. Its main weakness is the very broad scope, which could cause overlap with more specialized skills in areas like database optimization, frontend performance, or CI/CD pipelines. The description uses proper third-person voice throughout.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'runs load tests', 'profiles application bottlenecks', 'analyzes response time metrics', and 'generates performance optimization recommendations'. These are clear, actionable capabilities.	3 / 3
Completeness	Clearly answers both 'what' (runs load tests, profiles bottlenecks, analyzes metrics, generates recommendations) and 'when' with an explicit 'Use when...' clause listing numerous specific trigger scenarios.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'benchmark', 'load tests', 'stress tests', 'latency', 'throughput', 'performance bottlenecks', 'Core Web Vitals', 'LCP', 'FID', 'CLS', 'capacity', 'performance budgets', 'CI/CD', 'slow API response times', 'database query performance', 'frontend rendering delays'. These span multiple common user phrasings.	3 / 3
Distinctiveness Conflict Risk	While the performance testing/profiling niche is fairly distinct, the breadth of the description (covering load testing, database query performance, frontend rendering, CI/CD pipelines, Core Web Vitals) is so wide that it could overlap with more specialized skills for database optimization, frontend performance, or CI/CD pipeline management.	2 / 3
	Total	11 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: OpenRoster-ai/awesome-agents
Commit: 010799b

Reviewed: 4 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.