Runs load tests, profiles application bottlenecks, analyzes response time metrics, and generates performance optimization recommendations. Use when you need to benchmark a system, run load or stress tests, measure latency or throughput, identify performance bottlenecks, optimize Core Web Vitals (LCP, FID, CLS), plan capacity, enforce performance budgets in CI/CD pipelines, or investigate slow API response times, database query performance, or frontend rendering delays.
93
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Risky
Do not use without reviewing
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates specific capabilities, provides comprehensive trigger terms covering both backend and frontend performance scenarios, and includes an explicit 'Use when...' clause. It uses proper third-person voice throughout and covers a well-defined niche with minimal conflict risk. The description is thorough without being unnecessarily verbose.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'runs load tests', 'profiles application bottlenecks', 'analyzes response time metrics', and 'generates performance optimization recommendations'. These are clear, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (runs load tests, profiles bottlenecks, analyzes metrics, generates recommendations) and 'when' with an explicit 'Use when...' clause covering numerous specific trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'benchmark', 'load test', 'stress test', 'latency', 'throughput', 'performance bottlenecks', 'Core Web Vitals', 'LCP', 'FID', 'CLS', 'capacity', 'performance budgets', 'CI/CD', 'slow API response times', 'database query performance', 'frontend rendering delays'. These are terms users would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly occupies a distinct niche around performance testing and optimization. The specific triggers like 'load tests', 'stress tests', 'Core Web Vitals', 'performance budgets', and 'throughput' are unlikely to conflict with other skills such as general code review or deployment skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides concrete, executable guidance for performance benchmarking with clear multi-step workflows and validation checkpoints throughout. The k6 script is production-ready, the bottleneck triage is specific and layered, and progressive disclosure is well-handled. Minor conciseness improvements could be made by trimming some generic content (test type descriptions, well-known Web Vitals fixes) that Claude likely already knows.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient and avoids explaining basic concepts Claude already knows, but some sections could be tightened—e.g., the test type table is useful but the descriptions are somewhat generic, and the Core Web Vitals fixes section restates well-known optimization patterns. Overall reasonably lean but not maximally token-efficient. | 2 / 3 |
Actionability | The skill provides a fully executable k6 script with thresholds, checks, and authentication flow; concrete bash commands; specific metric thresholds (p95<500, error rate<1%); and actionable triage steps per layer (EXPLAIN ANALYZE, py-spy, Lighthouse). Guidance is copy-paste ready and specific. | 3 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with explicit validation checkpoints at multiple stages: error rate checks during ramp-up (Step 2), threshold checks after each run (Step 3), statistical significance validation with 3 iterations (Step 4), and CI/CD quality gates (Step 5). Feedback loops for error recovery are clearly stated (stop test, fix, restart from Step 2). | 3 / 3 |
Progressive Disclosure | The skill provides a clear overview with well-organized sections, then appropriately delegates detailed content to one-level-deep references (REPORT_TEMPLATE.md, CAPACITY_PLANNING.md). The main file contains enough actionable content to be useful standalone while pointing to supplementary materials for deeper topics. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
010799b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.