Runs load tests, profiles application bottlenecks, analyzes response time metrics, and generates performance optimization recommendations. Use when you need to benchmark a system, run load or stress tests, measure latency or throughput, identify performance bottlenecks, optimize Core Web Vitals (LCP, FID, CLS), plan capacity, enforce performance budgets in CI/CD pipelines, or investigate slow API response times, database query performance, or frontend rendering delays.
90
88%
Does it follow best practices?
Impact
92%
1.08xAverage score across 3 eval scenarios
Risky
Do not use without reviewing
Quality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates specific capabilities and provides an extensive 'Use when...' clause with many natural trigger terms. Its main weakness is the very broad scope, which could cause overlap with more specialized skills in areas like database optimization, frontend performance, or CI/CD pipelines. The description uses proper third-person voice throughout.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'runs load tests', 'profiles application bottlenecks', 'analyzes response time metrics', and 'generates performance optimization recommendations'. These are clear, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (runs load tests, profiles bottlenecks, analyzes metrics, generates recommendations) and 'when' with an explicit 'Use when...' clause listing numerous specific trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'benchmark', 'load tests', 'stress tests', 'latency', 'throughput', 'performance bottlenecks', 'Core Web Vitals', 'LCP', 'FID', 'CLS', 'capacity', 'performance budgets', 'CI/CD', 'slow API response times', 'database query performance', 'frontend rendering delays'. These span multiple common user phrasings. | 3 / 3 |
Distinctiveness Conflict Risk | While the performance testing/profiling niche is fairly distinct, the breadth of the description (covering load testing, database query performance, frontend rendering, CI/CD pipelines, Core Web Vitals) is so wide that it could overlap with more specialized skills for database optimization, frontend performance, or CI/CD pipeline management. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides concrete, executable guidance for performance benchmarking with clear multi-step workflows and validation checkpoints. The k6 script is production-ready and the triage steps are specific and actionable. Minor verbosity in some sections (test type table, Core Web Vitals fixes) prevents a perfect conciseness score, but overall the content earns its token budget.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient and avoids explaining basic concepts Claude already knows, but some sections could be tightened—e.g., the test type table is useful but the descriptions are somewhat generic, and the Core Web Vitals fixes section restates well-known optimization patterns. Overall reasonably lean but not maximally token-efficient. | 2 / 3 |
Actionability | The skill provides a fully executable k6 script with thresholds, checks, and authentication flow; concrete bash commands; specific metric thresholds (p95<500, error rate<1%); and actionable triage steps per layer (EXPLAIN ANALYZE, py-spy, Lighthouse). Guidance is copy-paste ready and specific. | 3 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with explicit validation checkpoints at multiple stages: error rate checks during ramp-up (Step 2), threshold checks after each run (Step 3), statistical significance validation with 3 iterations (Step 4), and CI/CD quality gates (Step 5). Feedback loops for error recovery are clearly stated (e.g., 'stop the test, fix, restart from Step 2'). | 3 / 3 |
Progressive Disclosure | The skill provides a clear overview with well-organized sections, then appropriately delegates detailed content to one-level-deep references (REPORT_TEMPLATE.md, CAPACITY_PLANNING.md). The main file contains enough actionable content to be useful standalone while pointing to supplementary materials for deeper topics. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
010799b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.