Automatically detect performance regressions in CI/CD pipelines by comparing metrics against baselines. Use when validating builds or analyzing performance trends. Trigger with phrases like "detect performance regression", "compare performance metrics", or "analyze performance degradation".
43
44%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/performance/performance-regression-detector/skills/detecting-performance-regressions/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid skill description that clearly communicates its purpose and provides explicit trigger guidance. Its main weakness is that the capability description could be more specific about concrete actions (e.g., types of metrics analyzed, output formats, threshold configuration). The trigger terms and completeness are strong points.
Suggestions
Add more specific concrete actions such as 'generate regression reports', 'set threshold alerts', 'compare latency/throughput/memory metrics' to improve specificity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (CI/CD pipelines, performance regressions) and mentions actions like 'detect', 'compare metrics against baselines', and 'analyze performance trends', but it doesn't list multiple concrete specific actions (e.g., what metrics, what baselines, what output formats). It stays somewhat high-level. | 2 / 3 |
Completeness | Clearly answers both 'what' (detect performance regressions by comparing metrics against baselines) and 'when' (validating builds, analyzing performance trends) with explicit trigger phrases provided. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms: 'detect performance regression', 'compare performance metrics', 'analyze performance degradation', 'CI/CD pipelines', 'validating builds', 'performance trends'. These are terms users would naturally use when needing this skill. | 3 / 3 |
Distinctiveness Conflict Risk | The description carves out a clear niche around performance regression detection in CI/CD pipelines. The specific trigger phrases and domain focus make it unlikely to conflict with general monitoring, testing, or other CI/CD skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is almost entirely abstract description with no actionable content. It explains what performance regression detection is and when you'd want it, but never provides concrete code, statistical methods, threshold values, data formats, or executable commands. It reads like a product marketing page rather than an instruction set for Claude.
Suggestions
Replace the abstract 'Instructions' section with concrete, executable code showing how to load baseline data, compute statistical comparisons (e.g., using scipy.stats for t-tests or z-scores), and determine regressions against specific thresholds.
Add a concrete example with sample input data (e.g., JSON metrics format) and expected output (e.g., regression report format) so Claude knows exactly what to produce.
Remove the 'Overview', 'How It Works', 'When to Use', 'Best Practices', 'Integration', and 'Resources' sections entirely — they explain concepts Claude already knows and consume tokens without adding actionable value.
Provide actual baseline file format specifications and include bundle files (e.g., a sample baseline JSON, a validation script) referenced from the SKILL.md to support the workflow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with extensive explanations of concepts Claude already knows. The 'Overview', 'How It Works', 'When to Use This Skill', 'Best Practices', 'Integration', and 'Resources' sections are all padded filler that add no actionable value. The content could be reduced by 70%+ without losing useful information. | 1 / 3 |
Actionability | No concrete code, commands, scripts, or executable examples anywhere. The 'Examples' section describes what the skill 'will do' in abstract terms rather than showing how. The 'Instructions' section is a vague 6-step list with no specifics on how to collect metrics, what statistical methods to use, what thresholds to apply, or what format baselines should be in. | 1 / 3 |
Workflow Clarity | The workflow in the 'Instructions' section is a high-level abstract list with no concrete commands, no validation checkpoints, no feedback loops, and no error recovery steps. The 'Error Handling' section is just a checklist of things to verify with no actionable guidance on how to fix issues. | 1 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files despite mentioning a baselines directory. No bundle files exist to support the content. The content is poorly organized with redundant sections (Overview, How It Works, When to Use, and Examples all repeat similar information at different abstraction levels). | 1 / 3 |
Total | 4 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
6e9558f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.