Automatically detect performance regressions in CI/CD pipelines by comparing metrics against baselines. Use when validating builds or analyzing performance trends. Trigger with phrases like "detect performance regression", "compare performance metrics", or "analyze performance degradation".
54
44%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/performance/performance-regression-detector/skills/detecting-performance-regressions/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid skill description that clearly communicates its purpose and provides explicit trigger guidance. Its main weakness is that the capability description could be more specific about concrete actions (e.g., what types of metrics, what output is produced, what baselines are supported). Overall it performs well across all dimensions.
Suggestions
Add more specific concrete actions such as 'generate regression reports', 'flag metric thresholds', or 'compare latency/throughput/memory metrics' to improve specificity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (CI/CD pipelines, performance regressions) and mentions actions like 'detect', 'compare metrics against baselines', and 'analyze performance trends', but it doesn't list multiple concrete specific actions (e.g., what metrics, what baselines, what output formats). It stays somewhat high-level. | 2 / 3 |
Completeness | Clearly answers both 'what' (detect performance regressions by comparing metrics against baselines) and 'when' (validating builds, analyzing performance trends) with explicit trigger phrases provided. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms: 'detect performance regression', 'compare performance metrics', 'analyze performance degradation', 'CI/CD pipelines', 'validating builds', 'performance trends'. These are terms users would naturally use when needing this skill. | 3 / 3 |
Distinctiveness Conflict Risk | The description carves out a clear niche around performance regression detection in CI/CD pipelines. The trigger terms are specific enough ('performance regression', 'performance degradation', 'compare performance metrics') that it's unlikely to conflict with general monitoring or generic CI/CD skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is almost entirely abstract description with no actionable content. It reads like a product marketing page rather than an instruction set — it describes what the skill does conceptually but never shows Claude how to actually do anything. There are no code examples, no specific statistical methods, no file format specifications, no concrete commands, and no real workflow with validation steps.
Suggestions
Add concrete, executable code examples for each step: loading baseline data from specific file formats (JSON/CSV), performing statistical comparisons (e.g., using scipy.stats for t-tests or z-scores), and generating reports.
Define the actual baseline data format with a schema or example file, and specify how metrics should be structured (e.g., JSON schema with fields for response_time, throughput, timestamp).
Replace the abstract 'Instructions' section with a concrete workflow including validation checkpoints, e.g., 'Verify baseline file exists and contains >= 30 data points before proceeding to statistical analysis'.
Remove the 'Overview', 'How It Works', 'When to Use This Skill', 'Best Practices', 'Integration', and 'Resources' sections — they add no actionable information and waste token budget. Replace with concrete threshold configuration examples and statistical method specifications.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with extensive explanations of concepts Claude already knows. The 'Overview', 'How It Works', 'When to Use This Skill', and 'Best Practices' sections are largely redundant filler. The examples describe what the skill 'will do' rather than showing how to do it. Multiple sections repeat the same information in slightly different ways. | 1 / 3 |
Actionability | No concrete code, commands, scripts, or executable guidance anywhere. Everything is described abstractly — 'apply statistical analysis', 'collect performance metrics', 'load historical baseline data' — without specifying how. No actual statistical methods, no code snippets, no file formats for baselines, no threshold definitions, no tool commands. | 1 / 3 |
Workflow Clarity | The 'Instructions' section lists 6 steps but they are entirely abstract with no specifics on how to perform any step. No validation checkpoints, no feedback loops, no error recovery within the workflow. The 'Error Handling' section is a generic checklist of things to verify without actionable remediation steps. | 1 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to supporting files despite mentioning a baselines directory. No bundle files exist to support the content. The content that could be split (statistical methods, threshold configuration, baseline format) is neither inline with useful detail nor referenced externally — it's simply absent. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
3a2d27d
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.