detecting-performance-regressions

tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill detecting-performance-regressions

Automatically detect performance regressions in CI/CD pipelines by comparing metrics against baselines. Use when validating builds or analyzing performance trends. Trigger with phrases like "detect performance regression", "compare performance metrics", or "analyze performance degradation".

51%

Overall

Validation — 81%

Implementation — 7%

Activation — 90%

SKILL.md

Review

Evals

Validation

81%

Warnings & errors only

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
metadata_version	'metadata' field is not a dictionary	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	13 / 16 Passed

Implementation

This skill content is highly verbose and abstract, explaining concepts Claude already understands while failing to provide any concrete, executable guidance. It describes what the skill does conceptually but never shows how to actually detect regressions - no code, no specific tools, no file formats, no actual commands. The content reads like marketing copy rather than actionable instructions.

Suggestions

Replace abstract descriptions with concrete code examples showing how to load baseline data, compare metrics, and detect regressions (e.g., Python scripts with pandas/scipy for statistical analysis)

Specify the exact file formats for baselines and metrics (JSON schema, CSV structure) and provide example data

Add explicit validation steps with actual commands: 'Run `python validate_baseline.py` - if it returns errors, fix X before proceeding'

Remove redundant sections (Overview, How It Works, When to Use all repeat similar information) and consolidate into a lean quick-start with executable examples

Dimension	Reasoning	Score
Conciseness	Extremely verbose with extensive padding. Explains obvious concepts Claude already knows (what CI/CD is, what performance metrics are), repeats information across sections (Overview, How It Works, When to Use all say similar things), and includes generic filler content throughout.	1 / 3
Actionability	No concrete code, commands, or executable guidance anywhere. Everything is abstract description ('gather performance metrics', 'apply statistical analysis') without specifying actual tools, scripts, file formats, or commands to run.	1 / 3
Workflow Clarity	Instructions section lists steps but they are vague and lack validation checkpoints. No feedback loops for error recovery, no specific commands, and the Error Handling section just lists things to 'verify' or 'check' without explaining how.	1 / 3
Progressive Disclosure	Content is organized into sections with headers, but it's a monolithic document with no references to external files. The Prerequisites mention a path but don't link to any detailed documentation. Content that could be split (examples, error handling) is all inline.	2 / 3
	Total	5 / 12 Passed

Activation

90%

This is a well-structured description that clearly communicates when and why to use the skill, with explicit trigger phrases that aid skill selection. The main weakness is that the capabilities could be more specific about what concrete actions the skill performs beyond 'detect' and 'compare'. The description uses proper third-person voice throughout.

Suggestions

Add more specific concrete actions such as 'generate regression reports', 'set performance thresholds', or 'track latency/throughput/memory metrics' to improve specificity.

Dimension	Reasoning	Score
Specificity	Names the domain (CI/CD pipelines, performance) and some actions (detect regressions, compare metrics, analyze trends), but lacks specific concrete actions like 'generate regression reports', 'set threshold alerts', or 'track specific metrics like latency/throughput'.	2 / 3
Completeness	Clearly answers both what (detect performance regressions by comparing metrics against baselines) and when (validating builds, analyzing performance trends) with explicit trigger phrases provided.	3 / 3
Trigger Term Quality	Includes good natural trigger phrases users would say: 'detect performance regression', 'compare performance metrics', 'analyze performance degradation', plus contextual terms like 'CI/CD pipelines', 'validating builds', and 'performance trends'.	3 / 3
Distinctiveness Conflict Risk	Clear niche focused specifically on performance regression detection in CI/CD context with distinct triggers like 'regression', 'baselines', 'degradation' - unlikely to conflict with general monitoring or testing skills.	3 / 3
	Total	11 / 12 Passed

Reviewed

17 days ago

Table of Contents

Validation Implementation Activation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.