CtrlK
BlogDocsLog inGet started
Tessl Logo

tracking-regression-tests

This skill enables Claude to track and run regression tests, ensuring new changes don't break existing functionality. It is triggered when the user asks to "track regression", "run regression tests", or uses the shortcut "reg". The skill helps in maintaining code stability by identifying critical tests, automating their execution, and analyzing the impact of changes. It also provides insights into test history and identifies flaky tests. The skill uses the `regression-test-tracker` plugin.

Install with Tessl CLI

npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill tracking-regression-tests
What are skills?

87

1.39x

Does it follow best practices?

Evaluation92%

1.39x

Agent success when using this skill

Validation for skill structure

SKILL.md
Review
Evals

Evaluation results

96%

55%

Setting Up a Regression Test Suite for a Payment Service

Plugin usage and mark flag

Criteria
Without context
With context

Plugin reference

0%

100%

Mark flag syntax

0%

100%

Confirmation language

90%

60%

Critical test selection

100%

100%

Run script uses plugin

0%

100%

Runbook mark workflow

20%

100%

Runbook run workflow

100%

100%

Deployment frequency guidance

100%

100%

Without context: $0.4298 · 2m 54s · 22 turns · 23 in / 6,590 out tokens

With context: $0.5349 · 3m 12s · 28 turns · 282 in / 7,214 out tokens

100%

3%

Diagnosing Test Instability in a Distributed Inventory Service

Flaky test detection and failure analysis

Criteria
Without context
With context

Failures highlighted

100%

100%

Flaky test identified

100%

100%

Consistent failure identified

100%

100%

Root cause for reorder trigger

100%

100%

Root cause for concurrent update

100%

100%

Root cause for low stock alert

100%

100%

Prioritized action list

80%

100%

Without context: $0.3098 · 1m 28s · 14 turns · 15 in / 5,123 out tokens

With context: $0.3736 · 2m 52s · 17 turns · 17 in / 5,539 out tokens

75%

5%

Automating Regression Safety Checks for a SaaS Deployment Pipeline

CI/CD integration and deployment frequency

Criteria
Without context
With context

Plugin in CI script

0%

0%

Pre-deployment gate

100%

100%

Deployment blocking

100%

100%

Critical test selection

100%

100%

Mark flag documented

0%

33%

Run frequency guidance

100%

100%

Flaky test mention

100%

100%

Results interpretation

100%

100%

Without context: $0.2896 · 1m 12s · 18 turns · 19 in / 4,212 out tokens

With context: $0.4280 · 2m 32s · 23 turns · 23 in / 5,796 out tokens

90%

38%

Establishing a Regression Safety Net Before First Production Launch

Pre-launch regression baseline setup

Criteria
Without context
With context

Plugin invoked for marking

0%

100%

Mark flag used

0%

100%

Confirmation per test

0%

0%

Critical path tests selected

100%

100%

Change-risk tests included

100%

100%

Plugin used for run

0%

100%

Pre-deployment frequency

100%

100%

Results include failures

100%

100%

Flaky test awareness

100%

100%

Runbook completeness

100%

100%

Without context: $0.3901 · 3m 16s · 14 turns · 15 in / 7,774 out tokens

With context: $0.5657 · 3m 50s · 25 turns · 279 in / 9,306 out tokens

97%

36%

Verifying Checkout Logic After a Pricing Engine Refactor

Change-driven regression run and analysis

Criteria
Without context
With context

Change-affected tests identified

100%

100%

Plugin used for marking

0%

100%

Mark flag syntax correct

0%

100%

Plugin used for run

0%

100%

Failures highlighted in report

100%

100%

Root cause per failure

100%

100%

Flaky tests flagged

100%

100%

Confirmation of addition

37%

62%

Prioritized action list

100%

100%

Critical tests not omitted

100%

100%

Without context: $0.4312 · 2m 54s · 19 turns · 20 in / 6,692 out tokens

With context: $0.4678 · 3m 28s · 21 turns · 102 in / 6,816 out tokens

96%

18%

Cleaning Up a Noisy Regression Suite Before a Major Release

Flaky test triage and suite refinement

Criteria
Without context
With context

Flaky tests identified

100%

100%

Consistent failures identified

100%

100%

Stable tests identified

100%

100%

Root cause per failing test

100%

100%

Flaky root cause or hypothesis

100%

100%

Plugin used for refinement

0%

100%

Mark flag for additions

0%

100%

Critical tests retained

100%

100%

Prioritized recommendations

100%

100%

Confirmation output

100%

50%

Without context: $0.3939 · 2m 54s · 16 turns · 17 in / 7,153 out tokens

With context: $0.3922 · 1m 34s · 20 turns · 278 in / 5,491 out tokens

100%

22%

Regression Suite Health Review

Test history insights and trend analysis

Criteria
Without context
With context

Flaky tests identified

100%

100%

Consistently failing identified

100%

100%

Stable tests identified

100%

100%

Trend patterns reported

100%

100%

Root cause for flaky tests

100%

100%

Root cause for consistent failure

100%

100%

Plugin used for suite update

0%

100%

Mark flag for additions

0%

100%

Prioritized action recommendations

100%

100%

Suite update confirmation

100%

100%

Without context: $0.4440 · 3m 54s · 17 turns · 18 in / 8,173 out tokens

With context: $0.4759 · 3m 29s · 23 turns · 21 in / 7,176 out tokens

79%

22%

Hotfix Regression Check Before Emergency Deployment

Emergency hotfix regression verification

Criteria
Without context
With context

Plugin used for marking

0%

50%

Mark flag syntax

0%

80%

Affected tests identified

100%

100%

Plugin used for run

0%

60%

Pre-deployment urgency

100%

100%

Confirmation per test added

100%

100%

Failures highlighted

75%

62%

Root cause per failure

100%

90%

Flaky test handling

0%

37%

Critical path tests selected

100%

100%

Without context: $0.4057 · 1m 38s · 20 turns · 21 in / 6,130 out tokens

With context: $0.4300 · 1m 36s · 23 turns · 103 in / 5,637 out tokens

100%

40%

Bootstrap Regression Suite for New Inventory Module

Critical test selection for new module regression suite

Criteria
Without context
With context

Plugin used for marking

0%

100%

Mark flag per test

0%

100%

Critical functionality tests selected

100%

100%

Change-likely tests included

100%

100%

Confirmation per test

100%

100%

Selection rationale documented

100%

100%

Low-value tests excluded

100%

100%

Run frequency guidance

25%

100%

Plugin run step documented

0%

100%

Flaky test awareness

100%

100%

Without context: $0.4046 · 3m 13s · 21 turns · 21 in / 6,562 out tokens

With context: $0.3282 · 1m 23s · 19 turns · 1,934 in / 4,298 out tokens

Evaluated
Agent
Claude Code
Model
Unknown

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.