CtrlK
BlogDocsLog inGet started
Tessl Logo

flaky-test-detector

Identifies non-deterministic or unreliable tests through static code analysis and test result analysis. Use when Claude needs to find flaky tests, analyze test reliability, or investigate intermittent test failures. Supports Python (pytest, unittest) and Java (JUnit, TestNG) test frameworks. Trigger when users mention "flaky tests", "intermittent failures", "non-deterministic tests", "unreliable tests", or ask to "find flaky tests", "analyze test stability", or "why tests fail randomly".

Install with Tessl CLI

npx tessl i github:ArabelaTso/Skills-4-SE --skill flaky-test-detector
What are skills?

94

Does it follow best practices?

Validation for skill structure

SKILL.md
Review
Evals

Evaluation results

100%

7%

CI/CD Flakiness Investigation

Test result analysis workflow

Criteria
Without context
With context

Script invocation

80%

100%

analysis_output.txt created

100%

100%

Report summary section

100%

100%

High priority section

100%

100%

Category grouping

100%

100%

Flakiness scores reported

70%

100%

Pass/fail pattern reported

100%

100%

Pass rate reported

87%

100%

Alternating pattern noted

100%

100%

Duration variance noted

100%

100%

Specific test details

100%

100%

Without context: $0.3675 · 1m 46s · 15 turns · 22 in / 6,621 out tokens

With context: $0.5926 · 1m 52s · 23 turns · 5,592 in / 7,161 out tokens

100%

1%

Test Suite Reliability Review

Python static analysis and remediation

Criteria
Without context
With context

Report structure per issue

100%

100%

Line numbers included

100%

100%

Issues grouped by category

100%

100%

Shared state identified

100%

100%

Fixed: setUp/fixture used

87%

100%

Fixed: tmp_path or tempfile

100%

100%

Fixed: time mock

100%

100%

Fixed: network mock

100%

100%

Fixed: sleep removed

100%

100%

Fixed: random seeded

100%

100%

Confidence or priority noted

100%

100%

File cleanup issue noted

100%

100%

Without context: $0.4350 · 2m 5s · 18 turns · 25 in / 7,425 out tokens

With context: $0.7972 · 3m 12s · 25 turns · 30 in / 10,966 out tokens

92%

4%

Java Test Suite Stabilization

Java test flakiness detection and fix

Criteria
Without context
With context

Report structure per issue

100%

100%

Line numbers in report

100%

100%

Issues grouped by category

100%

100%

Fixed: @Before setup

100%

100%

Fixed: @After teardown

50%

100%

Fixed: Thread.sleep removed

100%

100%

Fixed: Clock/time mock

0%

100%

Fixed: resource management

100%

100%

Fixed: race condition

100%

100%

Fixed: network mock

100%

100%

Confidence or severity noted

100%

0%

Shared static state identified

100%

100%

Without context: $0.3171 · 1m 55s · 13 turns · 20 in / 6,739 out tokens

With context: $0.7189 · 2m 47s · 25 turns · 4,222 in / 9,488 out tokens

Evaluated
Agent
Claude Code

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.