Identifies non-deterministic or unreliable tests through static code analysis and test result analysis. Use when Claude needs to find flaky tests, analyze test reliability, or investigate intermittent test failures. Supports Python (pytest, unittest) and Java (JUnit, TestNG) test frameworks. Trigger when users mention "flaky tests", "intermittent failures", "non-deterministic tests", "unreliable tests", or ask to "find flaky tests", "analyze test stability", or "why tests fail randomly".
Install with Tessl CLI
npx tessl i github:ArabelaTso/Skills-4-SE --skill flaky-test-detector94
Does it follow best practices?
Validation for skill structure
Identify and fix non-deterministic tests that intermittently fail without code changes.
When a user reports flaky tests or asks for test reliability analysis:
Flaky tests fail intermittently without code changes due to:
Analyze test code for common flaky patterns.
When to use:
Process:
Common patterns to detect:
Timing issues:
time.sleep(), Thread.sleep() - Fixed waitsawait in async functionsState issues:
External dependencies:
requests.get(), http.client - Real network callsRandomness:
random. without seedUUID.randomUUID() without mockingTime dependencies:
datetime.now(), System.currentTimeMillis()Analyze test execution history to find inconsistent results.
When to use:
Process:
scripts/analyze_test_results.py to analyze patternsScript usage:
python scripts/analyze_test_results.py test_results.jsonInput format (JSON):
[
{
"test_name": "test_user_login",
"status": "passed",
"timestamp": "2024-01-01T10:00:00",
"duration": 1.23
},
{
"test_name": "test_user_login",
"status": "failed",
"timestamp": "2024-01-01T11:00:00",
"duration": 1.45
}
]Metrics:
Common issues:
tmp_path for file operations@pytest.mark.django_db for database testsrandom module usageBest practices:
tmp_path fixture for file operationspytest-mock or unittest.mockfreezegun for time mockingrandom.seed()Common issues:
@Before/@After cleanup@Transactional for database testsThread.sleep() callsBest practices:
@Before/@After for setup/cleanup@Transactional for automatic rollbackClock for time mockingAsk clarifying questions:
Static analysis if:
Result analysis if:
For static analysis:
For result analysis:
analyze_test_results.py scriptStructure the report:
Example format:
Found 5 potentially flaky tests:
HIGH PRIORITY:
- test_user_login (flakiness: 0.85)
- Line 45: time.sleep(2) - fixed wait
- Line 52: Shared class variable 'user_data'
MEDIUM PRIORITY:
- test_api_call (flakiness: 0.62)
- Line 23: requests.get() - unmocked network callFor each issue, provide:
Reference remediation-strategies.md for detailed fixes.
User: "Our test_checkout test keeps failing randomly" → Analyze test code for flaky patterns, report findings with fixes
User: "Find all flaky tests in the test suite" → Scan all test files for common flaky patterns
User: "This test has a 60% pass rate, why?" → Analyze test code and suggest specific fixes
User: "Analyze these test results for flakiness" → Use analyze_test_results.py script on provided data
User: "How do I fix this race condition in my test?" → Provide remediation strategy with code examples
User: "Review this test for potential flakiness" → Static analysis of specific test with recommendations
0f00a4f
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.