CtrlK
BlogDocsLog inGet started
Tessl Logo

test-suite-prioritizer

Orders tests so failures surface earliest — runs tests covering changed code first, historically flaky/failing tests early, and slow low-value tests last. Use when the suite is too slow to run in full on every change, when CI feedback takes too long, or when deciding what to run in a smoke-test tier.

Install with Tessl CLI

npx tessl i github:santosomar/general-secure-coding-agent-skills --skill test-suite-prioritizer
What are skills?

100

Quality

100%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SKILL.md
Review
Evals

Test Suite Prioritizer

If the suite takes 40 minutes and fails at minute 38, you wasted 38 minutes. Run the test that's going to fail first. Prioritization is predicting failures and front-loading them.

Signals for priority

SignalWhy it predicts failureHow to get it
Covers changed codeThis change might have broken itCoverage map + git diff --name-only
Failed recentlyWhat failed yesterday fails todayCI history — last N runs
FlakyRuns early → flakes detected early, can be retriedCI history — pass/fail variance
FastMore tests per minute of budgetDuration from last run
High code coverage (this test)Covers more → more chance of catching somethingPer-test coverage
Co-change with modified filesHistorically changes with these filesgit log correlation

Combine into a score. Sort. Run in order.

Scoring formula (starting point — tune it)

priority(test) =
    10.0 * covers_changed_lines(test, diff)       # binary: 1 if any overlap
  +  3.0 * recent_failure_rate(test, last_20_runs)
  +  1.0 * flake_rate(test)
  +  0.1 * (1.0 / (duration_seconds(test) + 1))   # tie-break: faster first
  -  5.0 * is_quarantined(test)                   # known-broken go last

The covers_changed_lines weight dominates: if you changed foo.py, tests that cover foo.py run first. Everything else is tie-breaking.

Building the change → test map

You need: which tests cover which lines. Per-test coverage:

EcosystemHow
Pythonpytest-testmon maintains the map incrementally; or coverage run --parallel per test + coverage combine
JavaJaCoCo per-test — tricky, needs agent per test; or use git diff → changed classes → tests importing them (approximation)
JSjest --changedSince=<ref> does this natively
Gogo test -run with package granularity; or gotestsum --junitfile + parse

The map is expensive to build from scratch (run every test in isolation, collect coverage). Build it once, update incrementally.

Tiered execution

Don't just reorder — cut off:

TierWhatRuns onTime budget
SmokeTop-20 by priorityEvery commit< 2 min
AffectedEverything covering changed codeEvery PR< 10 min
FullEverything, priority-orderedMerge to mainWhatever it takes
NightlyFull + slow integration + flake retry loopCronHours OK

Fail fast at each tier. Smoke fail → don't run affected. Affected fail → don't run full.

Worked example

Change: PR touches auth/session.py (lines 45–60) and auth/tokens.py (new file).

Coverage map says:

TestCovers session.py 45–60Covers tokens.pyLast 20 runsDuration
test_session_refreshYesNo20/20 pass0.3s
test_token_issuanceNoYes(new)0.1s
test_session_expiryYes (line 58)No18/20 pass0.2s
test_login_e2eYes (indirectly)Yes20/20 pass12s
test_unrelated_billingNoNo20/20 pass0.4s
...300 more unrelated tests...

Priority order:

  1. test_session_expiry — covers change, recent failures (2/20), fast → score 13.2
  2. test_session_refresh — covers change, clean, fast → 10.1
  3. test_token_issuance — covers new file, clean, fastest → 10.1
  4. test_login_e2e — covers both, but slow → 10.008 5–305. Everything else (scores < 1)

Affected tier runs 1–4 (12.6s total). If they pass, high confidence the change is safe. Full suite runs on merge.

Flake handling

Flaky tests are a prioritization headache: run them early (catch flakes fast, retry within budget) or run them late (don't block good changes on noise)?

Answer: run them early, with automatic retry. A flake that fails-then-passes-on-retry cost you 5 seconds. A flake that fails at minute 38 with no retry budget cost you 38 minutes.

Do not

  • Do not trust static import-based "affected test" approximations for dynamic languages. getattr(module, name)() isn't an import. Use real coverage.
  • Do not let the priority order stagnate. Rebuild it when coverage changes significantly (nightly at minimum).
  • Do not skip the full suite entirely. Prioritization is for fast feedback, not for replacing comprehensive checks. Something always slips past "affected."
  • Do not deprioritize a test into oblivion. A test that's always last never runs if the budget is tight — at that point delete it or fix it.

Output format

## Change
<diff summary — files, line ranges>

## Coverage map freshness
Last rebuilt: <when>
Staleness: <N tests may have stale coverage>

## Priority order (top 20)
| Rank | Test | Score | Covers change | Recent fails | Duration |
| ---- | ---- | ----- | ------------- | ------------ | -------- |

## Tiers
| Tier | Tests | Est. duration | Run condition |
| ---- | ----- | ------------- | ------------- |

## Run plan
<pytest/jest/mvn command(s) — ordered>

## Flakes
<tests with flake_rate > 0 — retry policy>
Repository
santosomar/general-secure-coding-agent-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.