CtrlK
BlogDocsLog inGet started
Tessl Logo

szz-bug-identifier

Applies the SZZ algorithm to VCS history to identify which commits introduced bugs by correlating bug-fix commits with earlier changes. Use when mining a repository for bug-introducing commits, when building a defect-prediction dataset, or when the user asks which commit introduced a given fixed bug.

Install with Tessl CLI

npx tessl i github:santosomar/general-secure-coding-agent-skills --skill szz-bug-identifier
What are skills?

97

Quality

96%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SKILL.md
Review
Evals

SZZ Bug Identifier

SZZ (Śliwerski, Zimmermann, Zeller, 2005) answers: given a bug-fix commit, which earlier commit introduced the bug? It works by blaming the lines the fix touched.

The algorithm — classic SZZ

  1. Identify the fix commit. Usually via a link to an issue tracker (Fixes #1234, BUG-567) or a message keyword (fix, bug, patch).
  2. Extract the deleted/modified lines. In the fix's diff, every line removed or changed is a line that was (potentially) buggy.
  3. Blame each of those lines. git blame <fix>^ -- <file> on each modified line to find the commit that last touched it before the fix.
  4. Those blamed commits are bug-introducing candidates.

That's the whole algorithm. The rest is noise filtering.

Noise filters — classic SZZ's known weaknesses

Noise sourceWhy it's wrongFilter
Whitespace / formatting changesBlame hits a prettier run, not the real introducergit blame -w; ignore commits that only touch formatting
Comment-only changesThe fix edited a comment too — that line is not the bugStrip comment lines before blaming
Large refactor commitsEvery line blames to the Great Refactor of 2019git blame --ignore-rev with a curated ignore-list
The line was added by the fixNo blame target — added lines didn't exist beforeOnly blame deleted/modified lines, not added
Bug predates the repoBlame hits the initial import commitFlag — can't attribute
Moved fileBlame stops at the git mvgit blame -C -M to follow moves/copies
Blamed commit is newer than bug reportThe bug existed before that commit; blame is wrongDiscard candidates with commit-date > bug-report-date

Worked example

Fix commit: c4a9f1b — Fix: null check in getUserEmail (closes #892)

public String getUserEmail(long id) {
      User u = repo.find(id);
-     return u.getEmail();
+     if (u == null) return null;
+     return u.getEmail();
  }

Step 2: Modified line is return u.getEmail(); (the old version).

Step 3: git blame c4a9f1b^ -- UserService.java at that line → a17d3e0 — Add UserService (Jane, 2021-03-04).

Step 4: Candidate = a17d3e0.

Filters:

  • Whitespace? No, it's a real code line. ✓
  • Comment? No. ✓
  • Refactor in ignore-list? No. ✓
  • Added by the fix? No — this is the old version of a modified line. ✓
  • Newer than bug report? Issue #892 opened 2023-11-01; a17d3e0 is 2021. ✓

Verdict: a17d3e0 introduced the bug. The null-check was never there.

Edge cases

  • Multi-line fix where lines blame to different commits: Each blamed commit is a candidate. Usually one is the real introducer and the rest are incidental; → semantic-szz-analyzer to distinguish.
  • Fix is a one-line addition (no deletion): Classic SZZ has nothing to blame. Heuristic: blame the surrounding lines (the ones the added line sits between). Low confidence.
  • The fix reverts an earlier commit entirely: git revert — the reverted commit IS the bug-introducer, definitionally. Shortcut: check if the fix is a revert before running SZZ.
  • Cherry-picked commits: The blame may point at the cherry-pick, not the original. Check for (cherry picked from commit …) trailers and follow the chain.

Do not

  • Do not treat SZZ output as ground truth. Published precision is 40–70% without filters. It's a candidate generator for human review.
  • Do not run SZZ on a repo without a curated .git-blame-ignore-revs. Without it, every result blames the last formatting pass.
  • Do not use commit-message keyword matching (fix, bug) as your only fix-identification signal. False-positive rate is brutal. Prefer issue-tracker links.
  • Do not run SZZ at bug-fix time for a single bug — that's what → regression-root-cause-analyzer is for (bisect is more precise). SZZ is for batch mining.

Output format

fix: <sha> — <subject>
  candidates:
    <sha> — <subject>  (<date>, <author>)
      blamed from: <file>:<line>
      filters passed: whitespace ✓  comment ✓  date ✓
      confidence: <high|medium|low>
    ...
Repository
santosomar/general-secure-coding-agent-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.