CtrlK
BlogDocsLog inGet started
Tessl Logo

code-pattern-extractor

Identifies recurring structural patterns in a codebase — idioms, copy-paste clones, homegrown abstractions — and characterizes each as a reusable template. Use when learning a codebase's conventions, when hunting for copy-paste that should be a function, or when documenting how this team does things.

Install with Tessl CLI

npx tessl i github:santosomar/general-secure-coding-agent-skills --skill code-pattern-extractor
What are skills?

100

Quality

100%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SKILL.md
Review
Evals

Code Pattern Extractor

Every codebase has patterns — some intentional (house idioms), some accidental (copy-paste). Finding them tells you how code gets written here, and which duplication should be consolidated.

Three pattern types

TypeWhat it isWhat to do with it
IdiomThe team's standard way of doing XDocument it. New code should follow it.
CloneCopy-pasted code with minor tweaksExtract to a function. → code-refactoring-assistant
Anti-patternA recurring mistakeFlag it. → code-smell-detector

The same structural pattern can be any of the three — it depends on whether the repetition is good, accidental, or bad.

Finding clones — structural similarity

Exact-text clones are easy (rg + sort + uniq -c). Near-clones — same structure, different variable names — need normalization:

  1. Tokenize each function.
  2. Normalize: replace identifiers with placeholders ($1, $2, ...), normalize literals (42NUM, "foo"STR).
  3. Hash the normalized token stream.
  4. Group by hash. Collisions are clone candidates.
Original A:   user = db.get(user_id);  if user is None: raise NotFound("user")
Original B:   order = db.get(order_id); if order is None: raise NotFound("order")

Normalized:   $1 = db.get($2);         if $1 is None: raise NotFound(STR)
              $1 = db.get($2);         if $1 is None: raise NotFound(STR)

→ Same hash. Clone pair found.

Extractable as: def get_or_404(model, id, name): ...

Finding idioms — frequency of small patterns

Idioms are short (2–5 lines) and appear everywhere. Mine them by n-gram frequency on normalized AST nodes:

Normalized n-gramCountInterpretation
if $1 is None: return None89Null-propagation idiom — this codebase uses it heavily
with self._lock: $BODY34Locking idiom — _lock is the house convention
logger.info(f"{$1.__class__.__name__}: ...")27Logging idiom — always includes class name
try: $X except Exception: pass11Anti-pattern — swallowing all exceptions

The count tells you: high-count idioms are conventions to follow; high-count anti-patterns are systemic problems.

Worked example — extracting a house idiom

Observed in 23 places:

def get_foo(self, foo_id):
    resp = self._client.get(f"/foos/{foo_id}")
    resp.raise_for_status()
    data = resp.json()
    return Foo.from_dict(data)

def get_bar(self, bar_id):
    resp = self._client.get(f"/bars/{bar_id}")
    resp.raise_for_status()
    data = resp.json()
    return Bar.from_dict(data)
# ... ×21 more

Pattern template:

def get_$RESOURCE(self, ${RESOURCE}_id):
    resp = self._client.get(f"/${RESOURCE}s/{${RESOURCE}_id}")
    resp.raise_for_status()
    return $MODEL.from_dict(resp.json())

Parameters: $RESOURCE (string, e.g. foo), $MODEL (class, e.g. Foo).

Verdict: This is a clone, not an idiom. 23 copies of the same 4 lines with 2 parameters → extract:

def _get_resource(self, path: str, model: type[T]) -> T:
    resp = self._client.get(path)
    resp.raise_for_status()
    return model.from_dict(resp.json())

def get_foo(self, foo_id): return self._get_resource(f"/foos/{foo_id}", Foo)

23 × 4 lines → 23 × 1 line + 4 lines shared. And now error handling changes in one place.

Distinguishing idiom from clone

SignalPoints to idiomPoints to clone
Pattern lengthShort (2–3 lines)Long (5+ lines)
Parameter count0–12+
Repetition within one fileRareCommon
Language/framework requires this shapeYes — it's idiomNo — it's duplication
Would extraction make callsites clearer?No — idiom reads fineYes — callsite becomes a name

with self._lock: (1 line, 0 params, language-required shape) → idiom. The get_$RESOURCE block above (4 lines, 2 params, nothing requires this shape) → clone.

Do not

  • Do not extract every 2-line pattern. if x is None: return None appearing 89 times isn't duplication — it's an idiom, and extracting it to propagate_none(x) makes code less readable.
  • Do not report clone pairs without the extraction proposal. "These two functions are similar" is not actionable. "Extract this helper" is.
  • Do not ignore the parameter count. A pattern with 6 parameters that differ each time isn't extractable — the "common" part is tiny.
  • Do not miss semantic clones that differ textually. if not user vs if user is None — different text, same pattern. Normalize aggressively.

Output format

## Idioms (follow these)
| Pattern | Count | Example location |
| ------- | ----- | ---------------- |

## Clones (extract these)
### <pattern name>
Occurrences: <N>
Template:
<normalized pattern with $PARAMS>
Parameters: <list — what varies>
Proposed extraction:
<function signature + body>
Affected files: <list>

## Anti-patterns (fix these)
| Pattern | Count | Why bad | Locations |
| ------- | ----- | ------- | --------- |
Repository
santosomar/general-secure-coding-agent-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.