tessleng/agent-insight-experiment

Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

name:: analyze-git-history
description:: Analyze a repository's git history to identify patterns that affect coding agent performance. Examines file churn, co-change relationships, revert frequency, contributor concentration, and commit patterns to find areas where agents are likely to struggle. Use when running an insight scan's git history analysis phase, analyzing repository churn for agent readiness, auditing change patterns for AI agent risks, or understanding what a repo's change history reveals about agent performance risks.

Analyze Git History for Agent Performance Insights

Name: tessleng/agent-insight-experiment
Rating: 70.98 (1 reviews)
Author: tessleng

Examine the repository's git history to surface patterns that indicate where coding agents are likely to struggle.

Scope: Focus on git commands (log, shortlog, diff, blame, show, rev-list, etc.) for temporal and change pattern analysis.

Before You Start

Read the shared reference files:

Read the APEX taxonomy for the insight categories
Read the insight report schema for the exact report structure

Resolving reference paths: The links above use relative paths (../../references/...) that work when this skill is read from its tile directory. If those paths do not resolve (e.g. when activated via a .claude/skills/ symlink), find the shared references at .tessl/tiles/*/agent-insight-experiment/references/ relative to the repository root.

Your report prefix is GIT (e.g., GIT-001, GIT-002).

Quick Start (Recommended)

Run the data collection script to gather all git metrics in a single pass:

bash "$(dirname "$0")/scripts/git-data-collector.sh" --root "$(pwd)"

Resolving script path: The path above assumes this skill is read from its tile directory. If run via a .claude/skills/ symlink, locate the script at .tessl/tiles/*/agent-insight-experiment/skills/analyze-git-history/scripts/git-data-collector.sh relative to the repo root. Pass --months <n> to adjust the analysis window (default: 6), or --out <path> to write to a file.

The script outputs JSON containing: repository vitals (commit count, authors, last commit, recent activity), file churn (top 40), co-change pairs (top 10 files with their co-changed files), reverts, fix-up commits, contributor concentration (top 20 directories with author breakdown), large commits (>20 files), commit message prefix conventions, and pattern shifts (recent vs older half). Read the output and proceed directly to insight generation — skip the manual collection steps below.

Manual Collection (Fallback)

If the script is unavailable, collect data manually using the steps below.

Step 1: Repository Vitals

git rev-list --count HEAD                        # total commits
git log --format='%ae' | sort -u | wc -l         # unique authors
git log --since="6 months ago" --oneline | wc -l  # recent activity
git log -1 --format='%ci'                         # last commit date

Checkpoint: If total commits <50 or recent activity is zero, adjust scope — the repo may be too small or inactive for meaningful churn analysis. Note this in the report and focus on contributor patterns and structural observations instead.

Step 2: File Churn Analysis

git log --since="6 months ago" --name-only --pretty=format: | sort | uniq -c | sort -rn | head -40

Step 3: Co-Change Analysis

Find files that always change together — this reveals hidden coupling where an agent might miss a required co-change:

# For each of the top 10 high-churn files, check what else changes in the same commits
git log --since="6 months ago" --pretty=format:"%H" -- <file> | head -20 | while read sha; do
  git diff-tree --no-commit-id --name-only -r "$sha"
done | sort | uniq -c | sort -rn | head -10

Step 4: Revert and Fix-up Analysis

git log --since="6 months ago" --oneline --grep="revert" -i
git log --since="6 months ago" --oneline --grep="fix" -i | head -30

Check whether reverts and fixes cluster around specific areas.

Step 5: Contributor Concentration

# For each of the top 20 most-changed directories
git log --since="6 months ago" --format='%ae' -- <directory> | sort | uniq -c | sort -rn

Single-contributor areas signal concentrated implicit knowledge — high KCG-3 risk.

Step 6: Commit Pattern Analysis

# Large commits (many files) — suggest complex, coupled changes
git log --since="6 months ago" --pretty=format:"%H %s" --shortstat | head -100

# Commit message conventions
git log --since="6 months ago" --pretty=format:"%s" | sed 's/(.*//' | sort | uniq -c | sort -rn | head -20

Step 7: Recent Pattern Shifts

Convention changes visible in activity shifts between recent and older code:

# Compare directory activity: last 3 months vs 3-6 months ago
git log --since="3 months ago" --name-only --pretty=format: | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -20
git log --since="6 months ago" --until="3 months ago" --name-only --pretty=format: | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -20

Scope Limits

Analyze the last 6 months of history (or 2000 commits, whichever is less)
For very active repos, focus on the most recent 3 months for detailed analysis
Sample at most 20 directories for contributor concentration analysis
Use --since flags consistently to bound queries

What to Look For

Git history is especially good at revealing:

KCG-3 (Tribal knowledge): Single-contributor areas where knowledge is concentrated in one person
CAS-2 (Inconsistent code patterns): Pattern shifts over time visible in how the same things are done differently in old vs new code
SCX-4 (High coupling): Files that always change together despite being in different modules
RAF-1, RAF-2, RAF-5: Areas with frequent reverts, fix-up commits, or repeated changes suggest agent (or human) difficulty
KCG-5 (Stale documentation): Documentation files that haven't been updated even as the code they describe changed significantly
TCG-6 (Unowned content): Context or documentation files whose last meaningful edit is very old and whose historical authors are no longer active

Output

Produce a JSON report conforming to the insight report schema. Save to the path provided by the orchestrator (or .tessl-insights-poc/reports/git-history.json standalone).

Set scope.metrics to include:

commits_analyzed: number of commits examined
authors_seen: number of unique authors (by email, %ae) in the analysed window (denominator for the commit_authors_impacted hero stat)
branches_examined: number of branches checked
time_range_days: how many days of history covered
problem_commits: array of commit SHAs (short or full) that appear as evidence in this report's insights — i.e. the specific commits used to illustrate reverts, revert-reland cycles, oversized commits, drift commits, etc. Deduplicate across insights. Include every SHA you cite in evidence.
commit_authors_impacted: number of distinct authors (by email, %ae) who wrote the commits listed in problem_commits. Derive this by running git log -1 --format='%ae' <sha> for each SHA and counting unique values. This is the numerator for the commit_authors_impacted hero stat in the synthesized report.

The synthesizer uses problem_commits + commit_authors_impacted + authors_seen to populate summary.commit_authors_impacted in findings.json. Every insight that cites commits as evidence must use SHAs that also appear in scope.metrics.problem_commits — if you can't cite concrete SHAs for a "problem commits" style finding, the insight is too vague.

Validation before saving:

Verify all required metadata fields present
Confirm at least 8 insights with commit SHAs, file paths, or statistics as evidence
Check every insight has id, category, impact, effort, priority_score
Confirm scope.metrics.problem_commits contains every SHA cited in any insight's evidence and that scope.metrics.commit_authors_impacted matches the count of distinct authors for those SHAs

Mark data_source_exclusive: true for insights from temporal/change data (churn rates, co-change patterns, contributor concentration).