tessleng/agent-insight-experiment

Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Insight Report Schema

Name: tessleng/agent-insight-experiment
Rating: 70.98 (1 reviews)
Author: tessleng

Every analyzer produces a single JSON report file conforming to this schema. The report is the primary output — a companion Markdown summary is optional but encouraged.

File Naming

Save the report as <data_source>-report.json in the experiment workspace directory. For example: source-code-report.json, git-history-report.json.

ID Prefixes

Each data source uses a unique prefix for insight IDs:

Data Source	Prefix	Example
Source Code	SRC	SRC-001
Git History	GIT	GIT-001
GitHub Data	GH	GH-001
Agent Logs	LOG	LOG-001
Agent Context	CTX	CTX-001
Context Inventory	INV	INV-001

Schema

{
  "metadata": {
    "scan_id": "<string — shared across all reports in one scan run>",
    "data_source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>",
    "repository": "<string — org/repo-name or filesystem path>",
    "analysis_timestamp": "<ISO 8601 datetime>",
    "analyzer_model": "<string — model identifier>",
    "scope": {
      "description": "<string — what was analyzed, sampling strategy used, and any limitations>",
      "metrics": {},
      "time_range": "<optional string — date range for temporal data sources>"
    }
  },

  "context_inventory": "<optional object — only the `analyze-context-inventory` report includes this; see that skill's SKILL.md for shape>",

  "executive_summary": "<string — 2-3 paragraphs covering: what was analyzed and the most significant findings>",

  "summary_statistics": {
    "total_insights": "<number>",
    "by_category": {
      "KCG": 0, "CAS": 0, "SCX": 0, "RAF": 0, "TCG": 0
    },
    "by_impact": {
      "critical": 0, "high": 0, "medium": 0, "low": 0
    },
    "by_effort": {
      "trivial": 0, "low": 0, "medium": 0, "high": 0
    },
    "top_quick_wins": ["<array of insight IDs with highest priority_score>"]
  },

  "insights": ["<array of Insight objects — see below>"]
}

Insight Object

{
  "id": "<PREFIX-NNN>",
  "category": "<KCG | CAS | SCX | RAF | TCG>",
  "subcategory": "<e.g., KCG-1, CAS-2>",
  "title": "<string — short, specific, descriptive>",
  "description": "<string — detailed explanation: what the issue is, why it matters for agent performance, and how it manifests>",

  "evidence": [
    {
      "type": "<file_reference | code_snippet | git_log | pr_comment | ci_output | agent_conversation | config_file | review_comment | statistical>",
      "location": "<string — file path, PR URL/number, commit SHA, session ID, etc.>",
      "detail": "<string — what this evidence shows>",
      "snippet": "<optional string — relevant code or text excerpt>"
    }
  ],

  "impact": {
    "score": "<1-10 integer>",
    "level": "<critical | high | medium | low>",
    "reasoning": "<string — why this score: frequency, blast radius, severity>"
  },

  "effort": {
    "score": "<1-10 integer (1 = trivial, 10 = massive rewrite)>",
    "level": "<trivial | low | medium | high>",
    "reasoning": "<string — what the fix involves and roughly how long it would take>"
  },

  "priority_score": "<number — impact.score / effort.score, higher is better>",

  "affected_areas": ["<array of file paths or directory patterns>"],
  "confidence": "<high | medium | low>",
  "data_source_exclusive": "<boolean — could this insight ONLY have been discovered from this data source?>"
}

Scope Metrics by Data Source

The scope.metrics object uses keys appropriate to each data source:

Data Source	Typical Metrics
Source Code	`files_examined`, `directories_traversed`, `total_loc_sampled`
Git History	`commits_analyzed`, `authors_seen`, `branches_examined`, `problem_commits`, `commit_authors_impacted`
GitHub Data	`prs_reviewed`, `ci_runs_examined`, `issues_checked`, `review_comments_read`
Agent Logs	`sessions_parsed`, `conversations_analyzed`, `tool_calls_examined`, `tools_represented`, `sessions_with_frustration_signals`
Agent Context	`context_files_found`, `rules_analyzed`, `skills_inventoried`
Context Inventory	`files_inventoried`, `by_category` (map of category → count), `link_edges` (count of follow-links)

Hero-stat metrics

A few scope.metrics fields are used by the synthesizer to populate the hero stats shown at the top of the report. See the individual skill docs for exact definitions, but briefly:

git_history → problem_commits (array of SHAs cited as evidence), commit_authors_impacted (distinct authors of those SHAs). Used to compute summary.commit_authors_impacted in findings.json. authors_seen supplies the denominator.
agent_logs → sessions_with_frustration_signals (distinct sessions showing frustration/correction signals). Summed across all contributors' reports to produce summary.sessions_with_frustration_signals. sessions_parsed supplies the denominator.

Analyzers are responsible for these numbers — synthesis never re-derives them from free-text evidence.

Scoring Guide

Impact (1-10)

Score	Level	Meaning
9-10	Critical	Affects nearly every agent task; causes frequent, severe failures
7-8	High	Affects many tasks or causes significant failures in important areas
4-6	Medium	Affects some tasks or causes moderate confusion
1-3	Low	Affects few tasks or causes minor inefficiency

Effort (1-10)

Score	Level	Meaning
1-2	Trivial	A few minutes: add a comment, create a short context file, flip a config
3-4	Low	Under an hour: write documentation, add a rule, create a simple skill
5-7	Medium	Hours to a day: refactor a module, write comprehensive docs, add tests
8-10	High	Days+: major refactoring, architectural changes, large-scale cleanup

Confidence

High: Multiple pieces of clear evidence; the issue is unambiguous
Medium: Evidence supports the finding but there could be context you're missing
Low: Inference based on limited data; flagged as worth investigating

tessleng/agent-insight-experiment