tessleng/agent-insight-experiment

Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Findings Schema

Name: tessleng/agent-insight-experiment
Rating: 70.98 (1 reviews)
Author: tessleng

The synthesized output produced by synthesize-insights. This is the product-facing schema — it combines cross-referenced findings from all data source reports into a prioritized list with an inline action per finding.

File Naming

Save as findings.json in the .tessl-insights-poc/ directory at the repository root.

Schema

{
  "metadata": {
    "scan_id": "<string — shared across all reports in one scan>",
    "repository": "<string — org/repo-name or filesystem path>",
    "generated_at": "<ISO 8601 datetime>",
    "data_sources_used": [
      {
        "source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>",
        "report_file": "<relative path to the source report, e.g. reports/source-code.json>",
        "contributor": "<optional string — username for agent log reports>",
        "tools": "<optional array of strings — agent harnesses represented in this report; only set for source=agent_logs. Allowed values: \"claude_code\", \"cursor\", plus any other harness slug (e.g. \"copilot\", \"gemini_cli\"). Almost always length 1 — a single contributor typically drives one harness per project. The array form is kept for the rare case where one contributor's logs span multiple harnesses.>",
        "findings_contributed": "<number — how many findings this report contributed to>"
      }
    ]
  },

  "context_inventory": "<object — passed through VERBATIM from reports/context-inventory.json (never re-derived). REQUIRED when reports/context-inventory.json exists with a context_inventory block; omit only if the analyzer report is absent. Carries the per-file catalogue rendered by the HTML report's Context Inventory section as a filterable file tree. See skills/analyze-context-inventory/SKILL.md for shape.>",

  "overall_score": {
    "level": "<blocked | constrained | productive>",
    "reasoning": "<string — 1-2 sentences citing the specific finding IDs (e.g. F-001, F-003) that drove the classification>"
  },

  "executive_summary": "<string — short markdown prose, 2–3 sentences (~60 words max), plain-English state of affairs. No APEX category codes (KCG/CAS/SCX/RAF/TCG), no finding IDs (F-001, …), no severity-count tables. Describe the shape of the issues in the reader's own words — not the taxonomy.>",

  "summary": {
    "total_findings": "<number>",
    "by_severity": {
      "critical": 0,
      "high": 0,
      "medium": 0,
      "low": 0
    },
    "by_effort_size": {
      "pebble": 0,
      "stone": 0,
      "rock": 0,
      "boulder": 0
    },
    "top_categories": ["<array of top 2-3 APEX category codes by finding count>"],

    "commit_authors_impacted": {
      "value": "<number — copied from the git-history report's scope.metrics.commit_authors_impacted (distinct authors of the commits cited as evidence)>",
      "total": "<number — copied from the git-history report's scope.metrics.authors_seen (total distinct authors in the analysed window)>"
    },
    "sessions_with_frustration_signals": {
      "value": "<number — sum across all agent-logs reports of scope.metrics.sessions_with_frustration_signals>",
      "total": "<number — sum across all agent-logs reports of scope.metrics.sessions_parsed>"
    }
  },

  "findings": ["<array of Finding objects — see below>"]
}

Both overall_score and executive_summary are required outputs — every synthesis run must produce them. Renderers (like regenerate-report) may tolerate older findings.json files that predate these fields, but any new synthesis output must include them.

Overall Score

A single-word assessment of how well the repository supports coding agents. See skills/synthesize-insights/SKILL.md for the decision rubric.

Level	Meaning
`blocked`	Agents produce incorrect output with high confidence. Context is actively reducing performance rather than supporting it.
`constrained`	Agents handle routine tasks well but struggle with anything requiring organisation-specific knowledge. Context exists but remains incomplete or inconsistent.
`productive`	Agents handle the majority of tasks correctly. Context is well-structured, consistent, and directly actionable.

overall_score.reasoning should cite the specific finding IDs that drove the classification so the reader can jump straight to the evidence.

Executive Summary

A short markdown paragraph (max ~100 words) giving the reader a quick read on the scan before they drill into findings. It should loosely follow this shape without rigidly templating it:

Opening stat sentence (how many findings, severity breakdown, top 2-3 categories)
A sentence on what agents can and can't do today, grounded in the findings
One or two concrete standout examples (reference finding IDs or evidence)
A closing sentence reinforcing the overall score

Example (~90 words):

30 failure patterns were identified across 8 insights. Of these, 4 are critical and 16 are high severity, concentrated in tooling gaps (10 findings), knowledge gaps (9 findings), and recurrent agent failures (8 findings). Agents can complete routine, single-app tasks but systematically encounter friction on cross-app workflows, convention enforcement, and validation steps. The primary observability tool causes repeated agent thrashing, and 48% of merged PRs bypassed formal approval. Context documentation is incomplete in key areas, with no agent-facing guidance for Temporal proxy patterns or Oso Cloud Polar DSL.

Hero Stats

Two optional summary stats are surfaced at the top of the report to give an agent-enablement lead a quick read on reach and pain:

Stat	Meaning
`summary.commit_authors_impacted`	How much of the team is touching the problem areas called out by `git-history` findings. `value` = `git-history` report's `scope.metrics.commit_authors_impacted` (distinct authors of the commits cited as evidence). `total` = `git-history` report's `scope.metrics.authors_seen` (distinct authors in the analysis window).
`summary.sessions_with_frustration_signals`	How often agent sessions show correction / frustration signals (wrong output, revert, "stop guessing", RAF-6 style patterns). `value` and `total` are summed across all contributors' agent-logs reports.

Both fields are optional. Omit the field entirely if the underlying data source was not part of this scan:

Omit commit_authors_impacted when no git_history report contributed.
Omit sessions_with_frustration_signals when no agent_logs reports contributed.

Shape: both stats sit flat on summary (e.g. summary.commit_authors_impacted) with only value and total. They are not wrapped in a hero_stats object or any other grouping, and they do not carry additional fields like description. The HTML report template reads them at the flat path — any nesting causes the tiles to silently disappear from the report.

The numerator and denominator both come from the analyzer skills — synthesis never re-derives them from free-text evidence. See skills/synthesize-insights/SKILL.md for how the aggregation is done and references/insight-report-schema.md for the underlying scope.metrics contract.

Finding Object

Each finding carries its own inline action — the recommended fix for that specific finding.

{
  "id": "F-<NNN>",
  "category": "<KCG | CAS | SCX | RAF | TCG>",
  "subcategory": "<e.g., KCG-1, CAS-2>",
  "title": "<string — short, specific, descriptive>",
  "description": "<string — what the issue is, why it matters for agent performance, and how it manifests>",

  "evidence": [
    {
      "type": "<file_reference | code_snippet | git_log | pr_comment | ci_output | agent_conversation | config_file | review_comment | statistical>",
      "location": "<string — file path, PR URL/number, commit SHA, session ID, etc.>",
      "detail": "<string — what this evidence shows>",
      "snippet": "<optional string — relevant code or text excerpt>",
      "source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>"
    }
  ],

  "impact": {
    "score": "<1-10 integer>",
    "level": "<critical | high | medium | low>",
    "reasoning": "<string — why this score: frequency, blast radius, severity>"
  },

  "effort": {
    "score": "<1-10 integer (1 = trivial, 10 = massive rewrite)>",
    "level": "<trivial | low | medium | high>",
    "reasoning": "<string — what the fix involves>"
  },

  "priority_score": "<number — impact.score / effort.score, higher is better>",
  "confidence": "<high | medium | low>",
  "data_sources": ["<array of source identifiers that contributed to this finding>"],

  "action": {
    "title": "<string — concise imperative description of what to do>",
    "description": "<string — detailed explanation of the action and why it helps>",
    "type": "<create_context | update_code | add_skill | add_rule | refactor | add_tests | add_docs | configure_tools | create_plugin | update_plugin | remove_plugin>",
    "effort_size": "<pebble | stone | rock | boulder>",
    "example_fix": "<optional string — concrete example of what the fix looks like>"
  }
}

Action Types

Type	When to use
`create_context`	Add a new context file (AGENTS.md, .cursor/rules, CLAUDE.md)
`update_code`	Fix or improve application code directly
`add_skill`	Add a skill to an existing plugin or context file
`add_rule`	Add a rule to an existing plugin or context file
`refactor`	Restructure code to reduce complexity or improve patterns
`add_tests`	Add or improve test coverage
`add_docs`	Add or improve documentation (README, JSDoc, etc.)
`configure_tools`	Set up or configure MCP servers, extensions, or other tools
`create_plugin`	Create a new Tessl plugin to package skills, rules, and docs for reuse
`update_plugin`	Update an installed plugin's configuration, version, or content
`remove_plugin`	Remove an installed plugin that is causing harm, conflicts, or is redundant

See tessl-product-context.md for guidance on when to recommend plugin-related actions vs simpler alternatives.

Rock Sizing Guide

Actions use rock sizing to communicate effort at a glance:

Size	Effort Score	Meaning
Pebble	1-2	A few minutes: add a comment, flip a config, create a short context file
Stone	3-4	Under an hour: write documentation, add a rule, create a simple skill
Rock	5-7	Hours to a day: refactor a module, write comprehensive docs, add tests
Boulder	8-10	Days+: major refactoring, architectural changes, large-scale cleanup

Finding ID Assignment

Findings in the synthesized output use the F- prefix (e.g., F-001, F-002) with sequential numbering sorted by priority_score descending. The original per-source IDs (SRC-001, GIT-003, etc.) are not carried forward — provenance is tracked via the data_sources array and evidence[].source fields.

Scoring Guide

Impact (1-10)

Score	Level	Meaning
9-10	Critical	Affects nearly every agent task; causes frequent, severe failures
7-8	High	Affects many tasks or causes significant failures in important areas
4-6	Medium	Affects some tasks or causes moderate confusion
1-3	Low	Affects few tasks or causes minor inefficiency

Effort (1-10)

Score	Level	Meaning
1-2	Trivial	A few minutes: add a comment, create a short context file, flip a config
3-4	Low	Under an hour: write documentation, add a rule, create a simple skill
5-7	Medium	Hours to a day: refactor a module, write comprehensive docs, add tests
8-10	High	Days+: major refactoring, architectural changes, large-scale cleanup

Confidence

High: Multiple data sources corroborate, or single source with unambiguous evidence
Medium: Evidence supports the finding but there could be context missing
Low: Inference based on limited data; flagged as worth investigating

HTML Report

In addition to the JSON, produce a standalone HTML report (report.html) by reading the report template and injecting the serialized findings.json into the /*FINDINGS_JSON*/ marker. See the template for details on the injection mechanism.