CtrlK
BlogDocsLog inGet started
Tessl Logo

tessleng/agent-insight-experiment

Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.

70

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

findings-schema.mdreferences/

Findings Schema

The synthesized output produced by synthesize-insights. This is the product-facing schema — it combines cross-referenced findings from all data source reports into a prioritized list with an inline action per finding.

File Naming

Save as findings.json in the .tessl-insights-poc/ directory at the repository root.

Schema

{
  "metadata": {
    "scan_id": "<string — shared across all reports in one scan>",
    "repository": "<string — org/repo-name or filesystem path>",
    "generated_at": "<ISO 8601 datetime>",
    "data_sources_used": [
      {
        "source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>",
        "report_file": "<relative path to the source report, e.g. reports/source-code.json>",
        "contributor": "<optional string — username for agent log reports>",
        "tools": "<optional array of strings — agent harnesses represented in this report; only set for source=agent_logs. Allowed values: \"claude_code\", \"cursor\", plus any other harness slug (e.g. \"copilot\", \"gemini_cli\"). Almost always length 1 — a single contributor typically drives one harness per project. The array form is kept for the rare case where one contributor's logs span multiple harnesses.>",
        "findings_contributed": "<number — how many findings this report contributed to>"
      }
    ]
  },

  "context_inventory": "<object — passed through VERBATIM from reports/context-inventory.json (never re-derived). REQUIRED when reports/context-inventory.json exists with a context_inventory block; omit only if the analyzer report is absent. Carries the per-file catalogue rendered by the HTML report's Context Inventory section as a filterable file tree. See skills/analyze-context-inventory/SKILL.md for shape.>",

  "overall_score": {
    "level": "<blocked | constrained | productive>",
    "reasoning": "<string — 1-2 sentences citing the specific finding IDs (e.g. F-001, F-003) that drove the classification>"
  },

  "executive_summary": "<string — short markdown prose, 2–3 sentences (~60 words max), plain-English state of affairs. No APEX category codes (KCG/CAS/SCX/RAF/TCG), no finding IDs (F-001, …), no severity-count tables. Describe the shape of the issues in the reader's own words — not the taxonomy.>",

  "summary": {
    "total_findings": "<number>",
    "by_severity": {
      "critical": 0,
      "high": 0,
      "medium": 0,
      "low": 0
    },
    "by_effort_size": {
      "pebble": 0,
      "stone": 0,
      "rock": 0,
      "boulder": 0
    },
    "top_categories": ["<array of top 2-3 APEX category codes by finding count>"],

    "commit_authors_impacted": {
      "value": "<number — copied from the git-history report's scope.metrics.commit_authors_impacted (distinct authors of the commits cited as evidence)>",
      "total": "<number — copied from the git-history report's scope.metrics.authors_seen (total distinct authors in the analysed window)>"
    },
    "sessions_with_frustration_signals": {
      "value": "<number — sum across all agent-logs reports of scope.metrics.sessions_with_frustration_signals>",
      "total": "<number — sum across all agent-logs reports of scope.metrics.sessions_parsed>"
    }
  },

  "findings": ["<array of Finding objects — see below>"]
}

Both overall_score and executive_summary are required outputs — every synthesis run must produce them. Renderers (like regenerate-report) may tolerate older findings.json files that predate these fields, but any new synthesis output must include them.

Overall Score

A single-word assessment of how well the repository supports coding agents. See skills/synthesize-insights/SKILL.md for the decision rubric.

LevelMeaning
blockedAgents produce incorrect output with high confidence. Context is actively reducing performance rather than supporting it.
constrainedAgents handle routine tasks well but struggle with anything requiring organisation-specific knowledge. Context exists but remains incomplete or inconsistent.
productiveAgents handle the majority of tasks correctly. Context is well-structured, consistent, and directly actionable.

overall_score.reasoning should cite the specific finding IDs that drove the classification so the reader can jump straight to the evidence.

Executive Summary

A short markdown paragraph (max ~100 words) giving the reader a quick read on the scan before they drill into findings. It should loosely follow this shape without rigidly templating it:

  1. Opening stat sentence (how many findings, severity breakdown, top 2-3 categories)
  2. A sentence on what agents can and can't do today, grounded in the findings
  3. One or two concrete standout examples (reference finding IDs or evidence)
  4. A closing sentence reinforcing the overall score

Example (~90 words):

30 failure patterns were identified across 8 insights. Of these, 4 are critical and 16 are high severity, concentrated in tooling gaps (10 findings), knowledge gaps (9 findings), and recurrent agent failures (8 findings). Agents can complete routine, single-app tasks but systematically encounter friction on cross-app workflows, convention enforcement, and validation steps. The primary observability tool causes repeated agent thrashing, and 48% of merged PRs bypassed formal approval. Context documentation is incomplete in key areas, with no agent-facing guidance for Temporal proxy patterns or Oso Cloud Polar DSL.

Hero Stats

Two optional summary stats are surfaced at the top of the report to give an agent-enablement lead a quick read on reach and pain:

StatMeaning
summary.commit_authors_impactedHow much of the team is touching the problem areas called out by git-history findings. value = git-history report's scope.metrics.commit_authors_impacted (distinct authors of the commits cited as evidence). total = git-history report's scope.metrics.authors_seen (distinct authors in the analysis window).
summary.sessions_with_frustration_signalsHow often agent sessions show correction / frustration signals (wrong output, revert, "stop guessing", RAF-6 style patterns). value and total are summed across all contributors' agent-logs reports.

Both fields are optional. Omit the field entirely if the underlying data source was not part of this scan:

  • Omit commit_authors_impacted when no git_history report contributed.
  • Omit sessions_with_frustration_signals when no agent_logs reports contributed.

Shape: both stats sit flat on summary (e.g. summary.commit_authors_impacted) with only value and total. They are not wrapped in a hero_stats object or any other grouping, and they do not carry additional fields like description. The HTML report template reads them at the flat path — any nesting causes the tiles to silently disappear from the report.

The numerator and denominator both come from the analyzer skills — synthesis never re-derives them from free-text evidence. See skills/synthesize-insights/SKILL.md for how the aggregation is done and references/insight-report-schema.md for the underlying scope.metrics contract.

Finding Object

Each finding carries its own inline action — the recommended fix for that specific finding.

{
  "id": "F-<NNN>",
  "category": "<KCG | CAS | SCX | RAF | TCG>",
  "subcategory": "<e.g., KCG-1, CAS-2>",
  "title": "<string — short, specific, descriptive>",
  "description": "<string — what the issue is, why it matters for agent performance, and how it manifests>",

  "evidence": [
    {
      "type": "<file_reference | code_snippet | git_log | pr_comment | ci_output | agent_conversation | config_file | review_comment | statistical>",
      "location": "<string — file path, PR URL/number, commit SHA, session ID, etc.>",
      "detail": "<string — what this evidence shows>",
      "snippet": "<optional string — relevant code or text excerpt>",
      "source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>"
    }
  ],

  "impact": {
    "score": "<1-10 integer>",
    "level": "<critical | high | medium | low>",
    "reasoning": "<string — why this score: frequency, blast radius, severity>"
  },

  "effort": {
    "score": "<1-10 integer (1 = trivial, 10 = massive rewrite)>",
    "level": "<trivial | low | medium | high>",
    "reasoning": "<string — what the fix involves>"
  },

  "priority_score": "<number — impact.score / effort.score, higher is better>",
  "confidence": "<high | medium | low>",
  "data_sources": ["<array of source identifiers that contributed to this finding>"],

  "action": {
    "title": "<string — concise imperative description of what to do>",
    "description": "<string — detailed explanation of the action and why it helps>",
    "type": "<create_context | update_code | add_skill | add_rule | refactor | add_tests | add_docs | configure_tools | create_plugin | update_plugin | remove_plugin>",
    "effort_size": "<pebble | stone | rock | boulder>",
    "example_fix": "<optional string — concrete example of what the fix looks like>"
  }
}

Action Types

TypeWhen to use
create_contextAdd a new context file (AGENTS.md, .cursor/rules, CLAUDE.md)
update_codeFix or improve application code directly
add_skillAdd a skill to an existing plugin or context file
add_ruleAdd a rule to an existing plugin or context file
refactorRestructure code to reduce complexity or improve patterns
add_testsAdd or improve test coverage
add_docsAdd or improve documentation (README, JSDoc, etc.)
configure_toolsSet up or configure MCP servers, extensions, or other tools
create_pluginCreate a new Tessl plugin to package skills, rules, and docs for reuse
update_pluginUpdate an installed plugin's configuration, version, or content
remove_pluginRemove an installed plugin that is causing harm, conflicts, or is redundant

See tessl-product-context.md for guidance on when to recommend plugin-related actions vs simpler alternatives.

Rock Sizing Guide

Actions use rock sizing to communicate effort at a glance:

SizeEffort ScoreMeaning
Pebble1-2A few minutes: add a comment, flip a config, create a short context file
Stone3-4Under an hour: write documentation, add a rule, create a simple skill
Rock5-7Hours to a day: refactor a module, write comprehensive docs, add tests
Boulder8-10Days+: major refactoring, architectural changes, large-scale cleanup

Finding ID Assignment

Findings in the synthesized output use the F- prefix (e.g., F-001, F-002) with sequential numbering sorted by priority_score descending. The original per-source IDs (SRC-001, GIT-003, etc.) are not carried forward — provenance is tracked via the data_sources array and evidence[].source fields.

Scoring Guide

Impact (1-10)

ScoreLevelMeaning
9-10CriticalAffects nearly every agent task; causes frequent, severe failures
7-8HighAffects many tasks or causes significant failures in important areas
4-6MediumAffects some tasks or causes moderate confusion
1-3LowAffects few tasks or causes minor inefficiency

Effort (1-10)

ScoreLevelMeaning
1-2TrivialA few minutes: add a comment, create a short context file, flip a config
3-4LowUnder an hour: write documentation, add a rule, create a simple skill
5-7MediumHours to a day: refactor a module, write comprehensive docs, add tests
8-10HighDays+: major refactoring, architectural changes, large-scale cleanup

Confidence

  • High: Multiple data sources corroborate, or single source with unambiguous evidence
  • Medium: Evidence supports the finding but there could be context missing
  • Low: Inference based on limited data; flagged as worth investigating

HTML Report

In addition to the JSON, produce a standalone HTML report (report.html) by reading the report template and injecting the serialized findings.json into the /*FINDINGS_JSON*/ marker. See the template for details on the injection mechanism.

references

apex-taxonomy.md

findings-schema.md

insight-report-schema.md

report-template.html

tessl-product-context.md

README.md

tile.json