Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.
70
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
The synthesized output produced by synthesize-insights. This is the product-facing schema — it combines cross-referenced findings from all data source reports into a prioritized list with an inline action per finding.
Save as findings.json in the .tessl-insights-poc/ directory at the repository root.
{
"metadata": {
"scan_id": "<string — shared across all reports in one scan>",
"repository": "<string — org/repo-name or filesystem path>",
"generated_at": "<ISO 8601 datetime>",
"data_sources_used": [
{
"source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>",
"report_file": "<relative path to the source report, e.g. reports/source-code.json>",
"contributor": "<optional string — username for agent log reports>",
"tools": "<optional array of strings — agent harnesses represented in this report; only set for source=agent_logs. Allowed values: \"claude_code\", \"cursor\", plus any other harness slug (e.g. \"copilot\", \"gemini_cli\"). Almost always length 1 — a single contributor typically drives one harness per project. The array form is kept for the rare case where one contributor's logs span multiple harnesses.>",
"findings_contributed": "<number — how many findings this report contributed to>"
}
]
},
"context_inventory": "<object — passed through VERBATIM from reports/context-inventory.json (never re-derived). REQUIRED when reports/context-inventory.json exists with a context_inventory block; omit only if the analyzer report is absent. Carries the per-file catalogue rendered by the HTML report's Context Inventory section as a filterable file tree. See skills/analyze-context-inventory/SKILL.md for shape.>",
"overall_score": {
"level": "<blocked | constrained | productive>",
"reasoning": "<string — 1-2 sentences citing the specific finding IDs (e.g. F-001, F-003) that drove the classification>"
},
"executive_summary": "<string — short markdown prose, 2–3 sentences (~60 words max), plain-English state of affairs. No APEX category codes (KCG/CAS/SCX/RAF/TCG), no finding IDs (F-001, …), no severity-count tables. Describe the shape of the issues in the reader's own words — not the taxonomy.>",
"summary": {
"total_findings": "<number>",
"by_severity": {
"critical": 0,
"high": 0,
"medium": 0,
"low": 0
},
"by_effort_size": {
"pebble": 0,
"stone": 0,
"rock": 0,
"boulder": 0
},
"top_categories": ["<array of top 2-3 APEX category codes by finding count>"],
"commit_authors_impacted": {
"value": "<number — copied from the git-history report's scope.metrics.commit_authors_impacted (distinct authors of the commits cited as evidence)>",
"total": "<number — copied from the git-history report's scope.metrics.authors_seen (total distinct authors in the analysed window)>"
},
"sessions_with_frustration_signals": {
"value": "<number — sum across all agent-logs reports of scope.metrics.sessions_with_frustration_signals>",
"total": "<number — sum across all agent-logs reports of scope.metrics.sessions_parsed>"
}
},
"findings": ["<array of Finding objects — see below>"]
}Both overall_score and executive_summary are required outputs — every synthesis run must produce them. Renderers (like regenerate-report) may tolerate older findings.json files that predate these fields, but any new synthesis output must include them.
A single-word assessment of how well the repository supports coding agents. See skills/synthesize-insights/SKILL.md for the decision rubric.
| Level | Meaning |
|---|---|
blocked | Agents produce incorrect output with high confidence. Context is actively reducing performance rather than supporting it. |
constrained | Agents handle routine tasks well but struggle with anything requiring organisation-specific knowledge. Context exists but remains incomplete or inconsistent. |
productive | Agents handle the majority of tasks correctly. Context is well-structured, consistent, and directly actionable. |
overall_score.reasoning should cite the specific finding IDs that drove the classification so the reader can jump straight to the evidence.
A short markdown paragraph (max ~100 words) giving the reader a quick read on the scan before they drill into findings. It should loosely follow this shape without rigidly templating it:
Example (~90 words):
30 failure patterns were identified across 8 insights. Of these, 4 are critical and 16 are high severity, concentrated in tooling gaps (10 findings), knowledge gaps (9 findings), and recurrent agent failures (8 findings). Agents can complete routine, single-app tasks but systematically encounter friction on cross-app workflows, convention enforcement, and validation steps. The primary observability tool causes repeated agent thrashing, and 48% of merged PRs bypassed formal approval. Context documentation is incomplete in key areas, with no agent-facing guidance for Temporal proxy patterns or Oso Cloud Polar DSL.
Two optional summary stats are surfaced at the top of the report to give an agent-enablement lead a quick read on reach and pain:
| Stat | Meaning |
|---|---|
summary.commit_authors_impacted | How much of the team is touching the problem areas called out by git-history findings. value = git-history report's scope.metrics.commit_authors_impacted (distinct authors of the commits cited as evidence). total = git-history report's scope.metrics.authors_seen (distinct authors in the analysis window). |
summary.sessions_with_frustration_signals | How often agent sessions show correction / frustration signals (wrong output, revert, "stop guessing", RAF-6 style patterns). value and total are summed across all contributors' agent-logs reports. |
Both fields are optional. Omit the field entirely if the underlying data source was not part of this scan:
commit_authors_impacted when no git_history report contributed.sessions_with_frustration_signals when no agent_logs reports contributed.Shape: both stats sit flat on summary (e.g. summary.commit_authors_impacted) with only value and total. They are not wrapped in a hero_stats object or any other grouping, and they do not carry additional fields like description. The HTML report template reads them at the flat path — any nesting causes the tiles to silently disappear from the report.
The numerator and denominator both come from the analyzer skills — synthesis never re-derives them from free-text evidence. See skills/synthesize-insights/SKILL.md for how the aggregation is done and references/insight-report-schema.md for the underlying scope.metrics contract.
Each finding carries its own inline action — the recommended fix for that specific finding.
{
"id": "F-<NNN>",
"category": "<KCG | CAS | SCX | RAF | TCG>",
"subcategory": "<e.g., KCG-1, CAS-2>",
"title": "<string — short, specific, descriptive>",
"description": "<string — what the issue is, why it matters for agent performance, and how it manifests>",
"evidence": [
{
"type": "<file_reference | code_snippet | git_log | pr_comment | ci_output | agent_conversation | config_file | review_comment | statistical>",
"location": "<string — file path, PR URL/number, commit SHA, session ID, etc.>",
"detail": "<string — what this evidence shows>",
"snippet": "<optional string — relevant code or text excerpt>",
"source": "<source_code | git_history | github_data | agent_logs | agent_context | context_inventory>"
}
],
"impact": {
"score": "<1-10 integer>",
"level": "<critical | high | medium | low>",
"reasoning": "<string — why this score: frequency, blast radius, severity>"
},
"effort": {
"score": "<1-10 integer (1 = trivial, 10 = massive rewrite)>",
"level": "<trivial | low | medium | high>",
"reasoning": "<string — what the fix involves>"
},
"priority_score": "<number — impact.score / effort.score, higher is better>",
"confidence": "<high | medium | low>",
"data_sources": ["<array of source identifiers that contributed to this finding>"],
"action": {
"title": "<string — concise imperative description of what to do>",
"description": "<string — detailed explanation of the action and why it helps>",
"type": "<create_context | update_code | add_skill | add_rule | refactor | add_tests | add_docs | configure_tools | create_plugin | update_plugin | remove_plugin>",
"effort_size": "<pebble | stone | rock | boulder>",
"example_fix": "<optional string — concrete example of what the fix looks like>"
}
}| Type | When to use |
|---|---|
create_context | Add a new context file (AGENTS.md, .cursor/rules, CLAUDE.md) |
update_code | Fix or improve application code directly |
add_skill | Add a skill to an existing plugin or context file |
add_rule | Add a rule to an existing plugin or context file |
refactor | Restructure code to reduce complexity or improve patterns |
add_tests | Add or improve test coverage |
add_docs | Add or improve documentation (README, JSDoc, etc.) |
configure_tools | Set up or configure MCP servers, extensions, or other tools |
create_plugin | Create a new Tessl plugin to package skills, rules, and docs for reuse |
update_plugin | Update an installed plugin's configuration, version, or content |
remove_plugin | Remove an installed plugin that is causing harm, conflicts, or is redundant |
See tessl-product-context.md for guidance on when to recommend plugin-related actions vs simpler alternatives.
Actions use rock sizing to communicate effort at a glance:
| Size | Effort Score | Meaning |
|---|---|---|
| Pebble | 1-2 | A few minutes: add a comment, flip a config, create a short context file |
| Stone | 3-4 | Under an hour: write documentation, add a rule, create a simple skill |
| Rock | 5-7 | Hours to a day: refactor a module, write comprehensive docs, add tests |
| Boulder | 8-10 | Days+: major refactoring, architectural changes, large-scale cleanup |
Findings in the synthesized output use the F- prefix (e.g., F-001, F-002) with sequential numbering sorted by priority_score descending. The original per-source IDs (SRC-001, GIT-003, etc.) are not carried forward — provenance is tracked via the data_sources array and evidence[].source fields.
| Score | Level | Meaning |
|---|---|---|
| 9-10 | Critical | Affects nearly every agent task; causes frequent, severe failures |
| 7-8 | High | Affects many tasks or causes significant failures in important areas |
| 4-6 | Medium | Affects some tasks or causes moderate confusion |
| 1-3 | Low | Affects few tasks or causes minor inefficiency |
| Score | Level | Meaning |
|---|---|---|
| 1-2 | Trivial | A few minutes: add a comment, create a short context file, flip a config |
| 3-4 | Low | Under an hour: write documentation, add a rule, create a simple skill |
| 5-7 | Medium | Hours to a day: refactor a module, write comprehensive docs, add tests |
| 8-10 | High | Days+: major refactoring, architectural changes, large-scale cleanup |
In addition to the JSON, produce a standalone HTML report (report.html) by reading the report template and injecting the serialized findings.json into the /*FINDINGS_JSON*/ marker. See the template for details on the injection mechanism.