CtrlK
BlogDocsLog inGet started
Tessl Logo

tessleng/agent-insight-experiment

Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.

70

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

SKILL.mdskills/synthesize-insights/

name:
synthesize-insights
description:
Synthesize cross-referenced findings from multiple data source reports into a prioritized list of findings with inline actions and an HTML report. Use when all analyzer reports are ready and you need to produce the final synthesized output, when re-running synthesis after new agent log contributions, or when asked to update the insight scan results.

Synthesize Insights

Cross-reference individual data source reports, validate findings, deduplicate into a unified findings list, generate actions via subagent, and produce the final findings.json and report.html.

You have full access to all data sources for validation and cross-referencing.

Before You Start

Read the shared reference files:

  • APEX taxonomy — insight categories
  • Insight report schema — individual report format
  • Findings schema — the output format you must produce

Resolving reference paths: All reference links in this skill use relative paths (../../references/...) that work when read from the tile directory. If those paths do not resolve (e.g. when activated via a .claude/skills/ symlink), find the shared references at .tessl/tiles/*/agent-insight-experiment/references/ relative to the repository root. This applies to every ../../references/ link in this file, including the report template and product context referenced in later steps.

Inputs

Read all reports from .tessl-insights-poc/reports/:

  1. source-code.json (prefix SRC)
  2. git-history.json (prefix GIT)
  3. github-data.json (prefix GH)
  4. agent-context.json (prefix CTX)
  5. agent-logs/*.json — all files in this directory (prefix LOG); there may be multiple contributor reports
  6. context-inventory.json (prefix INV) — also carries a top-level context_inventory block that is passed through to findings.json unchanged

If any reports are missing, proceed with what's available and note the gap.

Analysis Strategy

Step 1: Catalog All Insights

Build a unified inventory from all reports: original ID, category, subcategory, source, impact/effort/priority, affected areas, and title. This gives you the full picture before cross-referencing.

Step 2: Cross-Reference and Deduplicate

Match insights across sources that describe the same underlying issue. Signals:

  • Overlapping affected_areas
  • Same subcategory with similar descriptions
  • One describes a cause, another the symptom

When multiple sources corroborate the same finding:

  • Merge into a single finding
  • Combine evidence from all sources (each evidence entry retains its source field)
  • Boost confidence — corroborated findings should generally be high confidence
  • Use the highest impact score among the originals
  • Track all contributing data sources in the data_sources array

Be rigorous: two insights in the same directory aren't automatically about the same issue.

Step 3: Validate Claims

Spot-check at least 2-3 insights from each source using full data access:

# Verify source code claims
cat <path-from-evidence>

# Verify git history claims
git log --since="6 months ago" --name-only --pretty=format: -- <path> | sort | uniq -c | sort -rn | head -5

# Verify GitHub claims
gh pr view <pr-number> --json title,body,reviewDecision,comments

# Verify agent log claims
head -50 <session-file-path>

# Verify context claims
cat <context-file-path>

Drop or downgrade findings that don't hold up to validation. Do not include low-confidence findings unless they are clearly worth investigating — prefer a smaller, more trustworthy set of findings over volume.

Step 4: Assign Finding IDs

Assign new F-NNN IDs sorted by priority_score descending (F-001 is the highest priority finding). The original per-source IDs (SRC-001, GIT-003, etc.) are not carried forward — provenance is tracked via data_sources and evidence[].source.

Step 5: Generate Actions via Subagent

The findings from Step 4 do not yet have actions — action generation is handled by a dedicated subskill with Tessl product awareness.

  1. Write the findings (without action fields) to a temp file:
FINDINGS_TEMP="$(mktemp)"
# Write the findings JSON array to $FINDINGS_TEMP
  1. Launch the generate-actions skill as a subagent, providing:

    • The path to the temp file containing findings
    • The workspace path (so it can read the reference files)
    • Instructions to read the Tessl product context and findings schema for context
  2. Receive back the findings with action populated on each one

  3. If subagents are not available, generate actions inline by reading the Tessl product context and findings schema yourself, then following the strategy in the generate-actions skill

Step 6: Compute Summary

Fill in the summary object:

  • total_findings count
  • by_severity breakdown
  • by_effort_size breakdown (computed from each finding's action.effort_size)
  • top_categories — the 2-3 APEX categories with the most findings
  • commit_authors_impacted — hero stat sourced from the git-history report (see below)
  • sessions_with_frustration_signals — hero stat summed across agent-logs reports (see below)

Hero stats

Both stats are optional. Omit the whole field if the underlying data source did not contribute to this scan. Never re-derive them from free-text evidence — always read them from the per-report metadata.scope.metrics.

Preflight: verify the analyzers emitted what you need. Run these before computing the stats — if a check fails, it means the analyzer is out of date relative to its own schema, and you should omit the corresponding hero stat rather than papering over it. Do not surface analyzer-drift warnings in the executive summary (the summary is user-facing and should stay high-level); if you want to leave a methodology note, put it in overall_score.reasoning instead.

# Only if a git-history report exists (skip otherwise):
jq -e '.metadata.scope.metrics.commit_authors_impacted, .metadata.scope.metrics.authors_seen' \
  .tessl-insights-poc/reports/git-history.json >/dev/null \
  && echo "git-history hero-stat metrics OK" \
  || echo "WARN: git-history report missing commit_authors_impacted and/or authors_seen — omit summary.commit_authors_impacted"

# Only if agent-logs reports exist (skip otherwise):
for f in .tessl-insights-poc/reports/agent-logs/*.json; do
  [ -f "$f" ] || continue
  jq -e '.metadata.scope.metrics.sessions_with_frustration_signals, .metadata.scope.metrics.sessions_parsed' "$f" >/dev/null \
    && echo "agent-logs hero-stat metrics OK: $(basename "$f")" \
    || echo "WARN: $(basename "$f") missing sessions_with_frustration_signals and/or sessions_parsed — treat its numerator as 0 (still include its sessions_parsed in the denominator)"
done

These checks catch analyzer-vs-synthesizer drift at synthesis time — the class of bug where an analyzer changes its output shape and the HTML report silently loses tiles at render time. If either check warns, silently omit the affected stat (the template already hides missing tiles); keep it out of the executive summary.

Shape matters — emit these as flat keys directly on summary. Do not wrap them in a hero_stats object or any other grouping — the findings schema, HTML template, and validation checklist all expect the flat shape below. Past synthesis runs have hallucinated a summary.hero_stats.* wrapper; if you do that, the tiles silently disappear from the HTML report.

// ✅ CORRECT — flat on summary
"summary": {
  "commit_authors_impacted": { "value": 13, "total": 48 },
  "sessions_with_frustration_signals": { "value": 10, "total": 181 }
}

// ❌ WRONG — nested; template won't render these
"summary": {
  "hero_stats": {
    "commit_authors_impacted": { "value": 13, "total": 48 }
  }
}

Only value and total are schema fields. Do not add description or other keys.

commit_authors_impacted — only emit when a git-history report was ingested:

"commit_authors_impacted": {
  "value": <git-history report's metadata.scope.metrics.commit_authors_impacted>,
  "total": <git-history report's metadata.scope.metrics.authors_seen>
}

Both numbers come directly from the analyzer — analyze-git-history is responsible for computing commit_authors_impacted (distinct authors across its problem_commits list) and authors_seen (denominator). Do not re-derive either from evidence text or by re-running git log. If an older git-history report is missing either field, omit the entire hero stat silently (the template hides missing tiles); do not mention the gap in the executive summary.

sessions_with_frustration_signals — only emit when at least one agent-logs report was ingested:

"sessions_with_frustration_signals": {
  "value": <sum of metadata.scope.metrics.sessions_with_frustration_signals across all agent-logs reports>,
  "total": <sum of metadata.scope.metrics.sessions_parsed across all agent-logs reports>
}

If an older agent-logs report is missing sessions_with_frustration_signals, treat it as 0 for the numerator but still include its sessions_parsed in the denominator. Do not mention the drift in the executive summary — keep that paragraph high-level.

Fill in metadata.data_sources_used with each source report, its path, optional contributor name (for agent logs), and how many findings it contributed to.

For every agent-logs entry, also populate tools — an array of agent harness slugs represented in that contributor's report. Read this from the per-report metadata.scope.metrics.tools_represented field. Normalize the values to lowercase slugs (e.g. "Claude Code""claude_code", "Cursor""cursor"). If the field is missing on an older report, infer from the log paths recorded in the report (Cursor logs under ~/.cursor/..."cursor"; Claude Code logs under ~/.claude/..."claude_code"). Do not set tools on non-agent-logs entries.

Step 7: Assess Overall Score

Classify the repository as blocked, constrained, or productive using the rubric below. Apply it in order — the first matching level wins.

blocked — agents are actively misled or actively dangerous. Any one of:

  • ≥1 critical RAF finding corroborated by agent logs and at least one other data source
  • ≥3 critical findings overall (any category)
  • A primary tool or workflow (observability, build, deploy, scaffolding) produces outputs that cause repeated agent thrashing — specifically, a critical TCG finding corroborated by agent logs
  • A critical KCG-7 (undocumented dangers) finding where agent logs show the agent actually executed the destructive or risky operation — an unguarded-risk footgun that has already fired at least once

constrained — the expected default for most real repos. All of:

  • Does not meet blocked criteria
  • ≥1 high severity finding in KCG, CAS, or TCG, or ≥1 corroborated RAF finding at high severity

productive — all of:

  • No critical findings
  • No corroborated RAF findings (at any severity)
  • Findings are predominantly medium or low severity

Write the result as:

"overall_score": {
  "level": "constrained",
  "reasoning": "Classified constrained because F-001 (KCG, high) and F-004 (TCG, high) show organisation-specific context gaps, but no critical findings cross the blocked threshold."
}

The reasoning must cite the specific finding IDs that pushed the classification one way or the other (≤2 sentences). If the repo is close to the next level in either direction, mention that in the reasoning so future scans can track drift.

Step 8: Write the Executive Summary

Write a short markdown paragraph to the executive_summary field — 2–3 sentences, ~60 words max. It's a plain-English state of affairs the reader sees before any details, a 15-second read that sets the scene for the overall score band.

Do not use internal codes or IDs in the prose. Specifically:

  • No APEX category codes (KCG, CAS, SCX, RAF, TCG) — these are not surfaced next to the summary in the HTML, so they read as jargon. Describe the shape of the problem in plain words instead (e.g. "missing onboarding docs and conflicting conventions", not "KCG and CAS findings").
  • No finding IDs (F-001, F-004, …) — the summary sits above the findings list and the IDs only make sense once the reader has scrolled. Describe the issue itself (e.g. "agents repeatedly fail tests in the payments module", not "F-003 shows a critical RAF issue").
  • No severity-count tables in prose (e.g. avoid "3 high, 7 medium") — the stats row already shows this.

Loose shape to aim for:

  1. One sentence on the overall state of agent-readiness, in the reader's own language.
  2. One sentence on the single most pressing theme, described concretely (a named tool, workflow, or observed behaviour — not a code).
  3. An optional closing sentence that echoes the overall_score.level in natural prose.

Keep it concrete and human — prefer named tools, workflows, or observed behaviours over generic phrasing like "various issues were found". If the reader can't act on a detail at the exec-summary level, it belongs in a finding, not here.

See the findings schema for an example.

Step 9: Pass Through Context Inventory

This step is required whenever reports/context-inventory.json exists and contains a top-level context_inventory object — it is not an optional polish step. The HTML report's Context Inventory section reads directly from findings.json.context_inventory.files; if you omit it, the section renders an empty state and the reader loses the file-tree view of the agent context surface.

Copy the block verbatim. Do not merge, trim, or re-derive — the inventory is the analyze-context-inventory skill's output.

# One-liner to extract the block:
jq '.context_inventory' .tessl-insights-poc/reports/context-inventory.json

Shape of what you're copying (abbreviated — see skills/analyze-context-inventory/SKILL.md for the full schema):

"context_inventory": {
  "files": [
    { "path": "...", "category": "entry_point | always_on_rule | hook | skill | mcp", "agents": ["claude", "cursor"], "usefulness": "high | medium | low", "purpose": "..." }
  ],
  "summary": { ... }
}

Verify after writing findings.json:

# Must print the same number on both sides:
jq '.context_inventory.files | length' .tessl-insights-poc/findings.json
jq '.context_inventory.files | length' .tessl-insights-poc/reports/context-inventory.json

If reports/context-inventory.json is absent or has no context_inventory block, omit the field — the HTML template already renders an empty state for the Context Inventory section, so there is no need to call this out in the executive summary.

Step 10: Generate HTML Report

Read the HTML report template.

The template contains a JSON injection marker:

const DATA = /*FINDINGS_JSON*/{}/*END_FINDINGS_JSON*/;

Replace the {} between the markers with the serialized contents of findings.json.

Critical: escape </ before injecting. The findings JSON is embedded inside a <script> tag, and the browser's HTML parser treats any literal </script> it sees as the end of the script block — even when it appears inside a JSON string value. If any finding's description, snippet, example_fix, or other text contains </script> (or any </ followed by a tag name the HTML parser recognizes), the page breaks mid-data and the report renders blank. This class of bug is silent — findings.json still validates, the template is still correct, and the output just fails to load without a console error.

Replace </ with <\/ in the serialized JSON string before injecting. The <\/ form is still valid JSON (the \/ is an allowed escape for /) and still valid JavaScript when evaluated, but the HTML parser no longer sees a closer. Do not modify findings.json on disk — escape only the copy you're about to embed.

Minimal reference implementation:

template = Path("tiles/agent-insight-experiment/references/report-template.html").read_text()
findings = Path(".tessl-insights-poc/findings.json").read_text()
safe = findings.replace("</", "<\\/")
start, end = "/*FINDINGS_JSON*/", "/*END_FINDINGS_JSON*/"
i = template.index(start) + len(start)
j = template.index(end)
out = template[:i] + safe.strip() + template[j:]
Path(".tessl-insights-poc/report.html").write_text(out)

Write the result as .tessl-insights-poc/report.html.

Important: Copy the template exactly — do not modify the HTML, CSS, or JS. Only replace the {} placeholder with the (escaped) JSON data.

Sanity check after writing — the embedded blob must contain zero </script> strings:

python3 -c "
from pathlib import Path
h = Path('.tessl-insights-poc/report.html').read_text()
i = h.index('/*FINDINGS_JSON*/') + len('/*FINDINGS_JSON*/')
j = h.rindex('/*END_FINDINGS_JSON*/')
print('embedded </script> count (must be 0):', h[i:j].count('</script>'))
"

Output

Produce two files in .tessl-insights-poc/:

1. findings.json

Structured output per the findings schema. The top-level JSON object must contain the following keys:

KeyRequired when
scan_id, repository, schema_versionalways
scan_started_at, scan_completed_atalways
overall_scorealways
executive_summaryalways
summaryalways (including by_severity, by_effort_size, top_categories; plus commit_authors_impacted / sessions_with_frustration_signals flat on summary when their source reports contributed)
metadataalways (including data_sources_used)
findingsalways
context_inventorywhen reports/context-inventory.json exists with a context_inventory block (see Step 9)
notesoptional — free-text notes about validations, drops, or caveats

2. report.html

The HTML template with findings.json injected. Verify it opens correctly and that the stats row and Context Inventory section render as expected:

open ".tessl-insights-poc/report.html"

Validation Checklist

Before finalizing:

  • Every insight from every source report is accounted for (merged, included, or explicitly dropped with reason)
  • Cross-references are genuine (not just same-area coincidences)
  • At least 2-3 insights per source spot-checked against actual data
  • Dropped insights have clear reasoning
  • Every finding has an action with all required fields (title, description, type, effort_size)
  • summary counts are accurate
  • Hero stats are flat: summary.commit_authors_impacted and summary.sessions_with_frustration_signals sit directly on summary — never wrapped in summary.hero_stats or any other grouping object. Each, if present, has only value and total (both ≥ 0, valuetotal), or is omitted entirely — never partially populated.
  • Context inventory passed through: if reports/context-inventory.json has a context_inventory block, findings.json.context_inventory.files.length equals reports/context-inventory.json's context_inventory.files.length. Verify with the jq snippet in Step 9.
  • overall_score.level is one of blocked | constrained | productive and reasoning cites specific finding IDs
  • executive_summary is populated, ~60 words max (2–3 sentences), plain-English state of affairs, and contains no APEX category codes (KCG, CAS, SCX, RAF, TCG), no finding IDs (F-001, …), and no severity-count prose
  • HTML opens correctly in a browser, the stats row shows the hero-stat tiles (when applicable), and the Context Inventory section lists files (when applicable)
  • The embedded findings blob contains zero </script> strings — see the sanity-check snippet in Step 10. A non-zero count means the </<\/ escape was skipped and the report will render blank

skills

synthesize-insights

README.md

tile.json