tessleng/agent-insight-experiment

Scan a repository to surface actionable findings about agent performance. Analyzes source code, git history, GitHub data, agent logs, and agent context, then synthesizes cross-referenced findings with targeted actions informed by Tessl product awareness. Supports incremental multi-developer contributions and produces a self-contained HTML report.

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

name:: analyze-agent-logs
description:: Analyze agent conversation logs from Cursor, Claude Code, or other AI coding tools to identify patterns that affect agent performance. Discovers log locations automatically, parses conversations for repeated failures, frustration signals, excessive iteration, and tool misuse. Use when running an insight scan's agent log analysis phase, or when you want to understand what agent conversation history reveals about performance bottlenecks in a repository.

Analyze Agent Logs for Agent Performance Insights

Name: tessleng/agent-insight-experiment
Rating: 70.98 (1 reviews)
Author: tessleng

Examine agent conversation histories to find patterns where agents repeatedly struggle, users express frustration, or tasks require excessive iteration.

Scope: Focus on agent conversation logs and related metadata.

Before You Start

Read the shared reference files:

Read the APEX taxonomy for the insight categories
Read the insight report schema for the exact report structure
Read the log discovery guide for guidance on finding and parsing agent logs

Resolving reference paths: The shared reference links above use relative paths (../../references/...) that work when this skill is read from its tile directory. If those paths do not resolve (e.g. when activated via a .claude/skills/ symlink), find the shared references at .tessl/tiles/*/agent-insight-experiment/references/ relative to the repository root. The log discovery guide (./references/log-discovery.md) is local to this skill and should resolve via the symlink.

Your report prefix is LOG (e.g., LOG-001, LOG-002).

Log Discovery

Agent logs live in different locations depending on the tool. The goal is to find conversation transcripts for the current project/repository.

Option A: Use Audit-Logs Plugin (Preferred)

If the try-tessl/agent-quality plugin is installed, use its collection and normalization scripts — they handle multi-tool log discovery and produce a consistent format:

# Check if audit-logs is installed
AUDIT_SCRIPTS="$(find "$(pwd)/.tessl/tiles" "$HOME/.tessl/tiles" -path "*/audit-logs/skills/audit-logs/scripts/collect_logs.py" -print -quit 2>/dev/null)"

if [ -n "$AUDIT_SCRIPTS" ]; then
  SCRIPTS_DIR="$(dirname "$AUDIT_SCRIPTS")"
  # Collect and normalize logs for this project
  uv run python3 "$SCRIPTS_DIR/collect_logs.py" --project-dir "$(pwd)"
  uv run python3 "$SCRIPTS_DIR/normalize_logs.py" --project-dir "$(pwd)"
fi

Option B: Manual Discovery

If audit-logs isn't available, discover logs directly. See the log discovery guide for detailed paths and formats. The key locations to check:

Cursor: ~/.cursor/projects/*/agent-transcripts/*.jsonl
Claude Code: ~/.claude/projects/*/conversations/ or ~/.claude/conversations/

Match project directories by looking for path fragments that match the current repo name or path.

Determining Available Logs

Before diving into analysis, inventory what's available:

# How many sessions exist for this project?
# What date range do they cover?
# Which tools were used (Cursor, Claude Code, other)?

Report these numbers in your scope metrics.

Analysis Strategy

Step 1: Session Sampling

If there are many sessions (>30), prioritize:

Most recent sessions (last 2-4 weeks)
Longest sessions (often indicate complex or frustrating tasks)
Sessions with many tool calls (indicate iteration-heavy tasks)

For smaller sets, analyze all available sessions.

Step 2: Failure Pattern Identification

Scan conversations for these signals:

Signal	What to grep for
Agent errors	Tool call failures, error messages, exceptions in output
Correction cycles	User messages containing "no", "wrong", "undo", "revert", "try again"
Repeated attempts	Same tool call or file edit appearing 3+ times
Task abandonment	Conversations ending without clear completion
Wrong approach	User redirecting: "instead", "don't use", "use X not Y"

For each failure pattern, note: task attempted, what went wrong, files/modules involved, whether it was eventually resolved.

Step 3: Frustration Signal Detection

Frustration level	Indicators
Mild correction	"No, use X not Y", "That's not right"
Escalating	Shorter messages, more directive language, "just do X"
Repeated instruction	User explaining the same thing 2+ times in one session
Abandonment	Session ends abruptly mid-task
Manual takeover	"I'll do it myself", user switching to manual edits

These are the highest-impact insights — real pain points observable nowhere else.

Step 4: Tool Usage Analysis

Which tools are used most frequently?
Inefficiency patterns: reading the same file repeatedly, excessive search calls, many failed tool invocations
Navigation difficulty: many search attempts before finding the right file
Missing tool usage: tools available but never invoked

Step 5: Topic and Area Clustering

Group findings by:

Code area: Which directories/modules appear most in failed sessions?
Task type: What kinds of tasks cause the most trouble? (new features, bug fixes, refactoring, tests, configuration)
Root cause: Why did the agent fail? (missing knowledge, wrong approach, tool limitation, unclear instructions)

Step 6: Success Patterns

Also note what goes well — areas where the agent completes tasks smoothly. This helps calibrate: if the agent handles service A perfectly but always struggles with service B, that's a strong comparative signal.

Scope Limits

Parse up to 50 sessions in detail
For very large session files (>10K lines), sample representative sections (beginning, middle, end)
Focus on the most recent 30 days of sessions unless asked for broader coverage

What to Look For

Agent logs are uniquely powerful for revealing:

RAF-1 (Repeated area mistakes): The same modules causing trouble across sessions
RAF-3 (Persistent pattern errors): The agent making the same type of mistake repeatedly
RAF-4 (Excessive iteration): Tasks that take many attempts
RAF-6 (User frustration signals): User corrections and frustration in conversations
KCG-1, KCG-2 (Knowledge gaps): The agent clearly not knowing about internal APIs or conventions
KCG-7 (Undocumented dangers): Sessions where the agent runs destructive or risky operations because nothing warned it away
TCG-1, TCG-3 (Missing context/skills): Situations where context or a skill would have prevented the failure

Output

Produce a JSON report conforming to the insight report schema.

If an output_file path is provided (e.g. by the orchestrator), save to that exact path. Do not compute your own filename — use the path as given.

If running standalone (no output_file provided), resolve the contributor filename (caller may pass username= to override):

# Explicit input wins over the git/whoami fallback.
USERNAME="${USERNAME_INPUT:-}"
if [ -z "$USERNAME" ]; then
  USERNAME=$(git config user.name 2>/dev/null \
    | tr '[:upper:]' '[:lower:]' \
    | sed -E 's/[[:space:]]+/-/g; s/[^a-z0-9._-]//g; s/^[-.]+|[-.]+$//g')
fi
[ -z "$USERNAME" ] && USERNAME=$(whoami | tr '[:upper:]' '[:lower:]' | sed -E 's/[^a-z0-9._-]//g')
[ -z "$USERNAME" ] && USERNAME="unknown-user"
DATE=$(date +%Y%m%d)

Default standalone output path: .tessl-insights-poc/reports/agent-logs/${USERNAME}-${DATE}.json.

Set scope.metrics to include:

sessions_parsed: total number of sessions analyzed (denominator for the sessions_with_frustration_signals hero stat)
conversations_analyzed: total conversations (some sessions may have multiple)
tool_calls_examined: approximate number of tool calls reviewed
tools_represented: array of agent-harness slugs present in the analyzed sessions. Use lowercase, underscore-separated slugs so downstream tools can consume them directly — "claude_code", "cursor", "copilot", "gemini_cli", etc. Include every harness that contributed at least one session (e.g. ["claude_code", "cursor"] if both are mixed). This value is propagated through to the synthesized report as data_sources_used[].tools.
sessions_with_frustration_signals: number of distinct parsed sessions exhibiting at least one frustration or correction signal. Definition: a session counts if it contains any of — a user message with clear frustration language (e.g. "wrong", "no", "stop guessing", "ugh", "that's not", "undo", "revert"), a manual takeover ("I'll do it myself"), a same-instruction-repeated-2+-times pattern, or the agent redirected by the user more than once in the session. This is a session-level count (not a per-signal count). Use conservative judgement — borderline cases (a single polite "no, do X instead") do not count. The denominator is sessions_parsed; do not emit a ratio. Always emit this field, even if the count is 0.

Validation before saving:

Verify all required metadata fields
Confirm at least 8 insights referencing specific sessions
Do NOT paste raw conversation content — summarize interaction patterns
Redact any sensitive information (API keys, credentials, personal data)
Confirm scope.metrics.sessions_with_frustration_signals ≤ scope.metrics.sessions_parsed and that every session counted towards it is referenced by at least one RAF-category insight in this report

Mark data_source_exclusive: true for insights requiring actual behavioral observation (frustration signals, iteration patterns, tool misuse).