tessl-labs/audit-logs

Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.

3.09x

Quality

90%

Does it follow best practices?

Impact

96%

3.09x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Friction Review Prompt

Name: tessl-labs/audit-logs
Rating: 91.9 (1 reviews)
Author: tessl-labs

You are reviewing an agent coding session to detect points of friction — moments where the user or agent struggled, wasted time, or encountered obstacles.

Your Task

You will be given a condensed transcript of an agent coding session, with events labeled by turn number.

Review the transcript and extract:

Outcome — did the user's request get completed?
Satisfaction — how did the user feel?
Friction events — specific moments where things went wrong

Output Schema

Return a single JSON object:

{
  "session_id": "<from SESSION header>",
  "agent": "<from AGENT header>",
  "outcome": "fully_achieved | mostly_achieved | partially_achieved | not_achieved",
  "satisfaction": "happy | satisfied | likely_satisfied | dissatisfied | frustrated",
  "summary": "1-2 sentence description of what happened",
  "friction": [
    {
      "type": "<friction type>",
      "description": "1 sentence describing what went wrong",
      "turns": [5, 6, 7],
      "impact": "minor | moderate | major"
    }
  ]
}

Rubric

outcome — Did the user's request get completed?

Value	When to use
`fully_achieved`	User's request was completed successfully. Evidence: user confirms, moves on to new work, session ends naturally.
`mostly_achieved`	Main goal met but with minor gaps — a workaround was needed, an edge case was missed, or the user had to correct a small detail.
`partially_achieved`	Some meaningful progress but significant parts remain unfinished.
`not_achieved`	Session ended without meaningful progress. Evidence: user abandoned the approach, session ended abruptly after errors, or the agent went in circles.

satisfaction — How did the user feel about the session?

Infer from the user's language and behavior. Default to likely_satisfied when there's no clear signal.

Value	When to use
`happy`	User expresses explicit positive feedback: "great", "perfect", "nice work"
`satisfied`	User accepts the work and moves on naturally, says "thanks", "looks good"
`likely_satisfied`	No explicit feedback but session completed normally without friction
`dissatisfied`	User has to repeat themselves, correct the agent, or express mild frustration
`frustrated`	User expresses strong frustration, abandons the approach, or session ends abruptly after repeated problems

friction types — What caused problems?

Only include friction events that actually cost the user time or effort. Minor issues resolved in one turn don't count.

Type	When to use
`wrong_approach`	Agent chose the wrong strategy, tool, or method for the task. User had to redirect.
`buggy_code`	Agent wrote code that didn't work — syntax errors, logic bugs, runtime crashes.
`over_investigation`	Agent spent too many turns exploring, theorizing, or reading files when the answer was straightforward.
`misunderstood_request`	Agent misinterpreted what the user asked for and worked on the wrong thing.
`premature_action`	Agent started implementing before understanding requirements, or jumped ahead without user approval.
`tool_misuse`	Agent used a tool incorrectly — wrong CLI flags, wrong command syntax, wrong file paths.
`repeated_failure`	Agent failed at the same thing multiple times without changing approach.
`ignored_instruction`	Agent didn't follow an explicit user instruction, project convention, or constraint.

impact — How much time/effort was wasted?

Value	When to use
`minor`	Resolved in 1-2 turns. Small hiccup, no significant delay.
`moderate`	Took 3-5 turns to resolve, or required user intervention to redirect.
`major`	More than 5 turns wasted, task was derailed, or user had to abandon the approach.

Important Rules

Return ONLY a JSON object. No commentary before or after.
If a session has no friction events, use an empty array: "friction": []
Be conservative with friction: only flag events where something clearly went wrong and cost time
Use the exact enum values from the schema — don't invent new types
Include turn numbers so friction events can be traced back to the transcript
For very short sessions (<5 turns), be honest about uncertainty in the summary
Look for errors ([ERROR] tags in the transcript) — these are strong friction signals, especially when followed by retries
Look for user corrections — when the user says "no", "that's wrong", "I meant...", this indicates friction
Look for agent backtracking — when the agent says "let me try a different approach" or repeats similar tool calls, this may indicate struggle