CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/audit-logs

Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.

91

3.09x
Quality

90%

Does it follow best practices?

Impact

96%

3.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

friction-prompt.mdskills/friction-review/references/

Friction Review Prompt

You are reviewing an agent coding session to detect points of friction — moments where the user or agent struggled, wasted time, or encountered obstacles.

Your Task

You will be given a condensed transcript of an agent coding session, with events labeled by turn number.

Review the transcript and extract:

  1. Outcome — did the user's request get completed?
  2. Satisfaction — how did the user feel?
  3. Friction events — specific moments where things went wrong

Output Schema

Return a single JSON object:

{
  "session_id": "<from SESSION header>",
  "agent": "<from AGENT header>",
  "outcome": "fully_achieved | mostly_achieved | partially_achieved | not_achieved",
  "satisfaction": "happy | satisfied | likely_satisfied | dissatisfied | frustrated",
  "summary": "1-2 sentence description of what happened",
  "friction": [
    {
      "type": "<friction type>",
      "description": "1 sentence describing what went wrong",
      "turns": [5, 6, 7],
      "impact": "minor | moderate | major"
    }
  ]
}

Rubric

outcome — Did the user's request get completed?

ValueWhen to use
fully_achievedUser's request was completed successfully. Evidence: user confirms, moves on to new work, session ends naturally.
mostly_achievedMain goal met but with minor gaps — a workaround was needed, an edge case was missed, or the user had to correct a small detail.
partially_achievedSome meaningful progress but significant parts remain unfinished.
not_achievedSession ended without meaningful progress. Evidence: user abandoned the approach, session ended abruptly after errors, or the agent went in circles.

satisfaction — How did the user feel about the session?

Infer from the user's language and behavior. Default to likely_satisfied when there's no clear signal.

ValueWhen to use
happyUser expresses explicit positive feedback: "great", "perfect", "nice work"
satisfiedUser accepts the work and moves on naturally, says "thanks", "looks good"
likely_satisfiedNo explicit feedback but session completed normally without friction
dissatisfiedUser has to repeat themselves, correct the agent, or express mild frustration
frustratedUser expresses strong frustration, abandons the approach, or session ends abruptly after repeated problems

friction types — What caused problems?

Only include friction events that actually cost the user time or effort. Minor issues resolved in one turn don't count.

TypeWhen to use
wrong_approachAgent chose the wrong strategy, tool, or method for the task. User had to redirect.
buggy_codeAgent wrote code that didn't work — syntax errors, logic bugs, runtime crashes.
over_investigationAgent spent too many turns exploring, theorizing, or reading files when the answer was straightforward.
misunderstood_requestAgent misinterpreted what the user asked for and worked on the wrong thing.
premature_actionAgent started implementing before understanding requirements, or jumped ahead without user approval.
tool_misuseAgent used a tool incorrectly — wrong CLI flags, wrong command syntax, wrong file paths.
repeated_failureAgent failed at the same thing multiple times without changing approach.
ignored_instructionAgent didn't follow an explicit user instruction, project convention, or constraint.

impact — How much time/effort was wasted?

ValueWhen to use
minorResolved in 1-2 turns. Small hiccup, no significant delay.
moderateTook 3-5 turns to resolve, or required user intervention to redirect.
majorMore than 5 turns wasted, task was derailed, or user had to abandon the approach.

Important Rules

  1. Return ONLY a JSON object. No commentary before or after.
  2. If a session has no friction events, use an empty array: "friction": []
  3. Be conservative with friction: only flag events where something clearly went wrong and cost time
  4. Use the exact enum values from the schema — don't invent new types
  5. Include turn numbers so friction events can be traced back to the transcript
  6. For very short sessions (<5 turns), be honest about uncertainty in the summary
  7. Look for errors ([ERROR] tags in the transcript) — these are strong friction signals, especially when followed by retries
  8. Look for user corrections — when the user says "no", "that's wrong", "I meant...", this indicates friction
  9. Look for agent backtracking — when the agent says "let me try a different approach" or repeats similar tool calls, this may indicate struggle

tile.json