CtrlK
BlogDocsLog inGet started
Tessl Logo

honeybadge/virtui

automatically control and record tui application sessions from the terminal

93

4.45x
Quality

94%

Does it follow best practices?

Impact

89%

4.45x

Average score across 3 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-3/

{
  "context": "Tests whether the agent correctly uses wait strategies (choosing the right one for the scenario), increases timeout for long-running operations, uses screen_hash for change detection, and handles timeouts by taking a screenshot to diagnose state.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Uses --json flag",
      "description": "All virtui commands that return data include --json or -j",
      "max_score": 6
    },
    {
      "name": "Daemon check and start",
      "description": "Script checks daemon status and starts it if not running before creating sessions",
      "max_score": 6
    },
    {
      "name": "Appropriate wait strategy",
      "description": "Uses --wait \"BUILD COMPLETE\" or --wait-regex with a relevant pattern (or pipeline wait with text condition) rather than polling with sleeps alone",
      "max_score": 15
    },
    {
      "name": "Timeout increased",
      "description": "The wait or exec command uses --timeout with a value greater than 30000 (e.g. 120000) to accommodate a potentially long-running build",
      "max_score": 12
    },
    {
      "name": "Screenshot on timeout",
      "description": "On a timeout condition (caught via exit code or error output), script runs `virtui screenshot` to capture the current terminal state",
      "max_score": 15
    },
    {
      "name": "Error state captured",
      "description": "The screen content captured on timeout is written to `error_state.txt`",
      "max_score": 8
    },
    {
      "name": "screen_hash used",
      "description": "Script references or logs the screen_hash value from screenshot output (rather than only comparing full screen_text) for change detection or logging",
      "max_score": 10
    },
    {
      "name": "Retryable check",
      "description": "Script or README references checking the `retryable` field from error output before deciding to retry",
      "max_score": 8
    },
    {
      "name": "Session killed",
      "description": "Session is killed with `virtui kill` after work is done (success or failure path)",
      "max_score": 8
    },
    {
      "name": "Daemon stopped",
      "description": "Script runs `virtui daemon stop` at the end",
      "max_score": 5
    },
    {
      "name": "Build report written",
      "description": "build_report.txt is written and contains captured screen output on the success path",
      "max_score": 7
    }
  ]
}

evals

tile.json