Break down goals into multiple tasks and coordinate execution with gates and recovery. Based on Claw Code's agentic harness.
92
90%
Does it follow best practices?
Impact
100%
1.09xAverage score across 3 eval scenarios
Passed
No known issues
Transient category present
100%
100%
Permission category present
100%
100%
Invalid input category present
100%
100%
Unrecoverable category present
100%
100%
Transient retry limit
100%
100%
Permission escalation signal
100%
100%
Unrecoverable escalation
90%
100%
Checkpoint / no-re-run of completed stages
100%
100%
Recovery resumes from failed task
100%
100%
At least four failure categories in docs
100%
100%
Transient wait before retry
100%
100%
Task count in range
0%
100%
Dependency notation used
100%
100%
Tasks are concrete and testable
100%
100%
Risky step flagged
100%
100%
Gate approval present
100%
100%
Log has timestamps
100%
100%
Log has event type labels
100%
100%
PLAN_CREATED event logged
50%
100%
Progress format per task
20%
100%
Execution complete event
100%
100%
Tasks in ordered sequence
100%
100%
Plan lists tasks separately
100%
100%
Plan flags review required
100%
100%
GATE_APPROVED event logged
100%
100%
PLAN_CREATED event logged
100%
100%
Progress format [Task N/M]
50%
100%
Timestamps in every log entry
100%
100%
Event type labels present
100%
100%
TASK_STARTED and TASK_COMPLETED logged per task
100%
100%
EXECUTION_COMPLETE event logged
100%
100%
Failure classified in log
100%
100%
Outcome report matches log
100%
100%
Tasks executed in plan order
100%
100%
No mid-execution task additions without log note
100%
100%