CtrlK
BlogDocsLog inGet started
Tessl Logo

benpiper-workspace/planning-execution-harness

Break down goals into multiple tasks and coordinate execution with gates and recovery. Based on Claw Code's agentic harness.

92

1.09x
Quality

90%

Does it follow best practices?

Impact

100%

1.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Evaluation results

100%

1%

Build a Resilient Data Sync Pipeline

Criteria
Without context
With context

Transient category present

100%

100%

Permission category present

100%

100%

Invalid input category present

100%

100%

Unrecoverable category present

100%

100%

Transient retry limit

100%

100%

Permission escalation signal

100%

100%

Unrecoverable escalation

90%

100%

Checkpoint / no-re-run of completed stages

100%

100%

Recovery resumes from failed task

100%

100%

At least four failure categories in docs

100%

100%

Transient wait before retry

100%

100%

100%

21%

Automate a Production Release Pipeline

Criteria
Without context
With context

Task count in range

0%

100%

Dependency notation used

100%

100%

Tasks are concrete and testable

100%

100%

Risky step flagged

100%

100%

Gate approval present

100%

100%

Log has timestamps

100%

100%

Log has event type labels

100%

100%

PLAN_CREATED event logged

50%

100%

Progress format per task

20%

100%

Execution complete event

100%

100%

Tasks in ordered sequence

100%

100%

100%

5%

Legacy Code Audit and Remediation

Criteria
Without context
With context

Plan lists tasks separately

100%

100%

Plan flags review required

100%

100%

GATE_APPROVED event logged

100%

100%

PLAN_CREATED event logged

100%

100%

Progress format [Task N/M]

50%

100%

Timestamps in every log entry

100%

100%

Event type labels present

100%

100%

TASK_STARTED and TASK_COMPLETED logged per task

100%

100%

EXECUTION_COMPLETE event logged

100%

100%

Failure classified in log

100%

100%

Outcome report matches log

100%

100%

Tasks executed in plan order

100%

100%

No mid-execution task additions without log note

100%

100%

Evaluated
Agent
Claude
Model
Claude Sonnet 4.6

Table of Contents