Name: benpiper-workspace/planning-execution-harness
Rating: 92 (1 reviews)
Author: benpiper-workspace

benpiper-workspace/planning-execution-harness

Break down goals into multiple tasks and coordinate execution with gates and recovery. Based on Claw Code's agentic harness.

1.09x

Quality

90%

Does it follow best practices?

Impact

100%

1.09x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Evaluation results

100%

Build a Resilient Data Sync Pipeline

Criteria

Without context

With context

Transient category present

100%

Permission category present

100%

Invalid input category present

100%

Unrecoverable category present

100%

Transient retry limit

100%

Permission escalation signal

100%

Unrecoverable escalation

90%

100%

Checkpoint / no-re-run of completed stages

100%

Recovery resumes from failed task

100%

At least four failure categories in docs

100%

Transient wait before retry

100%

21%

Automate a Production Release Pipeline

Criteria

Without context

With context

Task count in range

100%

Dependency notation used

100%

Tasks are concrete and testable

100%

Risky step flagged

100%

Gate approval present

100%

Log has timestamps

100%

Log has event type labels

100%

PLAN_CREATED event logged

50%

100%

Progress format per task

20%

100%

Execution complete event

100%

Tasks in ordered sequence

100%

Legacy Code Audit and Remediation

Criteria

Without context

With context

Plan lists tasks separately

100%

Plan flags review required

100%

GATE_APPROVED event logged

100%

PLAN_CREATED event logged

100%

Progress format [Task N/M]

50%

100%

Timestamps in every log entry

100%

Event type labels present

100%

TASK_STARTED and TASK_COMPLETED logged per task

100%

EXECUTION_COMPLETE event logged

100%

Failure classified in log

100%

Outcome report matches log

100%

Tasks executed in plan order

100%

No mid-execution task additions without log note

100%

Evaluated: 2 months ago
Agent: Claude
Model: Claude Sonnet 4.6

Table of Contents

Build a Resilient Data Sync Pipeline Automate a Production Release Pipeline Legacy Code Audit and Remediation