Break down goals into multiple tasks and coordinate execution with gates and recovery. Based on Claw Code's agentic harness.
92
90%
Does it follow best practices?
Impact
100%
1.09xAverage score across 3 eval scenarios
Passed
No known issues
Enable any LLM to reliably orchestrate multi-step processes by separating planning from execution, enforcing approval gates before action, and recovering intelligently from failures.
Problem solved: LLM agents either skip planning (act immediately, make mistakes) or plan without gating (no approval before risky actions) or fail once and stop (no recovery strategy).
This spec solves: Structured workflow with planning → approval → execution → recovery → observability.
What the system receives:
Constraints:
What the system produces:
| Stage | Output | Example |
|---|---|---|
| Planning | Ordered task list with dependencies | "Task 1: Analyze\nTask 2: Design (depends on 1)\nTask 3: Implement (depends on 2)" |
| Gate | Approval decision + any modifications | "APPROVED" or "APPROVED_MODIFIED: reorder Task 2 and 3" |
| Execution | Task completion log with progress | "[Task 1/3] ✓ Completed\n[Task 2/3] ✓ Completed" |
| Recovery | Recovery event + outcome | "RECOVERY_APPLIED: retry_transient → success" |
| Final | Execution summary + outcomes | "3/3 tasks completed. 0 failures." |
The system enforces two separate phases:
This prevents the "plan one thing, do another" problem.
User Goal
↓
[1. PLAN] → Decompose into ordered tasks
↓
[2. GATE] → BLOCKS execution until approved
↓
[3. EXECUTE] → Run tasks in order
↓ (if failure)
[4. RECOVER] → Classify + apply recovery recipe
↓
[5. LOG] → Record all state changes
↓
OutcomeStage Independence: Each stage has a single responsibility. Stages can be implemented separately in different languages/systems.
Decision 1: Gates Are Mandatory
Decision 2: Failures Are Classified
Decision 3: Execution Follows Plan Exactly
Decision 4: Everything Is Logged
Input: User goal
Output: Ordered task list with dependencies
LLM responsibility: Break goal into concrete, testable, ordered tasks
Task Requirements:
| Requirement | Definition | Pass | Fail |
|---|---|---|---|
| Concrete | Specific action, not vague | "Remove unused imports" | "Clean code" |
| Testable | Pass/fail is objective | "Response < 200ms" | "Make it faster" |
| Ordered | Execution sequence clear | Task 2 (depends on 1) | Task 1, 2, 3 |
| Sized | Not too big/small | 3-7 tasks | 1 task or 50 tasks |
Dependency Notation:
Task 1: Base functionality
Task 2: Feature (depends on Task 1)
Task 3: Parallel work (depends on Task 1)
Task 4: Integration (depends on Task 2, Task 3)Failure Modes:
Success Criteria:
Emit Event: PLAN_CREATED { task_count, dependencies }
Input: Proposed plan
Output: Approval decision (approve/modify/reject)
Human/Policy responsibility: Review plan before execution
Gate Actions:
| Action | When | Example |
|---|---|---|
| APPROVE | Plan looks correct | "Yes, proceed" |
| APPROVE_WITH_MODIFICATIONS | Reorder/add/remove tasks | "Approved, but skip Task 2" |
| ASK_FOR_CLARIFICATION | Don't understand | "What does 'sync' mean?" |
| REJECT | Wrong approach | "No, try different strategy" |
Gate Requirements:
Approval Signals (what counts as approval):
Rejection/Modification Signals:
Failure Modes:
Emit Events:
GATE_APPROVAL_REQUESTED { plan_id }GATE_APPROVED { modifications }GATE_REJECTED { reason }Input: Approved plan + working environment
Output: Task completions and failures
LLM responsibility: Execute tasks in order, follow the plan
Execution Rules:
TASK_COMPLETED or TASK_FAILEDProgress Format (MANDATORY for every task):
Use clear notation showing N/M progress with completion symbols. Format is flexible; choose what fits your task:
Required elements:
Acceptable formats:
[Task 1/5] ✓ Check database: Performance acceptable (2s query)
Step 2/5: Check API limits (✗ failed: 429 Rate limit)
Step 2/5 - Retry: Check API limits (✓ completed: 1000 req/hour)
Item 3/5 [COMPLETED] Update config: Deployment settings validatedAll formats above are valid because they show:
Critical requirement: Every task MUST show clear N/M progress + completion symbol. Choose any format that includes both elements.
Failure Modes:
Emit Events:
TASK_STARTED { task_id, task_name }TASK_COMPLETED { task_id, result }TASK_FAILED { task_id, error }Input: A task failure
Output: Recovery action or escalation
Policy responsibility: Define recovery recipes per failure type
Failure Classification:
| Failure Type | Characteristics | Recovery | Max Retries |
|---|---|---|---|
| Transient | Temporary, will likely succeed if retried | Retry with backoff (5s, 30s, 5m) | 2 |
| Permission | Access denied, needs approval/credentials | Ask human for approval or credentials | 1 (after user provides input) |
| Invalid Input | Data is malformed | Ask human for correction | 1 (after user provides input) |
| Logic Error | Code/approach is wrong | Fix approach and retry | 1 |
| Unrecoverable | Task no longer makes sense | Skip task or abort plan | 0 |
Recovery Recipe Structure:
Failure Type: [name]
Detection: [how to identify this failure]
Recovery Steps:
1. [action]
2. [action]
3. [escalation policy if still failing]
Max Attempts: [number]
Escalation: [what happens if max attempts exceeded]Example Recipe: Network Timeout
Failure Type: Transient (Network Timeout)
Detection: "Connection timeout" or "No response after 30s"
Recovery Steps:
1. Wait 5 seconds
2. Retry the request
3. If still fails, wait 30s and retry again
4. If still fails, emit RECOVERY_ESCALATION
Max Attempts: 2
Escalation: Ask user: "Network unstable. Retry, skip, or abort?"
- Retry: continue from step 1
- Skip: skip this task, continue with next
- Abort: halt plan executionExample Recipe: Permission Denied
Failure Type: Permission
Detection: "403 Forbidden" or "401 Unauthorized"
Recovery Steps:
1. Emit PERMISSION_REQUIRED event
2. STOP execution. Do NOT ask user to provide credentials during execution.
3. Instruct user to set credentials via secure channels (environment variables,
config files, credential vaults) BEFORE execution resumes
4. Ask: "Credentials configured? Retry, or skip/abort task?"
Max Attempts: 1 (after user configures credentials via secure means)
Escalation: User decides (skip or abort)
SECURITY NOTE: Never request, log, or handle credentials as text input during
execution. Credentials must come from secure sources (env vars, vaults, config
files) that are not exposed in logs or output.Failure Modes:
Emit Events:
FAILURE_DETECTED { task_id, error_message }FAILURE_CLASSIFIED { failure_type }RECOVERY_APPLIED { recipe_name, outcome }RECOVERY_ESCALATION { reason, user_decision }Input: All system state changes
Output: Complete, immutable event log
System responsibility: Record everything
What Gets Logged:
| Event | When | Format |
|---|---|---|
| PLAN_CREATED | Plan is generated | { task_count, dependencies } |
| GATE_APPROVAL_REQUESTED | Plan ready for review | { plan_id } |
| GATE_APPROVED | Approval given | { modifications_if_any } |
| TASK_STARTED | Task begins execution | { task_id, task_name } |
| TASK_COMPLETED | Task succeeds | { task_id, result } |
| TASK_FAILED | Task fails | { task_id, error_message } |
| FAILURE_CLASSIFIED | Error is categorized | { failure_type } |
| RECOVERY_APPLIED | Recovery strategy executed | { recipe_name, outcome } |
| EXECUTION_COMPLETE | All tasks done | { summary: completed, failed, skipped } |
Log Entry Format:
{
"timestamp": "2024-01-15T14:23:45Z",
"event": "TASK_COMPLETED",
"task_id": "task_1",
"task_name": "Analyze input",
"details": { "result": "5 requirements identified" }
}Log Properties:
Why Logging Matters:
Principle 1: Classify Before Recovering Never retry blindly. Identify failure type first, apply appropriate recovery.
Principle 2: Stop on Unknown Errors If error type not in recovery recipes, escalate to human.
Principle 3: Finite Retries Each recovery recipe specifies max attempts. Prevent infinite loops.
Principle 4: User Decides on Escalation If recovery exhausted, ask user: retry, skip, or abort? Respect their decision.
These are patterns teams can customize:
Transient Error Policy:
Permission Error Policy:
Logic Error Policy:
The event log is the single source of truth. Agents can:
External systems can:
This pattern is designed for task orchestration but has implications for credential management:
✓ Safe Use Cases:
⚠ Caution Required:
✗ Not Recommended: