or run

tessl search
Log in

task-orchestrator

tessl install github:jdrhyne/agent-skills --skill task-orchestrator

github.com/jdrhyne/agent-skills

Autonomous multi-agent task orchestration with dependency analysis, parallel tmux/Codex execution, and self-healing heartbeat monitoring. Use for large projects with multiple issues/tasks that need coordinated parallel execution.

Review Score

82%

Validation Score

12/16

Implementation Score

77%

Activation Score

85%

Task Orchestrator

Autonomous orchestration of multi-agent builds using tmux + Codex with self-healing monitoring.

Load the senior-engineering skill alongside this one for engineering principles.

Core Concepts

1. Task Manifest

A JSON file defining all tasks, their dependencies, files touched, and status.

{
  "project": "project-name",
  "repo": "owner/repo",
  "workdir": "/path/to/worktrees",
  "created": "2026-01-17T00:00:00Z",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {
          "id": "t1",
          "issue": 1,
          "title": "Fix X",
          "files": ["src/foo.js"],
          "dependsOn": [],
          "status": "pending",
          "worktree": null,
          "tmuxSession": null,
          "startedAt": null,
          "lastProgress": null,
          "completedAt": null,
          "prNumber": null
        }
      ]
    }
  ]
}

2. Dependency Rules

  • Same file = sequential — Tasks touching the same file must run in order or merge
  • Different files = parallel — Independent tasks can run simultaneously
  • Explicit depends = waitdependsOn array enforces ordering
  • Phase gates — Next phase waits for current phase completion

3. Execution Model

  • Each task gets its own git worktree (isolated branch)
  • Each task runs in its own tmux session
  • Use Codex with --yolo for autonomous execution
  • Model: GPT-5.2-codex high (configurable)

Setup Commands

Initialize Orchestration

# 1. Create working directory
WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)"
mkdir -p "$WORKDIR"

# 2. Clone repo for worktrees
git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo"
cd "$WORKDIR/repo"

# 3. Create tmux socket
SOCKET="$WORKDIR/orchestrator.sock"

# 4. Initialize manifest
cat > "$WORKDIR/manifest.json" << 'EOF'
{
  "project": "PROJECT_NAME",
  "repo": "OWNER/REPO",
  "workdir": "WORKDIR_PATH",
  "socket": "SOCKET_PATH",
  "created": "TIMESTAMP",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": []
}
EOF

Analyze GitHub Issues for Dependencies

# Fetch all open issues
gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json

# Group by files mentioned in issue body
# Tasks touching same files should serialize

Create Worktrees

# For each task, create isolated worktree
cd "$WORKDIR/repo"
git worktree add -b fix/issue-N "$WORKDIR/task-tN" main

Launch Tmux Sessions

SOCKET="$WORKDIR/orchestrator.sock"

# Create session for task
tmux -S "$SOCKET" new-session -d -s "task-tN"

# Launch Codex (uses gpt-5.2-codex with reasoning_effort=high from ~/.codex/config.toml)
# Note: Model config is in ~/.codex/config.toml, not CLI flag
tmux -S "$SOCKET" send-keys -t "task-tN" \
  "cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter

Monitoring & Self-Healing

Progress Check Script

#!/bin/bash
# check_progress.sh - Run via heartbeat

WORKDIR="$1"
SOCKET="$WORKDIR/orchestrator.sock"
MANIFEST="$WORKDIR/manifest.json"
STALL_THRESHOLD_MINS=20

check_session() {
  local session="$1"
  local task_id="$2"
  
  # Capture recent output
  local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)
  
  # Check for completion indicators
  if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then
    echo "DONE:$task_id"
    return 0
  fi
  
  # Check for errors
  if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then
    echo "ERROR:$task_id"
    return 1
  fi
  
  # Check for stall (prompt waiting for input)
  if echo "$output" | grep -qE "(\? |Continue\?|y/n|Press any key)"; then
    echo "STUCK:$task_id:waiting_for_input"
    return 2
  fi
  
  echo "RUNNING:$task_id"
  return 0
}

# Check all active sessions
for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do
  check_session "$session" "$session"
done

Self-Healing Actions

When a task is stuck, the orchestrator should:

  1. Waiting for input → Send appropriate response

    tmux -S "$SOCKET" send-keys -t "$session" "y" Enter
  2. Error/failure → Capture logs, analyze, retry with fixes

    # Capture error context
    tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"
    
    # Kill and restart with error context
    tmux -S "$SOCKET" kill-session -t "$session"
    tmux -S "$SOCKET" new-session -d -s "$session"
    tmux -S "$SOCKET" send-keys -t "$session" \
      "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter
  3. No progress for 20+ mins → Nudge or restart

    # Check git log for recent commits
    cd "$WORKDIR/$task_id"
    LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)
    
    # If no commits in threshold, restart

Heartbeat Cron Setup

# Add to cron (every 15 minutes)
cron action:add job:{
  "label": "orchestrator-heartbeat",
  "schedule": "*/15 * * * *",
  "prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself."
}

Workflow: Full Orchestration Run

Step 1: Analyze & Plan

# 1. Fetch issues
gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json

# 2. Analyze for dependencies (files mentioned, explicit deps)
# Group into phases:
# - Phase 1: Critical/blocking issues (no deps)
# - Phase 2: High priority (may depend on Phase 1)
# - Phase 3: Medium/low (depends on earlier phases)

# 3. Within each phase, identify:
# - Parallel batch: Different files, no deps → run simultaneously
# - Serial batch: Same files or explicit deps → run in order

Step 2: Create Manifest

Write manifest.json with all tasks, dependencies, file mappings.

Step 3: Launch Phase 1

# Create worktrees for Phase 1 tasks
for task in phase1_tasks; do
  git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main
done

# Launch tmux sessions
for task in phase1_parallel_batch; do
  tmux -S "$SOCKET" new-session -d -s "task-$id"
  tmux -S "$SOCKET" send-keys -t "task-$id" \
    "cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter
done

Step 4: Monitor & Self-Heal

Heartbeat checks every 15 mins:

  1. Poll all sessions
  2. Update manifest with progress
  3. Self-heal stuck tasks
  4. When all Phase N tasks complete → launch Phase N+1

Step 5: Create PRs

# When task completes successfully
cd "$WORKDIR/task-$id"
git push -u origin "fix/issue-$issue"
gh pr create --repo OWNER/REPO \
  --head "fix/issue-$issue" \
  --title "fix: Issue #$issue - $TITLE" \
  --body "Closes #$issue

## Changes
[Auto-generated by Codex orchestrator]

## Testing
- [ ] Unit tests pass
- [ ] Manual verification"

Step 6: Cleanup

# After all PRs merged or work complete
tmux -S "$SOCKET" kill-server
cd "$WORKDIR/repo"
for task in all_tasks; do
  git worktree remove "$WORKDIR/task-$id" --force
done
rm -rf "$WORKDIR"

Manifest Status Values

StatusMeaning
pendingNot started yet
blockedWaiting on dependency
runningCodex session active
stuckNeeds intervention (auto-heal)
errorFailed, needs retry
completeDone, ready for PR
pr_openPR created
mergedPR merged

Example: Security Framework Orchestration

{
  "project": "nuri-security-framework",
  "repo": "jdrhyne/nuri-security-framework",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
        {"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
        {"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
      ]
    },
    {
      "name": "Phase 2: High",
      "tasks": [
        {"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
        {"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
        {"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
        {"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
        {"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
      ]
    }
  ]
}

Parallel execution in Phase 1:

  • t1 and t3 run in parallel (different files)
  • t2 waits for t1 (same file)

Parallel execution in Phase 2:

  • t4, t6, t7 can start together
  • t5 waits for t4, t8 waits for t7

Tips

  1. Always use GPT-5.2-codex high for complex work: --model gpt-5.2-codex-high
  2. Clear prompts — Include issue number, description, expected outcome, test instructions
  3. Atomic commits — Tell Codex to commit after each logical change
  4. Push early — Push to remote branch so progress isn't lost if session dies
  5. Checkpoint logs — Capture tmux output periodically to files
  6. Phase gates — Don't start Phase N+1 until Phase N is 100% complete
  7. Self-heal aggressively — If stuck >10 mins, intervene automatically
  8. Browser relay limits — If CDP automation is blocked, use iframe batch scraping or manual browser steps

Integration with Other Skills

  • senior-engineering: Load for build principles and quality gates
  • coding-agent: Reference for Codex CLI patterns
  • github: Use for PR creation, issue management

Lessons Learned (2026-01-17)

Codex Sandbox Limitations

When using codex exec --full-auto, the sandbox:

  • No network accessgit push fails with "Could not resolve host"
  • Limited filesystem — Can't write to paths like ~/nuri_workspace

Heartbeat Detection Improvements

The heartbeat should check for:

  1. Shell prompt idle — If tmux pane shows username@hostname path %, worker is done
  2. Unpushed commitsgit log @{u}.. --oneline shows commits not on remote
  3. Push failures — Look for "Could not resolve host" in output

When detected, the orchestrator (not the worker) should:

  1. Push the commit from outside the sandbox
  2. Create the PR via gh pr create
  3. Update manifest and notify

Recommended Pattern

# In heartbeat, for each task:
cd /tmp/orchestrator-*/task-tN
if tmux capture-pane shows shell prompt; then
  # Worker finished, check for unpushed work
  if git log @{u}.. --oneline | grep -q .; then
    git push -u origin HEAD
    gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main
  fi
fi