CtrlK
BlogDocsLog inGet started
Tessl Logo

joelclaw-system-check

Run a comprehensive health check of the joelclaw system — k8s cluster, worker, Inngest, Redis, Typesense/OTEL, tests, TypeScript, repo sync, memory pipeline, pi-tools, git config, active loops, disk, stale tests. Outputs a 1-10 score with per-component breakdown. Use when: 'system health', 'health check', 'is everything working', 'system status', 'how's the system', 'check everything', or at session start to orient.

92

7.07x
Quality

89%

Does it follow best practices?

Impact

99%

7.07x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

joelclaw System Health Check

Run scripts/health.sh for a full system health report with 1-10 score.

~/Code/joelhooks/joelclaw/skills/joelclaw-system-check/scripts/health.sh

What It Checks (16 components)

CheckWhatGreen (10)Yellow (5-7)Red (1-3)
k8s clusterpods in joelclaw namespace4/4 Running, 0 restartspartial podsno pods
pdsAT Proto PDS on :2583version + collectionspod running, port-forward downpod not running
workersystem-bus on :311116+ functionsresponding, low countdown
inngest server:8288 reachablerespondingdown
redis/gatewayRedis + gateway session queuesconnected, low pending queueconnected, backlog risingunavailable
typesense/otelTypesense health + OTEL query pathhealthy + queryablehealthy, query degradedunavailable
testsbun test in system-bus0 failfailures
tsctsc --noEmitcleantype errors
repo syncmonorepo HEAD vs origin/mainin syncahead/behindrepo unavailable
memory pipelinejoelclaw inngest memory-healthhealthy checksdegraded checksfailing checks
pi-toolsextension deps installedall 3 depsmissing
git configuser.name + email setsetmissing
active loopsjoelclaw loop listqueryablequery degradedunavailable
gogcliGoogle Workspace authaccount authed, token validtoken stored, no passwordnot configured
diskfree space + loop tmp<80% used>80%
stale tests__tests__/ + acceptance testscleanpresent

When to Run

  • Session start — orient on system state before doing work
  • After loops complete — verify nothing broke
  • After infra changes — k8s, worker, Redis config
  • When something feels off — quick triage

Fixing Common Issues

Repo drift: cd ~/Code/joelhooks/joelclaw && git fetch origin && git status -sb

pi-tools broken: cd ~/.pi/agent/git/github.com/joelhooks/pi-tools && bun add @sinclair/typebox @mariozechner/pi-coding-agent @mariozechner/pi-tui @mariozechner/pi-ai

PDS unreachable: kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 & (or if pod down: kubectl rollout restart deployment/bluesky-pds -n joelclaw)

Worker down: joelclaw inngest restart-worker --register

Stale tests: rm -rf ~/Code/joelhooks/joelclaw/packages/system-bus/__tests__/ && find ~/Code/joelhooks/joelclaw/packages/system-bus/src -name "*.acceptance.test.ts" -delete

Loop tmp bloat: rm -rf /tmp/agent-loop/loop-*/ (only when no loops are running)

Inngest Hung-Run Quick Triage

When a run appears stuck after first step:

joelclaw run <run-id>

If trace shows Finalization failure with "Unable to reach SDK URL":

  1. Verify registration/health: joelclaw inngest status

  2. Verify function is present where expected: joelclaw functions | rg -i "manifest-archive|<function-name>"

  3. Check for stale app registrations in Inngest UI/API and remove stale SDK URLs.

  4. Assume possible handler blocking (not just network): review recent step code for filesystem/Redis/subprocess blocking before step response.

Repository
joelhooks/joelclaw
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.