Run a comprehensive health check of the joelclaw system — k8s cluster, worker, Inngest, Redis, Typesense/OTEL, tests, TypeScript, repo sync, memory pipeline, pi-tools, git config, active loops, disk, stale tests. Outputs a 1-10 score with per-component breakdown. Use when: 'system health', 'health check', 'is everything working', 'system status', 'how's the system', 'check everything', or at session start to orient.
92
89%
Does it follow best practices?
Impact
99%
7.07xAverage score across 3 eval scenarios
Passed
No known issues
Run scripts/health.sh for a full system health report with 1-10 score.
~/Code/joelhooks/joelclaw/skills/joelclaw-system-check/scripts/health.sh| Check | What | Green (10) | Yellow (5-7) | Red (1-3) |
|---|---|---|---|---|
| k8s cluster | pods in joelclaw namespace | 4/4 Running, 0 restarts | partial pods | no pods |
| pds | AT Proto PDS on :2583 | version + collections | pod running, port-forward down | pod not running |
| worker | system-bus on :3111 | 16+ functions | responding, low count | down |
| inngest server | :8288 reachable | responding | — | down |
| redis/gateway | Redis + gateway session queues | connected, low pending queue | connected, backlog rising | unavailable |
| typesense/otel | Typesense health + OTEL query path | healthy + queryable | healthy, query degraded | unavailable |
| tests | bun test in system-bus | 0 fail | — | failures |
| tsc | tsc --noEmit | clean | — | type errors |
| repo sync | monorepo HEAD vs origin/main | in sync | ahead/behind | repo unavailable |
| memory pipeline | joelclaw inngest memory-health | healthy checks | degraded checks | failing checks |
| pi-tools | extension deps installed | all 3 deps | — | missing |
| git config | user.name + email set | set | — | missing |
| active loops | joelclaw loop list | queryable | query degraded | unavailable |
| gogcli | Google Workspace auth | account authed, token valid | token stored, no password | not configured |
| disk | free space + loop tmp | <80% used | — | >80% |
| stale tests | __tests__/ + acceptance tests | clean | — | present |
Repo drift: cd ~/Code/joelhooks/joelclaw && git fetch origin && git status -sb
pi-tools broken: cd ~/.pi/agent/git/github.com/joelhooks/pi-tools && bun add @sinclair/typebox @mariozechner/pi-coding-agent @mariozechner/pi-tui @mariozechner/pi-ai
PDS unreachable: kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 & (or if pod down: kubectl rollout restart deployment/bluesky-pds -n joelclaw)
Worker down: joelclaw inngest restart-worker --register
Stale tests: rm -rf ~/Code/joelhooks/joelclaw/packages/system-bus/__tests__/ && find ~/Code/joelhooks/joelclaw/packages/system-bus/src -name "*.acceptance.test.ts" -delete
Loop tmp bloat: rm -rf /tmp/agent-loop/loop-*/ (only when no loops are running)
When a run appears stuck after first step:
joelclaw run <run-id>If trace shows Finalization failure with "Unable to reach SDK URL":
Verify registration/health:
joelclaw inngest status
Verify function is present where expected:
joelclaw functions | rg -i "manifest-archive|<function-name>"
Check for stale app registrations in Inngest UI/API and remove stale SDK URLs.
Assume possible handler blocking (not just network): review recent step code for filesystem/Redis/subprocess blocking before step response.
825972c
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.