Implement and verify joelclaw observability on every change so failures cannot stay silent. Use when adding/updating Inngest functions, gateway channels, webhook providers, APIs, workers, or any pipeline step. Enforces canonical OTEL contract, storage path, and verification gates. Triggers on: 'o11y', 'observability', 'logging', 'otel', 'instrument this', 'silent failure', 'add telemetry', 'log this function'.
90
87%
Does it follow best practices?
Impact
98%
2.80xAverage score across 3 eval scenarios
Passed
No known issues
Prevent silent failure by default. Observability is not optional polish: it is part of done.
packages/system-bus/src/observability/otel-event.tspackages/system-bus/src/observability/emit.tspackages/system-bus/src/observability/store.tsemitOtelEvent or emitMeasuredOtelEvent.emitGatewayOtel.POST /observability/emit (packages/system-bus/src/serve.ts), not ad-hoc writes.console.log as primary observability. Keep structured events as source of truth.metadata, not in facet fields (source, component, level, success).success: false with a meaningful error.step.run(...) to avoid replay duplication after resume.source: subsystem (worker, gateway, webhook, memory, verification, etc.)component: stable module/service name (check-system-health, redis-channel, observe)action: stable dotted action (system.health.checked, events.immediate_telegram)metadata: request IDs, deployment IDs, function IDs, session IDs, payload identifiersduration_ms: include for timed operationsUse event-per-hop (wide event style): one context-rich event for each major boundary/operation, not scattered string logs.
metadata.
debug/info for normal activity, warn for degraded but recoverable, error/fatal for failures.For full checklists and command recipes, read references/implementation-checklist.md.
import { emitMeasuredOtelEvent } from "../../observability/emit";
await emitMeasuredOtelEvent(
{
level: "info",
source: "worker",
component: "content-sync",
action: "content_sync.run",
metadata: { trigger: event.name },
},
async () => {
await runSync();
}
);import { emitGatewayOtel } from "../observability";
await emitGatewayOtel({
level: "error",
component: "redis-channel",
action: "events.immediate_telegram",
success: false,
error: "telegram_send_failed",
metadata: { sessionId, queueDepth },
});scripts/otel-smoke.sh).joelclaw otel list and joelclaw otel stats show expected behavior.source, component, and action.Use this when step code appears to run but runs remain RUNNING/CANCELLED with Finalization errors.
joelclaw run <run-id>Look for errors.Finalization.stack containing Unable to reach SDK URL.
joelclaw inngest status
joelclaw logs worker --lines 200
joelclaw logs errors --lines 200If an action that should emit once (for example manifest.archive.prereqs-passed) appears hundreds of times in one run window, move that emit into its own step.run.
joelclaw otel search "manifest.archive.prereqs-passed" --hours 1Unable to reach SDK URL as an ambiguous symptom.It can indicate ingress problems, but in practice it can also happen when a function handler blocks on local IO/dependencies long enough that finalization cannot complete.
Use scripts/otel-smoke.sh for a fast end-to-end probe:
./skills/o11y-logging/scripts/otel-smoke.sh verification o11y-skill probe.emitpackages/system-bus/src/observability/otel-event.tspackages/system-bus/src/observability/emit.tspackages/system-bus/src/observability/store.tspackages/system-bus/src/serve.tspackages/gateway/src/observability.tspackages/system-bus/src/inngest/functions/check-system-health.tspackages/cli/src/commands/otel.tsapps/web/app/api/otel/route.ts825972c
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.