Database architecture skills, docs, and rules for high-demand multi-tenant commerce platforms (PostgreSQL source of truth, Neo4j as derived GraphRAG projection, transactional outbox, RLS-based tenant isolation). Includes live schema introspection workflow via Postgres MCP.
97
97%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
This is the canonical outbox implementation for the platform. Skills cite this file rather than re-deriving the pattern. Copy it as-is for new projects; deviate only with an ADR.
CREATE TABLE outbox_messages (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id BIGINT NOT NULL,
aggregate_type TEXT NOT NULL, -- 'Order', 'Product'
aggregate_id BIGINT NOT NULL, -- the affected row
event_type TEXT NOT NULL, -- 'OrderConfirmed'
event_version SMALLINT NOT NULL DEFAULT 1,
payload JSONB NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
available_at TIMESTAMPTZ NOT NULL DEFAULT now(), -- for delayed/retry
processed_at TIMESTAMPTZ,
attempts SMALLINT NOT NULL DEFAULT 0,
last_error TEXT,
trace_id TEXT
);
-- Cheap polling: partial index on the pending set
CREATE INDEX outbox_messages_pending_idx
ON outbox_messages (available_at, id)
WHERE processed_at IS NULL;
-- Optional: tenant-aware lookup for debugging
CREATE INDEX outbox_messages_tenant_aggregate_idx
ON outbox_messages (tenant_id, aggregate_type, aggregate_id);When the table grows beyond 50M rows, partition by month on occurred_at.
BEGIN;
UPDATE sales_orders
SET status = 'confirmed',
confirmed_at = now(),
version = version + 1
WHERE id = $1
AND version = $2;
INSERT INTO outbox_messages (
tenant_id, aggregate_type, aggregate_id, event_type, payload, trace_id
) VALUES (
$tenant_id, 'Order', $order_id, 'OrderConfirmed', $payload::jsonb, $trace_id
);
COMMIT;Either both rows are persisted or neither is. No external publish happens inside the transaction.
Multiple relay workers can run concurrently. FOR UPDATE SKIP LOCKED lets
them claim disjoint sets without contention.
WITH claimed AS (
SELECT id
FROM outbox_messages
WHERE processed_at IS NULL
AND available_at <= now()
ORDER BY id
LIMIT 100
FOR UPDATE SKIP LOCKED
)
SELECT m.*
FROM outbox_messages m
JOIN claimed c ON c.id = m.id;The relay then publishes each message to the event bus. On success:
UPDATE outbox_messages
SET processed_at = now()
WHERE id = ANY($1::bigint[]);On failure (transient):
UPDATE outbox_messages
SET attempts = attempts + 1,
available_at = now() + (LEAST(attempts + 1, 8) * INTERVAL '30 seconds'),
last_error = $error
WHERE id = $id;On permanent failure (attempts > 10), move to DLQ:
WITH moved AS (
DELETE FROM outbox_messages WHERE id = $id RETURNING *
)
INSERT INTO outbox_messages_dlq SELECT * FROM moved;After 7 days, processed messages move to outbox_messages_archive,
partitioned by month, retained for 90 days unless compliance requires
longer.
CREATE TABLE outbox_messages_archive (
LIKE outbox_messages INCLUDING DEFAULTS INCLUDING CONSTRAINTS
) PARTITION BY RANGE (occurred_at);
-- Initial partition
CREATE TABLE outbox_messages_archive_202605
PARTITION OF outbox_messages_archive
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');Hourly archival job (idempotent):
WITH moved AS (
DELETE FROM outbox_messages
WHERE processed_at IS NOT NULL
AND processed_at < now() - INTERVAL '7 days'
RETURNING *
)
INSERT INTO outbox_messages_archive SELECT * FROM moved;Drop archive partitions older than 90 days monthly.
Consumers track their position to enable replay and ensure idempotency.
CREATE TABLE consumer_offsets (
consumer_name TEXT NOT NULL,
partition_key TEXT NOT NULL DEFAULT '',
last_processed_id BIGINT NOT NULL,
last_processed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (consumer_name, partition_key)
);A consumer reads:
SELECT *
FROM outbox_messages
WHERE id > (
SELECT last_processed_id
FROM consumer_offsets
WHERE consumer_name = $name AND partition_key = $partition
)
AND processed_at IS NOT NULL
ORDER BY id
LIMIT $batch_size;After processing, update the offset in the same transaction as the consumer's state change.
Required metrics:
outbox_lag_seconds (gauge) — now() - MIN(occurred_at) WHERE processed_at IS NULLoutbox_throughput (counter) — messages processed per secondoutbox_failures (counter) — by event_typeoutbox_dlq_depth (gauge) — COUNT(*) FROM outbox_messages_dlqRequired alerts:
attempts > 10FOR UPDATE SKIP LOCKED. Concurrent relays will
double-process or contend on the same rows.processed_at before the publish ack returns. If the publish
fails after the update, the event is lost.docs
skills
adr-drafting
commerce-database-architecture
graph-rag-boundary-review
mermaid-diagram-review
outbox-and-eventing-design
postgres-schema-introspection
schema-evolution-workflow