Database architecture skills, docs, and rules for high-demand multi-tenant commerce platforms (PostgreSQL source of truth, Neo4j as derived GraphRAG projection, transactional outbox, RLS-based tenant isolation). Includes live schema introspection workflow via Postgres MCP.
97
97%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
The canonical zero-downtime migration pattern for production. Skills cite this file when proposing schema changes against a live database.
The four phases — Expand, Backfill, Cutover, Contract — must be deployed separately, not as a single migration. Each phase is its own PR, its own deploy, its own observability window.
catalog_products.price to catalog_products.list_priceA naive ALTER TABLE catalog_products RENAME COLUMN price TO list_price
takes a brief lock, but it requires the application to flip from price to
list_price atomically — impossible across multiple service instances.
Instead, four phases:
-- Migration: V0123__add_list_price_to_products.sql
ALTER TABLE catalog_products
ADD COLUMN list_price NUMERIC(12, 2);catalog_products, < 50ms (column
add with no default and no constraint is fast in PG ≥ 11).price, ignores
list_price. New column is NULL.ALTER TABLE catalog_products DROP COLUMN list_price;Deploy. Observe. Wait for any in-flight long transactions to drain.
-- Migration: V0124__backfill_list_price_from_price.sql
-- Run in batches of 5,000 rows, throttled
WITH batch AS (
SELECT id
FROM catalog_products
WHERE list_price IS NULL
AND price IS NOT NULL
ORDER BY id
LIMIT 5000
FOR UPDATE SKIP LOCKED
)
UPDATE catalog_products p
SET list_price = p.price
FROM batch b
WHERE p.id = b.id;Run a script that loops:
while true; do
ROWS=$(psql -tA -c "<the UPDATE above returning row count>")
if [ "$ROWS" -eq 0 ]; then break; fi
echo "backfilled $ROWS rows"
sleep 0.5 # throttle to limit replication lag
doneprice only;
list_price catches up via re-running the batch script (idempotent
WHERE list_price IS NULL).UPDATE ... SET list_price = NULL,
but typically you'd just continue forward to Phase 1's rollback.Observe replication lag. Pause between batches if lag exceeds threshold.
Application changes (deployed independently per service):
- product.price = body.price
+ product.list_price = body.price
+ product.price = body.price # dual-write during cutoverFor reads:
- return { price: product.price }
+ return { price: product.list_price ?? product.price }price continue
to work.Wait until all service instances have deployed the new code. Use a feature flag if a cleaner rollout is needed:
if feature.enabled('use_list_price'):
write_list_price(value)
else:
write_price(value)
write_both_during_cutover(value)-- Migration: V0125__make_list_price_required.sql
ALTER TABLE catalog_products
ALTER COLUMN list_price SET NOT NULL;
-- Migration: V0126__drop_old_price_column.sql
ALTER TABLE catalog_products
DROP COLUMN price;SET NOT NULL requires a full table scan in PG < 12; in
PG ≥ 12 it can use an existing CHECK (list_price IS NOT NULL) NOT VALID
validated separately to avoid the lock.price will fail.
Only run after the cutover deploy is fully rolled out and burned in.ALTER TABLE ... ADD COLUMN price ... and re-backfill from
list_price. Costly. The contract phase is a one-way door — only ship
when confident.| Phase | Step |
|---|---|
| 1. Expand | ADD COLUMN x INT; (nullable, no default) |
| 2. Backfill | UPDATE ... SET x = ... WHERE x IS NULL; (batched) |
| 3. Cutover | App writes x on every insert/update |
| 4. Contract | ALTER ... SET NOT NULL; + add default for safety |
| Phase | Step |
|---|---|
| 1. Expand | ADD COLUMN x_new NEW_TYPE; |
| 2. Backfill | UPDATE ... SET x_new = x::NEW_TYPE; |
| 3. Cutover | Dual-write x and x_new |
| 4. Contract | Drop x, rename x_new to x |
CREATE INDEX CONCURRENTLY idx_name ON table_name (cols);CONCURRENTLY avoids blocking writes. If the build fails (e.g. invalid
index), drop and retry. Never use plain CREATE INDEX on a production
table over 1M rows.
| Phase | Step |
|---|---|
| 1. Expand | CREATE VIEW old_name AS SELECT * FROM new_name; (or vice versa with renamed table) |
| 2. Cutover | App switches to new name |
| 3. Contract | Drop the view |
-- Phase 1: add as NOT VALID (no immediate scan, no lock)
ALTER TABLE child_table
ADD CONSTRAINT child_parent_fk
FOREIGN KEY (parent_id) REFERENCES parent_table (id)
NOT VALID;
-- Phase 2: validate (scan, no exclusive lock)
ALTER TABLE child_table VALIDATE CONSTRAINT child_parent_fk;| Operation | Lock | Notes |
|---|---|---|
ADD COLUMN (nullable, no default) | AccessExclusiveLock, brief | Safe |
ADD COLUMN (with non-volatile default, PG ≥ 11) | AccessExclusiveLock, brief | Safe |
ADD COLUMN (with volatile default) | AccessExclusiveLock, full rewrite | Avoid |
DROP COLUMN | AccessExclusiveLock, brief | Logical only; reclaim space later |
ALTER COLUMN ... TYPE (compatible) | AccessExclusiveLock, brief | Check PG docs per type |
ALTER COLUMN ... TYPE (incompatible) | AccessExclusiveLock, full rewrite | Avoid — use 4-phase |
SET NOT NULL (PG < 12) | AccessExclusiveLock, full scan | Use NOT VALID + VALIDATE in PG 12+ |
CREATE INDEX | ShareLock, blocks writes | Use CONCURRENTLY |
CREATE INDEX CONCURRENTLY | ShareUpdateExclusiveLock | Safe; can fail on duplicates |
CREATE INDEX without CONCURRENTLY on production tables.
Blocks writes for the duration of the build.ADD COLUMN. Forces a full table rewrite
under AccessExclusiveLock.ALTER COLUMN ... TYPE between incompatible types. Forces a
full table rewrite. Use the four-phase pattern.lock_timeout. Always set SET lock_timeout = '5s'; before
a DDL statement so it fails fast instead of queueing behind a long
transaction.docs
skills
adr-drafting
commerce-database-architecture
graph-rag-boundary-review
mermaid-diagram-review
outbox-and-eventing-design
postgres-schema-introspection
schema-evolution-workflow