CtrlK
BlogDocsLog inGet started
Tessl Logo

g14wxz/commerce-database-architect

Database architecture skills, docs, and rules for high-demand multi-tenant commerce platforms (PostgreSQL source of truth, Neo4j as derived GraphRAG projection, transactional outbox, RLS-based tenant isolation). Includes live schema introspection workflow via Postgres MCP.

97

Quality

97%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

expand-contract-migration.mddocs/reference-snippets/

Expand / Contract Migration Template

The canonical zero-downtime migration pattern for production. Skills cite this file when proposing schema changes against a live database.

The four phases — Expand, Backfill, Cutover, Contract — must be deployed separately, not as a single migration. Each phase is its own PR, its own deploy, its own observability window.


Worked example: rename catalog_products.price to catalog_products.list_price

A naive ALTER TABLE catalog_products RENAME COLUMN price TO list_price takes a brief lock, but it requires the application to flip from price to list_price atomically — impossible across multiple service instances.

Instead, four phases:


Phase 1 — Expand (additive, online, reversible)

-- Migration: V0123__add_list_price_to_products.sql
ALTER TABLE catalog_products
  ADD COLUMN list_price NUMERIC(12, 2);
  • Lock impact: AccessExclusiveLock on catalog_products, < 50ms (column add with no default and no constraint is fast in PG ≥ 11).
  • Backward compatibility: old code reads/writes price, ignores list_price. New column is NULL.
  • Rollback: ALTER TABLE catalog_products DROP COLUMN list_price;

Deploy. Observe. Wait for any in-flight long transactions to drain.


Phase 2 — Backfill (batched, throttled, idempotent)

-- Migration: V0124__backfill_list_price_from_price.sql
-- Run in batches of 5,000 rows, throttled

WITH batch AS (
  SELECT id
    FROM catalog_products
   WHERE list_price IS NULL
     AND price IS NOT NULL
   ORDER BY id
   LIMIT 5000
   FOR UPDATE SKIP LOCKED
)
UPDATE catalog_products p
   SET list_price = p.price
  FROM batch b
 WHERE p.id = b.id;

Run a script that loops:

while true; do
  ROWS=$(psql -tA -c "<the UPDATE above returning row count>")
  if [ "$ROWS" -eq 0 ]; then break; fi
  echo "backfilled $ROWS rows"
  sleep 0.5      # throttle to limit replication lag
done
  • Lock impact: RowExclusiveLock per batch, < 100ms each.
  • Backward compatibility: writes from old code go to price only; list_price catches up via re-running the batch script (idempotent WHERE list_price IS NULL).
  • Rollback: UPDATE is reversible by UPDATE ... SET list_price = NULL, but typically you'd just continue forward to Phase 1's rollback.

Observe replication lag. Pause between batches if lag exceeds threshold.


Phase 3 — Cutover (application reads/writes new column, old still works)

Application changes (deployed independently per service):

- product.price = body.price
+ product.list_price = body.price
+ product.price = body.price          # dual-write during cutover

For reads:

- return { price: product.price }
+ return { price: product.list_price ?? product.price }
  • Backward compatibility: during the cutover window, both columns are populated on writes. Old service instances still using price continue to work.
  • Rollback: revert the application deploy. The DB schema is unchanged.

Wait until all service instances have deployed the new code. Use a feature flag if a cleaner rollout is needed:

if feature.enabled('use_list_price'):
    write_list_price(value)
else:
    write_price(value)
write_both_during_cutover(value)

Phase 4 — Contract (enforce, drop old surface)

-- Migration: V0125__make_list_price_required.sql
ALTER TABLE catalog_products
  ALTER COLUMN list_price SET NOT NULL;

-- Migration: V0126__drop_old_price_column.sql
ALTER TABLE catalog_products
  DROP COLUMN price;
  • Lock impact: SET NOT NULL requires a full table scan in PG < 12; in PG ≥ 12 it can use an existing CHECK (list_price IS NOT NULL) NOT VALID validated separately to avoid the lock.
  • Backward compatibility: none — old code reading price will fail. Only run after the cutover deploy is fully rolled out and burned in.
  • Rollback: ALTER TABLE ... ADD COLUMN price ... and re-backfill from list_price. Costly. The contract phase is a one-way door — only ship when confident.

Phases for other change types

Adding a NOT NULL column with a default

PhaseStep
1. ExpandADD COLUMN x INT; (nullable, no default)
2. BackfillUPDATE ... SET x = ... WHERE x IS NULL; (batched)
3. CutoverApp writes x on every insert/update
4. ContractALTER ... SET NOT NULL; + add default for safety

Changing a column type

PhaseStep
1. ExpandADD COLUMN x_new NEW_TYPE;
2. BackfillUPDATE ... SET x_new = x::NEW_TYPE;
3. CutoverDual-write x and x_new
4. ContractDrop x, rename x_new to x

Adding an index

CREATE INDEX CONCURRENTLY idx_name ON table_name (cols);

CONCURRENTLY avoids blocking writes. If the build fails (e.g. invalid index), drop and retry. Never use plain CREATE INDEX on a production table over 1M rows.

Renaming a table

PhaseStep
1. ExpandCREATE VIEW old_name AS SELECT * FROM new_name; (or vice versa with renamed table)
2. CutoverApp switches to new name
3. ContractDrop the view

Adding a foreign key

-- Phase 1: add as NOT VALID (no immediate scan, no lock)
ALTER TABLE child_table
  ADD CONSTRAINT child_parent_fk
    FOREIGN KEY (parent_id) REFERENCES parent_table (id)
    NOT VALID;

-- Phase 2: validate (scan, no exclusive lock)
ALTER TABLE child_table VALIDATE CONSTRAINT child_parent_fk;

Lock impact reference

OperationLockNotes
ADD COLUMN (nullable, no default)AccessExclusiveLock, briefSafe
ADD COLUMN (with non-volatile default, PG ≥ 11)AccessExclusiveLock, briefSafe
ADD COLUMN (with volatile default)AccessExclusiveLock, full rewriteAvoid
DROP COLUMNAccessExclusiveLock, briefLogical only; reclaim space later
ALTER COLUMN ... TYPE (compatible)AccessExclusiveLock, briefCheck PG docs per type
ALTER COLUMN ... TYPE (incompatible)AccessExclusiveLock, full rewriteAvoid — use 4-phase
SET NOT NULL (PG < 12)AccessExclusiveLock, full scanUse NOT VALID + VALIDATE in PG 12+
CREATE INDEXShareLock, blocks writesUse CONCURRENTLY
CREATE INDEX CONCURRENTLYShareUpdateExclusiveLockSafe; can fail on duplicates

Common mistakes (don't)

  • Skipping the backfill phase because "the column is nullable, who cares". When the column becomes required, you'll be stuck.
  • Doing all four phases in one migration. The cutover phase requires application code that doesn't exist until you deploy it. Migrations and deploys must be ordered correctly.
  • Using CREATE INDEX without CONCURRENTLY on production tables. Blocks writes for the duration of the build.
  • Using a volatile default in ADD COLUMN. Forces a full table rewrite under AccessExclusiveLock.
  • Running ALTER COLUMN ... TYPE between incompatible types. Forces a full table rewrite. Use the four-phase pattern.
  • Forgetting lock_timeout. Always set SET lock_timeout = '5s'; before a DDL statement so it fails fast instead of queueing behind a long transaction.

docs

reference-snippets

expand-contract-migration.md

idempotency-keys.md

inventory-movements.md

optimistic-concurrency.md

outbox-pattern.md

rls-templates.md

event-catalog.md

index.md

tile.json