CtrlK
BlogDocsLog inGet started
Tessl Logo

alonso-skills/postgres

Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.

94

1.36x
Quality

94%

Does it follow best practices?

Impact

94%

1.36x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

73-streaming-replication.mdreferences/

Physical Streaming Replication

[!WARNING] PG12 watershed recovery.conf removed in PG12. Server refuses to start if file exists. Standby identity now via standby.signal (presence file in $PGDATA) + standby GUCs in postgresql.conf. Configurations carried forward from PG≤11 fail at startup. Verbatim PG12 release-note: "recovery.conf is no longer used, and the server will not start if that file exists. recovery.signal and standby.signal files are now used to switch into non-primary mode. The trigger_file setting has been renamed to promote_trigger_file. The standby_mode setting has been removed."1

[!WARNING] PG16 watershed promote_trigger_file GUC removed. Use pg_ctl promote or pg_promote() instead.2

Table of Contents

When to Use This Reference

Pick this file for: setting up physical streaming replication primary+standby, choosing synchronous_commit level, picking synchronous_standby_names quorum vs priority mode, deciding hot_standby_feedback on/off, sizing max_standby_*_delay, planning controlled switchover, recovering from query-cancel cascades on standby, monitoring lag via pg_stat_replication.

NOT this file for: logical replication (→ 74-logical-replication.md), replication slot mechanics deep dive (→ 75-replication-slots.md), failover orchestration via Patroni/repmgr (→ 77-standby-failover.md, 78-ha-architectures.md, 79-patroni.md), pg_rewind for diverged former primaries (→ 89-pg-rewind.md), backup/PITR (→ 84-backup-physical-pitr.md).

Mental Model — Five Rules

  1. Physical streaming = byte-for-byte WAL ship. Standby applies primary's WAL records. Same binary. Standby identical to primary at byte level. Cannot replicate selectively (no per-table, no row filter — that's logical replication).

  2. Standby identity = standby.signal file + primary_conninfo GUC since PG12. Before PG12: recovery.conf. After PG12: empty standby.signal file in $PGDATA + postgresql.conf (or postgresql.auto.conf) with primary_conninfo='host=primary port=5432 user=replicator ...'. Server refuses startup if recovery.conf exists.1

  3. synchronous_commit is per-transaction durability dial. Five levels: off / local / remote_write / on (default) / remote_apply. Orthogonal to synchronous_standby_names — names list says which standbys count; level says how far WAL must travel before COMMIT returns.

  4. synchronous_standby_names has three syntaxes. Simple list (priority, first N), FIRST N (s1,s2,s3) (priority — first N must ack), ANY N (s1,s2,s3) (quorum — any N must ack). Empty string = async.

  5. hot_standby_feedback = on is bidirectional trade-off. Standby tells primary its xmin horizon. Primary delays vacuum to keep tuples visible to standby. Long-running standby query → primary table bloat. Off → standby query cancelled when primary vacuums tuples it still needs.

Decision Matrix

NeedUseDefaultProduction valueAvoid
Standby identitystandby.signal file (PG12+)nonetouch $PGDATA/standby.signalrecovery.conf (removed PG12)
Connect to primaryprimary_conninfo GUCemptyfull libpq conninfo with replication userhard-coded credentials in postgresql.conf (use service file)
Bound primary WAL retentionprimary_slot_name + max_slot_wal_keep_sizenone / -1named slot + max_slot_wal_keep_size = 64GBunbounded slot — primary disk fills
Block standby query cancel due to vacuumhot_standby_feedback = onoffon for query-heavy standbysleaving on with abandoned standby — primary bloats
Allow standby readshot_standby = onononoff only for hot-replay-only standby
Sync replication, single standby must acksynchronous_standby_names = 'standby1' + synchronous_commit = onempty / onnamed standby + onremote_apply for OLTP (latency punishment)
Sync replication, quorum (any N of M)ANY 2 (s1,s2,s3)n/aquorum mode for HA across AZspriority mode when AZ failure shouldn't block COMMIT
Per-transaction async commitSET LOCAL synchronous_commit = offn/ahigh-volume ingest in known-loseable transactionscluster-wide off on transactional workload
Cascading replicationstandby with max_wal_senders > 0 + downstream primary_conninfo pointing at itn/afor geographic distributionwhen primary can serve all standbys directly
Long-running standby reports without primary bloathot_standby_feedback = off + raise max_standby_streaming_delay30s5min for reporting standbysinfinite (-1) on transactional standby — query cancels appear at random
Apply WAL with delay (PITR safety net)recovery_min_apply_delay = '1h'01h for human-error rollback windowhigh values without monitoring slot lag
Promote standbypg_promote() (PG12+)n/aSELECT pg_promote(true, 60)pg_ctl promote from cron — race conditions

Three smell signals:

  • pg_stat_replication.replay_lag always NULL → standby idle, nothing to compare. Send dummy WAL via heartbeat-insert on primary. Not a bug.
  • pg_stat_replication.state = 'catchup' for hours → standby behind. Slot retention working but standby can't keep up. Check standby disk I/O, max_wal_senders, network.
  • Standby queries silently cancelled with ERROR: canceling statement due to conflict with recoverymax_standby_*_delay too low for workload, hot_standby_feedback = off. Pick one mitigation.

Mechanics

Setup Topology

Physical streaming replication = one primary + N standbys. Standby connects to primary as replication user, walreceiver process consumes WAL, startup process applies. Primary may also archive WAL (decoupled from streaming). Standbys may cascade.

┌──────────────────┐
                   │   Primary (rw)   │
                   │                  │
                   │ walsender ──────►│ standby1 (read-only)
                   │ walsender ──────►│ standby2 (read-only)
                   │                  │   │
                   │ archiver ──► WAL │   ▼
                   │   archive  store │ walsender ──► standby3 (cascading)
                   └──────────────────┘

Primary needs:

  • wal_level = replica (or logical)
  • max_wal_senders >= N + slack
  • max_replication_slots >= N
  • replication-user role (CREATE ROLE rep REPLICATION LOGIN PASSWORD 'x')
  • pg_hba.conf rule allowing replication connections

Standby needs:

  • base backup from primary (pg_basebackup -R writes both standby.signal and primary_conninfo)
  • standby.signal file in $PGDATA
  • primary_conninfo GUC pointing at primary
  • optional: primary_slot_name for bounded WAL retention

Standby Identity Files (PG12+)

Two presence files (empty content; their existence triggers behavior):

FileEffectUsed when
standby.signalServer starts in standby mode. Streams + applies WAL forever. Promotion ends standby mode.Long-running replicas
recovery.signalServer starts in recovery mode. Applies WAL until recovery_target_* reached, then promotes.Point-in-time recovery

Verbatim from recovery-config.html: "In releases prior to PostgreSQL 12, recovery configuration was specified in a separate recovery.conf file."3

Both files: empty content. PostgreSQL checks for existence at startup. Removing standby.signal mid-life requires restart (file is read at startup, not periodically).

pg_basebackup -R writes standby.signal automatically, plus appends primary_conninfo to postgresql.auto.conf.

primary_conninfo and primary_slot_name

primary_conninfo (sighup-context, since PG12 was here) = libpq conninfo string. Standby's walreceiver uses it to connect.

# postgresql.auto.conf (or postgresql.conf)
primary_conninfo = 'host=primary.example.com port=5432 user=replicator
                    sslmode=verify-full sslrootcert=/etc/postgres/ca.crt
                    application_name=standby1 channel_binding=require
                    options=''-c statement_timeout=0'''
primary_slot_name = 'standby1_slot'

Three operational rules:

  1. application_name in primary_conninfo is what shows up in pg_stat_replication.application_name AND what synchronous_standby_names matches against. Pick stable names.

  2. Use libpq service file (~/.pg_service.conf) or PGSERVICE to avoid embedding password literally. Cross-reference 48-authentication-pg-hba.md.

  3. primary_slot_name ties standby to a named replication slot on primary. Without it, primary may recycle WAL standby still needs (controlled by wal_keep_size on primary). With it, slot retains WAL until standby acknowledges — but unbounded retention if standby stalls. Cap with max_slot_wal_keep_size. Cross-reference 75-replication-slots.md.

wal_level Required Settings

wal_level (postmaster-context — restart required) controls WAL detail:

ValueWhat's loggedStreaming replication?Logical replication?
minimalJust enough for crash recoveryNO (standby cannot attach)NO
replica (default)Enough for streaming + base backupYESNO
logicalReplica + per-row info for logical decodingYESYES

Pick replica for physical streaming. Pick logical if planning to also run logical replication / CDC. Logical implies replica.

hot_standby (Read Queries on Standby)

hot_standby = on (default since PG10) on standby = allow read-only queries during recovery. off = standby is replay-only, no client queries.

Standby restrictions (verbatim from hot-standby.html):

  • No INSERT/UPDATE/DELETE/MERGE
  • No DDL
  • No SERIALIZABLE isolation (verbatim "Serializable transactions are not allowed on hot standby servers")4
  • pg_replication_origin_* functions disabled
  • LISTEN/NOTIFY disabled
  • Temp tables: only via pg_temp namespace allowed since PG14

hot_standby_feedback (Block Primary Vacuum)

hot_standby_feedback = on (sighup-context) makes standby's walreceiver send its xmin horizon back to primary. Primary delays autovacuum from removing tuples standby still references.

Trade-off:

DirectionIf onIf off (default)
Standby queryWill not be cancelled by recovery conflict on vacuum-removed tuplesCancelled with ERROR: canceling statement due to conflict with recovery after max_standby_streaming_delay
PrimaryAutovacuum delayed → table bloat growsAutovacuum runs unrestricted
Failure modeAbandoned/stuck standby keeps primary's xmin horizon back forever → unbounded bloatRandom query cancels on standby under heavy primary write load

Recommendation: on for reporting/analytics standbys (queries are long-running, bloat acceptable). off for HA-only standbys (short queries, prefer cancellation over primary bloat).

synchronous_commit Durability Dial

synchronous_commit (user-context — set per session, transaction, or cluster) controls how far WAL must travel before COMMIT returns success.

ValueWhere data is when COMMIT returnsLatencyDurability
offIn primary's WAL buffer (asynchronous flush)LowestLoss window: up to wal_writer_delay × 3 (~600ms default). Crash-safe but recent commits may vanish. NOT corruption — bounded loss.
localFlushed to primary's local disk (fsync)LowSurvives primary crash. Does NOT wait for any standby.
remote_writePrimary's disk + standby has received WAL into memoryMediumSurvives primary crash if at least one sync standby alive. NOT durable on standby's disk.
on (default)Primary's disk + standby's disk (flushed)HigherSurvives both primary and standby crash.
remote_applyPrimary's disk + standby's disk + standby has REPLAYED the WALHighestReads on standby see this commit immediately. Required for read-after-write on standby.

Per-transaction override:

BEGIN;
SET LOCAL synchronous_commit = off;
INSERT INTO event_log (payload) SELECT payload FROM staging;
COMMIT;

Cluster-wide via ALTER SYSTEM or postgresql.conf. Per-role via ALTER ROLE webapp SET synchronous_commit = on;. Cross-reference 46-roles-privileges.md.

local is useful when you have synchronous standbys configured but a specific batch transaction can tolerate primary-only durability — bypasses sync wait without compromising primary fsync.

synchronous_standby_names (Quorum vs Priority)

synchronous_standby_names (sighup-context) = list of standbys that must ack before sync synchronous_commit levels return.

Three syntaxes:

# Empty string (default) — async, no standby is sync
synchronous_standby_names = ''

# Simple list = priority mode, FIRST 1 implicit
synchronous_standby_names = 'standby1, standby2, standby3'

# Priority mode explicit — first N from list must ack
synchronous_standby_names = 'FIRST 2 (standby1, standby2, standby3)'

# Quorum mode — ANY N of the listed must ack
synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'

# Wildcard — any standby with matching application_name
synchronous_standby_names = 'ANY 1 (*)'
ModeSemanticsWhen to use
Empty stringAsync — no standby blocks COMMITBest throughput, no commit-time HA guarantee
Simple list / FIRST NPriority — must hear from first N in order. Falls through list if a higher-priority standby disconnects.Hierarchy with preferred sync (e.g., same-AZ over remote-AZ)
ANY NQuorum — any N of M acks suffice. Tolerates failure of (M − N) standbys.Multi-AZ HA where any AZ failure should not block COMMIT

Names matched against pg_stat_replication.application_name. application_name set via primary_conninfo's application_name=... parameter.

Operational rules:

  • Sync standby that disconnects → COMMIT blocks until either (a) standby reconnects, (b) timeout via wal_sender_timeout, (c) operator removes it from list and reloads. No automatic fallback to async — explicit policy.
  • pg_stat_replication.sync_state shows current status: async / potential (in list but not currently sync) / sync (currently sync) / quorum (member of quorum group).

max_standby_archive_delay / max_standby_streaming_delay

When standby's WAL apply needs to remove tuples a running standby query references, conflict resolution:

GUCDefaultApplies to
max_standby_archive_delay30sWAL replayed from archive (restore_command)
max_standby_streaming_delay30sWAL replayed from streaming connection

Three behaviors:

  • -1 = wait forever. Standby query never cancelled. WAL apply blocks indefinitely. Lag accumulates.
  • 0 = no delay. Cancel queries immediately on conflict. Maximum apply throughput.
  • positive value = wait this long, then cancel.

Combine with hot_standby_feedback: feedback prevents the conflict from arising; max_standby_*_delay controls behavior when conflict already arose.

Recovery Target GUCs

When recovery.signal present (point-in-time recovery, NOT regular standby), recovery target controls when to stop replay and promote.

GUCFormatEffect
recovery_target'immediate'Stop at first consistent point after base backup
recovery_target_timetimestampStop at first commit after this time
recovery_target_xidXIDStop at this transaction
recovery_target_lsnLSNStop at this WAL position
recovery_target_namestringStop at named restore point (set via pg_create_restore_point() on primary before crash)
recovery_target_timeline'latest' (PG12+ default), 'current', or specific TLIWhich timeline to follow
recovery_target_actionpause (default) / promote / shutdownWhat to do at target
recovery_target_inclusivetrue (default) / falseInclude target xact in replay or stop just before

Verbatim PG12: "Cause recovery to advance to the latest timeline by default ... recovery_target_timeline now defaults to latest. Previously, it defaulted to current."5

Verbatim PG12: "Do not allow multiple conflicting recovery_target* specifications ... only allow one of recovery_target, recovery_target_lsn, recovery_target_name, recovery_target_time, and recovery_target_xid."6

Cross-reference 84-backup-physical-pitr.md for full PITR walkthrough.

Cascading Replication

Standby may serve as upstream for downstream standby. Reduces load on primary, useful for geographic distribution.

Setup:

  • Cascading source standby needs max_wal_senders > 0 (already required for any standby that receives traffic).
  • Downstream standby's primary_conninfo points at cascading-source standby instead of original primary.
  • Downstream sees same WAL stream eventually. Promotion of cascading source promotes downstream's source — but downstream itself needs its own promotion to become a primary.

Constraints:

  • Cascading source must have wal_level >= replica.
  • Synchronous replication only via direct standby of primary — cascaded standbys are always async with respect to primary's COMMIT.

Promotion (pg_promote())

pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60) (PG12+) ends recovery, promotes to primary.7

SELECT pg_promote(true, 60);  -- wait up to 60s for promotion to complete

Returns true if promoted, false if timeout. Standby becomes primary, can accept writes. New timeline ID created.

Pre-PG12 alternatives (still work):

  • pg_ctl promote -D $PGDATA
  • Touch promote_trigger_file (REMOVED in PG16)2

Recovery-pause alternative:

SELECT pg_wal_replay_pause();    -- pause apply (does not promote)
SELECT pg_get_wal_replay_pause_state();  -- 'not paused' / 'pause requested' / 'paused' (PG14+)
SELECT pg_wal_replay_resume();   -- resume apply

Verbatim PG14: "Add function pg_get_wal_replay_pause_state() to report the recovery state ... It gives more detailed information than pg_is_wal_replay_paused(), which still exists."8

After promotion: timeline file (<timeline>.history) records the divergence. Old primary becomes diverged — re-attaching requires pg_rewind or fresh pg_basebackup. Cross-reference 89-pg-rewind.md.

pg_stat_replication View

Run on primary to inspect downstream standbys.

ColumnMeaning
pidPrimary-side walsender backend PID
usesysid, usenameReplication role on primary
application_nameStandby's identifier (matched by synchronous_standby_names)
client_addr, client_hostname, client_portNetwork endpoint of standby
backend_startWhen walsender started
backend_xminStandby's xmin horizon (NULL if hot_standby_feedback=off)
statestartup / catchup / streaming / backup / stopping
sent_lsnWAL position last sent
write_lsnWAL position standby has written to OS
flush_lsnWAL position standby has flushed to disk
replay_lsnWAL position standby has applied
write_lag / flush_lag / replay_lagTime interval — primary's commit timestamp to standby's stage
sync_priorityPosition in priority list (0 = async, otherwise from synchronous_standby_names)
sync_stateasync / potential / sync / quorum
reply_timeLast time standby replied

Lag interpretation: pg_wal_lsn_diff(primary_lsn, standby_replay_lsn) returns bytes. replay_lag returns interval — but only meaningful when standby is actively replaying. Idle standby: replay_lag is NULL despite sent_lsn = replay_lsn.

Per-Version Timeline

VersionStreaming-replication changes
PG12recovery.conf removed; standby.signal / recovery.signal model; recovery_target_timeline default latest; pg_promote(); wal_sender_timeout per-connection; pg_copy_physical_replication_slot() / pg_copy_logical_replication_slot(); max_wal_senders no longer counts against max_connections. All verbatim quotes captured in mechanics sections.157
PG13max_slot_wal_keep_size (cap WAL retained by stuck slot). Cross-reference 75-replication-slots.md.
PG14restore_command reloadable on SIGHUP; log_recovery_conflict_waits GUC; pg_get_wal_replay_pause_state() returns three states; in_hot_standby server parameter; recovery_init_sync_method=syncfs (Linux); pg_xact_commit_timestamp_origin(); pg_stat_replication_slots view; WalReceiverExit wait event. Verbatim release-note quotes captured in mechanics.9108111213
PG15LZ4 + Zstandard server-side base-backup compression; checkpointer + bgwriter run during crash recovery; recovery_prefetch GUC; archive_library GUC (alternative to archive_command); IDENTIFY_SYSTEM no longer required before START_REPLICATION. Verbatim quotes captured.1415161718
PG16promote_trigger_file removed (use pg_promote()); vacuum_defer_cleanup_age removed (use hot_standby_feedback + slots); logical decoding allowed on standbys (cross-reference 76-logical-decoding.md). Verbatim quotes captured.21920
PG17pg_basebackup --incremental + pg_combinebackup (cross-reference 84-backup-physical-pitr.md); WAL summarization via summarize_wal + wal_summary_keep_time; pg_replication_slots.invalidation_reason + inactive_since columns; pg_sync_replication_slots() for failover slot sync; sync_replication_slots GUC (standby side); synchronized_standby_slots GUC (primary side, names physical standbys that must catch up before logical subscribers see the WAL); pg_createsubscriber tool to create logical replica from physical standby; system identifier in backup manifest; dbname in pg_basebackup -R output. Verbatim quotes captured.2122232425262728
PG18idle_replication_slot_timeout (auto-invalidate inactive slots); max_active_replication_origins (separates origin count from max_replication_slots); pg_recvlogical --enable-failover + --enable-two-phase synonym. Verbatim quotes captured.293031

Examples / Recipes

Recipe 1 — Baseline production primary + standby

Primary postgresql.conf:

# WAL + replication
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 0           # rely on slots, not wal_keep_size
max_slot_wal_keep_size = 64GB

# Synchronous (one named standby must ack)
synchronous_standby_names = 'standby1'
synchronous_commit = on

# Crash safety
fsync = on
full_page_writes = on

# Archive (independent of streaming)
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'

# Logging
log_replication_commands = on

Primary pg_hba.conf:

# TYPE  DATABASE      USER          ADDRESS          METHOD
hostssl replication   replicator    10.0.0.0/8       scram-sha-256
host    all           all           0.0.0.0/0        reject

Replication user on primary:

CREATE ROLE replicator REPLICATION LOGIN PASSWORD '<scram-hash>';
SELECT pg_create_physical_replication_slot('standby1_slot');

Standby setup (run on standby host with empty $PGDATA):

sudo -u postgres pg_basebackup \
    -h primary.example.com -p 5432 -U replicator \
    -D $PGDATA -X stream -P -R -S standby1_slot

# pg_basebackup -R wrote standby.signal + appended primary_conninfo to postgresql.auto.conf

Standby postgresql.conf additions:

hot_standby = on
hot_standby_feedback = off       # HA standby — prefer query cancel over primary bloat
max_standby_streaming_delay = 30s
max_standby_archive_delay = 30s

# application_name set via primary_conninfo (in postgresql.auto.conf)
# Match against synchronous_standby_names = 'standby1'

Verify on primary:

SELECT application_name, state, sync_state,
       pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS lag_bytes,
       replay_lag
FROM pg_stat_replication;

Recipe 2 — Quorum sync across three AZs

Primary in AZ-A, two standbys (standby_az_b, standby_az_c) in AZ-B and AZ-C. Want any 1 of 2 to ack — survive single-AZ failure without blocking COMMITs.

synchronous_standby_names = 'ANY 1 (standby_az_b, standby_az_c)'
synchronous_commit = on

Each standby's primary_conninfo sets application_name=standby_az_b (or _c).

Verify quorum behavior:

SELECT application_name, sync_state FROM pg_stat_replication;
-- Expect: both rows show sync_state = 'quorum'

Stop one standby, verify primary still accepts writes (the other ack satisfies quorum). Stop both, verify primary blocks (no quorum possible).

Recipe 3 — Per-role async commit for batch jobs

Cluster default synchronous_commit = on (sync standby blocks). Batch ETL role can tolerate primary-only durability for known-resumable jobs.

ALTER ROLE batch SET synchronous_commit = local;  -- skip sync standby wait
ALTER ROLE webapp SET synchronous_commit = on;    -- explicit (default)
ALTER ROLE reporter SET default_transaction_read_only = on;  -- belt-and-braces

Cross-reference 46-roles-privileges.md for per-role baseline pattern.

Recipe 4 — Reporting standby with hot_standby_feedback

Long-running analytic queries on standby. Want no random query-cancel errors. Accept primary bloat trade-off.

Standby postgresql.conf:

hot_standby = on
hot_standby_feedback = on
max_standby_streaming_delay = 5min  # belt-and-braces; feedback should prevent the conflict
max_standby_archive_delay = 5min

Monitor primary bloat after enabling. If pg_stat_user_tables.n_dead_tup grows unbounded on hot tables, standby has stuck transaction. Diagnose:

-- Run on primary
SELECT application_name, backend_xmin, age(backend_xmin) AS xmin_age
FROM pg_stat_replication
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;

If xmin_age grows continuously, standby has long-running transaction. Either kill it on standby or accept bloat.

Cross-reference 27-mvcc-internals.md for xmin horizon.

Recipe 5 — Safe controlled switchover (planned promotion)

Primary becomes secondary, standby becomes primary. Zero data loss. Manual orchestration (use Patroni/repmgr in production — cross-reference 78-ha-architectures.md).

# Step 1: On primary — stop application traffic, force checkpoint
psql -h primary -c "CHECKPOINT;"

# Step 2: On primary — stop cleanly (waits for standby to catch up)
sudo systemctl stop postgresql

# Step 3: On standby — verify caught up to old primary's last LSN
psql -h standby -c "SELECT pg_last_wal_replay_lsn();"

# Step 4: On standby — promote
psql -h standby -c "SELECT pg_promote(true, 60);"

# Step 5: Verify standby is now primary
psql -h standby -c "SELECT pg_is_in_recovery();"  -- should be false

# Step 6: Reconfigure old primary as standby of new primary
# Either pg_rewind (if wal_log_hints or data_checksums enabled)
# or fresh pg_basebackup

# Step 7: Update application connection string to point at new primary

Cross-reference 89-pg-rewind.md for re-attaching old primary.

Recipe 6 — recovery_min_apply_delay for human-error rollback

Apply WAL with 1-hour delay. Bug-induced data loss can be reverted by promoting standby before delayed WAL applies.

Standby postgresql.conf:

recovery_min_apply_delay = 1h

Standby still receives WAL immediately (network) but applies it 1 hour late. If primary corrupts data at 10:00, you have until 11:00 to promote standby and stop replay.

Trade-offs:

  • Standby data is always 1 hour stale.
  • WAL accumulates on standby disk (1 hour worth).
  • During the delay window, RPO is 0 but RTO is "promote + delete remaining WAL" = minutes.
  • Slot retention on primary still measures "WAL written" not "WAL applied" — slot does NOT block primary from recycling.

Recipe 7 — Diagnose stuck state = 'catchup'

Standby connected but not caught up. Common when standby was offline for a while.

-- Run on primary
SELECT application_name, state,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))   AS sent_behind,
       pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn))               AS write_behind,
       pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn))              AS flush_behind,
       pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn))             AS replay_behind
FROM pg_stat_replication
WHERE state = 'catchup';

Bottleneck identification:

  • High sent_behind → primary CPU-bound or max_wal_senders saturated.
  • High write_behind → standby network or OS write cache slow.
  • High flush_behind → standby disk fsync slow.
  • High replay_behind → standby CPU-bound on apply (typically: long transactions on standby blocking apply, or recovery_min_apply_delay set).

Recipe 8 — Failover slot sync (PG17+)

Logical replication slots normally only exist on primary. After failover, subscribers must rebuild slots from scratch (data loss). PG17 introduces failover slots.

Primary:

SELECT pg_create_logical_replication_slot('app_slot', 'pgoutput', false, true);
-- 4th arg = failover = true

Standby postgresql.conf (PG17+):

sync_replication_slots = on

Primary postgresql.conf (PG17+):

synchronized_standby_slots = 'standby1'

Verbatim PG17: "Allow specification of physical standbys that must be synchronized before they are visible to subscribers."26

After promotion, standby's slot is in sync with subscribers' last-known LSN. Subscribers reconnect to new primary without restart-from-scratch.

Cross-reference 74-logical-replication.md and 75-replication-slots.md.

Recipe 9 — Cascading replication for geographic distribution

Primary in US-East, hub-standby in US-West, downstream standbys in US-West local DCs.

Hub-standby postgresql.conf:

hot_standby = on
max_wal_senders = 5
max_replication_slots = 5
# primary_conninfo points at primary (US-East)

Downstream standby postgresql.conf:

hot_standby = on
# primary_conninfo points at hub-standby (US-West) — NOT at primary
primary_conninfo = 'host=hub-standby.us-west.example.com user=replicator ...'

Bandwidth saving: primary ships WAL once across US-East→US-West. Hub then redistributes locally.

Caveat: cascading standbys are always async with respect to primary's COMMIT. Primary's synchronous_standby_names cannot include cascaded standbys.

Recipe 10 — Enable channel binding for replication

PG14 added SCRAM channel binding for client-side. Apply to replication connections too.

Primary pg_hba.conf:

hostssl replication replicator 10.0.0.0/8 scram-sha-256

Standby primary_conninfo:

primary_conninfo = 'host=primary.example.com port=5432 user=replicator
                    sslmode=verify-full sslrootcert=/etc/ssl/ca.crt
                    channel_binding=require
                    application_name=standby1'

Cross-reference 49-tls-ssl.md for full TLS hardening.

Recipe 11 — Detect query-cancel cascade

Standby logs flooded with ERROR: canceling statement due to conflict with recovery. Means max_standby_streaming_delay too low for query workload.

-- Run on standby
SELECT count(*) AS cancels_today
FROM pg_stat_database
WHERE datname = current_database();
-- conflicts column was added in PG14

Three mitigations:

  1. Raise max_standby_streaming_delay to e.g., 5min. Trade off: lag can grow during heavy primary writes.
  2. Enable hot_standby_feedback = on. Trade off: primary bloat.
  3. Move queries to a dedicated reporting standby with both #1 and #2 enabled.

Recipe 12 — Force WAL switch + verify standby ack

For scripted operations that need confirmation a checkpoint reached the standby:

-- On primary
SELECT pg_switch_wal();

-- Wait for standby to catch up
DO $$
DECLARE
    target_lsn pg_lsn := pg_current_wal_lsn();
    standby_lsn pg_lsn;
BEGIN
    LOOP
        SELECT replay_lsn INTO standby_lsn
        FROM pg_stat_replication WHERE application_name = 'standby1';

        EXIT WHEN standby_lsn >= target_lsn;
        PERFORM pg_sleep(0.1);
    END LOOP;
    RAISE NOTICE 'standby1 caught up to %', target_lsn;
END$$;

Recipe 13 — Emergency demotion (force-pause apply)

Standby is about to apply destructive WAL. Pause replay immediately.

-- On standby
SELECT pg_wal_replay_pause();

-- Verify
SELECT pg_get_wal_replay_pause_state();  -- 'paused'

-- After investigation, either resume or promote
SELECT pg_wal_replay_resume();           -- continue
-- OR
SELECT pg_promote(true, 60);             -- promote, abandon remaining WAL

pg_get_wal_replay_pause_state() PG14+ returns three values: not paused, pause requested, paused. Pre-PG14 use pg_is_wal_replay_paused() (boolean only).

Gotchas / Anti-patterns

  1. recovery.conf carried forward from PG≤11 prevents PG12+ startup. Server refuses to start with verbatim error referencing the file. Delete it and migrate settings to postgresql.conf + standby.signal.1

  2. promote_trigger_file removed in PG16. Custom failover scripts that touched a trigger file silently no-op on PG16+. Use pg_promote() or pg_ctl promote.2

  3. synchronous_standby_names empty string ≠ no sync — sync is OFF. A common misconfiguration is to set synchronous_commit = on cluster-wide thinking it enables sync replication. Without synchronous_standby_names populated, every commit is "sync to primary disk only" — no standby is involved.

  4. Sync standby disconnect blocks all writes. No automatic fallback to async. COMMIT hangs forever (or until wal_sender_timeout declares standby dead). Operator must explicitly remove from synchronous_standby_names and reload to unblock.

  5. hot_standby_feedback = on + abandoned standby = unbounded primary bloat. If standby is offline but slot is retained, primary's xmin horizon is held back by the last reported standby xmin. Vacuum cannot remove dead tuples. Combine with max_slot_wal_keep_size AND monitor pg_stat_replication for missing standbys.

  6. max_standby_streaming_delay = -1 is "wait forever" — apply lag grows unbounded. Standby reads block primary's commits if you also have sync replication. Pick a finite value or use hot_standby_feedback.

  7. max_standby_streaming_delay = 0 does not mean "fail fast on any conflict" — it means "cancel as soon as conflict arises". WAL apply does not wait. If your reporting workload has even brief queries, you'll see frequent cancels.

  8. primary_slot_name without max_slot_wal_keep_size = unbounded primary disk usage. Stuck/dead standby retains WAL forever. Combine with max_slot_wal_keep_size (PG13+) to cap.32 Cross-reference 75-replication-slots.md.

  9. wal_keep_size is a fallback for slot-less replication. With slots, set wal_keep_size = 0 and let slots manage retention. Both at once = double-counting.

  10. application_name in primary_conninfo must be unique per standby. Duplicate names → synchronous_standby_names matches whichever connects first; behavior undefined for the duplicate. Also breaks pg_stat_replication row identification.

  11. Synchronous replication does NOT replicate atomically — it replicates on COMMIT. If primary crashes mid-transaction, the in-progress writes are not on the standby. After failover, the in-progress transaction is rolled back as if it never started. This is correct behavior, but subtle.

  12. SERIALIZABLE isolation forbidden on standby. Verbatim from hot-standby.html: "Serializable transactions are not allowed on hot standby servers."4 Use REPEATABLE READ instead.

  13. LISTEN/NOTIFY does not propagate to standby. A NOTIFY issued on primary does not fire LISTEN handlers on standby connections. Standby applications must connect to primary for notifications.

  14. Temp tables not allowed on standby pre-PG14. PG14 relaxed: temp tables allowed via pg_temp namespace. Older versions: any CREATE TEMP TABLE errors out.

  15. Standby does NOT serve transactions started on primary. Cannot start tx on primary, route SELECT to standby. synchronous_commit = remote_apply only ensures the COMMIT is visible on standby — does not transfer the transaction itself.

  16. pg_basebackup -R writes primary_conninfo to postgresql.auto.conf. Hand-editing postgresql.conf does NOT override (postgresql.auto.conf precedence is higher). Use ALTER SYSTEM SET primary_conninfo = '' to clear, then re-set.

  17. pg_promote() returns immediately if wait = false. Caller gets back false (because not yet complete) but promotion is in flight. Always check pg_is_in_recovery() to confirm.

  18. recovery_target_inclusive = true (default) means STOP AFTER target. If recovery_target_xid = 12345, replay includes XID 12345 and stops just after. Set inclusive = false to stop just before.

  19. Multiple recovery_target_* GUCs is an error since PG12. Verbatim: "only allow one of recovery_target, recovery_target_lsn, recovery_target_name, recovery_target_time, and recovery_target_xid."6

  20. recovery_target_timeline = 'latest' is the PG12+ default. Pre-PG12 default was 'current'. After failover, new timeline is created — old standbys following current would be stuck on the dead branch. latest follows the newly promoted timeline automatically.5

  21. pg_stat_replication.replay_lag is NULL on idle standby. The lag interval is computed from "primary commit timestamp" minus "standby reply at that LSN". If primary hasn't committed anything, there's nothing to compare. Insert a heartbeat row periodically if you need a non-NULL value.

  22. Cascaded standbys cannot satisfy primary's synchronous_standby_names. Primary's sync requirement is only satisfied by standbys connected directly to primary. Cascaded standbys are always async from primary's perspective.

  23. vacuum_defer_cleanup_age removed in PG16. Old advice "set this to defer vacuum until standby has caught up" no longer works. Use hot_standby_feedback + replication slots instead.19

See Also

Sources

Primary references:

Footnotes

  1. PG12 release notes. Verbatim: "Move recovery.conf settings into postgresql.conf (Masao Fujii, Simon Riggs, Abhijit Menon-Sen, Sergei Kornilov). recovery.conf is no longer used, and the server will not start if that file exists. recovery.signal and standby.signal files are now used to switch into non-primary mode. The trigger_file setting has been renamed to promote_trigger_file. The standby_mode setting has been removed." https://www.postgresql.org/docs/release/12.0/ 2 3 4

  2. PG16 release notes. Verbatim: "Remove server variable promote_trigger_file (Simon Riggs). This was used to promote a standby to primary, but is now more easily accomplished with pg_ctl promote or pg_promote()." https://www.postgresql.org/docs/release/16.0/ 2 3 4

  3. PostgreSQL 16 docs Appendix O.1 (Obsolete or Renamed Features). https://www.postgresql.org/docs/16/recovery-config.html

  4. PostgreSQL 16 docs §27.4 Hot Standby. https://www.postgresql.org/docs/16/hot-standby.html 2

  5. PG12 release notes. Verbatim: "Cause recovery to advance to the latest timeline by default (Peter Eisentraut). Specifically, recovery_target_timeline now defaults to latest. Previously, it defaulted to current." https://www.postgresql.org/docs/release/12.0/ 2 3

  6. PG12 release notes. Verbatim: "Do not allow multiple conflicting recovery_target* specifications (Peter Eisentraut). Specifically, only allow one of recovery_target, recovery_target_lsn, recovery_target_name, recovery_target_time, and recovery_target_xid." https://www.postgresql.org/docs/release/12.0/ 2

  7. PG12 release notes. Verbatim: "Add function pg_promote() to promote standbys to primaries (Laurenz Albe, Michaël Paquier). Previously, this operation was only possible by using pg_ctl or creating a trigger file." https://www.postgresql.org/docs/release/12.0/ 2

  8. PG14 release notes. Verbatim: "Add function pg_get_wal_replay_pause_state() to report the recovery state (Dilip Kumar). It gives more detailed information than pg_is_wal_replay_paused(), which still exists." https://www.postgresql.org/docs/release/14.0/ 2

  9. PG14 release notes. Verbatim: "Allow the restore_command setting to be changed during a server reload (Sergei Kornilov). You can also set restore_command to an empty string and reload to force recovery to only read from the pg_wal directory." https://www.postgresql.org/docs/release/14.0/

  10. PG14 release notes. Verbatim: "Add server parameter log_recovery_conflict_waits to report long recovery conflict wait times (Bertrand Drouvot, Masahiko Sawada)." https://www.postgresql.org/docs/release/14.0/

  11. PG14 release notes. Verbatim: "Add new read-only server parameter in_hot_standby (Haribabu Kommi, Greg Nancarrow, Tom Lane). This allows clients to easily detect whether they are connected to a hot standby server." https://www.postgresql.org/docs/release/14.0/

  12. PG14 release notes. Verbatim: "Allow file system sync at the start of crash recovery on Linux (Thomas Munro). By default, PostgreSQL opens and fsyncs each data file in the database cluster at the start of crash recovery. A new setting, recovery_init_sync_method=syncfs, instead syncs each filesystem used by the cluster. This allows for faster recovery on systems with many database files." https://www.postgresql.org/docs/release/14.0/

  13. PG14 release notes. Verbatim: "Add system view pg_stat_replication_slots to report replication slot activity (Masahiko Sawada, Amit Kapila, Vignesh C). The function pg_stat_reset_replication_slot() resets slot statistics." https://www.postgresql.org/docs/release/14.0/

  14. PG15 release notes. Verbatim: "Add support for LZ4 and Zstandard compression of server-side base backups (Jeevan Ladhe, Robert Haas)." https://www.postgresql.org/docs/release/15.0/

  15. PG15 release notes. Verbatim: "Run the checkpointer and bgwriter processes during crash recovery (Thomas Munro). This helps to speed up long crash recoveries." https://www.postgresql.org/docs/release/15.0/

  16. PG15 release notes. Verbatim: "Allow WAL processing to pre-fetch needed file contents (Thomas Munro). This is controlled by the server variable recovery_prefetch." https://www.postgresql.org/docs/release/15.0/

  17. PG15 release notes. Verbatim: "Allow archiving via loadable modules (Nathan Bossart). Previously, archiving was only done by calling shell commands. The new server variable archive_library can be set to specify a library to be called for archiving." https://www.postgresql.org/docs/release/15.0/

  18. PG15 release notes. Verbatim: "No longer require IDENTIFY_SYSTEM to be run before START_REPLICATION (Jeff Davis)." https://www.postgresql.org/docs/release/15.0/

  19. PG16 release notes. Verbatim: "Remove the server variable vacuum_defer_cleanup_age (Andres Freund). This has been unnecessary since hot_standby_feedback and replication slots were added." https://www.postgresql.org/docs/release/16.0/ 2

  20. PG16 release notes. Verbatim: "Allow logical decoding on standbys (Bertrand Drouvot, Andres Freund, Amit Khandekar). Snapshot WAL records are required for logical slot creation but cannot be created on standbys. To avoid delays, the new function pg_log_standby_snapshot() allows creation of such records." https://www.postgresql.org/docs/release/16.0/

  21. PG17 release notes. Verbatim: "Add support for incremental file system backup (Robert Haas, Jakub Wartak, Tomas Vondra). Incremental backups can be created using pg_basebackup's new --incremental option. The new application pg_combinebackup allows manipulation of base and incremental file system backups." https://www.postgresql.org/docs/release/17.0/

  22. PG17 release notes. Verbatim: "Allow the creation of WAL summarization files (Robert Haas, Nathan Bossart, Hubert Depesz Lubaczewski). These files record the block numbers that have changed within an LSN range and are useful for incremental file system backups. This is controlled by the server variables summarize_wal and wal_summary_keep_time, and introspected with pg_available_wal_summaries(), pg_wal_summary_contents(), and pg_get_wal_summarizer_state()." https://www.postgresql.org/docs/release/17.0/

  23. PG17 release notes. Verbatim: "Add column pg_replication_slots.invalidation_reason to report the reason for invalid slots (Shveta Malik, Bharath Rupireddy). Add column pg_replication_slots.inactive_since to report slot inactivity duration (Bharath Rupireddy)." https://www.postgresql.org/docs/release/17.0/

  24. PG17 release notes. Verbatim: "Add function pg_sync_replication_slots() to synchronize logical replication slots (Hou Zhijie, Shveta Malik, Ajin Cherian, Peter Eisentraut)." https://www.postgresql.org/docs/release/17.0/

  25. PG17 release notes. Verbatim: "Add server variable sync_replication_slots to enable failover logical slot synchronization (Shveta Malik, Hou Zhijie, Peter Smith)." https://www.postgresql.org/docs/release/17.0/

  26. PG17 release notes. Verbatim: "Allow specification of physical standbys that must be synchronized before they are visible to subscribers (Hou Zhijie, Shveta Malik). The new server variable is synchronized_standby_slots." https://www.postgresql.org/docs/release/17.0/ 2

  27. PG17 release notes. Verbatim: "Add application pg_createsubscriber to create a logical replica from a physical standby server (Euler Taveira)." https://www.postgresql.org/docs/release/17.0/

  28. PG17 release notes. Verbatim: "Allow connection string value dbname to be written when pg_basebackup writes connection information to postgresql.auto.conf (Vignesh C, Hayato Kuroda)." https://www.postgresql.org/docs/release/17.0/

  29. PG18 release notes. Verbatim: "Allow inactive replication slots to be automatically invalidated using server variable idle_replication_slot_timeout (Nisha Moond, Bharath Rupireddy)." https://www.postgresql.org/docs/release/18.0/

  30. PG18 release notes. Verbatim: "Add server variable max_active_replication_origins to control the maximum active replication origins (Euler Taveira). This was previously controlled by max_replication_slots, but this new setting allows a higher origin count in cases where fewer slots are required." https://www.postgresql.org/docs/release/18.0/

  31. PG18 release notes. Verbatim: "Add pg_recvlogical option --enable-failover to specify failover slots (Hayato Kuroda). Also add option --enable-two-phase as a synonym for --two-phase, and deprecate the latter." https://www.postgresql.org/docs/release/18.0/

  32. PG13 release notes. max_slot_wal_keep_size GUC introduced. https://www.postgresql.org/docs/release/13.0/

references

01-syntax-ddl.md

02-syntax-dql.md

03-syntax-dml.md

04-ctes.md

05-views.md

06-functions.md

07-procedures.md

08-plpgsql.md

09-procedural-languages.md

10-dynamic-sql.md

11-window-functions.md

12-aggregates-grouping.md

13-cursors-and-prepares.md

14-data-types-builtin.md

15-data-types-custom.md

16-arrays.md

17-json-jsonb.md

18-uuid-numeric-money.md

19-timestamp-timezones.md

20-text-search.md

21-hstore.md

22-indexes-overview.md

23-btree-indexes.md

24-gin-gist-indexes.md

25-brin-hash-spgist-bloom-indexes.md

26-index-maintenance.md

27-mvcc-internals.md

28-vacuum-autovacuum.md

29-transaction-id-wraparound.md

30-hot-updates.md

31-toast.md

32-buffer-manager.md

33-wal.md

34-checkpoints-bgwriter.md

35-partitioning.md

36-inheritance.md

37-constraints.md

38-foreign-keys-deep.md

39-triggers.md

40-event-triggers.md

41-transactions.md

42-isolation-levels.md

43-locking.md

44-advisory-locks.md

45-listen-notify.md

46-roles-privileges.md

47-row-level-security.md

48-authentication-pg-hba.md

49-tls-ssl.md

50-encryption-pgcrypto.md

51-pgaudit.md

52-rules-system.md

53-server-configuration.md

54-memory-tuning.md

55-statistics-planner.md

56-explain.md

57-pg-stat-statements.md

58-performance-diagnostics.md

59-planner-tuning.md

60-parallel-query.md

61-jit-compilation.md

62-tablespaces.md

63-internals-architecture.md

64-system-catalogs.md

65-collations-encoding.md

66-bulk-operations-copy.md

67-cli-tools.md

68-pgbench.md

69-extensions.md

70-fdw.md

71-large-objects.md

72-extension-development.md

73-streaming-replication.md

74-logical-replication.md

75-replication-slots.md

76-logical-decoding.md

77-standby-failover.md

78-ha-architectures.md

79-patroni.md

80-connection-pooling.md

81-pgbouncer.md

82-monitoring.md

83-backup-pg-dump.md

84-backup-physical-pitr.md

85-backup-tools.md

86-pg-upgrade.md

87-major-version-upgrade.md

88-corruption-recovery.md

89-pg-rewind.md

90-disaster-recovery.md

91-docker-postgres.md

92-kubernetes-operators.md

93-pg-trgm.md

94-pgvector.md

95-postgis.md

96-timescaledb.md

97-citus.md

98-pg-cron.md

99-pg-partman.md

100-pg-versions-features.md

101-managed-vs-baremetal.md

102-skill-cookbook.md

SKILL.md

tile.json