Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.
94
94%
Does it follow best practices?
Impact
94%
1.36xAverage score across 3 eval scenarios
Passed
No known issues
Data-page checksums, pg_amcheck / amcheck, pg_checksums offline conversion, zero_damaged_pages, single-user mode (postgres --single), pg_resetwal (last-resort), and pg_dump as a corruption diagnostic. Covers PG14 → PG18 surface.
Pick this file when:
invalid page in block, could not read block, unexpected zero page at, invalid contrecord.SELECT against a table returns ERROR: invalid page in block N of relation base/....COUNT(*), missing rows after crash, replica mismatch).pg_upgrade or major-version migration.bt_index_check / bt_index_parent_check / verify_heapam / gin_index_check PG18+).pg_control is corrupted and the server won't start.Pick different file when:
84-backup-physical-pitr.md, 85-backup-tools.md, 90-disaster-recovery.md.89-pg-rewind.md.29-transaction-id-wraparound.md.28-vacuum-autovacuum.md.Five rules.
Rule 1 — Data checksums detect corruption; they do NOT repair it. When a page checksum mismatch is detected on read, the backend raises an error and aborts the transaction. The detection moves silent disk corruption into a loud, blocking failure — that's the design. Repair requires restoring the affected pages from a backup or replaying WAL. pg_checksums enables/disables/verifies cluster-wide; it cannot correct mismatches.
Rule 2 — Indexes corrupt independently of heap. Heap bytes can be intact while a B-tree index disagrees with the underlying rows (or vice versa). amcheck separates the two checks: bt_index_check / bt_index_parent_check for B-tree structure, verify_heapam (PG14+) for heap pages, gin_index_check (PG18+) for GIN. Routine bt_index_check runs under AccessShareLock — safe on a live primary. bt_index_parent_check upgrades to ShareLock and blocks writes.
Rule 3 — pg_resetwal is last-resort and destroys data integrity guarantees. Verbatim docs warning: "It should be used only as a last resort, when the server will not start due to such corruption. After running this command, it should be possible to start the server, but bear in mind that the database might contain inconsistent data due to partially-committed transactions. You should immediately dump your data, run initdb, and restore."1 Treat the post-pg_resetwal cluster as a one-shot dump source, not a production cluster.
Rule 4 — Single-user mode is for catalog rescue and wraparound emergencies, not interactive debugging. Verbatim: "The primary use for this mode is during bootstrapping by initdb. Sometimes it is used for debugging or disaster recovery; note that running a single-user server is not truly suitable for debugging the server, since no realistic interprocess communication and locking will happen."2 No background processing — no autovacuum, no checkpointer, no replication. Use it for VACUUM FREEZE when autovacuum can't start (wraparound), for REINDEX SYSTEM when a system catalog index is broken, and as a path into a cluster that won't accept multi-user connections.
Rule 5 — Test corruption hypothesis with pg_dump before invoking destructive tools. If a pg_dump of the entire cluster succeeds, the storage is internally consistent enough for logical extraction — corruption is likely localized. If pg_dump fails on a specific table or index, the failure narrows the investigation. Run pg_dump -Fc --no-blobs --schema-only first (catalog walk), then pg_dump -Fc (full extraction). pg_dump reads every visible row through MVCC; it surfaces page-level corruption as it goes.
[!WARNING]
pg_resetwalandzero_damaged_pagesdestroy data Both tools allow the cluster to keep running by skipping over corrupted bytes — but the bytes they skip are real rows.pg_resetwaldiscards uncommitted-but-WAL-only changes (in-flight transactions) and may reuse OIDs from committed-but-not-yet-checkpointed pages.zero_damaged_pageszeroes a damaged 8 KB page in memory destroying every tuple on it. Both produce a cluster suitable for onepg_dumpand oneinitdb+restore, nothing more. Take acp -a $PGDATAbyte-level copy BEFORE touching either.
[!WARNING] PG18: data checksums on by default Verbatim PG18 release-note: "Change initdb to default to enabling checksums. The new initdb option
--no-data-checksumsdisables checksums."3 Clusters initialized on PG18+ have checksums on out of the box.pg_upgradefrom a non-checksum cluster onto a PG18+ default-checksum new cluster fails the consistency-check phase — initialize the new cluster withinitdb --no-data-checksums, then runpg_checksums --enablepost-upgrade once the cluster is stopped.
13 rows mapping symptom / goal to tool.
| Symptom / goal | Tool / action | Notes |
|---|---|---|
ERROR: invalid page in block N on a query | amcheck → identify corrupt index vs heap; restore page from backup. | Cross-reference Recipe 4. |
| Suspect silent corruption (no error yet) | pg_amcheck --all --jobs=N weekly. | PG14+. |
| Suspect index corruption only | SELECT bt_index_check('myidx'::regclass); | AccessShareLock — safe in production. |
| Suspect index structure + parent/child invariants | SELECT bt_index_parent_check('myidx'::regclass); | ShareLock — blocks writes. |
| Suspect heap corruption | SELECT * FROM verify_heapam('mytable'::regclass); | PG14+. Returns one row per corruption. |
| Suspect GIN-index corruption | SELECT gin_index_check('myidx'::regclass); | PG18+. |
| Enable checksums on existing cluster | pg_checksums --enable -D $PGDATA | Cluster must be cleanly shut down. |
| Verify checksums on offline cluster | pg_checksums --check -D $PGDATA | Default mode. |
| Cluster will not start, WAL corrupt | pg_resetwal -D $PGDATA then dump+restore | Last resort. Take byte-level backup first. |
| Catalog-index corruption blocks startup | postgres --single + REINDEX SYSTEM <db> | Then restart normally. |
| Wraparound emergency, autovacuum can't run | postgres --single + VACUUM FREEZE | Cross-reference 29-transaction-id-wraparound.md. |
| Pre-upgrade integrity check | pg_amcheck --all, then pg_dump --schema-only | Run on a recent base-backup replica, not production. |
| Damaged page on one block, willing to lose rows | SET zero_damaged_pages = on; then VACUUM FULL table | Destructive — last resort before restore from backup. |
Three smell signals — wrong tool for the situation:
pg_resetwal -f as a first response to a startup error. Read the log first. Most "won't start" cases are recoverable without resetting WAL (config error, missing tablespace mount, exhausted disk, wraparound).zero_damaged_pages = on set cluster-wide as a "safety" default. This is a one-shot escape hatch, not a configuration. Setting it cluster-wide can mask further corruption and propagate zeroed pages to disk during the next checkpoint.bt_index_parent_check on every index in production. Acquires ShareLock — blocks writes. Use bt_index_check for routine; reserve --parent-check / bt_index_parent_check for after-the-fact incident investigation on a maintenance window.Per-page CRC-style checksum on data pages. Detects bit-rot, partial writes that escape full_page_writes, and bad RAM / disk I/O. Verbatim docs: "By default, data pages are not protected by checksums, but this can optionally be enabled for a cluster. When enabled, each data page includes a checksum that is updated when the page is written and verified each time the page is read."4
What's protected:
What's NOT protected (separate mechanisms):
Enabling:
initdb time: initdb --data-checksums (or -k)6. PG18+: on by default; opt out with --no-data-checksums3.pg_checksums --enable -D $PGDATA (server stopped)7.Reading the state:
SHOW data_checksums;Or from pg_controldata on a stopped cluster: Data page checksum version: 1 (enabled) vs 0 (disabled).
Failure observation: Checksum mismatches are reported to the client and counted in pg_stat_database.checksum_failures (and checksum_last_failure timestamp).
[!NOTE] PostgreSQL 12
pg_checksumsintroduced as offline--enable/--disable/--checkutility. Before PG12, the only way to flip checksums was to dump +initdb -k+ restore.
[!NOTE] PostgreSQL 18 Verbatim: "Change initdb to default to enabling checksums (Greg Sabino Mullane). The new initdb option
--no-data-checksumsdisables checksums."3pg_upgradecross-checksum-setting upgrade now requires explicit alignment.
Performance impact: Enabling checksums adds CPU work on every page read and write. Verbatim from initdb docs: "Enabling checksums may incur a noticeable performance penalty."6 Modern hardware (post-2018) handles it; benchmark on representative workload if uncertain.
pg_amcheck and amcheckamcheck is a contrib extension. pg_amcheck is a CLI wrapper (PG14+) that orchestrates amcheck calls across many relations in parallel.
Verbatim PG14 release-note: "Add command-line utility pg_amcheck to simplify running contrib/amcheck tests on many relations (Mark Dilger)."8
Verbatim PG14 release-note: "Allow amcheck to also check heap pages (Mark Dilger). Previously it only checked B-Tree index pages."9
Install:
CREATE EXTENSION amcheck;Three core functions (PG14+):
| Function | Lock | Purpose |
|---|---|---|
bt_index_check(index regclass, heapallindexed boolean) | AccessShareLock | Lightweight B-tree structural check. Safe in production. |
bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) | ShareLock | Adds parent/child invariant checks. Blocks writes. Not on hot standby. |
verify_heapam(relation regclass, on_error_stop boolean, check_toast boolean, skip text, startblock bigint, endblock bigint) | AccessShareLock | Heap page check — PG14+. Returns one row per corruption: (blkno, offnum, attnum, msg). |
Verbatim docs on bt_index_check: "When a routine, lightweight test for corruption is required in a live production environment, using bt_index_check often provides the best trade-off between thoroughness of verification and limiting the impact on application performance and availability."10
Verbatim docs on bt_index_parent_check: "bt_index_parent_check can be thought of as a more thorough variant of bt_index_check: unlike bt_index_check, bt_index_parent_check also checks invariants that span parent/child relationships."10
heapallindexed: When true, verifies every visible heap tuple has a matching index entry. Catches index-misses (silent UPDATE-skipping bugs). Slower than structural check alone.
[!NOTE] PostgreSQL 15 Verbatim: "Allow amcheck to check sequences (Mark Dilger)."11 "Improve amcheck sanity checks for TOAST tables (Mark Dilger)."12
[!NOTE] PostgreSQL 17 Verbatim: "Allow amcheck to check for unique constraint violations using new option --checkunique (Anastasia Lubennikova, Pavel Borisov, Maxim Orlov)."13 Detects unique-index violations where index says "unique" but heap holds duplicate rows.
[!NOTE] PostgreSQL 18 Verbatim: "Add amcheck check function gin_index_check() to verify GIN indexes (Grigory Kryachko, Heikki Linnakangas, Andrey Borodin)."14 Closes the longstanding GIN-cant-be-checked gap.
pg_amcheck CLI key flags:
| Flag | Behavior |
|---|---|
-a, --all | All databases, all relations. |
-d, --database=PATTERN | Match databases. |
-D, --exclude-database=PATTERN | Exclude. |
-t, --table=PATTERN | Tables / matviews / sequences. |
-i, --index=PATTERN | Indexes only. |
-r, --relation=PATTERN | Tables OR indexes. |
-s, --schema=PATTERN | Schemas. |
-j, --jobs=N | Parallel connections. |
--parent-check | Use bt_index_parent_check (locks heavier). |
--rootdescend | Re-finds tuples via root scan; implies --parent-check. |
--heapallindexed | Verify every heap tuple is indexed. |
--no-dependent-indexes | Skip cascade-checking a table's indexes. |
--no-dependent-toast | Skip TOAST table on table check. |
--install-missing | Auto-create the amcheck extension. |
--checkunique | PG17+ — verify unique constraints. |
-P, --progress | Progress to stderr. |
-v, --verbose | More output. |
pg_checksums Offline ConversionVerbatim docs: "pg_checksums checks, enables or disables data checksums in a PostgreSQL cluster."7 "The server must be shut down cleanly before running pg_checksums."7
| Mode | Behavior |
|---|---|
--check (default) | Verify every page; exit nonzero on first checksum failure. |
--enable | Rewrite every page with its computed checksum, flip cluster-control flag on. |
--disable | Flip cluster-control flag off (no page rewrites). |
Critical caveats:
pg_ctl stop -m fast not -m immediate; pg_controldata must show Database cluster state: shut down).--enable is time-proportional to data size — every relation page gets read, checksummed, and rewritten. For TB clusters, plan multi-hour offline window.zero_damaged_pages and ignore_checksum_failureBoth are developer GUCs — destructive escape hatches.
Verbatim zero_damaged_pages docs: "Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table."15
Verbatim ignore_checksum_failure docs: "Setting ignore_checksum_failure to on causes the system to ignore the failure (but still report a warning), and continue processing. This behavior may cause crashes, propagate or hide corruption, or other serious problems."16
Distinction:
zero_damaged_pages — fires on page-header corruption (the layout-byte sanity check). Zeroes the page in memory; subsequent checkpoint persists the zeroed page to disk.ignore_checksum_failure — fires only on checksum mismatch; only meaningful when data checksums are enabled. Reads the page anyway and lets the backend continue.Usage pattern (last-resort, destructive):
-- Session-local:
SET LOCAL zero_damaged_pages = on;
-- Force the damaged page to be loaded, zeroed, and (eventually) checkpointed:
VACUUM FULL mytable; -- or: SELECT * FROM mytable;
-- IMMEDIATELY pg_dump and restore. The zeroed page is now real.ignore_system_indexesBypasses system-catalog indexes (pg_class's indexes, pg_attribute's, etc.) when reading catalogs. Useful when a system index itself is corrupt and would otherwise block the backend from looking up table metadata.
-c ignore_system_indexes=on to postgres --single or postmaster -c ignore_system_indexes=on at startup.REINDEX SYSTEM <db> is required to clear the underlying issue.postgres --single)Verbatim docs: "The primary use for this mode is during bootstrapping by initdb. Sometimes it is used for debugging or disaster recovery; note that running a single-user server is not truly suitable for debugging the server, since no realistic interprocess communication and locking will happen."2 "The single-user mode server does not provide sophisticated line-editing features."2 "Single-user mode also does not do any background processing, such as automatic checkpoints or replication."2 "This user does not actually have to exist, so the single-user mode can be used to manually recover from certain kinds of accidental damage to the system catalogs."2
Invocation:
postgres --single -D /var/lib/postgresql/16/main mydbRules:
--single MUST be the first argument.-D or PGDATA.Newline = end of statement (unlike psql). Two ways to span lines:
\.-j flag: switch terminator to ;\n\n (semicolon, newline, blank line) so multi-line statements parse the way you'd expect from a script.Canonical uses:
VACUUM FREEZE; cross-reference 29-transaction-id-wraparound.md.REINDEX SYSTEM <dbname>; then exit and start the multi-user server normally.pg_class after a crash mid-CREATE.Cannot do in single-user mode:
LISTEN / NOTIFY (no IPC).walsender doesn't run.psql cannot connect either; you talk to the process via its own stdin/stdout.pg_resetwal Last-Resort RecoveryVerbatim docs: "clears the write-ahead log (WAL) and optionally resets some other control information stored in the pg_control file. This function is sometimes needed if these files have become corrupted. It should be used only as a last resort, when the server will not start due to such corruption."1
Verbatim: "After running this command, it should be possible to start the server, but bear in mind that the database might contain inconsistent data due to partially-committed transactions. You should immediately dump your data, run initdb, and restore. After restore, check for inconsistencies and repair as needed."1
Verbatim about -f / --force: "the recovered database must be treated with even more suspicion than usual: an immediate dump and restore is imperative. Do not execute any data-modifying operations in the database before you dump, as any such action is likely to make the corruption worse."1
When it's the right answer:
pg_control is unreadable or contains a checksum that fails.pg_wal/ are missing or truncated at the current segment.could not locate a valid checkpoint record.When it's NOT the right answer:
postgres --single + VACUUM FREEZE, not pg_resetwal.invalid page in block N of relation ... from one table — that's a relfile issue, not a WAL issue. Use amcheck + page surgery + restore from backup.Procedure (when truly necessary):
# 1. Stop the server. Confirm it's stopped:
pg_ctl -D $PGDATA stop -m immediate
pg_controldata $PGDATA | grep "Database cluster state"
# Expected: "Database cluster state: shut down" — if not, try -m fast first.
# 2. Take a byte-level backup:
cp -a $PGDATA /backup/cluster-pre-resetwal-$(date +%F)
# 3. Dry run — inspect what pg_resetwal would set:
pg_resetwal -n -D $PGDATA
# 4. Actually reset (only if dry run looks sane):
pg_resetwal -D $PGDATA
# 5. Start the cluster:
pg_ctl -D $PGDATA start
# 6. IMMEDIATELY: pg_dumpall > rescue.sql
# 7. initdb a fresh cluster, restore from rescue.sql.PG18+ also adds --char-signedness to pg_resetwal for cross-platform restoration scenarios — verbatim: "Add pg_resetwal option --char-signedness to change the default char signedness (Masahiko Sawada)."17
pg_dump as Corruption Diagnosticpg_dump reads every row through the executor. If a page is corrupt and the rows on it are visible to the dump snapshot, pg_dump either:
ERROR: invalid page in block N — narrows the corruption to a specific relation + block, ORThree-pass diagnostic:
# Pass 1 — fast: catalog walk + schema-only. Detects catalog-side issues.
pg_dump -d mydb --schema-only -Fc -f /dev/null 2> /tmp/dump-schema.log
# Pass 2 — full extraction, sequential. Detects heap + index corruption.
pg_dump -d mydb -Fc -f /dev/null 2> /tmp/dump-full.log
# Pass 3 — per-table parallel narrows the failing relation:
pg_dump -d mydb -Fd -j 8 -f /tmp/dump-parallel 2> /tmp/dump-parallel.logWhat pg_dump does NOT catch:
pg_dump reads heap directly via sequential scan; it never traverses indexes.pg_dump's snapshot view.That's why pg_amcheck (heap + index) is the canonical pre-upgrade audit, not pg_dump.
Page checksums and amcheck detect corruption that's already on disk. Preventing it requires correct WAL durability settings + reliable storage.
Verbatim docs on full_page_writes: "To guard against such failures, PostgreSQL periodically writes full page images to permanent WAL storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can restore partially-written pages from WAL."5
Verbatim docs on WAL CRC: "Each individual record in a WAL file is protected by a CRC-32C (32-bit) check that allows us to tell if record contents are correct. The CRC value is set when we write each WAL record and checked during crash recovery, archive recovery and replication."5
Don't disable for "performance":
fsync = off — crash-unsafe; cluster can corrupt on power loss.full_page_writes = off — only safe on filesystems that guarantee atomic 8 KB writes (ZFS, btrfs with the right options); cross-reference 33-wal.md.synchronous_commit = off — loses recent committed transactions on crash but does not corrupt the cluster.Cross-reference 33-wal.md and 34-checkpoints-bgwriter.md.
| Version | Corruption surface |
|---|---|
| PG14 | pg_amcheck CLI introduced (Mark Dilger)8. amcheck extended to check heap pages — verify_heapam (Mark Dilger)9. |
| PG15 | amcheck extended to sequences (Mark Dilger)11. amcheck improved TOAST sanity checks (Mark Dilger)12. |
| PG16 | No direct corruption-recovery surface changes. |
| PG17 | amcheck --checkunique (Anastasia Lubennikova, Pavel Borisov, Maxim Orlov)13 — detect unique-constraint violations where index claims uniqueness but heap has duplicates. |
| PG18 | gin_index_check() (Grigory Kryachko, Heikki Linnakangas, Andrey Borodin)14. initdb defaults to --data-checksums (Greg Sabino Mullane)3. pg_resetwal --char-signedness (Masahiko Sawada)17. |
pg_amcheck Weekly SweepProduction baseline. Runs as a pg_cron (or systemd) weekly job. Cross-reference 98-pg-cron.md.
pg_amcheck \
--all \
--jobs=4 \
--install-missing \
--progress \
--verbose \
2>&1 | tee /var/log/postgresql/amcheck-$(date +%F).log
# Exit code 0 = no corruption. Anything else = investigate.
# Use --parent-check for deep audit only during scheduled windows
# (it acquires ShareLock and blocks writes on each index).Alert when log contains the words corruption, inconsistent, or invalid.
Production-safe interactive investigation:
-- AccessShareLock — won't block writes:
SELECT bt_index_check('public.users_email_idx'::regclass, heapallindexed => true);
-- If this raises, the index is corrupt OR the heap has rows the index missed.
-- ShareLock — only if writes can pause:
SELECT bt_index_parent_check('public.users_email_idx'::regclass,
heapallindexed => true,
rootdescend => true);Fix path for B-tree:
REINDEX INDEX CONCURRENTLY public.users_email_idx;Cross-reference 26-index-maintenance.md.
-- PG14+ verify_heapam:
SELECT * FROM verify_heapam('public.orders'::regclass,
on_error_stop => false,
check_toast => true);
-- Returns one row per corruption finding:
-- blkno | offnum | attnum | msg
-- ------+--------+--------+-----------------------------------
-- 147 | 12 | 3 | toast value 23456789 not found in toast table
-- 147 | 14 | 0 | invalid lp_off + lp_len combinationFor a fast survey of all heap relations in a database:
SELECT n.nspname, c.relname,
v.blkno, v.offnum, v.attnum, v.msg
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
CROSS JOIN LATERAL verify_heapam(c.oid, on_error_stop => false) v
WHERE c.relkind IN ('r','m','S')
AND n.nspname NOT IN ('pg_catalog','information_schema')
ORDER BY n.nspname, c.relname;invalid page in block N of relationHit during query or VACUUM. Triage:
-- 1. Identify the relfile from the OID in the error message
-- (the path looks like `base/16384/24576`):
SELECT relname, pg_relation_filepath(oid)
FROM pg_class WHERE oid = 24576;
-- 2. Determine if it's an index or a heap:
SELECT relkind FROM pg_class WHERE oid = 24576;
-- 'r' = heap, 'i' = index, 't' = TOAST table, 'm' = matview.
-- 3a. If index: drop + recreate (lose nothing):
REINDEX INDEX CONCURRENTLY corrupt_idx;
-- 3b. If heap or TOAST: restore the file from a recent base backup
-- (cross-reference 84-backup-physical-pitr.md), then replay WAL.
-- 3c. If no backup AND can lose the rows on that page:
SET LOCAL zero_damaged_pages = on;
VACUUM FULL corrupt_table;
-- Zeros the damaged page in memory; next checkpoint persists it.
-- Dump and restore the whole cluster after this step.Cluster moving from data_checksums = off to on:
# 1. Stop the cluster cleanly:
pg_ctl -D $PGDATA stop -m fast
# 2. Confirm clean shutdown:
pg_controldata $PGDATA | grep "Database cluster state"
# Must show "shut down" — not "shut down in recovery" or "in production".
# 3. Dry-run check (default mode):
pg_checksums --check -D $PGDATA
# 4. Enable (rewrites every relation page — time proportional to data size):
pg_checksums --enable --progress -D $PGDATA
# 5. Verify the flip happened:
pg_controldata $PGDATA | grep checksum
# Expected: "Data page checksum version: 1"
# 6. Start the cluster:
pg_ctl -D $PGDATA startFor replicated clusters: do this on a stopped replica first, take a fresh pg_basebackup from the new-checksums replica, promote it, then re-sync the rest of the topology. Cross-reference 73-streaming-replication.md.
Database refuses connections, log says database is not accepting commands to avoid wraparound data loss. Cross-reference 29-transaction-id-wraparound.md.
# 1. Stop the cluster:
pg_ctl -D $PGDATA stop -m fast
# 2. Enter single-user mode in the affected database:
postgres --single -D $PGDATA mydb
# At the backend > prompt (newline terminates):
backend> VACUUM FREEZE;
# 3. After VACUUM completes, Ctrl-D to exit.
# 4. Start the cluster normally:
pg_ctl -D $PGDATA startIf multiple databases need freezing, run postgres --single once per database.
Catalog index corrupt — backend can't look up its own tables.
# 1. Stop the cluster:
pg_ctl -D $PGDATA stop -m fast
# 2. Start single-user with system-indexes ignored:
postgres --single -D $PGDATA -c ignore_system_indexes=on mydb
backend> REINDEX SYSTEM mydb;
# 3. Ctrl-D, restart normally.REINDEX SYSTEM rebuilds every system-catalog index in the database. Repeat in each affected database (and in template1 if pg_class etc. are involved).
pg_resetwal Last-ResortServer won't start; log shows could not locate a valid checkpoint record:
# 1. Confirm the server is stopped (no postmaster pid):
pg_ctl -D $PGDATA status
# 2. Byte-level backup of the data directory — ESSENTIAL:
cp -a $PGDATA /backup/cluster-pre-resetwal-$(date -u +%FT%H%M%SZ)
# 3. Dry run — see what pg_resetwal would set without writing:
pg_resetwal -n -D $PGDATA
# 4. Run for real:
pg_resetwal -D $PGDATA
# 5. Start:
pg_ctl -D $PGDATA start
# 6. IMMEDIATELY dump (do NOT do any writes first):
pg_dumpall > /tmp/rescue.sql
# 7. initdb a fresh cluster on new $PGDATA:
/usr/lib/postgresql/16/bin/initdb --data-checksums -D /var/lib/postgresql/16/new
# 8. Restore:
psql -d postgres -f /tmp/rescue.sql
# 9. Decommission the old data dir. Run pg_amcheck on the new one
# to confirm the restored cluster is clean.Never write to the post-pg_resetwal cluster before dumping — verbatim: "any such action is likely to make the corruption worse"1.
When a single block is damaged, no backup is available, and losing the rows on that page is acceptable:
-- 1. Confirm the block number from the error:
-- ERROR: invalid page in block 147 of relation base/16384/24576
-- 2. Identify the table:
SELECT relname FROM pg_class WHERE pg_relation_filepath(oid) = 'base/16384/24576';
-- => 'orders'
-- 3. Save anything still readable from the table:
\copy (SELECT * FROM orders) TO '/tmp/orders-survivors.csv' CSV HEADER
-- This may itself raise on block 147 — capture what comes out before.
-- 4. Set session-local zero_damaged_pages and force-read:
SET LOCAL zero_damaged_pages = on;
VACUUM FULL orders;
-- The damaged page is zeroed in memory and persisted at next checkpoint.
-- 5. Force checkpoint to persist the zeroed page (cross-reference 34):
CHECKPOINT;
-- 6. Run pg_amcheck on the cluster to confirm only this one
-- relation lost data, nothing else surfaced:
\q
pg_amcheck --all --jobs=4
-- 7. Plan a logical dump+restore (this cluster has a known-zeroed page).Before pg_upgrade or major-version migration. Cross-reference 86-pg-upgrade.md, 87-major-version-upgrade.md.
# 1. Take a fresh base backup, restore to a disposable host
# (don't audit on production directly; lock impact + risk).
pg_basebackup -h prod -D /tmp/audit -X stream -P
# 2. Start the disposable cluster on a non-default port.
# 3. Run pg_amcheck against everything:
pg_amcheck --all --jobs=$(nproc) --heapallindexed --parent-check --rootdescend \
--port=5433 2>&1 | tee /tmp/audit.log
# 4. (PG17+) verify uniqueness:
pg_amcheck --all --jobs=$(nproc) --checkunique --port=5433
# 5. Confirm exit 0; otherwise investigate corruption before upgrade.
# 6. Discard the disposable host.pg_stat_database.checksum_failuresCross-reference 82-monitoring.md.
SELECT datname, checksum_failures, checksum_last_failure
FROM pg_stat_database
WHERE checksum_failures > 0;Wire into Prometheus alert:
- alert: PostgresChecksumFailure
expr: pg_stat_database_checksum_failures > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Postgres reported a data-page checksum failure on {{ $labels.datname }}"
runbook: "See references/88-corruption-recovery.md Recipe 4"verify_heapam HardeningCross-reference 74-logical-replication.md, 38-foreign-keys-deep.md.
verify_heapam works on any heap, but pre-corruption hardening benefits from PK existence (faster post-incident reconciliation).
SELECT n.nspname, c.relname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_constraint p ON p.conrelid = c.oid AND p.contype = 'p'
WHERE c.relkind = 'r'
AND n.nspname NOT IN ('pg_catalog','information_schema')
AND p.oid IS NULL
ORDER BY pg_total_relation_size(c.oid) DESC;When checksums on the primary disagree with a streaming replica. Cross-reference 73-streaming-replication.md.
-- On both nodes, on the suspect table:
SELECT bt_index_check('public.suspect_idx'::regclass, heapallindexed => true);
SELECT * FROM verify_heapam('public.suspect'::regclass);
-- Compare. Mismatch on replica only -> rebuild replica from primary
-- (pg_basebackup or pg_rewind, see 89-pg-rewind.md).
-- Mismatch on primary only -> primary disk fault; failover + rebuild old primary.23 gotchas. Roughly ordered most-leveraged first.
pg_resetwal as first response to a startup error. Read pg_log/*.log first. Most "won't start" cases are config errors, missing tablespace mounts, full disk, or wraparound — none need pg_resetwal. Verbatim docs: "It should be used only as a last resort"1.pg_resetwal or zero_damaged_pages. Both are destructive. cp -a $PGDATA /backup/... first, every time.zero_damaged_pages = on set in postgresql.conf cluster-wide. Session-local only. Cluster-wide masks future corruption and persists zeroed pages on subsequent checkpoint.pg_checksums --enable on a running cluster. Refuses to start AND would corrupt if it didn't. Verbatim: "The server must be shut down cleanly before running pg_checksums"7.pg_checksums --enable on a primary while standbys stream. Each standby gets corrupt pages from the primary's not-yet-fully-rewritten state. Always do the conversion on a stopped replica, take a base backup, then re-build the topology.bt_index_parent_check run cluster-wide on a production primary. Acquires ShareLock per index. Schedule for a maintenance window. Use bt_index_check for routine.bt_index_parent_check on a hot-standby replica. Cannot run on standbys — requires writes to update validation state. Use bt_index_check on standbys.verify_heapam because amcheck = "index checker". Incorrect assumption since PG14 — verify_heapam was added to amcheck in PG14 and covers heap pages, TOAST tables, sequences, and matviews, not just B-tree indexes9.pg_amcheck --parent-check + --rootdescend on every weekly sweep. Both elevate locks. Reserve for after-incident audits.pg_stat_database.checksum_failures. Wire it to alerting. Cross-reference 82-monitoring.md.pg_upgrade from non-checksum cluster. Verbatim: "pg_upgrade requires matching cluster checksum settings"3. New cluster needs initdb --no-data-checksums first; convert after with pg_checksums --enable while stopped.single-user mode used for "interactive debugging". Verbatim: "not truly suitable for debugging the server, since no realistic interprocess communication and locking will happen"2. No checkpointer, no autovacuum, no logical decoding. Use only for catalog rescue and wraparound.postgres --single with --single not first argument. Refuses to start. --single MUST be first; database name MUST be last.\ line-continuations or pass -j to switch to ;\n\n terminator.REINDEX SYSTEM against a multi-user-mode cluster while writes happen. Locks every system catalog index in turn — writers block. Either do it during a maintenance window OR in single-user mode where contention is trivially zero.pg_dump taken as proof of corruption-freeness. pg_dump reads heap directly via seq-scan; never traverses indexes. Index-only corruption (bt_index_check finds it, pg_dump doesn't) slips past. Run pg_amcheck for the complete picture.ignore_checksum_failure = on left enabled in postgresql.conf. Verbatim: "This behavior may cause crashes, propagate or hide corruption, or other serious problems"16. Session-local only; reset immediately.ignore_system_indexes = on left in postgresql.conf after the rescue session. System catalogs perform terribly without their indexes. REINDEX SYSTEM and remove the GUC.pg_resetwal -f used to bypass an integrity warning the tool actively raised. The -f flag escalates the warning — verbatim: "the recovered database must be treated with even more suspicion than usual"1. Use only when even pg_control is unreadable.gin_index_check on pre-PG18 cluster. Function doesn't exist before PG1814. For GIN on PG14-17, the only recourse is REINDEX on suspicion.amcheck --checkunique on pre-PG17 cluster. Flag added in PG1713. Pre-PG17, write your own SQL: SELECT key, count(*) FROM tbl GROUP BY key HAVING count(*) > 1.pg_amcheck sweep — only checking after a customer reports missing data. Latency-to-detection becomes weeks. Schedule weekly (Recipe 1) at minimum; daily on storage you don't trust.26-index-maintenance.md — REINDEX, CONCURRENTLY rebuild for corrupt indexes27-mvcc-internals.md — xmin/xmax visibility, why pg_dump doesn't see every row28-vacuum-autovacuum.md — VACUUM FREEZE invoked from single-user mode29-transaction-id-wraparound.md — wraparound emergency requiring single-user mode32-buffer-manager.md — how data pages move buffer ↔ disk; relation to checksums33-wal.md — WAL durability, full_page_writes, CRC-32C per record34-checkpoints-bgwriter.md — checkpoint persistence of zeroed pages63-internals-architecture.md — postmaster + background workers in single-user mode64-system-catalogs.md — pg_class, pg_attribute queries that diagnose corruption69-extensions.md — installing amcheck73-streaming-replication.md — primary↔replica checksum mismatch diagnosis82-monitoring.md — pg_stat_database.checksum_failures alert83-backup-pg-dump.md — pg_dump as diagnostic + extraction84-backup-physical-pitr.md — restoring individual relfiles from base backup85-backup-tools.md — pgBackRest / Barman / WAL-G for per-relation restore86-pg-upgrade.md — pre-upgrade integrity audit87-major-version-upgrade.md — upgrade strategies, PG18 default-checksum trap89-pg-rewind.md — re-attach diverged former primary90-disaster-recovery.md — DR runbook, RPO/RTO98-pg-cron.md — scheduling the weekly pg_amcheck sweep101-managed-vs-baremetal.md — managed services usually disable --single and pg_resetwalPostgreSQL 16 — pg_resetwal reference — https://www.postgresql.org/docs/16/app-pgresetwal.html. "pg_resetwal clears the write-ahead log (WAL) and optionally resets some other control information stored in the pg_control file. This function is sometimes needed if these files have become corrupted. It should be used only as a last resort, when the server will not start due to such corruption." "After running this command, it should be possible to start the server, but bear in mind that the database might contain inconsistent data due to partially-committed transactions. You should immediately dump your data, run initdb, and restore. After restore, check for inconsistencies and repair as needed." "the recovered database must be treated with even more suspicion than usual: an immediate dump and restore is imperative. Do not execute any data-modifying operations in the database before you dump, as any such action is likely to make the corruption worse." ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
PostgreSQL 16 — postgres reference — https://www.postgresql.org/docs/16/app-postgres.html. "The primary use for this mode is during bootstrapping by initdb. Sometimes it is used for debugging or disaster recovery; note that running a single-user server is not truly suitable for debugging the server, since no realistic interprocess communication and locking will happen." "Single-user mode also does not do any background processing, such as automatic checkpoints or replication." "This user does not actually have to exist, so the single-user mode can be used to manually recover from certain kinds of accidental damage to the system catalogs." ↩ ↩2 ↩3 ↩4 ↩5 ↩6
PostgreSQL 18 Release Notes. "Change initdb to default to enabling checksums (Greg Sabino Mullane). The new initdb option --no-data-checksums disables checksums." Migration note: "pg_upgrade requires matching cluster checksum settings, so this new option can be useful to upgrade non-checksum old clusters." ↩ ↩2 ↩3 ↩4 ↩5
PostgreSQL 16 — Chapter 30.2 Data Checksums — https://www.postgresql.org/docs/16/checksums.html. "By default, data pages are not protected by checksums, but this can optionally be enabled for a cluster. When enabled, each data page includes a checksum that is updated when the page is written and verified each time the page is read." ↩
PostgreSQL 16 — WAL reliability — https://www.postgresql.org/docs/16/wal-reliability.html. "To guard against such failures, PostgreSQL periodically writes full page images to permanent WAL storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can restore partially-written pages from WAL." "Each individual record in a WAL file is protected by a CRC-32C (32-bit) check that allows us to tell if record contents are correct. The CRC value is set when we write each WAL record and checked during crash recovery, archive recovery and replication." ↩ ↩2 ↩3
PostgreSQL 16 — initdb reference — https://www.postgresql.org/docs/16/app-initdb.html. "Use checksums on data pages to help detect corruption by the I/O system that would otherwise be silent. Enabling checksums may incur a noticeable performance penalty." ↩ ↩2
PostgreSQL 16 — pg_checksums reference — https://www.postgresql.org/docs/16/app-pgchecksums.html. "pg_checksums checks, enables or disables data checksums in a PostgreSQL cluster." "The server must be shut down cleanly before running pg_checksums." "Enabling checksums in a large cluster can potentially take a long time. During this operation, the cluster or other programs that write to the data directory must not be started or else data loss may occur." ↩ ↩2 ↩3 ↩4 ↩5
PostgreSQL 14 Release Notes — https://www.postgresql.org/docs/release/14.0/. "Add command-line utility pg_amcheck to simplify running contrib/amcheck tests on many relations (Mark Dilger)." ↩ ↩2
PostgreSQL 14 Release Notes. "Allow amcheck to also check heap pages (Mark Dilger). Previously it only checked B-Tree index pages." ↩ ↩2 ↩3
PostgreSQL 16 — amcheck — https://www.postgresql.org/docs/16/amcheck.html. "The amcheck module provides functions that allow you to verify the logical consistency of the structure of relations." "When a routine, lightweight test for corruption is required in a live production environment, using bt_index_check often provides the best trade-off between thoroughness of verification and limiting the impact on application performance and availability." "bt_index_parent_check can be thought of as a more thorough variant of bt_index_check: unlike bt_index_check, bt_index_parent_check also checks invariants that span parent/child relationships." ↩ ↩2
PostgreSQL 15 Release Notes — https://www.postgresql.org/docs/release/15.0/. "Allow amcheck to check sequences (Mark Dilger)." ↩ ↩2
PostgreSQL 15 Release Notes. "Improve amcheck sanity checks for TOAST tables (Mark Dilger)." ↩ ↩2
PostgreSQL 17 Release Notes — https://www.postgresql.org/docs/release/17.0/. "Allow amcheck to check for unique constraint violations using new option --checkunique (Anastasia Lubennikova, Pavel Borisov, Maxim Orlov)." ↩ ↩2 ↩3
PostgreSQL 18 Release Notes — https://www.postgresql.org/docs/release/18.0/. "Add amcheck check function gin_index_check() to verify GIN indexes (Grigory Kryachko, Heikki Linnakangas, Andrey Borodin)." ↩ ↩2 ↩3
PostgreSQL 16 — runtime-config-developer — https://www.postgresql.org/docs/16/runtime-config-developer.html. "Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table." ↩
PostgreSQL 16 — runtime-config-developer. "Setting ignore_checksum_failure to on causes the system to ignore the failure (but still report a warning), and continue processing. This behavior may cause crashes, propagate or hide corruption, or other serious problems." ↩ ↩2
PostgreSQL 18 Release Notes. "Add pg_resetwal option --char-signedness to change the default char signedness (Masahiko Sawada)." ↩ ↩2
evals
references