CtrlK
BlogDocsLog inGet started
Tessl Logo

alonso-skills/postgres

Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.

94

1.36x
Quality

94%

Does it follow best practices?

Impact

94%

1.36x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

57-pg-stat-statements.mdreferences/

pg_stat_statements — Per-Query Workload Statistics

pg_stat_statements is the canonical contrib extension for per-query aggregated execution and I/O statistics. Where pg_stat_activity shows what is running right now and EXPLAIN ANALYZE shows one execution of one statement, pg_stat_statements shows which query patterns dominate the cluster's workload over time. It is the single most-leveraged observability tool a working DBA can install.

Table of Contents

When to Use This Reference

Reach here when:

  • You need to find the queries consuming the most cumulative time on a cluster.
  • You need to attribute I/O, WAL volume, JIT cost, or temp-file usage to specific queries.
  • You are tuning pg_stat_statements.max, track, track_planning, or track_utility.
  • You are upgrading PG versions and need to know whether queryid values, column names, or the reset function signature changed.
  • You need to combine pg_stat_statements.queryid with pg_stat_activity.query_id or EXPLAIN VERBOSE's query identifier to correlate aggregate stats with a live or planned execution.

If your question is interpret one specific slow query's plan, see 56-explain.md. If the question is what is happening right now, see 58-performance-diagnostics.md. If the question is which planner GUCs to tune for the workload these stats reveal, see 59-planner-tuning.md.

Mental Model

Five rules drive almost every interaction with pg_stat_statements:

  1. It is an extension and requires shared_preload_libraries. Setting the GUC, restarting the server, then CREATE EXTENSION pg_stat_statements in the target database is the canonical sequence. The verbatim docs requirement: "The module must be loaded by adding pg_stat_statements to shared_preload_libraries in postgresql.conf, because it requires additional shared memory. This means that a server restart is needed to add or remove the module."1 ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements' alone is not enough — without the restart no statistics are collected.

  2. Queries are normalized by a post-parse-analysis jumble. Constants are stripped and replaced with placeholders ($1, $2, …) so SELECT * FROM t WHERE id = 5 and SELECT * FROM t WHERE id = 47 collapse to one row keyed by a single queryid. The verbatim rule: "Since the queryid hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings."2 Plus: "it is not safe to assume that queryid will be stable across major versions of PostgreSQL."3

  3. The view is bounded by pg_stat_statements.max (default 5000) and evicts least-frequently-executed entries. Verbatim: "If more distinct statements than that are observed, information about the least-executed statements is discarded."4 The dealloc counter in pg_stat_statements_info rises every time an entry is discarded; non-zero dealloc means your buffer is too small for your workload.

  4. track_planning is off by default. The plan-time columns (plans, total_plan_time, min/max/mean/stddev_plan_time) exist in PG13+ but report zero unless pg_stat_statements.track_planning = on. Verbatim warning: "Enabling this parameter may incur a noticeable performance penalty, especially when statements with identical query structure are executed by many concurrent connections which compete to update a small number of pg_stat_statements entries."5

  5. save = on (default) persists across clean shutdown; pg_stat_statements_reset() does not survive restart unless saved. Verbatim: "pg_stat_statements.save specifies whether to save statement statistics across server shutdowns. If it is off then statistics are not saved at shutdown nor reloaded at server start. The default value is on."6 Note: a crash (not a clean stop) loses all stats regardless of save. After pg_upgrade the queryid hash is not portable and stats start from zero on the new cluster.

[!WARNING] queryid changes silently across major versions A query that has queryid -1234567890123 on PG16 will have a different queryid on PG17 even with identical text. Dashboards that pin to a queryid (Datadog, Grafana, pganalyze) must be rebuilt after pg_upgrade. Re-attribute by query text or regexp_replace(query, '\$\d+', '?', 'g'). The verbatim docs quote "it is not safe to assume that queryid will be stable across major versions of PostgreSQL"3 is the headline upgrade-time misconception-defeater.

Decision Matrix

You want to …Order by / filterAvoidWhy
Find queries consuming most cumulative timeORDER BY total_exec_time DESCORDER BY mean_exec_time DESCMean misses high-frequency moderate-cost queries that dominate aggregate load
Find p99-tail outliersORDER BY max_exec_time DESC filter calls > 100mean_exec_time alonemean averages out tail latency; max_exec_time shows the worst single execution
Find queries spilling to temp filesWHERE temp_blks_written > 0 ORDER BY temp_blks_written DESCguessing from work_memTemp-block I/O is the direct evidence of work_mem undersizing — cross-reference 54-memory-tuning.md Recipe 5
Find queries emitting most WALORDER BY wal_bytes DESC filter wal_records > 0guessing from INSERT/UPDATE/DELETE textwal_bytes is the direct measurement; surfaces hot indexed columns and HOT failures — cross-reference 30-hot-updates.md
Find queries doing most disk readsORDER BY shared_blks_read DESCshared_blks_hit alone_read is buffer misses (disk); _hit is buffer hits (cache) — _read reveals working-set vs shared_buffers mismatch
Find queries where planning dominates executionWHERE total_plan_time > total_exec_time AND track_planning=ontracking planning by defaulttrack_planning=off by default; turn on only when investigating planning overhead, cross-reference Recipe 8
Diagnose dealloc pressureSELECT dealloc FROM pg_stat_statements_infoguessing max settingNon-zero dealloc means buffer too small; raise pg_stat_statements.max and restart
Find JIT-impacted queries (PG15+)WHERE jit_functions > 0 ORDER BY jit_generation_time DESCglobal JIT-off as the first fixJIT columns are per-query; identify the offenders before disabling cluster-wide — cross-reference 61-jit-compilation.md
Reset stats for one query without resetting allpg_stat_statements_reset(userid=0, dbid=0, queryid=<id>)pg_stat_statements_reset() no-argsThree-arg form is per-query; no-args resets everything
Reset only minmax tails (PG17+)pg_stat_statements_reset(0, 0, 0, true)full reset to clear outliersminmax_only=true PG17+ clears just min/max_plan_time and min/max_exec_time7
Correlate aggregate stats with a running queryjoin pg_stat_activity.query_id = pg_stat_statements.queryidmatching by SQL textcompute_query_id = auto (default) populates pg_stat_activity.query_id when pg_stat_statements loaded8
Compare before/after a deploymentsnapshot, change, compare deltas via stats_since columns (PG17+)full reset between deploysPG17+ stats_since lets you compute deltas without resetting9

Three smell signals to act on before any deep query analysis:

  • pg_stat_statements_info.dealloc > 0 — buffer is evicting; raise pg_stat_statements.max (default 5000 is undersized for any cluster with thousands of distinct queries from ORMs or framework code).
  • track_planning = off AND you are investigating slow planning — turn on, wait a few minutes, then re-query. Plan-time columns are zero until track_planning = on.
  • One queryid has calls = 1 and dominates total_exec_time — likely a one-off pg_dump or REINDEX or VACUUM FULL that should be filtered out of dashboards via WHERE calls > <threshold>.

Syntax / Mechanics

Installation

Three-step sequence with explicit restart:

-- 1. Add to shared_preload_libraries (postgresql.conf or ALTER SYSTEM)
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

-- 2. Restart cluster (pg_reload_conf() does NOT suffice — postmaster context)
--    pg_ctl restart  OR  systemctl restart postgresql

-- 3. Per-database: register the extension
CREATE EXTENSION pg_stat_statements;

Verify installation:

SELECT name, setting FROM pg_settings WHERE name = 'shared_preload_libraries';
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_stat_statements';
SELECT count(*) FROM pg_stat_statements;  -- should return a number, not error

Verbatim docs note on access: "For security reasons, only superusers and roles with privileges of the pg_read_all_stats role are allowed to see the SQL text and queryid of queries executed by other users. Other users can see the statistics, however, if the view has been installed in their database."10

Non-superusers reading the view see their own query text + queryids; other users' rows have query = '<insufficient privilege>'. Grant pg_read_all_stats to monitoring roles — cross-reference 46-roles-privileges.md Recipe 8.

[!NOTE] PostgreSQL 13 pg_stat_statements.track_planning GUC, the plans / total_plan_time / min_plan_time / max_plan_time / mean_plan_time / stddev_plan_time columns, and the wal_records / wal_fpi / wal_bytes WAL counters all landed in PG13. The PG13 release notes wording: "Allow EXPLAIN, auto_explain, autovacuum, and pg_stat_statements to track WAL usage statistics"11 establishes the WAL columns.

The pg_stat_statements View

The view exposes one row per (userid, dbid, queryid, toplevel) combination. PG16 column catalog (42 columns) — version-introduced annotations inline:

ColumnTypeWhat it is
useridoidOID of the user who executed the statement
dbidoidOID of the database in which the statement was executed
toplevelboolWhether the call was top-level (vs nested in a function) — PG14+12
queryidbigintHash code identifying the normalized statement
querytextRepresentative text of the normalized statement
plansbigintNumber of times the statement was planned (zero unless track_planning=on) — PG13+
total_plan_timedouble precisionTotal time spent planning, in ms — PG13+
min_plan_timedouble precisionMinimum planning time, in ms — PG13+
max_plan_timedouble precisionMaximum planning time, in ms — PG13+
mean_plan_timedouble precisionMean planning time, in ms — PG13+
stddev_plan_timedouble precisionStandard deviation of planning time, in ms — PG13+
callsbigintNumber of times the statement was executed
total_exec_timedouble precisionTotal time spent executing, in ms
min_exec_timedouble precisionMinimum execution time, in ms
max_exec_timedouble precisionMaximum execution time, in ms
mean_exec_timedouble precisionMean execution time, in ms
stddev_exec_timedouble precisionStandard deviation of execution time, in ms
rowsbigintTotal number of rows retrieved or affected
shared_blks_hitbigintTotal shared-buffer block hits
shared_blks_readbigintTotal shared-buffer block reads (i.e., disk reads)
shared_blks_dirtiedbigintTotal shared blocks dirtied
shared_blks_writtenbigintTotal shared blocks written (e.g., by backend due to full buffers)
local_blks_hitbigintLocal block hits (temp tables/indexes)
local_blks_readbigintLocal block reads
local_blks_dirtiedbigintLocal blocks dirtied
local_blks_writtenbigintLocal blocks written
temp_blks_readbigintTemp file blocks read (work_mem spills)
temp_blks_writtenbigintTemp file blocks written
blk_read_timedouble precisionTime spent reading from shared blocks (renamed shared_blk_read_time in PG17+)13
blk_write_timedouble precisionTime spent writing shared blocks (renamed shared_blk_write_time in PG17+)
temp_blk_read_timedouble precisionTime spent reading temp blocks — PG15+14
temp_blk_write_timedouble precisionTime spent writing temp blocks — PG15+
wal_recordsbigintWAL records emitted — PG13+
wal_fpibigintFull-page images written — PG13+
wal_bytesnumericBytes of WAL emitted — PG13+
jit_functionsbigintNumber of JIT-compiled functions — PG15+15
jit_generation_timedouble precisionTotal time spent generating JIT code, ms — PG15+
jit_inlining_countbigintNumber of times inlining was performed — PG15+
jit_inlining_timedouble precisionTotal inlining time, ms — PG15+
jit_optimization_countbigintNumber of times optimization was performed — PG15+
jit_optimization_timedouble precisionTotal optimization time, ms — PG15+
jit_emission_countbigintNumber of times JIT emission ran — PG15+
jit_emission_timedouble precisionTotal emission time, ms — PG15+

[!NOTE] PostgreSQL 17 Three significant column changes. (1) blk_read_timeshared_blk_read_time and blk_write_timeshared_blk_write_time (verbatim release note: "Rename pg_stat_statements columns blk_read_time and blk_write_time to shared_blk_read_time and shared_blk_write_time"13). (2) New columns local_blk_read_time and local_blk_write_time (verbatim: "Add pg_stat_statements columns to report local block I/O timings"16). (3) New columns stats_since and minmax_stats_since for delta computation (verbatim: "Add pg_stat_statements columns stats_since and minmax_stats_since to show last-reset times"9). (4) New jit_deform_count and jit_deform_time columns (verbatim: "Add JIT deform_counter details to pg_stat_statements"17). Monitoring queries written for PG16 must be rewritten — any blk_read_time reference returns "column does not exist" on PG17+.

[!NOTE] PostgreSQL 18 Two new columns: parallel_workers_to_launch and parallel_workers_launched (verbatim: "Add pg_stat_statements columns to report parallel activity"18), plus wal_buffers_full (verbatim: "Add pg_stat_statements.wal_buffers_full to report full WAL buffers"19). Plus three normalization additions: (a) CREATE TABLE AS and DECLARE queries are now tracked and assigned queryids (verbatim: "Allow the queries of CREATE TABLE AS and DECLARE to be tracked by pg_stat_statements"20); (b) SET statement values parameterized (verbatim: "Allow the parameterization of SET values in pg_stat_statements"21).

The pg_stat_statements_info View

Single-row view exposing meta-statistics about the extension itself:

ColumnTypeWhat it is
deallocbigintNumber of times least-executed entries have been evicted because pg_stat_statements.max was exceeded22
stats_resettimestamp with time zoneWhen all stats were last reset (via pg_stat_statements_reset() no-args)

dealloc is the canonical capacity-pressure signal. If it grows over a period of normal operations, your buffer is too small. Verbatim docs: "Total number of times pg_stat_statements entries about the least-executed statements were deallocated because more distinct statements than pg_stat_statements.max were observed."22

queryid and Normalization

The queryid is a 64-bit signed hash computed on the post-parse-analysis tree of a query. Constants are folded out; WHERE id = 5 and WHERE id = 47 produce identical queryids. The representative query text shown stores $1, $2, … placeholders.

Two rules that surprise operators:

Search path affects queryid. Same SQL text with different search_path produces different queryids because the planner resolves table names through the path, and the resolved OIDs feed the jumble. Verbatim: "queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings."2 The corollary: standardizing search_path across application roles reduces query-entry sprawl.

Architecture and platform affect queryid. Verbatim: "The hashing process is also sensitive to differences in machine architecture and other facets of the platform."23 Two replicas built from the same base backup will hash identically; a cross-architecture replica (extremely rare in practice) may not.

Pre-PG17 caveats normalized only in PG17+:

Statement formPre-PG17 behaviorPG17+ behavior
SAVEPOINT my_sp_001Each savepoint name → different rowNames replaced with placeholders24
PREPARE TRANSACTION 'gid-abc'Each GID → different rowGIDs replaced with placeholders25
CALL my_proc(1, 2)Inline constants tracked separatelyParameters normalized to placeholders26
DEALLOCATE my_prep_stmtEach name → different rowNames normalized27

The cumulative effect on PG≤16 clusters running OLTP frameworks (every web request often issues a SAVEPOINT+RELEASE SAVEPOINT pair) is that thousands of SAVEPOINT s_<uuid> rows fill the buffer and evict the queries that matter. On PG17+ these collapse to two rows total.

compute_query_id and pg_stat_activity.query_id

The queryid hash function was moved from pg_stat_statements into core in PG14. The GUC compute_query_id (PG14+) controls in-core hashing:

ValueBehavior
offNo queryid computation
onAlways compute queryid in core
auto (default)Compute when an extension like pg_stat_statements requests it8
regressSame as auto, but EXPLAIN does not show the id (for regression testing)

When compute_query_id = auto and pg_stat_statements is loaded, the queryid is available in three additional places:

  • pg_stat_activity.query_id — the queryid of the currently-running statement for each backend
  • EXPLAIN VERBOSE output — the queryid is included for the EXPLAINed statement
  • log_line_prefix%Q emits the queryid into log lines

This enables the canonical "what is this slow query doing right now AND historically?" join — see Recipe 10.

[!NOTE] PostgreSQL 14 Verbatim from release notes: "Move pg_stat_statements query hash computation to the core server (Julien Rouhaud)" and the introduction of compute_query_id (default auto). The pg_stat_activity.query_id column is also PG14+. Pre-PG14 clusters can correlate only via SQL text.

[!NOTE] PostgreSQL 16 auto_explain's log_verbose mode now honors compute_query_id (verbatim: "Have auto_explain's log_verbose mode honor the value of compute_query_id"). Previously the queryid was missing from auto_explain output even when compute_query_id was on. Combined with auto_explain enabled in production (cross-reference 56-explain.md Recipe 12), this gives full plan-text capture keyed by the same queryid as pg_stat_statements.

Configuration GUCs

Six tunables; defaults are usually correct except max for high-cardinality workloads:

GUCDefaultContextWhat it controls
pg_stat_statements.max5000postmasterMaximum tracked entries4
pg_stat_statements.tracktopsuperusernone / top / all — nested statement tracking28
pg_stat_statements.track_utilityonsuperuserTrack non-DML statements (DDL, CREATE, DROP, SAVEPOINT, etc.)29
pg_stat_statements.track_planningoffsuperuserTrack planning time (perf overhead)5
pg_stat_statements.saveonpostmasterPersist stats across clean shutdown6

The verbatim docs descriptions:

  • track: "Specify top to track top-level statements (those issued directly by clients), all to also track nested statements (such as statements invoked within functions), or none to disable statement statistics collection. The default value is top."28
  • track_utility: "Utility commands are all those other than SELECT, INSERT, UPDATE, DELETE, and MERGE. The default value is on."29
  • track_planning: "Enabling this parameter may incur a noticeable performance penalty, especially when statements with identical query structure are executed by many concurrent connections which compete to update a small number of pg_stat_statements entries."5
  • save: "If it is off then statistics are not saved at shutdown nor reloaded at server start."6

Reset Functions

Three reset paths, with version-specific signatures:

-- PG≤16 signature
pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint) RETURNS void

-- PG17+ signature
pg_stat_statements_reset(
    userid Oid,
    dbid Oid,
    queryid bigint,
    minmax_only boolean DEFAULT false
) RETURNS timestamp with time zone

Verbatim PG17 description of minmax_only: "When minmax_only is true only the values of minimum and maximum planning and execution time will be reset (i.e. min_plan_time, max_plan_time, min_exec_time and max_exec_time fields). The default value for minmax_only parameter is false."7

Behavior:

  • pg_stat_statements_reset() (no args) — reset everything; on PG17+ returns the reset timestamp
  • pg_stat_statements_reset(0, 0, 0) — same as no args (zero is the wildcard)
  • pg_stat_statements_reset(0, 0, <queryid>) — reset stats for one query across all users + databases
  • pg_stat_statements_reset(<userid>, 0, 0) — reset stats for one user
  • pg_stat_statements_reset(0, <dbid>, 0) — reset stats for one database
  • pg_stat_statements_reset(0, 0, 0, true) — PG17+ only: reset only minmax columns (preserves total/calls/etc.)

Default permissions: superuser-only. Verbatim: "By default, this function can only be executed by superusers. Access may be granted to others using GRANT."30

Per-Version Timeline

VersionChanges
PG13track_planning GUC + plans / *_plan_time columns. WAL counters: wal_records, wal_fpi, wal_bytes.
PG14Move queryid hashing to core via compute_query_id. New toplevel column12. pg_stat_statements_info view introduced (with dealloc and stats_reset). Row counts for utility commands. Queryid now exposed in pg_stat_activity.query_id, EXPLAIN VERBOSE, log_line_prefix %Q.
PG15JIT counters added (jit_functions, jit_generation_time, jit_inlining_count, jit_inlining_time, jit_optimization_count, jit_optimization_time, jit_emission_count, jit_emission_time)15. Temp-block timing: temp_blk_read_time, temp_blk_write_time14.
PG16Constants in utility commands now normalized (e.g., CLUSTER my_table USING $1). Extended-query-protocol row counts corrected. auto_explain log_verbose honors compute_query_id.
PG17blk_read_timeshared_blk_read_time (rename)13. Local-block timing: local_blk_read_time, local_blk_write_time16. Delta columns: stats_since, minmax_stats_since9. JIT deform: jit_deform_count, jit_deform_time17. Reset function gains minmax_only boolean + timestamptz return7. SAVEPOINT names24, 2PC GIDs25, CALL parameters26, DEALLOCATE names27 all normalized.
PG18Parallel-worker columns: parallel_workers_to_launch, parallel_workers_launched18. WAL buffers: wal_buffers_full19. Track CREATE TABLE AS and DECLARE20. SET statement values parameterized21.

Examples / Recipes

Recipe 1: Install and verify (baseline)

The canonical three-step install with verification:

-- Step 1: edit postgresql.conf or:
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';
-- Optional tuning for moderate-to-busy clusters:
ALTER SYSTEM SET pg_stat_statements.max = 10000;
ALTER SYSTEM SET track_io_timing = on;  -- so blk_*_time columns populate

-- Step 2: restart cluster (NOT just pg_reload_conf)
-- pg_ctl -D $PGDATA restart -m fast

-- Step 3: per-database
CREATE EXTENSION pg_stat_statements;

-- Verify
SELECT name, setting FROM pg_settings WHERE name IN (
    'shared_preload_libraries', 'compute_query_id', 'track_io_timing',
    'pg_stat_statements.max', 'pg_stat_statements.track',
    'pg_stat_statements.track_planning', 'pg_stat_statements.save'
);

SELECT count(*) AS entries FROM pg_stat_statements;
SELECT * FROM pg_stat_statements_info;  -- dealloc=0, stats_reset=<install time>

Recipe 2: Top 10 queries by cumulative time

The single most useful pg_stat_statements query:

SELECT
    userid::regrole AS user,
    (total_exec_time / 1000)::int AS total_sec,
    calls,
    (mean_exec_time)::numeric(10, 2) AS mean_ms,
    rows,
    100 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0)
        AS cache_hit_pct,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE userid::regrole::text NOT IN ('postgres', 'pg_monitor')  -- filter admin
ORDER BY total_exec_time DESC
LIMIT 10;

total_exec_time DESC reveals the queries that cost the cluster the most, not the slowest single execution. A query called 10M times at 1ms each (10000 sec total) outranks a query called once at 5000 sec.

Recipe 3: Top 10 queries by tail latency

For p99-style worst-case investigation:

SELECT
    queryid,
    calls,
    (mean_exec_time)::numeric(10, 2) AS mean_ms,
    (max_exec_time)::numeric(10, 2) AS max_ms,
    (max_exec_time / mean_exec_time)::numeric(10, 1) AS max_over_mean,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE calls > 100  -- ignore one-offs
  AND max_exec_time > 1000  -- focus on >1s outliers
ORDER BY max_exec_time DESC
LIMIT 10;

max_over_mean greater than 10× often points at parameter-skew issues: the generic plan is OK for 99% of values but disastrous for 1%. Cross-reference 13-cursors-and-prepares.md on plan_cache_mode.

Recipe 4: Queries spilling to temp files

work_mem undersizing manifests here:

SELECT
    queryid,
    calls,
    (mean_exec_time)::numeric(10, 2) AS mean_ms,
    pg_size_pretty(temp_blks_read::bigint * 8192) AS temp_read,
    pg_size_pretty(temp_blks_written::bigint * 8192) AS temp_written,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 20;

Any non-zero temp_blks_written indicates a sort, hash, or CTE materialize that exceeded work_mem. Three remediations in priority order: (1) raise work_mem per-role for the offending workload (cross-reference 54-memory-tuning.md Recipe 3), (2) raise hash_mem_multiplier (PG13+) if Hash nodes dominate, (3) add an index that lets the planner avoid the sort/hash entirely.

Recipe 5: Queries emitting most WAL

Targets write-amplification hot spots:

SELECT
    queryid,
    calls,
    pg_size_pretty(wal_bytes::bigint) AS total_wal,
    pg_size_pretty((wal_bytes / nullif(calls, 0))::bigint) AS wal_per_call,
    wal_fpi,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE wal_bytes > 0
ORDER BY wal_bytes DESC
LIMIT 20;

High wal_fpi per call indicates full-page images dominate — usually means a HOT-update failure on a hot indexed column (cross-reference 30-hot-updates.md) or checkpoints too frequent (cross-reference 34-checkpoints-bgwriter.md). On PG18+, additionally check wal_buffers_full to detect WAL writer pressure.

Recipe 6: Per-query cache-hit profile

Identifies queries with poor working-set residency:

SELECT
    queryid,
    calls,
    shared_blks_hit + shared_blks_read AS total_blks,
    round(100.0 * shared_blks_hit /
          nullif(shared_blks_hit + shared_blks_read, 0), 1) AS hit_pct,
    pg_size_pretty(shared_blks_read::bigint * 8192) AS disk_read,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE shared_blks_hit + shared_blks_read > 1000  -- meaningful volume
ORDER BY shared_blks_read DESC
LIMIT 20;

hit_pct < 95% on a high-volume query is a working-set-vs-shared_buffers mismatch; raise shared_buffers if RAM is available (cross-reference 32-buffer-manager.md Recipe 1) or add an index that reduces the scan footprint.

Recipe 7: JIT-impacted queries (PG15+)

JIT is a default-on PG12+ feature whose default thresholds (jit_above_cost = 100000) are appropriate for analytics but may waste cycles on short OLTP queries:

SELECT
    queryid,
    calls,
    jit_functions,
    (jit_generation_time + jit_inlining_time +
     jit_optimization_time + jit_emission_time)::numeric(10, 2) AS jit_total_ms,
    (total_exec_time)::numeric(10, 2) AS exec_total_ms,
    round(100.0 * (jit_generation_time + jit_inlining_time +
                   jit_optimization_time + jit_emission_time) /
          nullif(total_exec_time, 0), 1) AS jit_pct,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE jit_functions > 0
ORDER BY jit_total_ms DESC
LIMIT 20;

jit_pct > 20% means JIT compilation cost is a meaningful fraction of total time — raise jit_above_cost cluster-wide or SET LOCAL jit = off per-session for the affected queries. Cross-reference 61-jit-compilation.md.

Recipe 8: Planning-dominates-execution detection (track_planning=on)

Requires pg_stat_statements.track_planning = on. Identifies prepared-statement misuse and generic-vs-custom plan flips:

-- One-time enable (requires reload; or include in ALTER SYSTEM)
ALTER SYSTEM SET pg_stat_statements.track_planning = on;
SELECT pg_reload_conf();

-- After a sample period
SELECT
    queryid,
    calls,
    (mean_plan_time)::numeric(10, 2) AS mean_plan_ms,
    (mean_exec_time)::numeric(10, 2) AS mean_exec_ms,
    round(mean_plan_time / nullif(mean_exec_time, 0), 1) AS plan_over_exec,
    substr(query, 1, 100) AS query
FROM pg_stat_statements
WHERE plans > 100
  AND mean_plan_time > mean_exec_time  -- planning dominates
ORDER BY total_plan_time DESC
LIMIT 20;

mean_plan_time > mean_exec_time on a frequently-called query is the signature of "the application opens a new connection and re-plans the same SQL every call." The fix is either prepared statements (where the planning is amortized) or a connection pooler that retains plans (cross-reference 13-cursors-and-prepares.md and 81-pgbouncer.md). Always turn track_planning off again after diagnosis — the verbatim performance penalty quote applies.

Recipe 9: Reset only minmax tails (PG17+)

Sometimes you want to clear outlier max-times without losing cumulative totals — e.g., after a one-time slow query that distorts the dashboard:

-- PG17+ only
SELECT pg_stat_statements_reset(0, 0, 0, true) AS reset_time;
-- Returns: timestamptz of the reset

-- For a single query
SELECT pg_stat_statements_reset(0, 0, <queryid>, true);

After the call, min_exec_time, max_exec_time, min_plan_time, max_plan_time are reset; calls, total_exec_time, mean_exec_time, etc. are preserved. Pre-PG17 there is no way to do this — the only option is a full reset.

Recipe 10: Correlate aggregate stats with a running query

The canonical "this query is hung right now AND it's the slow one in the dashboard" join:

SELECT
    a.pid,
    a.state,
    a.wait_event_type,
    a.wait_event,
    (now() - a.query_start)::interval AS running_for,
    s.calls,
    (s.mean_exec_time)::numeric(10, 2) AS historical_mean_ms,
    (s.max_exec_time)::numeric(10, 2) AS historical_max_ms,
    a.query
FROM pg_stat_activity a
LEFT JOIN pg_stat_statements s ON s.queryid = a.query_id
WHERE a.state = 'active'
  AND a.pid != pg_backend_pid()
  AND a.query_start < now() - interval '10 seconds'
ORDER BY a.query_start ASC;

Requires compute_query_id = auto (default) or on, and pg_stat_statements loaded. The LEFT JOIN accommodates queries new enough that they don't yet appear in pg_stat_statements. Cross-reference 58-performance-diagnostics.md for the full pg_stat_activity surface.

Recipe 11: Audit dealloc pressure

Detects buffer-undersizing operationally:

SELECT
    dealloc,
    stats_reset,
    age(now(), stats_reset) AS time_since_reset,
    (dealloc::numeric / nullif(extract(epoch FROM age(now(), stats_reset)), 0) * 3600)
        ::numeric(10, 2) AS dealloc_per_hour,
    (SELECT count(*) FROM pg_stat_statements) AS entries,
    current_setting('pg_stat_statements.max')::int AS max_entries
FROM pg_stat_statements_info;

Interpretation tiers:

  • dealloc_per_hour = 0 — buffer adequately sized
  • dealloc_per_hour < 10 — minor pressure, fine
  • dealloc_per_hour 10–1000 — buffer undersized by ~2×; raise pg_stat_statements.max to 10000 or 20000
  • dealloc_per_hour > 1000 — severe pressure; raise max to 50000 + investigate whether track_utility = on is needed (turning it off may be cleaner than raising max indefinitely)

Recipe 12: Track query stability across deployments (PG17+)

Pre-PG17 the only delta computation is "snapshot, wait, diff" with two separate snapshots. PG17+ adds per-row stats_since and minmax_stats_since so you can compute deltas without locking:

-- PG17+ delta-without-reset pattern
WITH baseline AS (
    SELECT queryid, query, calls, total_exec_time, stats_since
    FROM pg_stat_statements
)
SELECT
    queryid,
    calls - lag(calls) OVER (PARTITION BY queryid ORDER BY stats_since)
        AS calls_delta,
    total_exec_time - lag(total_exec_time) OVER (PARTITION BY queryid ORDER BY stats_since)
        AS exec_time_delta,
    substr(query, 1, 100) AS query
FROM baseline
WHERE stats_since > now() - interval '1 hour'
ORDER BY exec_time_delta DESC NULLS LAST
LIMIT 20;

Pre-PG17 alternative: capture a snapshot before the deploy:

CREATE TABLE pss_baseline AS
SELECT queryid, calls, total_exec_time, total_plan_time,
       shared_blks_hit, shared_blks_read, wal_bytes,
       now() AS captured_at
FROM pg_stat_statements;

-- After the deploy + sample period
SELECT
    s.queryid,
    s.calls - coalesce(b.calls, 0) AS calls_delta,
    s.total_exec_time - coalesce(b.total_exec_time, 0) AS exec_delta,
    substr(s.query, 1, 100) AS query
FROM pg_stat_statements s
LEFT JOIN pss_baseline b ON b.queryid = s.queryid
ORDER BY (s.total_exec_time - coalesce(b.total_exec_time, 0)) DESC
LIMIT 20;

Recipe 13: Find regressed queries after deploy

A specific cut of Recipe 12 — comparing two snapshots to find queries whose mean execution time got worse:

-- Snapshot A (before deploy): saved to pss_before
-- Snapshot B (after deploy + bake time): from pg_stat_statements now

WITH delta AS (
    SELECT
        s.queryid,
        s.calls - b.calls AS new_calls,
        s.total_exec_time - b.total_exec_time AS new_exec_time,
        (s.total_exec_time - b.total_exec_time) /
            nullif(s.calls - b.calls, 0) AS new_mean_ms,
        b.total_exec_time / nullif(b.calls, 0) AS old_mean_ms,
        s.query
    FROM pg_stat_statements s
    JOIN pss_before b ON b.queryid = s.queryid
    WHERE s.calls > b.calls + 100  -- only queries called many times in the window
)
SELECT
    queryid,
    new_calls,
    old_mean_ms::numeric(10, 2) AS before_ms,
    new_mean_ms::numeric(10, 2) AS after_ms,
    round((new_mean_ms / nullif(old_mean_ms, 0)), 2) AS slowdown_ratio,
    substr(query, 1, 100) AS query
FROM delta
WHERE new_mean_ms > old_mean_ms * 1.5  -- 50%+ slower
ORDER BY slowdown_ratio DESC
LIMIT 20;

Couple this with auto_explain capture during the same window to grab the plan of the regressed query — cross-reference 56-explain.md Recipe 12.

Gotchas / Anti-patterns

  1. shared_preload_libraries requires server restart. ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements' then pg_reload_conf() does NOT load the extension. The pending_restart flag will be set in pg_settings. Cross-reference 53-server-configuration.md gotcha #4.

  2. queryid is not stable across major versions. Verbatim docs warning3. Dashboards pinned to queryid break on pg_upgrade. Re-attribute via query text or regenerate dashboards using the new queryid space.

  3. track_planning = off by default; plan-time columns are zero until enabled. A common debugging frustration: querying mean_plan_time and getting all zeros even though the user expected planning data. Enable explicitly, wait for traffic, then query.

  4. pg_stat_statements.max = 5000 is too small for OLTP fleets. A web application with ORM-generated SQL routinely produces tens of thousands of distinct queryids. dealloc in pg_stat_statements_info is the diagnostic; raise to 10000–50000.

  5. queryid is sensitive to search_path. Two roles running the same SQL with different search_path see different queryids2. Standardize search_path via ALTER ROLE app SET search_path = … to consolidate.

  6. queryid is sensitive to schema-qualified names. SELECT * FROM users vs SELECT * FROM public.users are different queryids even with search_path = public. Pick a convention and stick to it.

  7. Non-superuser sees stats but not query text. Verbatim: query text replaced with <insufficient privilege> for queries of other users10. Grant pg_read_all_stats to monitoring roles.

  8. blk_read_time / blk_write_time renamed in PG17. Verbatim release-note13. Any pre-PG17 dashboard or extension querying blk_read_time returns "column does not exist" error on PG17+; use shared_blk_read_time and the new local_blk_read_time instead.

  9. JIT columns are PG15+, not PG14. A frequent planning-note error. Verbatim PG15 release note: "Add JIT counters to pg_stat_statements"15. PG14 docs do not list these columns; queries SELECT'ing them on PG14 error out.

  10. jit_deform_count / jit_deform_time are PG17+, not PG18. Another common version misattribution. Verbatim PG17: "Add JIT deform_counter details to pg_stat_statements"17.

  11. wal_records / wal_fpi / wal_bytes are PG13+, not PG14. Present in PG14 docs but introduced earlier. Be specific about minimum-version requirements in monitoring tooling.

  12. track_utility = on (default) means SAVEPOINT, BEGIN, COMMIT count toward max. On busy OLTP clusters, framework-generated SAVEPOINT s_<uuid> statements (pre-PG17 each name = distinct queryid) can dominate the buffer entirely. Two fixes: (a) upgrade to PG17+ (names normalized), or (b) pg_stat_statements.track_utility = off (cleaner but loses DDL visibility).

  13. Pre-PG17 each SAVEPOINT name is a separate row. PG17+ replaces names with placeholders24. Pre-PG17 clusters running PgBouncer-with-server-side-savepoints accumulate thousands of SAVEPOINT <random_name> entries.

  14. Pre-PG17 each PREPARE TRANSACTION GID is a separate row. PG17+ normalizes25. Distributed-transaction-heavy workloads (XA, 2PC over postgres_fdw) suffer here on PG≤16.

  15. Pre-PG17 each CALL with literal arguments is a separate row. PG17+ normalizes parameters26.

  16. PG≤17 does NOT track CREATE TABLE AS or DECLARE. PG18+ tracks both with queryids20. On pre-PG18 clusters these statements are invisible to pg_stat_statements regardless of track_utility.

  17. PG≤17 does NOT parameterize SET values. Different SET work_mem = '256MB' and SET work_mem = '512MB' calls have distinct queryids on PG≤17. PG18+ normalizes21.

  18. pg_stat_statements_reset() returns void pre-PG17, timestamptz PG17+. Tooling that captures the return value breaks across the upgrade. Verbatim PG17 signature change7.

  19. pg_stat_statements_reset requires superuser by default. Verbatim: "By default, this function can only be executed by superusers."30 Grant explicitly: GRANT EXECUTE ON FUNCTION pg_stat_statements_reset(oid, oid, bigint) TO ops_user; — note that the signature changed in PG17, grant the correct one.

  20. compute_query_id = auto is on by default ONLY when an extension requests it. A cluster with pg_stat_statements not loaded has compute_query_id = auto but pg_stat_activity.query_id is always NULL. Setting compute_query_id = on forces computation regardless of extensions.

  21. save = on (default) persists only across clean shutdown. A crash, SIGKILL, or OOM-killer event loses all stats. The verbatim docs say "saved across server shutdowns" — server-crash is not a shutdown6.

  22. Stats lost on pg_upgrade. Verbatim: queryid not stable across majors3. After upgrade the file is regenerated empty; do not rely on long-term historical comparisons.

  23. Pre-parameterized application queries already use placeholders. A query sent through libpq with bind parameters (PREPARE foo AS SELECT * FROM t WHERE id = $1; EXECUTE foo(5)) is already parameterized; pg_stat_statements stores it as-is. Two clients sending the same EXECUTE foo(5) and EXECUTE foo(47) collapse to one row. Different from the literal-substitution behavior — same end state, different mechanism.

See Also

Sources

Footnotes

  1. PG16 pg_stat_statements docs — "Installation" section. https://www.postgresql.org/docs/16/pgstatstatements.html

  2. PG16 pg_stat_statements docs — "queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings." https://www.postgresql.org/docs/16/pgstatstatements.html 2 3

  3. PG16 pg_stat_statements docs — "Furthermore, it is not safe to assume that queryid will be stable across major versions of PostgreSQL." https://www.postgresql.org/docs/16/pgstatstatements.html 2 3 4

  4. PG16 pg_stat_statements docs — "pg_stat_statements.max is the maximum number of statements tracked by the module ... If more distinct statements than that are observed, information about the least-executed statements is discarded ... The default value is 5000." https://www.postgresql.org/docs/16/pgstatstatements.html 2

  5. PG16 pg_stat_statements docs — "Enabling this parameter may incur a noticeable performance penalty, especially when statements with identical query structure are executed by many concurrent connections which compete to update a small number of pg_stat_statements entries. The default value is off." https://www.postgresql.org/docs/16/pgstatstatements.html 2 3

  6. PG16 pg_stat_statements docs — "pg_stat_statements.save specifies whether to save statement statistics across server shutdowns ... The default value is on." https://www.postgresql.org/docs/16/pgstatstatements.html 2 3 4

  7. PG17 pg_stat_statements docs — "When minmax_only is true only the values of minimum and maximum planning and execution time will be reset (i.e. min_plan_time, max_plan_time, min_exec_time and max_exec_time fields). The default value for minmax_only parameter is false." https://www.postgresql.org/docs/17/pgstatstatements.html 2 3 4

  8. PG16 runtime-config-statistics — "Valid values are off (always disabled), on (always enabled), auto, which lets modules such as pg_stat_statements automatically enable it ... The default is auto." https://www.postgresql.org/docs/16/runtime-config-statistics.html 2

  9. PG17 release notes — "Add pg_stat_statements columns stats_since and minmax_stats_since to show last-reset times." https://www.postgresql.org/docs/release/17.0/ 2 3

  10. PG16 pg_stat_statements docs — "For security reasons, only superusers and roles with privileges of the pg_read_all_stats role are allowed to see the SQL text and queryid of queries executed by other users." https://www.postgresql.org/docs/16/pgstatstatements.html 2

  11. PG13 release notes — "Allow EXPLAIN, auto_explain, autovacuum, and pg_stat_statements to track WAL usage statistics." https://www.postgresql.org/docs/release/13.0/

  12. PG14 release notes — "Cause pg_stat_statements to track top and nested statements separately (Julien Rouhaud). Use the new toplevel column to distinguish them." https://www.postgresql.org/docs/release/14.0/ 2

  13. PG17 release notes — "Rename pg_stat_statements columns blk_read_time and blk_write_time to shared_blk_read_time and shared_blk_write_time." https://www.postgresql.org/docs/release/17.0/ 2 3 4

  14. PG15 release notes — "Add counters for temporary file block I/O" (the temp_blk_read_time / temp_blk_write_time columns of pg_stat_statements). https://www.postgresql.org/docs/release/15.0/ 2

  15. PG15 release notes — "Add JIT counters to pg_stat_statements (Magnus Hagander)." https://www.postgresql.org/docs/release/15.0/ 2 3

  16. PG17 release notes — "Add pg_stat_statements columns to report local block I/O timings." https://www.postgresql.org/docs/release/17.0/ 2

  17. PG17 release notes — "Add JIT deform_counter details to pg_stat_statements." https://www.postgresql.org/docs/release/17.0/ 2 3

  18. PG18 release notes — "Add pg_stat_statements columns to report parallel activity (Guillaume Lelarge). The new columns are parallel_workers_to_launch and parallel_workers_launched." https://www.postgresql.org/docs/18/release-18.html 2

  19. PG18 release notes — "Add pg_stat_statements.wal_buffers_full to report full WAL buffers (Bertrand Drouvot)." https://www.postgresql.org/docs/18/release-18.html 2

  20. PG18 release notes — "Allow the queries of CREATE TABLE AS and DECLARE to be tracked by pg_stat_statements (Anthonin Bonnefoy). They are also now assigned query ids." https://www.postgresql.org/docs/18/release-18.html 2 3

  21. PG18 release notes — "Allow the parameterization of SET values in pg_stat_statements (Greg Sabino Mullane, Michael Paquier). This reduces the bloat caused by SET statements with differing constants." https://www.postgresql.org/docs/18/release-18.html 2 3

  22. PG16 pg_stat_statements docs — "Total number of times pg_stat_statements entries about the least-executed statements were deallocated because more distinct statements than pg_stat_statements.max were observed." https://www.postgresql.org/docs/16/pgstatstatements.html 2

  23. PG16 pg_stat_statements docs — "The hashing process is also sensitive to differences in machine architecture and other facets of the platform." https://www.postgresql.org/docs/16/pgstatstatements.html

  24. PG17 release notes — "Replace savepoint names stored in pg_stat_statements with placeholders (Greg Sabino Mullane). This greatly reduces the number of entries needed to record SAVEPOINT, RELEASE SAVEPOINT, and ROLLBACK TO SAVEPOINT commands." https://www.postgresql.org/docs/release/17.0/ 2 3

  25. PG17 release notes — "Replace the two-phase commit GIDs stored in pg_stat_statements with placeholders (Michael Paquier). This greatly reduces the number of entries needed to record PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED." https://www.postgresql.org/docs/release/17.0/ 2 3

  26. PG17 release notes — "Replace CALL arguments stored in pg_stat_statements with placeholders." https://www.postgresql.org/docs/release/17.0/ 2 3

  27. PG17 release notes — "Track DEALLOCATE statements in pg_stat_statements with placeholders for the prepared-statement names." https://www.postgresql.org/docs/release/17.0/ 2

  28. PG16 pg_stat_statements docs — "Specify top to track top-level statements ... all to also track nested statements ... or none to disable statement statistics collection. The default value is top." https://www.postgresql.org/docs/16/pgstatstatements.html 2

  29. PG16 pg_stat_statements docs — "Utility commands are all those other than SELECT, INSERT, UPDATE, DELETE, and MERGE. The default value is on." https://www.postgresql.org/docs/16/pgstatstatements.html 2

  30. PG16 pg_stat_statements docs — "By default, this function can only be executed by superusers. Access may be granted to others using GRANT." https://www.postgresql.org/docs/16/pgstatstatements.html 2

references

01-syntax-ddl.md

02-syntax-dql.md

03-syntax-dml.md

04-ctes.md

05-views.md

06-functions.md

07-procedures.md

08-plpgsql.md

09-procedural-languages.md

10-dynamic-sql.md

11-window-functions.md

12-aggregates-grouping.md

13-cursors-and-prepares.md

14-data-types-builtin.md

15-data-types-custom.md

16-arrays.md

17-json-jsonb.md

18-uuid-numeric-money.md

19-timestamp-timezones.md

20-text-search.md

21-hstore.md

22-indexes-overview.md

23-btree-indexes.md

24-gin-gist-indexes.md

25-brin-hash-spgist-bloom-indexes.md

26-index-maintenance.md

27-mvcc-internals.md

28-vacuum-autovacuum.md

29-transaction-id-wraparound.md

30-hot-updates.md

31-toast.md

32-buffer-manager.md

33-wal.md

34-checkpoints-bgwriter.md

35-partitioning.md

36-inheritance.md

37-constraints.md

38-foreign-keys-deep.md

39-triggers.md

40-event-triggers.md

41-transactions.md

42-isolation-levels.md

43-locking.md

44-advisory-locks.md

45-listen-notify.md

46-roles-privileges.md

47-row-level-security.md

48-authentication-pg-hba.md

49-tls-ssl.md

50-encryption-pgcrypto.md

51-pgaudit.md

52-rules-system.md

53-server-configuration.md

54-memory-tuning.md

55-statistics-planner.md

56-explain.md

57-pg-stat-statements.md

58-performance-diagnostics.md

59-planner-tuning.md

60-parallel-query.md

61-jit-compilation.md

62-tablespaces.md

63-internals-architecture.md

64-system-catalogs.md

65-collations-encoding.md

66-bulk-operations-copy.md

67-cli-tools.md

68-pgbench.md

69-extensions.md

70-fdw.md

71-large-objects.md

72-extension-development.md

73-streaming-replication.md

74-logical-replication.md

75-replication-slots.md

76-logical-decoding.md

77-standby-failover.md

78-ha-architectures.md

79-patroni.md

80-connection-pooling.md

81-pgbouncer.md

82-monitoring.md

83-backup-pg-dump.md

84-backup-physical-pitr.md

85-backup-tools.md

86-pg-upgrade.md

87-major-version-upgrade.md

88-corruption-recovery.md

89-pg-rewind.md

90-disaster-recovery.md

91-docker-postgres.md

92-kubernetes-operators.md

93-pg-trgm.md

94-pgvector.md

95-postgis.md

96-timescaledb.md

97-citus.md

98-pg-cron.md

99-pg-partman.md

100-pg-versions-features.md

101-managed-vs-baremetal.md

102-skill-cookbook.md

SKILL.md

tile.json