Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.
94
94%
Does it follow best practices?
Impact
94%
1.36xAverage score across 3 eval scenarios
Passed
No known issues
SELECT, Joins, Set Operations, LIMIT/FETCHSELECT grammarSELECT expressionsFROM clause formsINNER, LEFT, RIGHT, FULL, CROSS, NATURALUSING vs ON vs implicit cross-joinLATERAL subqueries and functionsWHERE and subqueries (EXISTS, IN, ANY, ALL)GROUP BY, HAVING, and grouping-set extensionsDISTINCT and DISTINCT ONUNION, INTERSECT, EXCEPTORDER BY — NULLS FIRST/LAST, USING operatorLIMIT / OFFSET / FETCH FIRST ... WITH TIESFOR UPDATE / FOR SHARE row lockingTABLESAMPLE — BERNOULLI and SYSTEMWITH ORDINALITY and ROWS FROMVALUES as a row sourceTABLE shorthandJSON_TABLE (PG17+)Load this file when the question involves:
SELECT statement: target list, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSETLATERAL joinUSING and ON and how each affects the output column listDISTINCT ON for "first row per group" queriesDISTINCT vs ALL semanticsFETCH FIRST n ROWS WITH TIES (the SQL-standard cousin of LIMIT)NULLS FIRST / NULLS LAST placement under ORDER BY ASC and DESCFOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR KEY SHARE, plus NOWAIT / SKIP LOCKEDTABLESAMPLE for fast statistical samplesJSON_TABLE (PG17+) for shredding JSON into relational rowsFor deep dives, route to the topical files: window functions (11-window-functions.md), aggregates and grouping sets in detail (12-aggregates-grouping.md), CTEs and recursive queries (04-ctes.md), the locking taxonomy (43-locking.md), planner behavior (56-explain.md).
SELECT grammarThis is the full surface of SELECT as of PostgreSQL 16:1
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
[ { * | expression [ [ AS ] output_name ] } [, ...] ]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression
[ ASC | DESC | USING operator ]
[ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS }
{ ONLY | WITH TIES } ]
[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE }
[ OF table_name [, ...] ]
[ NOWAIT | SKIP LOCKED ] [...] ]Logical evaluation order (independent of how you write the clauses) is:
FROM → WHERE → GROUP BY → HAVING → SELECT (target list)
→ DISTINCT → UNION/INTERSECT/EXCEPT → ORDER BY → LIMIT/OFFSETThat order is why you cannot reference a SELECT-list alias from WHERE (it does not yet exist) but can reference it from ORDER BY (it does).2
SELECT expressions-- Every column from a single source.
SELECT * FROM orders;
-- All columns from one specific FROM-item with table.* notation.
SELECT o.*, c.name AS customer_name FROM orders o JOIN customers c ON c.id = o.customer_id;
-- Expressions, casts, aliases.
SELECT
o.id,
o.amount_cents / 100.0 AS amount_dollars,
o.created_at::date AS order_date,
o.amount_cents > 10000 AS is_big_ticket
FROM orders o;
-- Bare aliases (the AS keyword is optional but recommended for clarity).
SELECT id "Order ID", amount_cents amt FROM orders; -- legal but uglyORDER BY and (since PG, always) to the result-set column names — but not to WHERE, GROUP BY, or HAVING.-- Scalar subquery: must return ≤ 1 row × 1 column.
SELECT
c.id,
c.name,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count
FROM customers c;[!WARNING] Scalar subquery cardinality If the inner query returns more than one row at runtime, you get
ERROR: more than one row returned by a subquery used as an expression. Either add aLIMIT 1(only when you genuinely want any one row) or rewrite as aLEFT JOIN ... GROUP BY— theLEFT JOINis almost always faster as well, because the planner can avoid the per-outer-row evaluation pattern that scalar subqueries imply.
FROM clause forms-- 1. A table (optionally with ONLY to skip inheritance children / partitions).
FROM orders
FROM ONLY orders -- only the parent, not partitions
FROM orders * -- redundant; "*" is the historical "include children" marker
-- 2. A subquery (must be aliased).
FROM (SELECT id, amount_cents FROM orders WHERE created_at > now() - interval '7 days') o
-- 3. A function call.
FROM generate_series(1, 100) AS s(n)
FROM unnest(ARRAY['a','b','c']) AS letter
FROM jsonb_each(my_jsonb_col) AS j(k, v)
-- 4. A WITH-query reference.
WITH recent AS (SELECT * FROM orders WHERE created_at > now() - interval '7 days')
SELECT * FROM recent;
-- 5. A VALUES expression (synthetic rows).
FROM (VALUES (1,'a'), (2,'b'), (3,'c')) AS t(id, letter)
-- 6. ROWS FROM(...) (parallel columns from N functions, "zipped" row-wise).
FROM ROWS FROM (generate_series(1,3), generate_series(10,12)) AS r(a, b)Every FROM-item can carry an alias (with optional column aliases), and the alias replaces the table name for the rest of the query:3
SELECT o.id, o.amount_cents
FROM orders AS o (id, customer_id, amount_cents, created_at) -- column aliases
WHERE o.id = 1;[!NOTE] Subqueries and function calls in
FROMmust have an alias (inpsqlyou'll seeERROR: subquery in FROM must have an alias). A plain table is the onlyFROM-item where the alias is optional.
INNER, LEFT, RIGHT, FULL, CROSS, NATURAL| Form | Behavior |
|---|---|
T1 [INNER] JOIN T2 ON cond | Rows where cond is true. INNER is optional. |
T1 LEFT [OUTER] JOIN T2 ON cond | All T1, NULL-pad T2 when no match. |
T1 RIGHT [OUTER] JOIN T2 ON cond | Mirror of LEFT. Avoid; flip the join order instead. |
T1 FULL [OUTER] JOIN T2 ON cond | All rows from both sides, NULL-pad missing side. |
T1 CROSS JOIN T2 | Cartesian product. Equivalent to T1, T2 and to T1 INNER JOIN T2 ON TRUE. |
T1 NATURAL JOIN T2 | INNER JOIN on every commonly-named column. Avoid (see Gotchas). |
-- INNER (the default)
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id;
-- LEFT — find orders without a customer row (data-quality probe)
SELECT o.id
FROM orders o
LEFT JOIN customers c ON c.id = o.customer_id
WHERE c.id IS NULL;
-- FULL — for set difference / overlap queries
SELECT COALESCE(a.id, b.id) AS id,
a.value AS value_left,
b.value AS value_right
FROM table_a a
FULL JOIN table_b b USING (id);
-- CROSS — date-spine times region-spine
SELECT d.day, r.region
FROM generate_series(date '2026-01-01', date '2026-01-31', interval '1 day') AS d(day)
CROSS JOIN (VALUES ('US'),('EU'),('APAC')) AS r(region);[!NOTE]
RIGHT JOINis the mirror ofLEFT JOIN. There is no execution-plan difference once the planner has its way with it, butLEFTis conventional — keep the "preserved" side on the left and your code will read more consistently.
USING vs ON vs implicit cross-join-- ON: the most general; the join columns appear once for each side.
SELECT o.id, o.customer_id, c.id AS c_id, c.name
FROM orders o JOIN customers c ON c.id = o.customer_id;
-- ↑↑↑↑ ↑↑↑↑↑
-- Both o.customer_id and c.id are in the output.
-- USING: only one column per join-key in the output (the "merged" column).
SELECT id, customer_id, name
FROM orders JOIN customers USING (id, customer_id);
-- ↑ the USING columns collapse to one each;
-- the rest are appended in left-then-right order
-- NATURAL: USING on every commonly-named column. AVOID.
SELECT * FROM orders NATURAL JOIN customers;
-- Implicit cross product (comma-separated) — equivalent to CROSS JOIN.
SELECT *
FROM orders, customers
WHERE customers.id = orders.customer_id;[!WARNING]
NATURAL JOINis a foot-gunNATURAL JOINsilently picks up any new commonly-named column when the schema changes (e.g. someone addscreated_atto both tables andNATURAL JOINstarts joining on it too). Use explicitUSING (...)orON .... Even Postgres committers will tell you not to useNATURAL JOINin production code.
[!WARNING]
WHEREfilters after the join;ONfilters during it ForINNER JOINthe two are interchangeable. ForOUTER JOINthey are not:4-- ON: keeps the unmatched left rows and NULL-pads them. FROM a LEFT JOIN b ON (a.id = b.id AND b.val > 5) -- WHERE: silently turns the LEFT JOIN into an INNER JOIN, because the -- NULL-padded rows fail the b.val > 5 predicate and are filtered out. FROM a LEFT JOIN b ON (a.id = b.id) WHERE b.val > 5If you want the "match-condition-only" behavior on an outer join, the predicate must be in
ON. If you want a post-join filter that also drops unmatched rows, you wanted anINNER JOINto begin with.
LATERAL subqueries and functionsLATERAL lets a FROM-item reference columns from FROM-items that appear earlier in the same FROM clause. Without LATERAL, sibling FROM-items cannot see each other's columns.2
-- Top-3 most recent orders per customer.
SELECT c.id, recent.*
FROM customers c
CROSS JOIN LATERAL (
SELECT id, amount_cents, created_at
FROM orders
WHERE customer_id = c.id -- ← references c, only legal because of LATERAL
ORDER BY created_at DESC
LIMIT 3
) AS recent;
-- LATERAL with LEFT JOIN to keep customers who have zero orders.
SELECT c.id, recent.*
FROM customers c
LEFT JOIN LATERAL (
SELECT id, amount_cents, created_at
FROM orders
WHERE customer_id = c.id
ORDER BY created_at DESC
LIMIT 3
) AS recent ON TRUE;
-- LATERAL is implicit for set-returning functions in FROM.
SELECT p.id, vertex
FROM polygons p, unnest(p.vertices) AS vertex; -- LATERAL impliedWhen to reach for LATERAL:
unnest, jsonb_array_elements, regexp_matches, ...) with arguments from an earlier tableSELECT (compute it once in a LATERAL subquery aliased calc, then reference calc.foo, calc.bar)-- "Re-use a complex expression several times" pattern.
SELECT o.id,
calc.gross,
calc.tax,
calc.gross + calc.tax AS total
FROM orders o
CROSS JOIN LATERAL (
SELECT (o.amount_cents * 1.0) AS gross,
round(o.amount_cents * 0.0875, 0) AS tax
) AS calc;WHERE and subqueries (EXISTS, IN, ANY, ALL)-- Plain WHERE predicates.
WHERE o.amount_cents > 10000
AND o.created_at >= now() - interval '7 days'
AND o.status IN ('paid','refunded')
-- EXISTS: typically the fastest pattern for "row exists in another table".
WHERE EXISTS (SELECT 1 FROM order_items oi WHERE oi.order_id = o.id)
-- NOT EXISTS: NULL-safe anti-join (preferred form).
WHERE NOT EXISTS (SELECT 1 FROM refunds r WHERE r.order_id = o.id)
-- IN with a list, IN with a subquery.
WHERE o.status IN ('paid','refunded')
WHERE o.customer_id IN (SELECT id FROM customers WHERE region = 'EU')
-- ANY / SOME / ALL (equivalent operators).
WHERE o.amount_cents = ANY (ARRAY[100, 200, 300])
WHERE o.amount_cents >= ALL (SELECT min_threshold FROM tiers WHERE active)| Pattern | What it means | Notes |
|---|---|---|
x IN (list) | x = ANY (list) | Fine for small static lists. |
x IN (subquery) | x = ANY (subquery) | Planner usually rewrites to semi-join. |
x NOT IN (subquery) | NOT (x = ANY (subquery)) | NULL-unsafe. If the subquery yields any NULL, the whole expression is UNKNOWN and no rows match. |
EXISTS (subquery) / NOT EXISTS (subquery) | Row-existence test | NULL-safe. Preferred for anti-joins. |
x = ANY (subquery) | "Any one row matches x" | Semi-join. |
x = ALL (subquery) | "Every row matches x" | Often degenerates into a = MAX/MIN check; usually clearer rewritten. |
[!WARNING] Never use
NOT INwith a subquery UseNOT EXISTSinstead. Even if the column you are matching on isNOT NULLtoday, a future schema change can introduce a NULL and silently drop your result set to zero rows.NOT EXISTSis NULL-safe in every case and the planner can choose an anti-join.
GROUP BY, HAVING, and grouping-set extensions-- Basic grouping.
SELECT customer_id, COUNT(*) AS n, SUM(amount_cents) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(amount_cents) > 100000;
-- GROUP BY can reference SELECT-list position or alias (the latter is a PG extension).
SELECT customer_id, COUNT(*) FROM orders GROUP BY 1; -- position
SELECT customer_id AS cid, COUNT(*) FROM orders GROUP BY cid; -- alias (PG-only)
-- GROUPING SETS — explicit multiple groupings in one pass.
SELECT region, product, SUM(qty)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), (product), ());
-- ROLLUP — n+1 successively shorter prefix groupings.
SELECT region, product, SUM(qty)
FROM sales
GROUP BY ROLLUP (region, product);
-- → groupings: (region,product), (region), ()
-- CUBE — 2^n groupings (the power set).
SELECT region, product, channel, SUM(qty)
FROM sales
GROUP BY CUBE (region, product, channel);[!NOTE] PostgreSQL 14
GROUP BY DISTINCTremoves duplicate grouping sets that are produced when you combine multipleROLLUP/CUBEclauses. The release note: "AllowDISTINCTto be added toGROUP BYto remove duplicateGROUPING SETcombinations."5-- Without DISTINCT, ((a,b),(a),(), (a,c),(a),()) — note (a) and () repeat. -- With DISTINCT, ((a,b),(a,c),(a),()). SELECT a, b, c, COUNT(*) FROM t GROUP BY DISTINCT ROLLUP(a,b), ROLLUP(a,c);This deduplicates grouping sets, not output rows.
SELECT DISTINCTstill applies to the projection.
Deep-dive coverage of GROUPING(...) and per-grouping-set NULL semantics lives in 12-aggregates-grouping.md.
DISTINCT and DISTINCT ON-- DISTINCT: remove duplicate output rows (across the entire SELECT list).
SELECT DISTINCT region FROM sales;
-- DISTINCT ON (cols): for each unique combination of (cols), keep the FIRST row
-- as defined by ORDER BY. The ORDER BY must start with the same columns.
SELECT DISTINCT ON (customer_id) customer_id, id, created_at
FROM orders
ORDER BY customer_id, created_at DESC;
-- → most-recent order per customer.DISTINCT ON is a PostgreSQL extension (not SQL standard) but it is the most efficient "top-1 per group" pattern in PG when you also have a btree index on (group_col, sort_col DESC). The alternative — a window function — almost always plans worse for top-1.
[!WARNING]
DISTINCT ONandORDER BYmust agree The leadingORDER BYkeys must match theDISTINCT ONcolumns, in the same order. Otherwise the planner emits "SELECT DISTINCT ON expressions must match initial ORDER BY expressions." Add any additional tie-breaker columns after the DISTINCT-ON keys.
UNION, INTERSECT, EXCEPTAll three combine the result sets of two SELECTs with identical column counts and compatible column types.
-- UNION: A or B, duplicates removed (DEFAULT). Slow on big sets.
SELECT id FROM customers_a UNION SELECT id FROM customers_b;
-- UNION ALL: A or B, duplicates kept. Fast — use this unless you specifically need dedup.
SELECT id FROM customers_a UNION ALL SELECT id FROM customers_b;
-- INTERSECT: rows in both A and B (default DISTINCT; INTERSECT ALL is the multiset variant).
SELECT id FROM customers_a INTERSECT SELECT id FROM customers_b;
-- EXCEPT: rows in A but not B (default DISTINCT; EXCEPT ALL is multiset).
SELECT id FROM customers_a EXCEPT SELECT id FROM customers_b;Precedence: INTERSECT binds tighter than UNION / EXCEPT. Parenthesize when in doubt.1
-- ORDER BY / LIMIT apply to the *combined* result; place them at the very end.
( SELECT id, name FROM customers_a
UNION ALL
SELECT id, name FROM customers_b )
ORDER BY id
LIMIT 100;[!NOTE] PostgreSQL does not implement the SQL-standard
CORRESPONDINGclause for set operations.1 Column matching is strictly positional. Reorder columns yourself when the two sides have different layouts.
The default is DISTINCT for all three set operators. Explicitly writing UNION DISTINCT is legal as of PG, but conventional code writes plain UNION; explicitly writing UNION ALL is the customary way to opt in to duplicates.
ORDER BY — NULLS FIRST/LAST, USING operatorORDER BY created_at DESC
ORDER BY amount_cents DESC NULLS LAST
ORDER BY customer_id, created_at DESC NULLS FIRST
ORDER BY name COLLATE "en_US"
ORDER BY some_expression USING < -- sort with the < operator of the column's type
ORDER BY 1, 2 DESC -- by SELECT-list position (legal but discouraged)| Direction | Default NULLS placement |
|---|---|
ASC (or implicit) | NULLS LAST |
DESC | NULLS FIRST |
This default makes NULLs appear "at the end of the natural order for an ascending sort and at the start of a descending sort," which is rarely what application code wants — application code almost always wants NULLs at the bottom. Spell out NULLS LAST explicitly when ordering DESC with a column that may be NULL.1
-- Common mistake: silently NULLs-first.
SELECT name FROM users ORDER BY last_login_at DESC;
-- Almost always what you actually meant:
SELECT name FROM users ORDER BY last_login_at DESC NULLS LAST;ORDER BY is the only clause where SELECT-list aliases are visible and where positional references are allowed. Use named expressions in production code — positions silently re-map when you change the target list.
LIMIT / OFFSET / FETCH FIRST ... WITH TIESPostgreSQL accepts two equivalent forms for paging:1
-- PostgreSQL form (always available; SQL-standard FETCH form is preferred for new code).
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100;
-- SQL:2008 form. Equivalent.
SELECT * FROM orders ORDER BY id
OFFSET 100 ROWS
FETCH NEXT 20 ROWS ONLY;[!NOTE] PostgreSQL 13
FETCH FIRST n ROWS WITH TIESreturns the first n rows plus any additional rows that tie with the last returned row on theORDER BYkeys.6SELECT * FROM orders ORDER BY amount_cents DESC FETCH FIRST 5 ROWS WITH TIES; -- Returns at least 5 rows; if rows 5, 6, 7 all have the same amount_cents, -- all three are included.
WITH TIESrequires anORDER BYand is incompatible withSKIP LOCKED.1
Use LIMIT n for the common case and FETCH FIRST n ROWS WITH TIES when you need stable inclusion of tied-on-key rows (leaderboards, top-K rankings).
[!WARNING] Deep-OFFSET pagination is anti-scaling
LIMIT 20 OFFSET 100000makes the database scan and discard 100,000 rows. Use keyset pagination instead:-- Page boundary: pass the last (created_at, id) tuple from the previous page. SELECT * FROM orders WHERE (created_at, id) < (:last_created_at, :last_id) ORDER BY created_at DESC, id DESC LIMIT 20;With an index on
(created_at DESC, id DESC)this is constant-time per page regardless of depth.
FOR UPDATE / FOR SHARE row lockingRow-level locking attached to a SELECT declares the intent of the surrounding transaction to lock the returned rows. The variants are:
| Clause | Lock strength | Conflicts with |
|---|---|---|
FOR UPDATE | Strongest; locks the row for any modification. | FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE. |
FOR NO KEY UPDATE | Weaker UPDATE that promises not to change a key column. Acquired implicitly by plain UPDATE that doesn't change a key. | FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE. |
FOR SHARE | Read-lock; prevents others from UPDATEing the row. | FOR NO KEY UPDATE, FOR UPDATE. |
FOR KEY SHARE | Weakest; acquired by foreign-key checks; blocks key changes but not non-key updates. | FOR UPDATE. |
Modifiers:
NOWAIT — fail immediately with ERROR: could not obtain lock on row in relation "..." if the lock can't be taken at once.SKIP LOCKED — silently skip rows whose row-level lock is held by another transaction. The canonical pattern for "claim work from a queue table."-- Worker claims up to 10 jobs at a time without blocking on jobs claimed by other workers.
BEGIN;
SELECT id, payload
FROM jobs
WHERE status = 'queued'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- ... do the work, then ...
UPDATE jobs SET status = 'done' WHERE id = ANY (:claimed_ids);
COMMIT;SKIP LOCKED and NOWAIT are mutually exclusive with WITH TIES. The full table-level + row-level lock interaction matrix lives in 43-locking.md.
TABLESAMPLE — BERNOULLI and SYSTEMTABLESAMPLE returns a random sample of a base table. Two built-in methods are provided:1
| Method | What it samples | Speed | Sample-size variance |
|---|---|---|---|
BERNOULLI(p) | Each row with probability p%. | O(table size). | Low. |
SYSTEM(p) | Each page with probability p% (all rows on the chosen pages). | O(table size × p). Much faster on big tables. | Higher (clustered). |
-- ~1% sample by row (more accurate, slow).
SELECT * FROM big_table TABLESAMPLE BERNOULLI (1);
-- ~1% sample by page (less accurate, fast).
SELECT * FROM big_table TABLESAMPLE SYSTEM (1);
-- Reproducible sample.
SELECT * FROM big_table TABLESAMPLE BERNOULLI (1) REPEATABLE (42);Use SYSTEM for quick "give me roughly 1 % of the table" exploratory queries; use BERNOULLI when distribution accuracy matters (e.g. estimating per-customer aggregates). Both methods only work on base tables, not on the output of joins or subqueries.
[!NOTE] The
tsm_system_rowsandtsm_system_timecontrib extensions addSYSTEM_ROWS(n)andSYSTEM_TIME(ms)sampling methods — useful when you want "approximately n rows" or "approximately t milliseconds of sampling," not a percentage.
WITH ORDINALITY and ROWS FROM-- Per-row index of a set-returning function.
SELECT * FROM unnest(ARRAY['a','b','c']) WITH ORDINALITY AS t(value, idx);
-- value | idx
-- ------+----
-- a | 1
-- b | 2
-- c | 3
-- ROWS FROM — zip multiple set-returning functions side by side.
SELECT * FROM ROWS FROM (
generate_series(1, 3),
unnest(ARRAY['a','b','c'])
) AS t(num, letter);
-- num | letter
-- ----+-------
-- 1 | a
-- 2 | b
-- 3 | cROWS FROM pads shorter inputs with NULLs to match the longest input. WITH ORDINALITY then numbers the combined rows. Use these for stable correlated unrolling of array columns or paired JSON arrays without an explicit join.
VALUES as a row source-- VALUES as a stand-alone statement.
VALUES (1, 'a'), (2, 'b'), (3, 'c');
-- VALUES as a FROM-item (must be aliased).
SELECT v.id, v.label
FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c')) AS v(id, label);
-- VALUES as a join lookup.
SELECT o.id, o.status, label.text
FROM orders o
LEFT JOIN (VALUES ('p','pending'), ('a','approved'), ('r','rejected')) AS label(code, text)
ON label.code = o.status;VALUES lists are extremely cheap to plan (no statistics, no I/O) and the planner sees them as a sequential scan over an in-memory table. They are the right tool for hard-coded reference data inside a query.
TABLE shorthandTABLE foo is shorthand for SELECT * FROM foo:
TABLE orders; -- equivalent to: SELECT * FROM orders
TABLE ONLY orders; -- exclude inheritance children / partitions
-- Useful in set operations:
TABLE customers_a UNION ALL TABLE customers_b;Rarely used in application code, but it shows up in scripts and you should recognize it.
JSON_TABLE (PG17+)[!NOTE] PostgreSQL 17
JSON_TABLEshreds a JSON document into a relational result set in a single query. It implements the SQL/JSON standard and replaces ad-hoc patterns built fromjsonb_to_recordset,jsonb_array_elements, andLATERALjoins.7
-- Sample document: an array of orders, each with an array of items.
WITH docs(doc) AS (VALUES (
'{
"orders": [
{"id": 1, "items": [{"sku":"A","qty":2}, {"sku":"B","qty":1}]},
{"id": 2, "items": [{"sku":"C","qty":5}]}
]
}'::jsonb))
SELECT j.*
FROM docs,
JSON_TABLE(
doc,
'$.orders[*]' AS root
COLUMNS (
order_id int PATH '$.id',
NESTED PATH '$.items[*]'
COLUMNS (
sku text PATH '$.sku',
qty int PATH '$.qty',
line FOR ORDINALITY
)
)
) AS j;
-- order_id | sku | qty | line
-- ---------+-----+-----+------
-- 1 | A | 2 | 1
-- 1 | B | 1 | 2
-- 2 | C | 5 | 1Use JSON_TABLE when you would otherwise nest two or three jsonb_array_elements + LATERAL joins. It is concise, the planner understands it natively, and (where possible) it pushes path evaluation down into the row-shredding step.
Full coverage of JSON path operators (->, ->>, #>, #>>, @>, <@, jsonb_path_query, jsonb_path_exists, etc.) is in 17-json-jsonb.md.
DISTINCT ON vs LATERAL vs window)-- A. DISTINCT ON — fastest for N = 1, single-pass.
SELECT DISTINCT ON (customer_id)
customer_id, id, amount_cents, created_at
FROM orders
ORDER BY customer_id, created_at DESC;
-- B. LATERAL — most flexible; handles N > 1 cleanly.
SELECT c.id AS customer_id, r.*
FROM customers c
CROSS JOIN LATERAL (
SELECT id, amount_cents, created_at
FROM orders
WHERE customer_id = c.id
ORDER BY created_at DESC
LIMIT 3
) AS r;
-- C. Window function — readable, but materializes the full join then filters.
SELECT customer_id, id, amount_cents, created_at
FROM (
SELECT customer_id, id, amount_cents, created_at,
row_number() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders
) s
WHERE rn <= 3;For the index CREATE INDEX ON orders (customer_id, created_at DESC), options A and B both use an index-only walk and short-circuit early; option C reads every row before filtering. Pick (A) for N = 1, (B) for small N.
NOT EXISTS vs LEFT JOIN ... IS NULL vs NOT INSee the comparison table and SQL examples in the WHERE and subqueries section. Use NOT EXISTS — it is NULL-safe and the planner picks an anti-join. Never use NOT IN with a subquery.
-- Rows in staging that didn't make it to production (and vice versa).
( SELECT id FROM stage.orders EXCEPT SELECT id FROM prod.orders )
UNION ALL
( SELECT id FROM prod.orders EXCEPT SELECT id FROM stage.orders );
-- Row-level diff (every column equal).
SELECT row_to_json(s.*) AS only_in_stage
FROM stage.orders s
WHERE NOT EXISTS (
SELECT 1 FROM prod.orders p
WHERE p.id = s.id
AND p.amount_cents IS NOT DISTINCT FROM s.amount_cents
AND p.status IS NOT DISTINCT FROM s.status
);IS NOT DISTINCT FROM is the NULL-safe = operator; use it whenever you compare nullable columns.
COUNT(DISTINCT) estimation-- Estimate distinct customer_id values cheaply on a giant table.
SELECT COUNT(DISTINCT customer_id) * 100 AS est_distinct
FROM events TABLESAMPLE SYSTEM (1);
-- HLL via the hll extension is much more accurate but requires the extension.OFFSET)-- First page.
SELECT id, created_at, amount_cents
FROM orders
ORDER BY created_at DESC, id DESC
LIMIT 50;
-- Subsequent page: pass the (created_at, id) tuple of the LAST row of the previous page.
SELECT id, created_at, amount_cents
FROM orders
WHERE (created_at, id) < (:last_created_at, :last_id)
ORDER BY created_at DESC, id DESC
LIMIT 50;Index: CREATE INDEX ON orders (created_at DESC, id DESC). This is O(log n + 50) per page regardless of depth. Compare with OFFSET 100000 which is O(100050).
-- One-shot random sample (different each call).
SELECT * FROM orders ORDER BY random() LIMIT 100;
-- Reproducible random sample using setseed + random().
BEGIN;
SELECT setseed(0.42);
SELECT * FROM orders ORDER BY random() LIMIT 100;
COMMIT;
-- For sampling much larger tables, prefer TABLESAMPLE BERNOULLI with REPEATABLE.
SELECT * FROM orders TABLESAMPLE BERNOULLI (0.1) REPEATABLE (42) LIMIT 100;LATERAL recipe: parameterized SRF-- For each user, find the 10 most-similar users by trigram similarity of names.
SELECT u.id, sim.peer_id, sim.score
FROM users u
CROSS JOIN LATERAL (
SELECT peer.id AS peer_id, similarity(u.name, peer.name) AS score
FROM users peer
WHERE peer.id <> u.id
ORDER BY u.name <-> peer.name -- requires pg_trgm
LIMIT 10
) AS sim;Without LATERAL, this requires a per-user PL/pgSQL loop or a window-function plan over the full cross-join. With LATERAL plus a GiST index on users(name gist_trgm_ops) the inner query is a top-10 index scan per outer row.
JSON_TABLE recipe — invoice shredding (PG17+)WITH inv(doc) AS (
VALUES (
'{"invoice_id": "INV-7", "items": [
{"sku":"A","qty":2,"price_cents":1500},
{"sku":"B","qty":1,"price_cents":4000}
]}'::jsonb)
)
SELECT t.*
FROM inv,
JSON_TABLE(
doc, '$' AS root
COLUMNS (
invoice_id text PATH '$.invoice_id',
NESTED PATH '$.items[*]'
COLUMNS (
sku text PATH '$.sku',
qty int PATH '$.qty',
price_cents int PATH '$.price_cents',
line_no FOR ORDINALITY
)
)
) AS t;Before PG17 the same job needed jsonb_to_recordset plus LATERAL jsonb_array_elements plus column extraction:
-- Pre-PG17 equivalent.
SELECT
doc ->> 'invoice_id' AS invoice_id,
item ->> 'sku' AS sku,
(item ->> 'qty')::int AS qty,
(item ->> 'price_cents')::int AS price_cents,
line_no
FROM inv,
LATERAL jsonb_array_elements(doc -> 'items') WITH ORDINALITY AS it(item, line_no);FOR UPDATE SKIP LOCKED-- One worker, one transaction. Many workers running this concurrently will not block one another.
BEGIN;
WITH next AS (
SELECT id
FROM jobs
WHERE status = 'queued'
AND run_after <= now()
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE jobs
SET status = 'running', started_at = now()
WHERE id = (SELECT id FROM next)
RETURNING id, payload;
-- ... worker does its thing using the returned id+payload ...
COMMIT;FOR UPDATE SKIP LOCKED is the canonical pattern for backend job queues in PostgreSQL. It scales to hundreds of concurrent workers on a single table without coordination.
LATERAL-- Without LATERAL: window function (preferred when you need it across all rows).
SELECT id, customer_id, amount_cents,
SUM(amount_cents) OVER (PARTITION BY customer_id ORDER BY created_at) AS running_total
FROM orders;
-- With LATERAL: useful when you only need running total for a *subset* of rows.
SELECT o.id, o.customer_id, o.amount_cents, r.running_total
FROM orders o
CROSS JOIN LATERAL (
SELECT SUM(amount_cents) AS running_total
FROM orders o2
WHERE o2.customer_id = o.customer_id
AND o2.created_at <= o.created_at
) AS r
WHERE o.id IN (SELECT id FROM flagged_orders);NOT IN with a subquery is NULL-unsafe. Always use NOT EXISTS instead. A single NULL in the subquery yields zero rows. See also the DML-side NULL handling in 03-syntax-dml.md.NATURAL JOIN silently changes meaning when columns are added. Use explicit USING(...) or ON ....WHERE after a LEFT JOIN collapses it into an INNER JOIN. Predicates on the nullable side must go in ON to preserve unmatched rows.NULLS placement is direction-sensitive. ASC → NULLS LAST, DESC → NULLS FIRST. Production code should specify the placement explicitly any time the column is nullable.DISTINCT ON requires its keys at the front of ORDER BY. "SELECT DISTINCT ON expressions must match initial ORDER BY expressions" is the error you get otherwise.SELECT DISTINCT is not free. It implies a sort or hash aggregation. If you can dedup at write time (a unique constraint) or know your join is already 1:1, omit DISTINCT.OFFSET does not skip the work, it just discards rows. Deep-paging via OFFSET is O(offset). Use keyset pagination for any page beyond the first dozen.LEFT JOIN ... GROUP BY when there's any chance the inner query can produce more than one row.UNION deduplicates; UNION ALL does not. The default is UNION (= DISTINCT). On large sets that's a sort or hash you may not have wanted.SELECT if the source-of-truth layouts differ.FOR UPDATE on a LEFT JOIN locks only the rows from the FROM-list table named in OF. Plain FOR UPDATE locks all rows from all base tables in FROM, including the right side of LEFT JOINs; that often surprises developers who only meant to lock the primary table. Use FOR UPDATE OF mytable to scope.SELECT * in views is fragile. When the underlying table gains a column, the view's stored column list does not auto-update — you'll need a CREATE OR REPLACE VIEW ... (or DROP + recreate) to pick it up. See 05-views.md.OFFSET and LIMIT are applied after ORDER BY, which is applied after set operations. That means ORDER BY / LIMIT belong at the very end of a UNION chain; if you want per-SELECT ordering you need parentheses + a wrapping query.FOR UPDATE cannot be combined with aggregates, DISTINCT, GROUP BY, HAVING, WINDOW, UNION/INTERSECT/EXCEPT, or a LIMIT/OFFSET that follows them. PG will reject these combinations: locking applies to specific source rows, and those clauses make the row identity ambiguous.WITH TIES is incompatible with SKIP LOCKED. Pick one.SELECT are visible to ORDER BY but not to WHERE, GROUP BY, or HAVING. Repeat the expression in those clauses or wrap with a subquery.03-syntax-dml.md — INSERT, UPDATE, DELETE, MERGE, RETURNING, ON CONFLICT04-ctes.md — WITH, WITH RECURSIVE, MATERIALIZED / NOT MATERIALIZED, SEARCH / CYCLE05-views.md — views and materialized views built on SELECT11-window-functions.md — OVER, frame clauses, named WINDOW12-aggregates-grouping.md — FILTER, GROUPING SETS / ROLLUP / CUBE, ordered-set and hypothetical-set aggregates17-json-jsonb.md — JSON_TABLE, SQL/JSON path language, JSON operators27-mvcc-internals.md — what FOR UPDATE actually locks at the tuple level20-text-search.md — tsvector, tsquery, full-text search operators used in SELECT35-partitioning.md — FROM ONLY semantics and partition pruning in queries43-locking.md — full conflict matrix for row-level lock modes56-explain.md — reading the plan a SELECT produces69-extensions.md — tsm_system_rows / tsm_system_time sampling method extensionsSQL command reference for SELECT (PG16). Full grammar including WITH, WINDOW, FETCH FIRST ... WITH TIES, FOR ... SKIP LOCKED, TABLESAMPLE, set-operation precedence, and the explicit note that PostgreSQL does not implement the SQL standard's CORRESPONDING clause. https://www.postgresql.org/docs/16/sql-select.html ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
"LATERAL Subqueries" section in the queries chapter — explains the lateral-reference rules and why LATERAL is implicit for set-returning functions in FROM. https://www.postgresql.org/docs/16/queries-table-expressions.html#QUERIES-LATERAL ↩ ↩2
"FROM Clause" — table expressions, aliases, ONLY, function-call FROM items, WITH ORDINALITY, ROWS FROM, TABLESAMPLE. https://www.postgresql.org/docs/16/queries-table-expressions.html ↩
Same chapter: joined tables, the difference between ON and WHERE for outer joins, NATURAL JOIN, CROSS JOIN. https://www.postgresql.org/docs/16/queries-table-expressions.html ↩
PostgreSQL 14 release notes: "Allow DISTINCT to be added to GROUP BY to remove duplicate GROUPING SET combinations (Vik Fearing). For example, GROUP BY CUBE (a,b), CUBE (b,c) will generate duplicate grouping combinations without DISTINCT." https://www.postgresql.org/docs/release/14.0/ ↩
PostgreSQL 13 release notes: "Allow FETCH FIRST to use WITH TIES to return any additional rows that match the last result row (Surafel Temesgen)." https://www.postgresql.org/docs/release/13.0/ ↩
JSON_TABLE documented in the JSON functions chapter for PG17. Same shape applies in PG18. https://www.postgresql.org/docs/17/functions-json.html ↩
evals
references