Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.
94
94%
Does it follow best practices?
Impact
94%
1.36xAverage score across 3 eval scenarios
Passed
No known issues
Reference for the built-in scalar types every Postgres backend ships with — character, numeric, boolean, binary, network, and bit-string. Composite/enum/domain/range types live in 15-data-types-custom.md; arrays in 16-arrays.md; JSON/JSONB in 17-json-jsonb.md; UUID, money, and the deeper NUMERIC precision/scale story in 18-uuid-numeric-money.md; date/time/interval in 19-timestamp-timezones.md.
text, varchar, charsmallint, integer, bigintnumeric / decimalreal, double precisionserial, bigserial, smallserial) — Aliases, Not Typesbooleanbyteainet, cidr, macaddr, macaddr8bit(n), bit varying(n) / varbitUse this file when picking among text vs varchar(n) vs char(n), between integer and bigint, between numeric and real, between bytea and a large-object handle, between inet and plain text, or whenever someone wrote char(n) and you have to explain why every Postgres veteran will groan at the migration.
This file does not cover indexing those columns (see 22-indexes-overview.md), collations for text comparison (see 65-collations-encoding.md), or storage details like TOAST that affect large values (see 31-toast.md). It does cover the behavioral surprises baked into each type — char(n) trailing-space semantics, NaN > everything in numeric, bytea_output flipping representation, cidr rejecting host-bits-set, and the rest.
Five rules cover most type-selection decisions:
For text, use text. Never char(n). Use varchar(n) only when a length constraint is a business rule (column will end up in a CSV with a fixed-width field), not for performance. The three character types share one varlena storage backend; char(n) is the slowest of the three because of its space-padding obligations.1
For exact numbers, use numeric. For approximate physical measurements, use double precision. real is rarely the right answer (4-byte float, ~6 decimal digits). money is locale-sensitive and almost never what you want; see 18-uuid-numeric-money.md.
For autoincrement IDs, use GENERATED BY DEFAULT AS IDENTITY (SQL standard, in core since PG10), not serial. serial/bigserial/smallserial are macros for "create a sequence, set the column default, OWNED BY the column"; they are not real types and they have ownership transfer quirks.2 See 01-syntax-ddl.md for the migration recipe.
For binary data, use bytea for anything you would manipulate in SQL, and stay below ~10 MB per row. For larger blobs that you stream chunk-by-chunk, the Large Object API exists (see 71-large-objects.md), but it carries transactional and replication caveats.
For IP addresses and MACs, use inet/cidr/macaddr — not text. The dedicated types are fixed-size, validate input, sort correctly, and accelerate containment queries with GiST. The 8-octet variant macaddr8 is the right choice for new schemas because EUI-64 is a superset of EUI-48.
| You need to store | Use | Avoid | Why |
|---|---|---|---|
| Text of unknown/variable length | text | varchar(n), char(n) | All three share the same storage; text is the canonical name; length cap only when business-required1 |
| Text with hard length cap from a spec | varchar(n) | char(n) | varchar(n) rejects overlong input; char(n) pads with spaces and treats trailing spaces as insignificant3 |
| Counter that won't exceed 2.1B | integer | bigint for tiny counters | 4 bytes vs 8; matters at billions of rows |
| Counter that might exceed 2.1B | bigint | integer | Migrating integer → bigint on a hot table is painful |
| Money, scientific exact values | numeric(p, s) | real, double precision, money | numeric is exact; floats round; money is locale-sensitive4 |
| Physics-style measurement, percentages | double precision | numeric for non-exact data | double is 10–100× faster than numeric for arithmetic4 |
| Yes/no | boolean | smallint, 'Y'/'N' text | 1 byte, indexable, accepts 't', 'true', 'yes', 'on', '1'5 |
| Image, encrypted blob, hash, signed data | bytea | text (hex string), Large Object | bytea is transactional, replicates cleanly, fits MVCC; LO API for >10 MB streams6 |
| IPv4/IPv6 host address | inet | text | 7 or 19 bytes; built-in containment operators; correct sorting |
| IPv4/IPv6 network (CIDR block) | cidr | inet if you also store host addresses | cidr validates that bits to the right of mask are zero |
| MAC address | macaddr8 (new) or macaddr (legacy) | text | 6 or 8 bytes; accepts 7+ input formats; deterministic output |
| Bit mask, feature flags | integer/bigint with bitwise ops, OR boolean columns | bit(n), bit varying(n) | Bit-string types are rarely the right choice — see Gotchas |
| Auto-incrementing surrogate key | bigint GENERATED BY DEFAULT AS IDENTITY | serial, bigserial | Identity is the SQL standard; serial is a macro with subtle ownership traps2 |
| Universally-unique surrogate key | uuid | text | See 18-uuid-numeric-money.md |
text, varchar, char-- All three accept up to ~1 GB; varlena (1- or 4-byte header + bytes).
CREATE TABLE t (
a text, -- variable length, no cap
b character varying(100), -- aliased as varchar(100); cap = 100 chars
c character(10), -- aliased as char(10); always padded to 10
d "char" -- C-language single byte; never use in app code
);[!WARNING]
char(n)is space-padded and almost never what you want"Values of type
characterare physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of typecharacter."3 This makes'abc'::char(5)equal to'abc'and'abc 'for comparison purposes — but the column literally stores'abc 'and aRETURNINGorCOPY OUTwill produce the padded form. Mixingchar(n)andvarchar/textin joins introduces ambiguity in trailing-space semantics. Usetextorvarchar(n).
No performance difference among text, varchar(n), varchar, and char(n) for storage and access, per the official docs: "There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. ... In most situations text or character varying should be used instead."1
Maximum: ~1 GB stored value; explicit varchar(n)/char(n) length n may not exceed 10,485,760 characters.1
Storage: 1-byte header for short strings (≤ 126 bytes); 4-byte header for longer. Beyond ~2 KB, values may be compressed and/or moved to TOAST (see 31-toast.md).
[!NOTE] PostgreSQL 15:
"char"(the internal single-byte type) I/O format changed"Change the I/O format of type
"char"for non-ASCII characters ... Bytes with the high bit set are now output as a backslash and three octal digits, to avoid encoding issues."7 This affects the"char"system-catalog type — not the user-facingcharacter/char. Only relevant if you readpg_class.relkindfrom a non-libpq client.
[!NOTE] PostgreSQL 18: nondeterministic-collation operations broadened
"Allow
LIKEwith nondeterministic collations (Peter Eisentraut)" and "Allow text position search functions with nondeterministic collations (Peter Eisentraut)."8 PG12 introduced nondeterministic collations for case-insensitive UNIQUE constraints (see65-collations-encoding.md); PG18 letsLIKE,position(),strpos(), etc., use them too.
[!NOTE] PostgreSQL 18:
casefold()for Unicode case-insensitive matching"Add function
casefold()to allow for more sophisticated case-insensitive matching (Jeff Davis)."8casefold(text)is the Unicode-correct alternative tolower(text)for case-insensitive comparison (Turkishi/İ, Germanß, etc.).
smallint, integer, bigint| Type | Aliases | Bytes | Range |
|---|---|---|---|
smallint | int2 | 2 | −32,768 to +32,767 |
integer | int, int4 | 4 | −2,147,483,648 to +2,147,483,647 |
bigint | int8 | 8 | −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 |
Picking between them is a row-count and key-density bet, not a "future-proofing" debate:
bigint. Migrating integer → bigint on a hot table is expensive: it rewrites every row and every covering index.smallint saves 2 bytes per row — meaningful at hundreds of millions of rows; meaningless at thousands. Often not worth the loss of ergonomics.numeric_value_out_of_range (SQLSTATE 22003). There is no silent wrap.numeric / decimalTwo interchangeable aliases for the same type. Use numeric by convention (decimal is provided for SQL-standard compatibility).
CREATE TABLE invoice (
id bigint GENERATED BY DEFAULT AS IDENTITY,
amount_cents numeric(12, 2) -- 12 total digits, 2 after the decimal
);p = total significant digits.s = digits after the decimal point.Unconstrained numeric (no (p, s)) accepts any value up to the implementation limits; the column does not coerce or round on insert. "Specifying NUMERIC without any precision or scale creates an 'unconstrained numeric' column in which numeric values of any length can be stored, up to the implementation limits."4
[!WARNING]
NaNinnumericdoes not behave like IEEE-754 NaNPostgres treats
NaNinnumericas equal toNaNand greater than every non-NaN value. Per the docs: "In most implementations of the 'not-a-number' concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values."4 This is a deliberate departure from IEEE-754 done to makenumericorderable.
[!NOTE] PostgreSQL 14:
numericsupportsInfinityand-Infinity"Add support for
Infinityand-Infinityvalues in the numeric data type (Tom Lane)."9 Before PG14 onlyreal/double precisionhad infinity values.
[!NOTE] PostgreSQL 15: scale can be negative or greater than precision
"Allow the scale of a numeric value to be negative, or greater than its precision (Dean Rasheed, Tom Lane). This allows rounding of values to the left of the decimal point, e.g.,
'1234'::numeric(4, -2)returns 1200."10 A negative scale means round to that many places to the left of the decimal. Useful for sensor data with rough resolution.
[!NOTE] PostgreSQL 18: faster
numericarithmetic"Improve the speed of numeric multiplication and division (Joel Jacobson, Dean Rasheed)" and "Add math functions
gamma()andlgamma()(Dean Rasheed)."8
real, double precision| Type | Aliases | Bytes | Approximate range | Decimal precision |
|---|---|---|---|---|
real | float4 | 4 | ~1E-37 to 1E+37 | ≥ 6 digits |
double precision | float, float8 | 8 | ~1E-307 to 1E+308 | ≥ 15 digits |
IEEE-754 single and double precision. They support Infinity, -Infinity, and NaN.
[!WARNING] Never use
realordouble precisionfor money or any value where exact equality mattersPer the docs: "If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead."4 Floating-point math will quietly produce results that don't equal each other:
0.1 + 0.2 = 0.30000000000000004indouble precision. This is correct IEEE behavior, and it is a footgun for accounting and audit.
[!NOTE] PostgreSQL 14: float-divide-by-infinity returns 0
"Make non-zero floating-point values divided by infinity return zero (Kyotaro Horiguchi)" and "Make floating-point division of NaN by zero return NaN (Tom Lane)."9 Minor edge-case alignment with IEEE-754.
serial, bigserial, smallserial) — Aliases, Not Types-- "serial" is shorthand. This:
CREATE TABLE legacy (id serial PRIMARY KEY, name text);
-- expands to (approximately):
CREATE SEQUENCE legacy_id_seq AS integer;
CREATE TABLE legacy (
id integer NOT NULL DEFAULT nextval('legacy_id_seq'),
name text,
PRIMARY KEY (id)
);
ALTER SEQUENCE legacy_id_seq OWNED BY legacy.id;| Notational form | Underlying integer | Underlying sequence range |
|---|---|---|
smallserial / serial2 | smallint | 1 to 32,767 |
serial / serial4 | integer | 1 to 2,147,483,647 |
bigserial / serial8 | bigint | 1 to 9,223,372,036,854,775,807 |
[!WARNING] Use
GENERATED BY DEFAULT AS IDENTITYfor new tablesPer the docs: "The data types
smallserial,serialandbigserialare not true types, but merely a notational convenience for creating unique identifier columns."2 Identity columns (SQL standard, in core since PG10) handle ownership, sequence rights, andpg_dumpcorrectly without the special-caseOWNED BYplumbing. See the migration recipe in01-syntax-ddl.md. For the deeper serial-vs-IDENTITY comparison and theALWAYS/BY DEFAULTdecision, see18-uuid-numeric-money.md.
booleanCREATE TABLE feature_flag (
name text PRIMARY KEY,
enabled boolean NOT NULL DEFAULT false
);bool.t or f. Set a CASE expression in queries if you want TRUE/FALSE/YES/NO in output text.Accepted input (case-insensitive, leading/trailing whitespace ignored):5
true (any of) | false (any of) |
|---|---|
true, t | false, f |
yes, y | no, n |
on | off |
1 | 0 |
The '1'::boolean / '0'::boolean casts are why DBA scripts using exported CSVs with 1/0 just work. The 'y'/'n' parse is why ORM defaults that emit 'Y' accidentally work too.
Use IS TRUE / IS FALSE / IS NOT TRUE / IS NOT FALSE / IS NULL to handle the third state NULL (unknown):
SELECT * FROM feature_flag WHERE enabled IS NOT TRUE;
-- matches rows where enabled is false OR nullbytea-- Hex input (recommended for new code)
SELECT '\x48656c6c6f'::bytea; -- 'Hello'
-- Escape input (legacy, avoid)
SELECT 'abc \153\154\155'::bytea; -- 'abc klm'
-- bytea_output controls the *display* format only — storage is identical
SET bytea_output = 'hex'; -- default since PG9
SET bytea_output = 'escape'; -- legacytoast_tuple_target (default ~2 KB) compress and TOAST out-of-line.11\x prefix + 2 hex digits per byte). "The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred."6Escape format requires escaping for: NUL byte (\000), single quote ('' or \047), backslash (\\ or \134), and any non-printable octets (0–31, 127–255 as \xxx octal).6
[!NOTE] PostgreSQL 18:
bytea↔ integer casts andreverse()"Allow casting between integer types and bytea (Aleksander Alekseev)" and "Add function
reverse()to reverse bytea bytes (Aleksander Alekseev)."8 Casts let you go(2_147_483_647)::bytea(returns\x7fffffff) withoutlpad(to_hex(...), ...)::byteaplumbing.
bytea vs Large Object: see 71-large-objects.md. The short version: bytea is transactional, replicates cleanly, and works with RETURNING. The LO API exists for streaming reads/writes of multi-gigabyte files, but loses the RETURNING ergonomics and has weak replication semantics. Default to bytea.
inet, cidr, macaddr, macaddr8| Type | Storage | What it accepts | Notes |
|---|---|---|---|
inet | 7 or 19 bytes | IPv4 or IPv6 host or network: 192.0.2.10, 192.0.2.0/24, ::1, 2001:db8::/32 | Default netmask /32 for v4, /128 for v6 when omitted |
cidr | 7 or 19 bytes | IPv4 or IPv6 network only: 192.0.2.0/24 | Rejects host-bits-set: 192.0.2.10/24 is an error |
macaddr | 6 bytes | EUI-48 MAC: 08:00:2b:01:02:03 (7 accepted input formats) | Output is always colon-separated |
macaddr8 | 8 bytes | EUI-64 MAC: 08:00:2b:01:02:03:04:05 | 6-byte input is expanded with FF FE in middle |
CREATE TABLE login_audit (
when_at timestamptz NOT NULL DEFAULT now(),
client inet NOT NULL,
actor text NOT NULL
);
-- Containment: 192.0.2.10 falls inside 192.0.2.0/24
SELECT '192.0.2.10'::inet << '192.0.2.0/24'::cidr; -- t
-- Subtraction: difference in addresses
SELECT '192.0.2.10'::inet - '192.0.2.5'::inet; -- 5
-- Functions
SELECT host('192.0.2.10/24'), masklen('192.0.2.10/24'); -- '192.0.2.10', 24Network operators:12
| Operator | Meaning |
|---|---|
<< | strictly contained by subnet |
<<= | contained by or equal |
>> | strictly contains |
>>= | contains or equal |
&& | overlap (either side contains the other) |
~ | bitwise NOT |
& | bitwise AND |
| | bitwise OR |
+ | add offset |
- | subtract address or offset |
Index containment with GiST or SP-GiST using inet_ops / cidr_ops:
CREATE INDEX login_client_gist ON login_audit USING gist (client inet_ops);
-- accelerates "WHERE client << '10.0.0.0/8'" queriesSee 24-gin-gist-indexes.md for operator class selection and 25-brin-hash-spgist-bloom-indexes.md for SP-GiST trade-offs.
bit(n), bit varying(n) / varbitCREATE TABLE flags (
id bigint PRIMARY KEY,
mask bit varying(64)
);
INSERT INTO flags VALUES (1, B'01100110'); -- B'...' is bit-string literal
SELECT mask | B'10000000' FROM flags; -- bitwise ORbit(n) — fixed width, error if input is shorter or longer (cast truncates/zero-pads silently).bit varying(n) — up to n bits; error if longer; alias varbit(n).[!WARNING] Bit-string types are rarely the right choice
For feature-flag bitmasks: use
integer/bigintwith&/|/#(XOR) — smaller storage and faster ops.For "long, sparse bit array": use
roaringbitmapextension or a normalized table.For "raw bytes": use
bytea.
bit/bit varyingexist for SQL-standard completeness and the rare case where individual-bit semantics are user-visible. New schemas almost never need them.
[!NOTE] PostgreSQL 16: non-decimal literals and underscore grouping
Three new forms for integer literals:13
-- Hex / octal / binary integers (any size with NUMERIC parsing) SELECT 0x42f, 0XFFFF, 0o273, 0O755, 0b100101, 0B10011001; -- Underscores for visual grouping (in decimal and non-decimal) SELECT 1_500_000_000, 0xFFFF_FFFF, 1.618_034;Restrictions: underscores cannot appear at start/end, immediately before/after the decimal point, immediately around the exponent marker, or in consecutive pairs.13
text vs varchar(n) vs char(n) — and the one valid reason for varchar(n)-- BAD: char(n). Trailing-space semantics break joins to text/varchar columns.
-- BAD: varchar(255) "for performance". There is none.
-- GOOD: text. Default.
CREATE TABLE user_profile (
id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
email text NOT NULL UNIQUE,
display_name text NOT NULL,
bio text
);
-- ONLY GOOD use of varchar(n): a business-rule length cap
-- (e.g., the column is exported into a fixed-width legacy CSV).
CREATE TABLE billing_export (
customer_short_code varchar(8) NOT NULL -- legacy mainframe wants exactly 8 chars
);Adding a length cap to a column is a ALTER TABLE ... ALTER COLUMN ... TYPE varchar(n) USING substring(col, 1, n) — full table rewrite. Rip caps out, don't add them, unless required.
serial to IDENTITY-- Old schema:
-- CREATE TABLE t (id serial PRIMARY KEY, name text);
-- The sequence is named t_id_seq, OWNED BY t.id.
-- Drop the default that points at the sequence; convert the column to an identity.
BEGIN;
ALTER TABLE t ALTER COLUMN id DROP DEFAULT;
-- Now make it an identity column. Use the existing sequence's current value as the start.
SELECT setval(pg_get_serial_sequence('t', 'id'),
(SELECT max(id) FROM t),
true); -- third arg true => next nextval() returns max + 1
ALTER TABLE t
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;
-- The old sequence still exists; drop it now that the column owns its own identity sequence.
-- (Actually, the ADD GENERATED clause creates a new sequence and the old one is detached.)
DROP SEQUENCE t_id_seq;
COMMIT;Why BY DEFAULT instead of ALWAYS: with ALWAYS, manual INSERT (id, name) VALUES (42, 'x') will fail unless you say OVERRIDING SYSTEM VALUE. BY DEFAULT lets you continue inserting explicit IDs for backfill/data-load.
numeric(12, 2) and avoiding floatCREATE TABLE invoice (
id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
currency char(3) NOT NULL, -- ISO 4217 — three chars, *always* three chars, the one place char(n) is right
amount numeric(12, 2) NOT NULL,
CHECK (currency = upper(currency)),
CHECK (amount >= 0)
);
-- DO NOT do this:
-- CREATE TABLE invoice (..., amount double precision); -- silent rounding errors
-- DO NOT use the money type either — it's locale-sensitive and almost always wrong.
-- See 18-uuid-numeric-money.md for the full discussion.For multi-currency: store the integer minor units (cents, pence, satoshi) plus a separate currency_code:
amount_cents bigint NOT NULL,
currency char(3) NOT NULLThis eliminates all rounding at the cost of "the application has to multiply by 100 on display."
inet and indexing for containmentCREATE TABLE block_list (
cidr_block cidr PRIMARY KEY,
added_at timestamptz NOT NULL DEFAULT now(),
reason text
);
-- GiST index for containment queries
CREATE INDEX block_list_cidr_gist ON block_list USING gist (cidr_block inet_ops);
-- Is this client blocked?
SELECT * FROM block_list WHERE cidr_block >>= '192.0.2.10'::inet;For very large tables (10M+ rows) where you only ever check exact match, a btree on cidr_block is fine. For range-of-subnets queries, GiST or SP-GiST.
[!NOTE] PostgreSQL 14: SP-GiST gained
inet_opsPG14 added an SP-GiST operator class for inet/cidr that is often faster than GiST for trie-like containment lookups. "Allow SP-GiST indexes to be used for INET and CIDR types (Emre Hasegeli)."14 Worth benchmarking against GiST on big tables.
-- This is rejected at insert time. cidr enforces "host bits must be zero":
INSERT INTO block_list (cidr_block) VALUES ('192.0.2.10/24');
-- ERROR: invalid cidr value: "192.0.2.10/24"
-- DETAIL: Value has bits set to right of mask.
-- Convert from inet to cidr by zeroing host bits:
SELECT set_masklen('192.0.2.10'::inet, 24)::cidr;
-- set_masklen
-- ----------------
-- 192.0.2.0/24For a text column you're migrating to cidr, find values that won't cast cleanly:
SELECT s FROM legacy_blocks WHERE NOT (s ~ '^[0-9:.a-fA-F]+/\d+$')
OR s::cidr IS NULL; -- last clause raises if invalid; wrap in a SAVEPOINT-per-row routinebigint (better than bit(n))-- Allocate up to 64 flags, one bit each.
-- Constants live as application-side enums; comments document them here.
CREATE TABLE account (
id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
email text NOT NULL,
feature_mask bigint NOT NULL DEFAULT 0
-- bit 0: beta_enabled
-- bit 1: marketing_emails
-- bit 2: two_factor_required
-- ...
);
-- Set bit 2 atomically:
UPDATE account SET feature_mask = feature_mask | (1::bigint << 2) WHERE id = $1;
-- Read bit 2:
SELECT (feature_mask & (1::bigint << 2)) <> 0 AS two_factor_required FROM account WHERE id = $1;
-- Index "rows with bit 2 set" via partial index:
CREATE INDEX account_2fa_required
ON account (id)
WHERE (feature_mask & 4) <> 0;For ad-hoc analytics, prefer separate boolean columns or a normalized (account_id, feature) table — bitmasks are opaque in EXPLAIN and joins.
numeric vs the silent-rounding float trap-- Demonstrate the failure mode:
SELECT 0.1::double precision + 0.2::double precision;
-- ?column?
-- ------------------------
-- 0.30000000000000004
-- The correct way:
SELECT 0.1::numeric + 0.2::numeric;
-- ?column?
-- ----------
-- 0.3
-- For monetary aggregations across many rows, numeric is the only safe choice:
SELECT customer_id,
sum(amount)::numeric(14,2) AS total_due
FROM invoice
GROUP BY customer_id;bytea (not text)-- DO:
CREATE TABLE artifact (
id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
path text NOT NULL,
sha256 bytea NOT NULL, -- 32 raw bytes
CHECK (octet_length(sha256) = 32)
);
INSERT INTO artifact (path, sha256)
VALUES ('/srv/release.tar.gz', sha256('hello'::bytea));
SELECT path, encode(sha256, 'hex') FROM artifact;
-- DON'T:
-- CREATE TABLE artifact (..., sha256_hex text); -- twice the storage; case-sensitivity bugsbytea is half the on-disk size of a hex text column for the same digest. The encode(..., 'hex') and decode(..., 'hex') functions handle the display conversion.
[!NOTE] PostgreSQL 18
-- Locale-aware case folding (Unicode "fc" mapping) — handles Turkish dotless i, ß, etc.
SELECT casefold('STRASSE') = casefold('Straße'); -- t
-- For older majors: use lower() + an ICU collation that does the right thing,
-- or a nondeterministic collation (PG12+); see 65-collations-encoding.md.[!NOTE] PostgreSQL 18
SELECT 2_147_483_647::bytea;
-- \x7fffffff
SELECT '\xdeadbeef'::bytea::integer;
-- ?column?
-- ------------
-- -559038737 -- signed-32 interpretation
-- Building binary wire formats:
SELECT (1::int4)::bytea || (42::int4)::bytea;
-- \x000000010000002achar(n)Run this against any production database before a schema review:
SELECT n.nspname AS schema,
c.relname AS table,
a.attname AS column,
pg_catalog.format_type(a.atttypid, a.atttypmod) AS type
FROM pg_attribute a
JOIN pg_class c ON c.oid = a.attrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE a.atttypid = 'bpchar'::regtype
AND a.attnum > 0
AND NOT a.attisdropped
AND c.relkind IN ('r', 'p') -- ordinary tables and partitioned tables
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY schema, table, column;Output is the audit list. The exception is char(3) for ISO currency codes — that one is correct because the spec demands exactly three characters and there is no ambiguity. Most other char(n) columns came from "I thought it would be faster," and should be migrated to text or varchar(n) if the cap is genuinely business-required.
inet and cidr for "this column stores both addresses and networks"-- inet stores both a host address AND a netmask. A "host" is just an address with /32 (v4) or /128 (v6).
-- cidr rejects host-bits-set; it stores networks only.
-- For a column that may hold either a single host or a CIDR block: use inet.
-- '192.0.2.10/32' (the host) — fine in inet
-- '192.0.2.0/24' (the network) — fine in inet
-- '192.0.2.10/24' — INVALID in cidr (host bits set), VALID in inet (host within network)
-- For a column that must hold only networks (allow/deny lists, route tables): use cidr.inet semantics for 192.0.2.10/24 say "this is the host 192.0.2.10 in the /24 network containing it." cidr semantics reject this because in CIDR notation the address portion must be the lowest IP in the block. If you store an audit log of "what source IPs hit my API," use inet. If you store "block lists of subnets," use cidr.
char(n) pads with spaces and treats trailing space as insignificant in comparisons. 'abc'::char(5) = 'abc'::text is true. 'abc'::char(5) = 'abc ' is also true. The padded form is what gets stored, sorted, and returned via COPY OUT. Avoid char(n) for everything except fixed-width legacy interfaces.3
varchar(255) "for performance" is a myth. All character types share a single varlena backend.1 The 255 limit comes from MySQL row-format constraints, not Postgres. Use text unless a length cap is a business rule.
numeric NaN sorts after every other value, including Infinity. This makes numeric orderable, but breaks the assumption that "NaN propagates through comparisons." nan_val > 5 is true in numeric, false (NULL) in double precision. Be explicit with IS NOT NULL + IS NOT NULL.4
serial/bigserial confused ownership. pg_dump handles them, but if you DROP COLUMN id from a table, the OWNED-BY sequence is dropped too. With GENERATED BY DEFAULT AS IDENTITY the link is explicit and ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY is the safe rollback.2
real and double precision accumulate error. Summing one billion real values is not accurate to the last decimal place. For accounting-grade arithmetic, the answer is always numeric, even if you have to round at display time.4
'1'::boolean works; 1::boolean does not. cast(1 AS boolean) requires an explicit cast through integer != 0 or you'll get cannot cast type integer to boolean. The text literal form '1', '0', 'true', 'false', 'yes', 'no', 'on', 'off', 't', 'f', 'y', 'n' works.5
bytea_output flips display but not storage. Setting SET bytea_output = 'escape' makes existing \x... outputs render with octal escapes. Drivers see whichever format they ask for. Storage is identical. If your client library is comparing returned bytea strings instead of using its bytea-bind API, you have a bug waiting.6
cidr rejects host-bits-set. '192.0.2.10/24'::cidr is an error; you have to write '192.0.2.0/24'::cidr or convert via set_masklen(inet, 24)::cidr. The error message is clear (Value has bits set to right of mask.) but ETL pipelines that expected text-like behavior will fail loudly.
inet-vs-cidr containment uses the SAME << operator but different semantics on host bits. '192.0.2.10/24'::inet << '192.0.2.0/24'::cidr is true; comparison considers the host address, not the masked network. Document which side is host and which is network in column comments.
GiST on inet/cidr requires the operator class: CREATE INDEX ... USING gist (col inet_ops). Without inet_ops, no operator class exists. SP-GiST also requires inet_ops/cidr_ops (added PG14).14
bit(n) truncates silently on cast. B'10'::bit(3) returns B'100' (right-padded with zero), not an error. bit varying(n) truncates input that exceeds n with an error on direct INSERT but truncates on explicit cast. Read the rules.
text storage is unbounded — but TOAST kicks in around 2 KB. Storing a megabyte of text per row works, but every read from disk pays for de-TOAST and decompression. Consider whether the column should be a separate one-to-one table with the heavy column elsewhere. See 31-toast.md.
numeric arithmetic is 10–100× slower than double precision arithmetic. For non-money workloads with billions of rows, this matters. Benchmark; don't reach for numeric on principle for sensor data or telemetry.
Integer literal underscores (PG16+) cannot be at the boundary. _1_000 is a parse error; 1_000_ is a parse error; 1__000 is a parse error. Underscores must be between digits and not consecutive.13
smallserial exists; tinyint does not. Postgres has no 1-byte integer type. Use boolean for 0/1 and smallint for everything else. (If you really must store −128 to 127 in 1 byte, the C-language "char" type exists, but it's not recommended for application code and pg_dump/clients may render it oddly.)
inet and text cast freely; cidr and text do not in all directions. '192.0.2.0/24'::cidr works; (some_cidr_col)::text works; but be aware that round-trip through text loses no information for cidr but does for inet — host bits in inet are part of the value, and the canonical text form preserves them.
01-syntax-ddl.md — CREATE TABLE syntax including GENERATED BY DEFAULT AS IDENTITY, generated columns15-data-types-custom.md — composite, enum, domain, range, multirange16-arrays.md — arrays of any of these scalar types17-json-jsonb.md — JSON/JSONB tradeoffs18-uuid-numeric-money.md — the deeper numeric precision/scale rules, UUID, money pitfalls19-timestamp-timezones.md — timestamp/timestamptz/date/time/interval22-indexes-overview.md — picking an index type per scalar24-gin-gist-indexes.md — inet_ops for GiST containment25-brin-hash-spgist-bloom-indexes.md — SP-GiST inet_ops (PG14+)31-toast.md — what happens to large text / bytea values65-collations-encoding.md — collations, casefold(), nondeterministic collations71-large-objects.md — when bytea is the wrong tool100-pg-versions-features.md — release-note headline tracker per major versionPostgreSQL 16 documentation, "Character Types" (§8.3): "There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. ... In most situations text or character varying should be used instead." Length cap: "the length n must be greater than zero and cannot exceed 10,485,760." Maximum stored value: "The longest possible character string that can be stored is about 1 GB." Short-string overhead: "The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string ... Longer strings have 4 bytes of overhead instead of 1." https://www.postgresql.org/docs/16/datatype-character.html ↩ ↩2 ↩3 ↩4 ↩5
PostgreSQL 16 documentation, "Numeric Types" §8.1.4 "Serial Types": "The data types smallserial, serial and bigserial are not true types, but merely a notational convenience for creating unique identifier columns." https://www.postgresql.org/docs/16/datatype-numeric.html ↩ ↩2 ↩3 ↩4
PostgreSQL 16 documentation, "Character Types" (§8.3): "Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results." https://www.postgresql.org/docs/16/datatype-character.html ↩ ↩2 ↩3
PostgreSQL 16 documentation, "Numeric Types" (§8.1): byte sizes table; range limits "up to 131072 digits before the decimal point; up to 16383 digits after the decimal point"; NaN: "In most implementations of the 'not-a-number' concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values."; unconstrained: "Specifying NUMERIC without any precision or scale creates an 'unconstrained numeric' column ..."; money: "If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead." https://www.postgresql.org/docs/16/datatype-numeric.html ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
PostgreSQL 16 documentation, "Boolean Type" (§8.6): accepts true, t, yes, y, on, 1 and false, f, no, n, off, 0 (case-insensitive, unique prefixes accepted, leading/trailing whitespace ignored); storage 1 byte; output t/f. https://www.postgresql.org/docs/16/datatype-boolean.html ↩ ↩2 ↩3
PostgreSQL 16 documentation, "Binary Data Types" (§8.4): "The output format depends on the configuration parameter bytea_output; the default is hex." Hex format: "The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred." Escape format escape table (zero, single quote, backslash, non-printable octets) per Table 8.7. Storage: "1 or 4 bytes plus the actual binary string." https://www.postgresql.org/docs/16/datatype-binary.html ↩ ↩2 ↩3 ↩4
PostgreSQL 15 release notes: "Change the I/O format of type "char" for non-ASCII characters (Tom Lane). Bytes with the high bit set are now output as a backslash and three octal digits, to avoid encoding issues." https://www.postgresql.org/docs/release/15.0/ ↩
PostgreSQL 18 release notes: "Allow LIKE with nondeterministic collations (Peter Eisentraut)", "Allow text position search functions with nondeterministic collations (Peter Eisentraut)", "Add function casefold() to allow for more sophisticated case-insensitive matching (Jeff Davis)", "Add function reverse() to reverse bytea bytes (Aleksander Alekseev)", "Allow casting between integer types and bytea (Aleksander Alekseev)", "Improve the speed of numeric multiplication and division (Joel Jacobson, Dean Rasheed)", "Add math functions gamma() and lgamma() (Dean Rasheed)." https://www.postgresql.org/docs/release/18.0/ ↩ ↩2 ↩3 ↩4
PostgreSQL 14 release notes: "Add support for Infinity and -Infinity values in the numeric data type (Tom Lane)", "Make non-zero floating-point values divided by infinity return zero (Kyotaro Horiguchi)", "Make floating-point division of NaN by zero return NaN (Tom Lane)." https://www.postgresql.org/docs/release/14.0/ ↩ ↩2
PostgreSQL 15 release notes: "Allow the scale of a numeric value to be negative, or greater than its precision (Dean Rasheed, Tom Lane). This allows rounding of values to the left of the decimal point, e.g., '1234'::numeric(4, -2) returns 1200." https://www.postgresql.org/docs/release/15.0/ ↩
PostgreSQL 16 documentation, "TOAST" (§70.2). Cross-referenced from 31-toast.md. https://www.postgresql.org/docs/16/storage-toast.html ↩
PostgreSQL 16 documentation, "Network Address Functions and Operators" (§9.12): operator list <<, <<=, >>, >>=, &&, ~, &, |, +, - and functions abbrev, broadcast, family, host, hostmask, inet_merge, inet_same_family, masklen, netmask, network, set_masklen, text. https://www.postgresql.org/docs/16/functions-net.html ↩
PostgreSQL 16 release notes: "Allow non-decimal integer literals (Peter Eisentraut, David Rowley)", "Allow NUMERIC to process hexadecimal, octal, and binary integers of any size (Dean Rasheed)", "Allow underscores in integer and numeric constants (Peter Eisentraut, Dean Rasheed)." Lexical-syntax detail from PostgreSQL 16 documentation, "Lexical Structure" (§4.1.2.6): underscore restrictions verbatim. https://www.postgresql.org/docs/release/16.0/ and https://www.postgresql.org/docs/16/sql-syntax-lexical.html ↩ ↩2 ↩3
PostgreSQL 14 release notes: "Allow SP-GiST indexes to be used for INET and CIDR types (Emre Hasegeli)." https://www.postgresql.org/docs/release/14.0/ ↩ ↩2
evals
references