alonso-skills/postgres

Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.

1.36x

Quality

94%

Does it follow best practices?

Impact

94%

1.36x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Built-in Scalar Data Types

Name: alonso-skills/postgres
Rating: 94 (1 reviews)
Author: alonso-skills

Reference for the built-in scalar types every Postgres backend ships with — character, numeric, boolean, binary, network, and bit-string. Composite/enum/domain/range types live in 15-data-types-custom.md; arrays in 16-arrays.md; JSON/JSONB in 17-json-jsonb.md; UUID, money, and the deeper NUMERIC precision/scale story in 18-uuid-numeric-money.md; date/time/interval in 19-timestamp-timezones.md.

When to Use This Reference
Mental Model
Type Selection Matrix
Syntax / Mechanics
Examples / Recipes
Gotchas / Anti-patterns
See Also
Sources

When to Use This Reference

Use this file when picking among text vs varchar(n) vs char(n), between integer and bigint, between numeric and real, between bytea and a large-object handle, between inet and plain text, or whenever someone wrote char(n) and you have to explain why every Postgres veteran will groan at the migration.

This file does not cover indexing those columns (see 22-indexes-overview.md), collations for text comparison (see 65-collations-encoding.md), or storage details like TOAST that affect large values (see 31-toast.md). It does cover the behavioral surprises baked into each type — char(n) trailing-space semantics, NaN > everything in numeric, bytea_output flipping representation, cidr rejecting host-bits-set, and the rest.

Mental Model

Five rules cover most type-selection decisions:

For text, use text. Never char(n). Use varchar(n) only when a length constraint is a business rule (column will end up in a CSV with a fixed-width field), not for performance. The three character types share one varlena storage backend; char(n) is the slowest of the three because of its space-padding obligations.¹
For exact numbers, use numeric. For approximate physical measurements, use double precision. real is rarely the right answer (4-byte float, ~6 decimal digits). money is locale-sensitive and almost never what you want; see 18-uuid-numeric-money.md.
For autoincrement IDs, use GENERATED BY DEFAULT AS IDENTITY (SQL standard, in core since PG10), not serial. serial/bigserial/smallserial are macros for "create a sequence, set the column default, OWNED BY the column"; they are not real types and they have ownership transfer quirks.² See 01-syntax-ddl.md for the migration recipe.
For binary data, use bytea for anything you would manipulate in SQL, and stay below ~10 MB per row. For larger blobs that you stream chunk-by-chunk, the Large Object API exists (see 71-large-objects.md), but it carries transactional and replication caveats.
For IP addresses and MACs, use inet/cidr/macaddr — not text. The dedicated types are fixed-size, validate input, sort correctly, and accelerate containment queries with GiST. The 8-octet variant macaddr8 is the right choice for new schemas because EUI-64 is a superset of EUI-48.

Type Selection Matrix

You need to store	Use	Avoid	Why
Text of unknown/variable length	`text`	`varchar(n)`, `char(n)`	All three share the same storage; `text` is the canonical name; length cap only when business-required¹
Text with hard length cap from a spec	`varchar(n)`	`char(n)`	`varchar(n)` rejects overlong input; `char(n)` pads with spaces and treats trailing spaces as insignificant³
Counter that won't exceed 2.1B	`integer`	`bigint` for tiny counters	4 bytes vs 8; matters at billions of rows
Counter that might exceed 2.1B	`bigint`	`integer`	Migrating `integer` → `bigint` on a hot table is painful
Money, scientific exact values	`numeric(p, s)`	`real`, `double precision`, `money`	`numeric` is exact; floats round; `money` is locale-sensitive⁴
Physics-style measurement, percentages	`double precision`	`numeric` for non-exact data	`double` is 10–100× faster than `numeric` for arithmetic⁴
Yes/no	`boolean`	`smallint`, `'Y'`/`'N'` text	1 byte, indexable, accepts `'t'`, `'true'`, `'yes'`, `'on'`, `'1'`⁵
Image, encrypted blob, hash, signed data	`bytea`	`text` (hex string), Large Object	`bytea` is transactional, replicates cleanly, fits MVCC; LO API for >10 MB streams⁶
IPv4/IPv6 host address	`inet`	`text`	7 or 19 bytes; built-in containment operators; correct sorting
IPv4/IPv6 network (CIDR block)	`cidr`	`inet` if you also store host addresses	`cidr` validates that bits to the right of mask are zero
MAC address	`macaddr8` (new) or `macaddr` (legacy)	`text`	6 or 8 bytes; accepts 7+ input formats; deterministic output
Bit mask, feature flags	`integer`/`bigint` with bitwise ops, OR `boolean` columns	`bit(n)`, `bit varying(n)`	Bit-string types are rarely the right choice — see Gotchas
Auto-incrementing surrogate key	`bigint GENERATED BY DEFAULT AS IDENTITY`	`serial`, `bigserial`	Identity is the SQL standard; `serial` is a macro with subtle ownership traps²
Universally-unique surrogate key	`uuid`	`text`	See `18-uuid-numeric-money.md`

Syntax / Mechanics

Character Types: `text`, `varchar`, `char`

-- All three accept up to ~1 GB; varlena (1- or 4-byte header + bytes).
CREATE TABLE t (
    a text,                       -- variable length, no cap
    b character varying(100),     -- aliased as varchar(100); cap = 100 chars
    c character(10),              -- aliased as char(10); always padded to 10
    d "char"                      -- C-language single byte; never use in app code
);

[!WARNING] char(n) is space-padded and almost never what you want

"Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character."³ This makes 'abc'::char(5) equal to 'abc' and 'abc ' for comparison purposes — but the column literally stores 'abc ' and a RETURNING or COPY OUT will produce the padded form. Mixing char(n) and varchar/text in joins introduces ambiguity in trailing-space semantics. Use text or varchar(n).

No performance difference among text, varchar(n), varchar, and char(n) for storage and access, per the official docs: "There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. ... In most situations text or character varying should be used instead."¹

Maximum: ~1 GB stored value; explicit varchar(n)/char(n) length n may not exceed 10,485,760 characters.¹

Storage: 1-byte header for short strings (≤ 126 bytes); 4-byte header for longer. Beyond ~2 KB, values may be compressed and/or moved to TOAST (see 31-toast.md).

[!NOTE] PostgreSQL 15: "char" (the internal single-byte type) I/O format changed

"Change the I/O format of type "char" for non-ASCII characters ... Bytes with the high bit set are now output as a backslash and three octal digits, to avoid encoding issues."⁷ This affects the "char" system-catalog type — not the user-facing character/char. Only relevant if you read pg_class.relkind from a non-libpq client.

[!NOTE] PostgreSQL 18: nondeterministic-collation operations broadened

"Allow LIKE with nondeterministic collations (Peter Eisentraut)" and "Allow text position search functions with nondeterministic collations (Peter Eisentraut)."⁸ PG12 introduced nondeterministic collations for case-insensitive UNIQUE constraints (see 65-collations-encoding.md); PG18 lets LIKE, position(), strpos(), etc., use them too.

[!NOTE] PostgreSQL 18: casefold() for Unicode case-insensitive matching

"Add function casefold() to allow for more sophisticated case-insensitive matching (Jeff Davis)."⁸ casefold(text) is the Unicode-correct alternative to lower(text) for case-insensitive comparison (Turkish i / İ, German ß, etc.).

Integer Types: `smallint`, `integer`, `bigint`

Type	Aliases	Bytes	Range
`smallint`	`int2`	2	−32,768 to +32,767
`integer`	`int`, `int4`	4	−2,147,483,648 to +2,147,483,647
`bigint`	`int8`	8	−9,223,372,036,854,775,808 to +9,223,372,036,854,775,807

Picking between them is a row-count and key-density bet, not a "future-proofing" debate:

For surrogate keys on tables that might grow past 2.1 B rows, default to bigint. Migrating integer → bigint on a hot table is expensive: it rewrites every row and every covering index.
For domain-bounded counters (HTTP status, port number, age), smallint saves 2 bytes per row — meaningful at hundreds of millions of rows; meaningless at thousands. Often not worth the loss of ergonomics.
Overflow raises numeric_value_out_of_range (SQLSTATE 22003). There is no silent wrap.

Arbitrary-Precision: `numeric` / `decimal`

Two interchangeable aliases for the same type. Use numeric by convention (decimal is provided for SQL-standard compatibility).

CREATE TABLE invoice (
    id            bigint GENERATED BY DEFAULT AS IDENTITY,
    amount_cents  numeric(12, 2)  -- 12 total digits, 2 after the decimal
);

Precision p = total significant digits.
Scale s = digits after the decimal point.
Range: up to 131,072 digits before the decimal and 16,383 digits after.⁴
Storage: variable, ~2 bytes per 4 digits + ~5-byte header.

Unconstrained numeric (no (p, s)) accepts any value up to the implementation limits; the column does not coerce or round on insert. "Specifying NUMERIC without any precision or scale creates an 'unconstrained numeric' column in which numeric values of any length can be stored, up to the implementation limits."⁴

[!WARNING] NaN in numeric does not behave like IEEE-754 NaN

Postgres treats NaN in numeric as equal to NaN and greater than every non-NaN value. Per the docs: "In most implementations of the 'not-a-number' concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values."⁴ This is a deliberate departure from IEEE-754 done to make numeric orderable.

[!NOTE] PostgreSQL 14: numeric supports Infinity and -Infinity

"Add support for Infinity and -Infinity values in the numeric data type (Tom Lane)."⁹ Before PG14 only real/double precision had infinity values.

[!NOTE] PostgreSQL 15: scale can be negative or greater than precision

"Allow the scale of a numeric value to be negative, or greater than its precision (Dean Rasheed, Tom Lane). This allows rounding of values to the left of the decimal point, e.g., '1234'::numeric(4, -2) returns 1200."¹⁰ A negative scale means round to that many places to the left of the decimal. Useful for sensor data with rough resolution.

[!NOTE] PostgreSQL 18: faster numeric arithmetic

"Improve the speed of numeric multiplication and division (Joel Jacobson, Dean Rasheed)" and "Add math functions gamma() and lgamma() (Dean Rasheed)."⁸

Floating-Point: `real`, `double precision`

Type	Aliases	Bytes	Approximate range	Decimal precision
`real`	`float4`	4	~1E-37 to 1E+37	≥ 6 digits
`double precision`	`float`, `float8`	8	~1E-307 to 1E+308	≥ 15 digits

IEEE-754 single and double precision. They support Infinity, -Infinity, and NaN.

[!WARNING] Never use real or double precision for money or any value where exact equality matters

Per the docs: "If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead."⁴ Floating-point math will quietly produce results that don't equal each other: 0.1 + 0.2 = 0.30000000000000004 in double precision. This is correct IEEE behavior, and it is a footgun for accounting and audit.

[!NOTE] PostgreSQL 14: float-divide-by-infinity returns 0

"Make non-zero floating-point values divided by infinity return zero (Kyotaro Horiguchi)" and "Make floating-point division of NaN by zero return NaN (Tom Lane)."⁹ Minor edge-case alignment with IEEE-754.

Serial Family (`serial`, `bigserial`, `smallserial`) — Aliases, Not Types

-- "serial" is shorthand. This:
CREATE TABLE legacy (id serial PRIMARY KEY, name text);

-- expands to (approximately):
CREATE SEQUENCE legacy_id_seq AS integer;
CREATE TABLE legacy (
    id    integer NOT NULL DEFAULT nextval('legacy_id_seq'),
    name  text,
    PRIMARY KEY (id)
);
ALTER SEQUENCE legacy_id_seq OWNED BY legacy.id;

Notational form	Underlying integer	Underlying sequence range
`smallserial` / `serial2`	`smallint`	1 to 32,767
`serial` / `serial4`	`integer`	1 to 2,147,483,647
`bigserial` / `serial8`	`bigint`	1 to 9,223,372,036,854,775,807

[!WARNING] Use GENERATED BY DEFAULT AS IDENTITY for new tables

Per the docs: "The data types smallserial, serial and bigserial are not true types, but merely a notational convenience for creating unique identifier columns."² Identity columns (SQL standard, in core since PG10) handle ownership, sequence rights, and pg_dump correctly without the special-case OWNED BY plumbing. See the migration recipe in 01-syntax-ddl.md. For the deeper serial-vs-IDENTITY comparison and the ALWAYS/BY DEFAULT decision, see 18-uuid-numeric-money.md.

`boolean`

CREATE TABLE feature_flag (
    name     text PRIMARY KEY,
    enabled  boolean NOT NULL DEFAULT false
);

Storage: 1 byte.
Aliases: bool.
Output: always t or f. Set a CASE expression in queries if you want TRUE/FALSE/YES/NO in output text.

Accepted input (case-insensitive, leading/trailing whitespace ignored):⁵

`true` (any of)	`false` (any of)
`true`, `t`	`false`, `f`
`yes`, `y`	`no`, `n`
`on`	`off`
`1`	`0`

The '1'::boolean / '0'::boolean casts are why DBA scripts using exported CSVs with 1/0 just work. The 'y'/'n' parse is why ORM defaults that emit 'Y' accidentally work too.

Use IS TRUE / IS FALSE / IS NOT TRUE / IS NOT FALSE / IS NULL to handle the third state NULL (unknown):

SELECT * FROM feature_flag WHERE enabled IS NOT TRUE;
-- matches rows where enabled is false OR null

`bytea`

-- Hex input (recommended for new code)
SELECT '\x48656c6c6f'::bytea;          -- 'Hello'

-- Escape input (legacy, avoid)
SELECT 'abc \153\154\155'::bytea;      -- 'abc klm'

-- bytea_output controls the *display* format only — storage is identical
SET bytea_output = 'hex';              -- default since PG9
SET bytea_output = 'escape';           -- legacy

Storage: 1- or 4-byte header + raw bytes. Up to ~1 GB per value; values over toast_tuple_target (default ~2 KB) compress and TOAST out-of-line.¹¹
Default I/O format: hex (\x prefix + 2 hex digits per byte). "The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred."⁶

Escape format requires escaping for: NUL byte (\000), single quote ('' or \047), backslash (\\ or \134), and any non-printable octets (0–31, 127–255 as \xxx octal).⁶

[!NOTE] PostgreSQL 18: bytea ↔ integer casts and reverse()

"Allow casting between integer types and bytea (Aleksander Alekseev)" and "Add function reverse() to reverse bytea bytes (Aleksander Alekseev)."⁸ Casts let you go (2_147_483_647)::bytea (returns \x7fffffff) without lpad(to_hex(...), ...)::bytea plumbing.

bytea vs Large Object: see 71-large-objects.md. The short version: bytea is transactional, replicates cleanly, and works with RETURNING. The LO API exists for streaming reads/writes of multi-gigabyte files, but loses the RETURNING ergonomics and has weak replication semantics. Default to bytea.

Network Types: `inet`, `cidr`, `macaddr`, `macaddr8`

Type	Storage	What it accepts	Notes
`inet`	7 or 19 bytes	IPv4 or IPv6 host or network: `192.0.2.10`, `192.0.2.0/24`, `::1`, `2001:db8::/32`	Default netmask /32 for v4, /128 for v6 when omitted
`cidr`	7 or 19 bytes	IPv4 or IPv6 network only: `192.0.2.0/24`	Rejects host-bits-set: `192.0.2.10/24` is an error
`macaddr`	6 bytes	EUI-48 MAC: `08:00:2b:01:02:03` (7 accepted input formats)	Output is always colon-separated
`macaddr8`	8 bytes	EUI-64 MAC: `08:00:2b:01:02:03:04:05`	6-byte input is expanded with `FF FE` in middle

CREATE TABLE login_audit (
    when_at  timestamptz NOT NULL DEFAULT now(),
    client   inet NOT NULL,
    actor    text NOT NULL
);

-- Containment: 192.0.2.10 falls inside 192.0.2.0/24
SELECT '192.0.2.10'::inet << '192.0.2.0/24'::cidr;        -- t

-- Subtraction: difference in addresses
SELECT '192.0.2.10'::inet - '192.0.2.5'::inet;             -- 5

-- Functions
SELECT host('192.0.2.10/24'), masklen('192.0.2.10/24');    -- '192.0.2.10', 24

Network operators:¹²

Operator	Meaning
`<<`	strictly contained by subnet
`<<=`	contained by or equal
`>>`	strictly contains
`>>=`	contains or equal
`&&`	overlap (either side contains the other)
`~`	bitwise NOT
`&`	bitwise AND
`\|`	bitwise OR
`+`	add offset
`-`	subtract address or offset

Index containment with GiST or SP-GiST using inet_ops / cidr_ops:

CREATE INDEX login_client_gist ON login_audit USING gist (client inet_ops);
-- accelerates "WHERE client << '10.0.0.0/8'" queries

See 24-gin-gist-indexes.md for operator class selection and 25-brin-hash-spgist-bloom-indexes.md for SP-GiST trade-offs.

Bit Strings: `bit(n)`, `bit varying(n)` / `varbit`

CREATE TABLE flags (
    id  bigint PRIMARY KEY,
    mask  bit varying(64)
);

INSERT INTO flags VALUES (1, B'01100110');     -- B'...' is bit-string literal
SELECT mask | B'10000000' FROM flags;          -- bitwise OR

bit(n) — fixed width, error if input is shorter or longer (cast truncates/zero-pads silently).
bit varying(n) — up to n bits; error if longer; alias varbit(n).
Storage: ~1 byte per 8 bits + 5–8 bytes overhead.

[!WARNING] Bit-string types are rarely the right choice

For feature-flag bitmasks: use integer/bigint with &/|/# (XOR) — smaller storage and faster ops.

For "long, sparse bit array": use roaringbitmap extension or a normalized table.

For "raw bytes": use bytea.

bit/bit varying exist for SQL-standard completeness and the rare case where individual-bit semantics are user-visible. New schemas almost never need them.

Numeric Literal Syntax (PG16+ Hex / Octal / Binary / Underscores)

[!NOTE] PostgreSQL 16: non-decimal literals and underscore grouping

Three new forms for integer literals:¹³
-- Hex / octal / binary integers (any size with NUMERIC parsing)
SELECT 0x42f, 0XFFFF, 0o273, 0O755, 0b100101, 0B10011001;

-- Underscores for visual grouping (in decimal and non-decimal)
SELECT 1_500_000_000, 0xFFFF_FFFF, 1.618_034;
Restrictions: underscores cannot appear at start/end, immediately before/after the decimal point, immediately around the exponent marker, or in consecutive pairs.¹³

Examples / Recipes

Recipe 1: Picking `text` vs `varchar(n)` vs `char(n)` — and the one valid reason for `varchar(n)`

-- BAD: char(n). Trailing-space semantics break joins to text/varchar columns.
-- BAD: varchar(255) "for performance". There is none.
-- GOOD: text. Default.
CREATE TABLE user_profile (
    id            bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    email         text NOT NULL UNIQUE,
    display_name  text NOT NULL,
    bio           text
);

-- ONLY GOOD use of varchar(n): a business-rule length cap
-- (e.g., the column is exported into a fixed-width legacy CSV).
CREATE TABLE billing_export (
    customer_short_code  varchar(8)  NOT NULL    -- legacy mainframe wants exactly 8 chars
);

Adding a length cap to a column is a ALTER TABLE ... ALTER COLUMN ... TYPE varchar(n) USING substring(col, 1, n) — full table rewrite. Rip caps out, don't add them, unless required.

Recipe 2: Migrating `serial` to `IDENTITY`

-- Old schema:
-- CREATE TABLE t (id serial PRIMARY KEY, name text);
-- The sequence is named t_id_seq, OWNED BY t.id.

-- Drop the default that points at the sequence; convert the column to an identity.
BEGIN;
ALTER TABLE t ALTER COLUMN id DROP DEFAULT;

-- Now make it an identity column. Use the existing sequence's current value as the start.
SELECT setval(pg_get_serial_sequence('t', 'id'),
              (SELECT max(id) FROM t),
              true);   -- third arg true => next nextval() returns max + 1

ALTER TABLE t
    ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;

-- The old sequence still exists; drop it now that the column owns its own identity sequence.
-- (Actually, the ADD GENERATED clause creates a new sequence and the old one is detached.)
DROP SEQUENCE t_id_seq;

COMMIT;

Why BY DEFAULT instead of ALWAYS: with ALWAYS, manual INSERT (id, name) VALUES (42, 'x') will fail unless you say OVERRIDING SYSTEM VALUE. BY DEFAULT lets you continue inserting explicit IDs for backfill/data-load.

Recipe 3: Storing money as `numeric(12, 2)` and avoiding float

CREATE TABLE invoice (
    id         bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    currency   char(3) NOT NULL,    -- ISO 4217 — three chars, *always* three chars, the one place char(n) is right
    amount     numeric(12, 2) NOT NULL,
    CHECK (currency = upper(currency)),
    CHECK (amount >= 0)
);

-- DO NOT do this:
-- CREATE TABLE invoice (..., amount double precision);   -- silent rounding errors

-- DO NOT use the money type either — it's locale-sensitive and almost always wrong.
-- See 18-uuid-numeric-money.md for the full discussion.

For multi-currency: store the integer minor units (cents, pence, satoshi) plus a separate currency_code:

amount_cents  bigint NOT NULL,
currency      char(3) NOT NULL

This eliminates all rounding at the cost of "the application has to multiply by 100 on display."

Recipe 4: Validating IP addresses with `inet` and indexing for containment

CREATE TABLE block_list (
    cidr_block  cidr PRIMARY KEY,
    added_at    timestamptz NOT NULL DEFAULT now(),
    reason      text
);

-- GiST index for containment queries
CREATE INDEX block_list_cidr_gist ON block_list USING gist (cidr_block inet_ops);

-- Is this client blocked?
SELECT * FROM block_list WHERE cidr_block >>= '192.0.2.10'::inet;

For very large tables (10M+ rows) where you only ever check exact match, a btree on cidr_block is fine. For range-of-subnets queries, GiST or SP-GiST.

[!NOTE] PostgreSQL 14: SP-GiST gained inet_ops

PG14 added an SP-GiST operator class for inet/cidr that is often faster than GiST for trie-like containment lookups. "Allow SP-GiST indexes to be used for INET and CIDR types (Emre Hasegeli)."¹⁴ Worth benchmarking against GiST on big tables.

Recipe 5: Auditing IP block lists for invalid CIDR input

-- This is rejected at insert time. cidr enforces "host bits must be zero":
INSERT INTO block_list (cidr_block) VALUES ('192.0.2.10/24');
-- ERROR:  invalid cidr value: "192.0.2.10/24"
-- DETAIL: Value has bits set to right of mask.

-- Convert from inet to cidr by zeroing host bits:
SELECT set_masklen('192.0.2.10'::inet, 24)::cidr;
--   set_masklen
-- ----------------
--  192.0.2.0/24

For a text column you're migrating to cidr, find values that won't cast cleanly:

SELECT s FROM legacy_blocks WHERE NOT (s ~ '^[0-9:.a-fA-F]+/\d+$')
   OR s::cidr IS NULL;   -- last clause raises if invalid; wrap in a SAVEPOINT-per-row routine

Recipe 6: Bitwise feature flags in `bigint` (better than `bit(n)`)

-- Allocate up to 64 flags, one bit each.
-- Constants live as application-side enums; comments document them here.
CREATE TABLE account (
    id           bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    email        text NOT NULL,
    feature_mask bigint NOT NULL DEFAULT 0
    -- bit 0: beta_enabled
    -- bit 1: marketing_emails
    -- bit 2: two_factor_required
    -- ...
);

-- Set bit 2 atomically:
UPDATE account SET feature_mask = feature_mask | (1::bigint << 2) WHERE id = $1;

-- Read bit 2:
SELECT (feature_mask & (1::bigint << 2)) <> 0 AS two_factor_required FROM account WHERE id = $1;

-- Index "rows with bit 2 set" via partial index:
CREATE INDEX account_2fa_required
    ON account (id)
    WHERE (feature_mask & 4) <> 0;

For ad-hoc analytics, prefer separate boolean columns or a normalized (account_id, feature) table — bitmasks are opaque in EXPLAIN and joins.

Recipe 7: Computing exact change with `numeric` vs the silent-rounding float trap

-- Demonstrate the failure mode:
SELECT 0.1::double precision + 0.2::double precision;
--  ?column?
-- ------------------------
--  0.30000000000000004

-- The correct way:
SELECT 0.1::numeric + 0.2::numeric;
--  ?column?
-- ----------
--  0.3

-- For monetary aggregations across many rows, numeric is the only safe choice:
SELECT customer_id,
       sum(amount)::numeric(14,2) AS total_due
  FROM invoice
 GROUP BY customer_id;

Recipe 8: Storing hashes and signatures as `bytea` (not `text`)

-- DO:
CREATE TABLE artifact (
    id      bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    path    text NOT NULL,
    sha256  bytea NOT NULL,    -- 32 raw bytes
    CHECK (octet_length(sha256) = 32)
);

INSERT INTO artifact (path, sha256)
VALUES ('/srv/release.tar.gz', sha256('hello'::bytea));

SELECT path, encode(sha256, 'hex') FROM artifact;

-- DON'T:
-- CREATE TABLE artifact (..., sha256_hex text);  -- twice the storage; case-sensitivity bugs

bytea is half the on-disk size of a hex text column for the same digest. The encode(..., 'hex') and decode(..., 'hex') functions handle the display conversion.

Recipe 9: PG18+ casefold for Unicode-correct case-insensitive comparison

[!NOTE] PostgreSQL 18

-- Locale-aware case folding (Unicode "fc" mapping) — handles Turkish dotless i, ß, etc.
SELECT casefold('STRASSE') = casefold('Straße');  -- t

-- For older majors: use lower() + an ICU collation that does the right thing,
-- or a nondeterministic collation (PG12+); see 65-collations-encoding.md.

Recipe 10: PG18+ integer ↔ bytea casts

[!NOTE] PostgreSQL 18

SELECT 2_147_483_647::bytea;
-- \x7fffffff

SELECT '\xdeadbeef'::bytea::integer;
--  ?column?
-- ------------
--  -559038737   -- signed-32 interpretation

-- Building binary wire formats:
SELECT (1::int4)::bytea || (42::int4)::bytea;
-- \x000000010000002a

Recipe 11: Detecting columns that should be changed away from `char(n)`

Run this against any production database before a schema review:

SELECT n.nspname AS schema,
       c.relname AS table,
       a.attname AS column,
       pg_catalog.format_type(a.atttypid, a.atttypmod) AS type
  FROM pg_attribute a
  JOIN pg_class c ON c.oid = a.attrelid
  JOIN pg_namespace n ON n.oid = c.relnamespace
 WHERE a.atttypid = 'bpchar'::regtype
   AND a.attnum > 0
   AND NOT a.attisdropped
   AND c.relkind IN ('r', 'p')                 -- ordinary tables and partitioned tables
   AND n.nspname NOT IN ('pg_catalog', 'information_schema')
 ORDER BY schema, table, column;

Output is the audit list. The exception is char(3) for ISO currency codes — that one is correct because the spec demands exactly three characters and there is no ambiguity. Most other char(n) columns came from "I thought it would be faster," and should be migrated to text or varchar(n) if the cap is genuinely business-required.

Recipe 12: Picking between `inet` and `cidr` for "this column stores both addresses and networks"

-- inet stores both a host address AND a netmask. A "host" is just an address with /32 (v4) or /128 (v6).
-- cidr rejects host-bits-set; it stores networks only.

-- For a column that may hold either a single host or a CIDR block: use inet.
--     '192.0.2.10/32' (the host) — fine in inet
--     '192.0.2.0/24'  (the network) — fine in inet
--     '192.0.2.10/24' — INVALID in cidr (host bits set), VALID in inet (host within network)

-- For a column that must hold only networks (allow/deny lists, route tables): use cidr.

inet semantics for 192.0.2.10/24 say "this is the host 192.0.2.10 in the /24 network containing it." cidr semantics reject this because in CIDR notation the address portion must be the lowest IP in the block. If you store an audit log of "what source IPs hit my API," use inet. If you store "block lists of subnets," use cidr.

Gotchas / Anti-patterns

char(n) pads with spaces and treats trailing space as insignificant in comparisons. 'abc'::char(5) = 'abc'::text is true. 'abc'::char(5) = 'abc ' is also true. The padded form is what gets stored, sorted, and returned via COPY OUT. Avoid char(n) for everything except fixed-width legacy interfaces.³
varchar(255) "for performance" is a myth. All character types share a single varlena backend.¹ The 255 limit comes from MySQL row-format constraints, not Postgres. Use text unless a length cap is a business rule.
numeric NaN sorts after every other value, including Infinity. This makes numeric orderable, but breaks the assumption that "NaN propagates through comparisons." nan_val > 5 is true in numeric, false (NULL) in double precision. Be explicit with IS NOT NULL + IS NOT NULL.⁴
serial/bigserial confused ownership. pg_dump handles them, but if you DROP COLUMN id from a table, the OWNED-BY sequence is dropped too. With GENERATED BY DEFAULT AS IDENTITY the link is explicit and ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY is the safe rollback.²
real and double precision accumulate error. Summing one billion real values is not accurate to the last decimal place. For accounting-grade arithmetic, the answer is always numeric, even if you have to round at display time.⁴
'1'::boolean works; 1::boolean does not. cast(1 AS boolean) requires an explicit cast through integer != 0 or you'll get cannot cast type integer to boolean. The text literal form '1', '0', 'true', 'false', 'yes', 'no', 'on', 'off', 't', 'f', 'y', 'n' works.⁵
bytea_output flips display but not storage. Setting SET bytea_output = 'escape' makes existing \x... outputs render with octal escapes. Drivers see whichever format they ask for. Storage is identical. If your client library is comparing returned bytea strings instead of using its bytea-bind API, you have a bug waiting.⁶
cidr rejects host-bits-set. '192.0.2.10/24'::cidr is an error; you have to write '192.0.2.0/24'::cidr or convert via set_masklen(inet, 24)::cidr. The error message is clear (Value has bits set to right of mask.) but ETL pipelines that expected text-like behavior will fail loudly.
inet-vs-cidr containment uses the SAME << operator but different semantics on host bits. '192.0.2.10/24'::inet << '192.0.2.0/24'::cidr is true; comparison considers the host address, not the masked network. Document which side is host and which is network in column comments.
GiST on inet/cidr requires the operator class: CREATE INDEX ... USING gist (col inet_ops). Without inet_ops, no operator class exists. SP-GiST also requires inet_ops/cidr_ops (added PG14).¹⁴
bit(n) truncates silently on cast. B'10'::bit(3) returns B'100' (right-padded with zero), not an error. bit varying(n) truncates input that exceeds n with an error on direct INSERT but truncates on explicit cast. Read the rules.
text storage is unbounded — but TOAST kicks in around 2 KB. Storing a megabyte of text per row works, but every read from disk pays for de-TOAST and decompression. Consider whether the column should be a separate one-to-one table with the heavy column elsewhere. See 31-toast.md.
numeric arithmetic is 10–100× slower than double precision arithmetic. For non-money workloads with billions of rows, this matters. Benchmark; don't reach for numeric on principle for sensor data or telemetry.
Integer literal underscores (PG16+) cannot be at the boundary. _1_000 is a parse error; 1_000_ is a parse error; 1__000 is a parse error. Underscores must be between digits and not consecutive.¹³
smallserial exists; tinyint does not. Postgres has no 1-byte integer type. Use boolean for 0/1 and smallint for everything else. (If you really must store −128 to 127 in 1 byte, the C-language "char" type exists, but it's not recommended for application code and pg_dump/clients may render it oddly.)
inet and text cast freely; cidr and text do not in all directions. '192.0.2.0/24'::cidr works; (some_cidr_col)::text works; but be aware that round-trip through text loses no information for cidr but does for inet — host bits in inet are part of the value, and the canonical text form preserves them.

Sources

PostgreSQL 16 documentation, "Character Types" (§8.3): "There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. ... In most situations text or character varying should be used instead." Length cap: "the length n must be greater than zero and cannot exceed 10,485,760." Maximum stored value: "The longest possible character string that can be stored is about 1 GB." Short-string overhead: "The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string ... Longer strings have 4 bytes of overhead instead of 1." https://www.postgresql.org/docs/16/datatype-character.html ↩ ↩² ↩³ ↩⁴ ↩⁵
PostgreSQL 16 documentation, "Numeric Types" §8.1.4 "Serial Types": "The data types smallserial, serial and bigserial are not true types, but merely a notational convenience for creating unique identifier columns." https://www.postgresql.org/docs/16/datatype-numeric.html ↩ ↩² ↩³ ↩⁴
PostgreSQL 16 documentation, "Character Types" (§8.3): "Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results." https://www.postgresql.org/docs/16/datatype-character.html ↩ ↩² ↩³
PostgreSQL 16 documentation, "Numeric Types" (§8.1): byte sizes table; range limits "up to 131072 digits before the decimal point; up to 16383 digits after the decimal point"; NaN: "In most implementations of the 'not-a-number' concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values."; unconstrained: "Specifying NUMERIC without any precision or scale creates an 'unconstrained numeric' column ..."; money: "If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead." https://www.postgresql.org/docs/16/datatype-numeric.html ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
PostgreSQL 16 documentation, "Boolean Type" (§8.6): accepts true, t, yes, y, on, 1 and false, f, no, n, off, 0 (case-insensitive, unique prefixes accepted, leading/trailing whitespace ignored); storage 1 byte; output t/f. https://www.postgresql.org/docs/16/datatype-boolean.html ↩ ↩² ↩³
PostgreSQL 16 documentation, "Binary Data Types" (§8.4): "The output format depends on the configuration parameter bytea_output; the default is hex." Hex format: "The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred." Escape format escape table (zero, single quote, backslash, non-printable octets) per Table 8.7. Storage: "1 or 4 bytes plus the actual binary string." https://www.postgresql.org/docs/16/datatype-binary.html ↩ ↩² ↩³ ↩⁴
PostgreSQL 15 release notes: "Change the I/O format of type "char" for non-ASCII characters (Tom Lane). Bytes with the high bit set are now output as a backslash and three octal digits, to avoid encoding issues." https://www.postgresql.org/docs/release/15.0/ ↩
PostgreSQL 18 release notes: "Allow LIKE with nondeterministic collations (Peter Eisentraut)", "Allow text position search functions with nondeterministic collations (Peter Eisentraut)", "Add function casefold() to allow for more sophisticated case-insensitive matching (Jeff Davis)", "Add function reverse() to reverse bytea bytes (Aleksander Alekseev)", "Allow casting between integer types and bytea (Aleksander Alekseev)", "Improve the speed of numeric multiplication and division (Joel Jacobson, Dean Rasheed)", "Add math functions gamma() and lgamma() (Dean Rasheed)." https://www.postgresql.org/docs/release/18.0/ ↩ ↩² ↩³ ↩⁴
PostgreSQL 14 release notes: "Add support for Infinity and -Infinity values in the numeric data type (Tom Lane)", "Make non-zero floating-point values divided by infinity return zero (Kyotaro Horiguchi)", "Make floating-point division of NaN by zero return NaN (Tom Lane)." https://www.postgresql.org/docs/release/14.0/ ↩ ↩²
PostgreSQL 15 release notes: "Allow the scale of a numeric value to be negative, or greater than its precision (Dean Rasheed, Tom Lane). This allows rounding of values to the left of the decimal point, e.g., '1234'::numeric(4, -2) returns 1200." https://www.postgresql.org/docs/release/15.0/ ↩
PostgreSQL 16 documentation, "TOAST" (§70.2). Cross-referenced from 31-toast.md. https://www.postgresql.org/docs/16/storage-toast.html ↩
PostgreSQL 16 documentation, "Network Address Functions and Operators" (§9.12): operator list <<, <<=, >>, >>=, &&, ~, &, |, +, - and functions abbrev, broadcast, family, host, hostmask, inet_merge, inet_same_family, masklen, netmask, network, set_masklen, text. https://www.postgresql.org/docs/16/functions-net.html ↩
PostgreSQL 16 release notes: "Allow non-decimal integer literals (Peter Eisentraut, David Rowley)", "Allow NUMERIC to process hexadecimal, octal, and binary integers of any size (Dean Rasheed)", "Allow underscores in integer and numeric constants (Peter Eisentraut, Dean Rasheed)." Lexical-syntax detail from PostgreSQL 16 documentation, "Lexical Structure" (§4.1.2.6): underscore restrictions verbatim. https://www.postgresql.org/docs/release/16.0/ and https://www.postgresql.org/docs/16/sql-syntax-lexical.html ↩ ↩² ↩³
PostgreSQL 14 release notes: "Allow SP-GiST indexes to be used for INET and CIDR types (Emre Hasegeli)." https://www.postgresql.org/docs/release/14.0/ ↩ ↩²