CtrlK
BlogDocsLog inGet started
Tessl Logo

alonso-skills/postgres

Comprehensive PostgreSQL reference for developers and DBAs covering versions 14–18. Use whenever the user asks about PostgreSQL syntax, DDL/DML/DQL, joins, LATERAL, CTEs, window functions, GROUPING SETS, DISTINCT ON, RETURNING, ON CONFLICT, PL/pgSQL, functions, procedures, triggers, views, materialized views, indexes (B-tree/GIN/GiST/BRIN/Hash/Bloom), MVCC, VACUUM, autovacuum, WAL, TOAST, partitioning, replication (streaming/logical), backup, PITR, HA (Patroni/repmgr), pgBouncer, EXPLAIN ANALYZE, RLS, roles, extensions (pgvector, PostGIS, TimescaleDB, Citus, pg_trgm, pg_cron), JSON/JSONB, full-text search, UUID, timestamptz, COPY, system catalogs, collations, large objects, cursors, GUC, or any Postgres administration, performance, security, replication, backup, or recovery topic.

94

1.36x
Quality

94%

Does it follow best practices?

Impact

94%

1.36x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

79-patroni.mdreferences/

Patroni

[!WARNING] SETTINGS.html does NOT exist Patroni config reference lives in two pages — patroni_configuration.html (architecture, precedence rules) and yaml_configuration.html (per-setting reference). A common stale-citation pattern points at SETTINGS.html which returns 404. Cite the two split pages, never the legacy slug.

When to Use This Reference

For Patroni-specific operational mechanics: patroni.yml grammar, DCS choices, REST API endpoints, watchdog modes, tags, patronictl CLI, failover-vs-switchover, pause mode, bootstrap, standby-cluster setup. For HA-architecture comparison (Patroni vs repmgr vs pg_auto_failover vs Stolon vs K8s operators), see 78-ha-architectures.md. For the standby-side PG mechanics that Patroni orchestrates, see 73-streaming-replication.md and 77-standby-failover.md.

Table of Contents

Mental Model

Five rules drive every Patroni decision:

  1. Patroni = Python daemon on every PG node + external DCS for leader election. Without DCS, no safe failover. Daemon owns PG process lifecycle (pg_ctl start/stop/reload), runs the HA loop every loop_wait seconds, holds the leader-lock key in DCS with TTL.
  2. Two-tier configuration. patroni.yml = static + bootstrap config (read at daemon startup). Dynamic config (loop_wait, ttl, synchronous_mode, postgresql.parameters) lives in DCS, applies to every node, edited via patronictl edit-config or PATCH /config. Verbatim TTL rule: "when changing values of loop_wait, retry_timeout, or ttl you have to follow the rule: loop_wait + 2 * retry_timeout <= ttl".
  3. REST API at port 8008 is the operational surface. Health endpoints (/primary, /replica, /sync, /async, /health, /liveness, /readiness) for HAProxy / K8s probes. Control endpoints (POST /failover, POST /switchover, POST /restart, POST /reinitialize, PATCH /config) for manual operations. Tags-aware lag filters via query params.
  4. Tags constrain promotion behavior. nofailover=true removes node from leader race. clonefrom=true marks node as preferred bootstrap source. noloadbalance=true makes /replica return 503. nosync=true excludes from synchronous quorum. nostream=true disables streaming. replicatefrom: <member> enables cascading replication. failover_priority: <int> prefers higher-priority node.
  5. Watchdog mode required + Linux softdog prevents zombie primaries. Verbatim: "If watchdog activation fails and watchdog mode is required then the node will refuse to become leader." Default safety margin = 5 seconds before TTL expires. Without watchdog, a hung daemon can leave the OS believing it's still leader while DCS has already promoted another node → split-brain.

Decision Matrix

NeedUseAvoidWhy
Self-managed HA on bare metal / VMsPatroni + etcd 3-nodeDIY shell scriptsPatroni handles fencing + DCS + watchdog; shell scripts produce split-brain
K8s deploymentK8s operator (CNPG / Zalando)Standalone Patroni in K8sOperator handles PVC + Service + StatefulSet lifecycle; Patroni-direct works but more YAML
2-node clusterPatroni + 3-node DCS quorum on separate hostsDCS on same hosts as PGQuorum requires majority; co-locating DCS with PG breaks the fencing model
RTO target ~10sPatroni + watchdog + sync replicationManual promotionPatroni's HA loop default loop_wait=10s + DCS TTL 30s ≈ 10-30s detect + promote
RPO target = 0synchronous_mode: on + synchronous_node_count >= 1Async-onlySync replication is the only zero-data-loss mode; trade-off is write availability
Read-only routingHAProxy + /replica health check + noloadbalance tagTCP-only health check/replica returns 503 on primary; TCP health check thinks primary is replica
Cascading replicationtags: replicatefrom: <member> on the leafManual primary_conninfo editManual edit fights Patroni; tag persists across promotions
Maintenance window without failoverpatronictl pauseStop Patroni daemonPause leaves PG running, blocks automatic actions; stopping daemon triggers leader expiry → failover
Manual switchover (planned)patronictl switchoverpg_ctl promote on candidateSwitchover demotes current leader cleanly + promotes candidate; manual promote diverges timelines
Manual failover (planned)patronictl failover --candidate <node>patronictl switchover (when leader unhealthy)Failover bypasses sanity checks (leader unhealthy); switchover requires healthy leader
Cross-region DRStandby cluster (standby_cluster: config block)Streaming across regions in one clusterStandby cluster cascades from primary cluster's leader; latency-tolerant; per-region promotion
Citus distributed PGPatroni 3.0+ citus: config blockIndependent Patroni per nodeCitus-aware Patroni handles worker registration + group_id assignment

Three smell signals that Patroni is wrong tool:

  • Cluster spans regions with synchronous_mode: on — cross-region sync replication blocks every COMMIT on WAN latency. Use standby clusters per region.
  • 3-node DCS on same VMs as PG primary + replicas — DCS quorum + PG quorum must fail independently. Co-location defeats the purpose.
  • watchdog.mode: off in production — hung Patroni daemon → DCS TTL expires → another node promotes → original PG still accepts writes locally → split-brain. Watchdog forces the OS to kill the node.

Architecture

┌─────────────────────────┐
                    │   DCS (etcd / Consul    │
                    │   / ZooKeeper / K8s)    │
                    │   leader key + TTL      │
                    │   members + config      │
                    └────┬────────────────┬───┘
                         │                │
              ┌──────────┴──┐         ┌───┴─────────┐
              │  Patroni    │         │  Patroni    │
              │  daemon     │         │  daemon     │
              │  REST :8008 │         │  REST :8008 │
              │      │      │         │      │      │
              │  postgres   │ ──────► │  postgres   │
              │  (primary)  │   WAL   │  (replica)  │
              └──────┬──────┘         └──────┬──────┘
                     │                       │
                  watchdog                watchdog
              /dev/watchdog            /dev/watchdog

Five responsibilities of each Patroni daemon:

  1. Hold or chase the leader lock. Leader writes its lock to DCS every loop_wait seconds with TTL. If DCS rejects (network split, lock expired), leader demotes immediately. Replicas race to acquire on next loop.
  2. Manage PG process. pg_ctl start/stop/reload/restart driven by Patroni state changes. postgresql.conf rendered from patroni.yml + dynamic config at startup.
  3. Configure replication. Sets primary_conninfo (replicas), synchronous_standby_names (primary), creates standby.signal and recovery.signal files (PG12+).
  4. Reset watchdog. Kernel watchdog must be petted every loop. Failure → kernel reboots the host.
  5. Serve REST API. Health probes for load balancers + control endpoints for patronictl.

Configuration Precedence

Verbatim from patroni_configuration.html: "Patroni configuration is stored in three places" with precedence (highest first):

  1. Local configpatroni.yml on each node (DCS connection, REST API address, PostgreSQL data_dir, authentication, tags).
  2. Dynamic config — stored in DCS, edited via patronictl edit-config or PATCH /config (cluster-wide: loop_wait, ttl, retry_timeout, synchronous_mode, postgresql.parameters, postgresql.use_pg_rewind).
  3. Environment variablesPATRONI_* overrides for containerized deployments.

Operational rule: Configuration that must be identical cluster-wide (PG parameters affecting replication, sync mode, timeouts) belongs in dynamic config. Configuration that's node-local (DCS endpoint, data_dir path, listen address, tags) belongs in patroni.yml.

Static patroni.yml

Minimal three-node cluster configuration (skeleton):

scope: postgres-prod
name: pg-node-1
namespace: /service/

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.0.11:8008
  authentication:
    username: patroni
    password: <REDACTED>

etcd3:
  hosts: etcd1:2379,etcd2:2379,etcd3:2379
  protocol: https
  cacert: /etc/patroni/etcd-ca.crt

bootstrap:
  dcs:
    loop_wait: 10
    retry_timeout: 10
    ttl: 30
    maximum_lag_on_failover: 1048576
    synchronous_mode: true
    synchronous_node_count: 1
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: 'on'
        wal_log_hints: 'on'
        max_wal_senders: 10
        max_replication_slots: 10
        wal_keep_size: 1024MB
        shared_buffers: 8GB
        synchronous_commit: 'on'
  initdb:
    - encoding: UTF8
    - data-checksums
  pg_hba:
    - hostssl replication replicator 10.0.0.0/24 scram-sha-256
    - hostssl all all 10.0.0.0/24 scram-sha-256
  users:
    admin:
      password: <REDACTED>
      options: [createrole, createdb]

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.0.11:5432
  data_dir: /var/lib/postgresql/16/main
  bin_dir: /usr/lib/postgresql/16/bin
  pgpass: /var/lib/postgresql/.pgpass
  authentication:
    replication:
      username: replicator
      password: <REDACTED>
    superuser:
      username: postgres
      password: <REDACTED>
    rewind:
      username: rewind_user
      password: <REDACTED>
  parameters:
    unix_socket_directories: /var/run/postgresql
  pg_rewind:
    username: rewind_user
    password: <REDACTED>

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

log:
  level: INFO
  dir: /var/log/patroni
  file_size: 25000000
  file_num: 4

Eight key blocks:

BlockPurposeWhere it lives
scope + name + namespaceCluster identity in DCSStatic
restapiREST API listen + authStatic
<dcs> (etcd3 / consul / zookeeper / kubernetes)DCS connectionStatic
bootstrap.dcsInitial dynamic config on first cluster createStatic (written to DCS on bootstrap)
bootstrap.initdb + bootstrap.pg_hba + bootstrap.usersFirst-time cluster initStatic
postgresqlPG process managementStatic
watchdogHardware/software watchdogStatic
tagsPer-node behavior modifiersStatic

[!WARNING] bootstrap.dcs is written to DCS only on initial cluster bootstrap After bootstrap, editing bootstrap.dcs in patroni.yml has no effect. Subsequent changes go through patronictl edit-config or PATCH /config. This is the single most common Patroni config trap.

Dynamic Configuration (DCS-stored)

Edited via patronictl edit-config (opens $EDITOR with current DCS YAML) or PATCH /config. Changes apply on every node's next HA loop tick (no restart required for most parameters).

Twelve dynamic settings catalog:

SettingDefaultPurpose
loop_wait10sHA loop interval — how often Patroni updates leader lock and checks state
ttl30sLeader lock TTL in DCS — after this, replicas race to acquire
retry_timeout10sDCS retry budget per loop iteration
maximum_lag_on_failover1MBReplica too far behind cannot be promoted automatically
master_start_timeout300sLeader has this long to come back before failover triggers
synchronous_modeoffEnable synchronous replication (on, quorum, off)
synchronous_node_count1How many sync standbys (with synchronous_mode: on)
synchronous_mode_strictfalseBlock writes if no sync standby available
failsafe_modefalsePrevents demoting leader on DCS outage if all healthy replicas vote yes
check_timelinetrueDon't promote a node on a divergent timeline
postgresql.use_pg_rewindfalseUse pg_rewind for re-attaching diverged former primary
postgresql.parameters.*(cluster-wide PG GUCs synced to every node's postgresql.conf)

Verbatim TTL formula from dynamic_configuration.html: "when changing values of loop_wait, retry_timeout, or ttl you have to follow the rule: loop_wait + 2 * retry_timeout <= ttl". Default values (10 + 2×10 = 30 ≤ 30) sit at the boundary — tighten loop_wait or retry_timeout without raising ttl and Patroni rejects the edit.

DCS Choices

DCSBest forNotes
etcd (v3 API)Default for self-managed PatroniLightweight, Raft consensus, easy to deploy 3 or 5 nodes. Use etcd3: block, NOT legacy etcd:
ConsulExisting Consul service-mesh deploymentsService discovery integration; pay attention to HashiCorp's BUSL license change (Aug 2023)
ZooKeeperExisting ZooKeeper deployments (Hadoop, Kafka shops)Mature, Java-based, heavier than etcd
KubernetesK8s deployments (Endpoints or ConfigMaps mode)No external DCS needed — uses K8s API server as DCS
ExhibitorLegacy ZooKeeper-via-Exhibitor deploymentsRare; ZooKeeper management layer

Sizing rules:

  • Always 3 or 5 DCS nodes (odd numbers for quorum). 1 = no HA. 2 = no quorum.
  • DCS nodes must fail independently from PG nodes. Co-locating DCS with PG defeats fencing.
  • DCS network partition tolerance: with 3 nodes, survives 1 failure. With 5, survives 2.

[!NOTE] PostgreSQL 14 etcd v2 deprecation Patroni still supports legacy etcd: block (etcd v2 API), but etcd project deprecated v2 in etcd 3.4 (Aug 2019). For new clusters, always use etcd3: block (etcd v3 gRPC API).

REST API

Patroni listens on restapi.listen (default :8008). Endpoints are categorized as health checks (for load balancers) and control (for patronictl and ops).

Health check endpoints (return HTTP 200 if condition met, 503 otherwise):

Endpoint200 means
GET /primaryNode is the cluster primary with leader lock
GET /leaderNode holds leader lock (primary OR standby-cluster leader)
GET /standby-leaderNode is the leader of a standby cluster
GET /replicaNode is a running replica (NOT primary), passes optional lag/tag filters
GET /read-onlyNode is primary OR running replica (anything readable)
GET /syncNode is a synchronous standby
GET /asyncNode is an asynchronous standby
GET /healthPostgreSQL is up and running on this node
GET /livenessPatroni HA loop is functioning (Patroni daemon alive)
GET /readinessNode is ready to serve traffic (Patroni + PG both up, lag within bounds)

Optional query parameters refine health checks:

  • ?lag=10MB/replica returns 503 if replication lag exceeds threshold
  • ?tag_<name>=<value> — filters by node tags (e.g., ?tag_nosync=false)

Control endpoints (POST/PATCH/DELETE, require REST API auth):

EndpointMethodPurpose
GET /clusterGETJSON dump of cluster topology + member state + DCS leader
GET /historyGETHistory of switchovers/failovers
GET /configGETCurrent dynamic config (YAML)
PATCH /configPATCHModify dynamic config
PUT /configPUTReplace dynamic config entirely
POST /failoverPOSTTrigger failover to a candidate (use when leader is unhealthy)
POST /switchoverPOSTTrigger switchover (use when leader is healthy and you want planned move)
DELETE /switchoverDELETECancel scheduled switchover
POST /restartPOSTRestart PostgreSQL on this node
DELETE /restartDELETECancel scheduled restart
POST /reinitializePOSTReinitialize PG data dir on a replica (rebuild from leader)
GET /metricsGETPrometheus-format metrics

Authentication: Health endpoints are unauthenticated by default. Control endpoints require restapi.authentication.{username,password} from patroni.yml. TLS via restapi.{certfile,keyfile,cafile,verify_client}.

Tags

Tags constrain per-node behavior. Set in patroni.yml under tags: block. Each tag verbatim from yaml_configuration.html:

TagVerbatim definitionDefault
nofailover"Controls whether this node is allowed to participate in the leader race and become a leader."false
clonefrom"If set to true other nodes might prefer to use this node for bootstrap."false
noloadbalance"If set to true the node will return HTTP Status Code 503 for the GET /replica REST API health-check and therefore will be excluded from the load-balancing."false
nosync"If set to true the node will never be selected as a synchronous replica."false
nostream"If set to true the node will not use replication protocol to stream WAL."false
replicatefrom"The name of another replica to replicate from. Used to support cascading replication."(unset)
failover_priority"Controls the priority this node should have during failover. Nodes with higher priority will be preferred."(integer; higher wins)

Operational patterns:

  • Reporting replica that should never become primary: nofailover: true, noloadbalance: false (still served by /replica health check for read traffic).
  • DR replica in a different region: nofailover: true, noloadbalance: true (excluded from both leader race and /replica LB pool).
  • Cascading replication for geographically distributed reads: replicatefrom: <hub_member_name> on each leaf.
  • Async-only replica during cluster upgrade: nosync: true (won't satisfy synchronous_standby_names).

[!WARNING] nostream: true is dangerous A node with nostream: true won't stream WAL — it relies entirely on restore_command (WAL archive). Useful for archive-only DR replicas. If the archive isn't healthy, the node falls arbitrarily behind. Never use nostream on a node that participates in failover.

Watchdog

Verbatim from watchdog.html: "Default Patroni configuration will try to use /dev/watchdog on Linux if it is accessible to Patroni."

Three watchdog modes:

ModeBehavior
offPatroni does not use a watchdog. Never use in production — hung daemon can produce split-brain.
automaticPatroni uses watchdog if /dev/watchdog is accessible; falls back to no-watchdog if not.
requiredPatroni refuses to start (or refuses to become leader) if watchdog activation fails. Production default.

Linux softdog module activation (verbatim): "To enable software watchdog issue the following commands as root before starting Patroni: modprobe softdog". For hardware watchdogs (iTCO, sp5100_tco, etc.), use the appropriate kernel module instead.

Safety margin (verbatim): "By default Patroni will set up the watchdog to expire 5 seconds before TTL expires."

Timeline interaction (verbatim): "With the default setup of loop_wait=10 and ttl=30 this gives HA loop at least 15 seconds (ttl - safety_margin - loop_wait) to complete."

The HA loop budget formula (with default values 10/30/5):

HA loop window = ttl - safety_margin - loop_wait
               = 30 - 5 - 10
               = 15 seconds

If the HA loop takes longer than this window (e.g., slow DCS, blocked Patroni daemon), the watchdog fires before the leader lock expires → kernel reboots host → other node promotes safely without split-brain. Without watchdog, the leader lock expires while a hung Patroni daemon could still have PG running locally.

Permissions: Patroni daemon must have write access to /dev/watchdog. Typically:

chown postgres:postgres /dev/watchdog

Or via udev rule in /etc/udev/rules.d/:

KERNEL=="watchdog", OWNER="postgres", GROUP="postgres", MODE="0600"

patronictl CLI

CLI wrapping the REST API. Reads patroni.yml for DCS endpoint, then operates on the cluster.

Twelve common commands:

CommandPurpose
patronictl listShow cluster topology (members, state, role, lag, leader lock)
patronictl topologyTree view (cascading replication visible)
patronictl historyPast switchovers/failovers
patronictl show-configCurrent dynamic config
patronictl edit-configOpen dynamic config in $EDITOR, write back to DCS on save
patronictl switchoverPlanned move of leader to a candidate (requires healthy leader)
patronictl failoverForced move (when leader is unhealthy)
patronictl restart <cluster> <member>Restart PG on a specific member
patronictl reinit <cluster> <member>Rebuild a replica's data dir from the leader
patronictl pauseStop automatic failover; manual operations still allowed
patronictl resumeRe-enable automatic failover
patronictl reloadReload patroni.yml (re-reads static config)

Config file lookup: patronictl -c /etc/patroni/patroni.yml ... or set PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml.

Failover vs Switchover

OperationWhenSource stateBehavior
switchoverPlanned (rolling upgrade, hardware swap)Leader must be healthyDemotes current leader cleanly → promotes candidate → demoted leader becomes replica. Zero data loss.
failoverUnplanned (leader unhealthy)Leader may be unhealthy/unreachableForced promotion of candidate. May produce divergence; old leader needs pg_rewind to re-attach.

Scheduled switchover example:

patronictl switchover \
  --master pg-node-1 \
  --candidate pg-node-2 \
  --scheduled '2026-05-13T22:00:00' \
  --force

--force skips interactive confirmation. Useful in CI/CD pipelines.

Manual failover example (leader DCS-unreachable):

patronictl failover --candidate pg-node-2 --force

Patroni verifies candidate is healthy + on the current timeline + within maximum_lag_on_failover, then promotes.

Pause Mode

Verbatim from pause.html: "the member key in DCS is updated with the current information about the cluster" but "it does not change the state of PostgreSQL". Additionally: "Manual unscheduled restart, manual unscheduled failover/switchover and reinitialize are allowed" and "No scheduled action is allowed".

Effect summary:

ActionIn normal modeIn paused mode
Automatic failover on leader expiryYESBlocked
Automatic synchronous standby promotionYESBlocked
Manual patronictl switchoverYESYES
Manual patronictl failoverYESYES
Manual patronictl restartYESYES
Manual patronictl reinitYESYES
Scheduled switchover/restartYESBlocked
Demoting a primary without leader lockYES (auto-corrects)Blocked
Updating member key in DCSYESYES (cluster state still visible)

Use case: Take cluster out of automatic-failover behavior during maintenance window. Stop Patroni daemon → leader lock expires → automatic failover triggers. Pause mode → daemon keeps running → DCS state stays current → no automatic action.

[!WARNING] Resuming pause forgets nothing patronictl resume re-enables automatic failover immediately. If a leader has died during the pause, Patroni will failover on the very next HA loop. Confirm leader health before resuming.

Bootstrap + Replica Creation

Two replica-creation methods, in order of preference:

  1. basebackup (default) — Patroni runs pg_basebackup -h <leader> -U replicator -D <data_dir> -X stream -P -R. WAL streamed during base copy, replica starts immediately. Works for clusters up to a few TB.
  2. wal-e / wal-g / pgBackRest — Restore base backup from object storage, then catch up via WAL archive. Required for clusters > 1 TB or when network bandwidth to leader is constrained.

Configuration block:

postgresql:
  create_replica_methods:
    - pgbackrest
    - basebackup
  basebackup:
    - max-rate: '100M'
    - checkpoint: fast
  pgbackrest:
    command: /usr/bin/pgbackrest --stanza=prod --delta restore
    keep_data: true
    no_params: true

Patroni tries methods in listed order. keep_data: true preserves existing data dir contents (useful for delta restore).

Bootstrap from existing PG cluster (not first-time install): see existing_data.html — set bootstrap.method: existing and Patroni adopts the running cluster.

Standby Cluster

A standby cluster is a cluster whose leader streams from another cluster's leader. Used for cross-region DR.

Configuration block (on the standby cluster):

bootstrap:
  dcs:
    standby_cluster:
      host: pg-primary-region-a.example.com
      port: 5432
      primary_slot_name: standby_region_b
      create_replica_methods:
        - basebackup

The standby cluster has its own DCS, its own leader election among its members. Its leader (the "standby leader") streams from the primary cluster's leader and cascades WAL to its replicas.

Promotion to independent cluster: Remove the standby_cluster: block via patronictl edit-config. The standby leader breaks the upstream connection and becomes an independent primary.

Kubernetes Integration

Patroni-on-K8s replaces external DCS with the Kubernetes API server. Two modes:

ModeStorageNotes
kubernetes.use_endpoints: trueK8s Endpoints objectDefault; integrates with K8s Service for read/write traffic routing
kubernetes.use_endpoints: falseK8s ConfigMapsLegacy; doesn't integrate with Service

For most K8s deployments, prefer a dedicated operator (CloudNativePG, Zalando postgres-operator, Crunchy PGO) — see 78-ha-architectures.md and 92-kubernetes-operators.md. Standalone Patroni-in-K8s is harder to maintain.

Citus Integration

[!NOTE] PostgreSQL 14 Patroni 3.0+ Patroni 3.0 (Feb 2023) added native Citus support. Configure via citus: block in patroni.yml. Patroni manages worker registration with the coordinator + group_id assignment + failover of individual worker shards. Without native support, Citus clusters required separate Patroni-per-node + manual coordinator updates.

Citus configuration block (on each Citus node):

citus:
  group: 0      # 0 = coordinator group; 1, 2, ... = worker groups
  database: citus

Worker group N is replicated within itself; the coordinator group registers workers via citus_add_node. Patroni handles re-registration on failover.

Per-Version Timeline

Patroni release cadence is independent of PG releases. Verified at planning time: latest is v4.1.3 (2026-05-05).

Patroni versionReleasedKey items
2.02020-09Standby clusters, REST API improvements
2.12021-09failover_priority, scheduled switchovers
3.02023-02Citus support, REST API auth required by default for control endpoints
3.12023-08Kubernetes Endpoints v1 (replaces v1beta1)
3.22024-01failsafe_mode for DCS outage handling
3.32024-08Quorum-based synchronous replication (synchronous_mode: quorum)
4.02025-05Drops Python 3.6/3.7 support; PG 17 support
4.12025-12PG 18 support; etcd v2 deprecated; improved monotonic-clock handling
4.1.32026-05Latest at planning time

Patroni PG-version-support rule: Patroni N typically supports the last 5 PG majors. For PG 18, use Patroni 4.1+. For PG 14, Patroni 3.x and 4.x both work.

Recipes

1. Three-node Patroni cluster on etcd (minimum production deployment)

DCS first (separate hosts), then Patroni on each PG host. Skeleton above is the canonical baseline. Cross-reference 78-ha-architectures.md Recipe 2 for the etcd setup itself.

2. Add a fourth node as reporting replica (never becomes leader)

In its patroni.yml:

tags:
  nofailover: true
  noloadbalance: false
  clonefrom: false
  nosync: true

nofailover removes it from leader race. nosync prevents it from being selected as synchronous standby (so its lag doesn't block writes). noloadbalance: false keeps it in the /replica health-check pool for read traffic.

3. Cascading replication via replicatefrom

Hub-and-spoke pattern for geographic distribution:

# In pg-node-eu-west-1.patroni.yml
tags:
  replicatefrom: pg-node-us-east-1

pg-node-eu-west-1 streams from pg-node-us-east-1 instead of from the primary directly. Tag survives across primary promotions — Patroni resolves the chain dynamically.

4. Controlled switchover for planned maintenance

patronictl -c /etc/patroni/patroni.yml list
# Verify pg-node-1 is leader, all replicas in sync, no lag

patronictl switchover --master pg-node-1 --candidate pg-node-2 --force

# Verify new state
patronictl list

# Now pg-node-1 is a replica; perform maintenance on it
systemctl stop patroni
# ... maintenance ...
systemctl start patroni
# pg-node-1 catches up as replica

5. Disable failover during maintenance via pause mode

patronictl pause --wait
# ... maintenance work, manual operations allowed ...
patronictl resume

--wait blocks until pause is fully applied across the cluster.

6. HAProxy read/write split using Patroni REST API health checks

/etc/haproxy/haproxy.cfg:

listen postgres-rw
    bind *:5000
    mode tcp
    option httpchk GET /primary
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg-node-1 10.0.0.11:5432 check port 8008
    server pg-node-2 10.0.0.12:5432 check port 8008
    server pg-node-3 10.0.0.13:5432 check port 8008

listen postgres-ro
    bind *:5001
    mode tcp
    option httpchk GET /replica
    http-check expect status 200
    balance roundrobin
    default-server inter 3s fall 3 rise 2
    server pg-node-1 10.0.0.11:5432 check port 8008
    server pg-node-2 10.0.0.12:5432 check port 8008
    server pg-node-3 10.0.0.13:5432 check port 8008

Writes go to :5000 → HAProxy routes only to the node returning 200 on /primary. Reads go to :5001 → balanced across nodes returning 200 on /replica.

[!WARNING] HAProxy TCP-only checks are wrong option tcp-check cannot tell primary from replica. Both accept TCP. Always use option httpchk GET /primary or GET /replica against port 8008.

7. Enable synchronous replication with quorum

patronictl edit-config

Edit:

synchronous_mode: quorum
synchronous_node_count: 1
postgresql:
  parameters:
    synchronous_commit: 'remote_apply'

Save. Patroni rewrites synchronous_standby_names on the primary to ANY 1 (<list of replicas>). Any 1 of N replicas must ack before COMMIT returns. Cross-reference 73-streaming-replication.md for the underlying synchronous_standby_names mechanics.

8. Rebuild a replica that has fallen behind beyond wal_keep_size

patronictl reinit postgres-prod pg-node-3 --force --wait

Patroni stops PG on pg-node-3, runs pg_basebackup from the current leader, restarts PG. --wait blocks until reinit completes.

9. Detect a Patroni/PG state mismatch

Symptom: patronictl list shows pg-node-1 as Leader but psql -h pg-node-1 -c 'SELECT pg_is_in_recovery()' returns t.

Cause: Patroni daemon crashed mid-promotion. DCS shows the new state; PG never finished promotion.

Fix:

systemctl restart patroni
# Patroni reconciles state on next HA loop

If that fails, manually demote in DCS:

patronictl failover --candidate pg-node-2 --force
# pg-node-2 becomes leader; pg-node-1 will pg_rewind to re-attach

10. Detect split-brain (two nodes both think they're leader)

This should be impossible with watchdog enabled. If observed:

patronictl list
# Both pg-node-1 and pg-node-2 marked Leader → split-brain

Recovery (manual, do not delay):

  1. Identify which node has the leader lock in DCS:

    etcdctl get /service/postgres-prod/leader
  2. Stop the OTHER node immediately:

    ssh pg-node-2 systemctl stop patroni postgresql
  3. On the stopped node, use pg_rewind to re-attach to the real leader (cross-reference 89-pg-rewind.md).

  4. Start Patroni on the stopped node — it will rejoin as replica.

  5. Post-mortem: Why didn't watchdog fire? Check dmesg, kernel logs, journalctl -u patroni. Set watchdog.mode: required everywhere.

11. Audit Patroni configuration drift across nodes

for node in pg-node-1 pg-node-2 pg-node-3; do
  echo "=== $node ==="
  ssh $node 'patronictl show-config | sha256sum'
done

All nodes should produce the same hash (dynamic config is cluster-wide via DCS). Drift means a node's patroni.yml overrides DCS — investigate.

12. Migrate an existing standalone PG cluster into Patroni

In patroni.yml:

bootstrap:
  method: existing

Stop the standalone PG, start Patroni. Patroni detects the existing data dir and adopts it as the leader. Add replicas via basebackup or pgbackrest. Verbatim documentation: existing_data.html.

13. Read Prometheus metrics from Patroni

curl -s http://10.0.0.11:8008/metrics | grep '^patroni_'

Returns:

patroni_master 0
patroni_primary 0
patroni_xlog_received_location 1.234567e+10
patroni_xlog_replayed_location 1.234567e+10
patroni_xlog_replayed_timestamp ...
patroni_postgres_running 1
patroni_postmaster_start_time ...
patroni_cluster_unlocked 0
patroni_failsafe_mode_is_active 0
patroni_sync_standby 1

Wire /metrics into Prometheus + alert on patroni_cluster_unlocked == 1 (no leader for > N seconds), patroni_xlog_replayed_location lag > threshold.

Gotchas

  1. bootstrap.dcs is written to DCS only on first bootstrap. Editing it later has no effect. Use patronictl edit-config instead.
  2. SETTINGS.html does not exist. Settings reference lives in yaml_configuration.html.
  3. watchdog.mode: off in production is unsafe. A hung Patroni daemon → DCS TTL expiry → another node promotes → original PG still accepts writes locally. Watchdog forces kernel reboot. Always required or automatic.
  4. loop_wait + 2 * retry_timeout > ttl is rejected. Verbatim rule. Default 10+20≤30 sits at the boundary.
  5. DCS quorum must live on hosts separate from PG. Co-locating DCS with PG defeats the fencing model — a partition that splits PG also splits DCS the same way.
  6. 2-node Patroni clusters cannot achieve quorum. Always 3 nodes minimum (or 3-node DCS quorum + 2 PG nodes — but then DCS quorum survives 1 failure, PG cluster doesn't).
  7. synchronous_mode: on with 1 standby blocks writes if that standby is down. Use synchronous_node_count: 1 + synchronous_mode_strict: false to allow degradation to async. Or quorum with N>=2.
  8. maximum_lag_on_failover: 1MB (default) means lagging replicas can't be promoted. A reporting replica with nosync: true that falls behind can't be a failover candidate — by design. Tune per workload.
  9. Patroni manages postgresql.conf — your manual edits are overwritten. All PG GUCs go through postgresql.parameters in dynamic config.
  10. replicatefrom is a hint, not a hard constraint. If the named upstream is unavailable, Patroni falls back to the primary. Tag survives across promotions.
  11. patronictl pause does not stop the daemon. Daemon keeps DCS state current; only blocks automatic actions. Stopping the daemon would expire the leader lock and trigger failover.
  12. nostream: true makes the node depend entirely on restore_command. Without a healthy WAL archive, the node falls arbitrarily behind. Never use on failover candidates.
  13. promote_trigger_file was removed in PG16. Patroni uses pg_ctl promote / pg_promote() directly. Cross-reference 77-standby-failover.md gotcha #1.
  14. Patroni REST API auth defaults differ across versions. Patroni 3.0+ requires auth on control endpoints by default; older versions did not. Always set restapi.authentication.{username,password}.
  15. etcd v2 API is deprecated. Use etcd3: block, not legacy etcd: block. etcd project itself deprecated v2 in etcd 3.4.
  16. K8s Endpoints mode requires PG service to use clusterIP (not LoadBalancer). Endpoints-based DCS works by editing the Endpoints object of an existing Service; LoadBalancer services don't allow this manipulation.
  17. patronictl reinit blocks the node for the duration of pg_basebackup. For large clusters, can take hours. Use pgbackrest create_replica_method instead.
  18. synchronous_mode_strict: true will refuse writes if no sync standby is available. Trade availability for durability — pick deliberately.
  19. HAProxy option tcp-check is wrong for Patroni. It can't distinguish primary from replica. Always option httpchk GET /primary or GET /replica.
  20. failover and switchover are not aliases. switchover requires healthy leader; failover does not. Picking the wrong one in a runbook can fail when you need it most.
  21. Patroni won't start two leaders even without watchdog — but it can't prevent it after a crash. The protection is: at start, refuse to be leader without grabbing the lock. After a SIGKILL or kernel hang, that protection is gone. Watchdog is the backstop.
  22. patronictl edit-config opens YAML in your $EDITOR. Saving an invalid YAML produces an error message and discards the edit. Always test with patronictl show-config after.
  23. Patroni 4.0 dropped Python 3.6 and 3.7 support. RHEL 7 / Ubuntu 18.04 default Python is too old. Plan upgrade path before upgrading Patroni.

See Also

Sources

references

01-syntax-ddl.md

02-syntax-dql.md

03-syntax-dml.md

04-ctes.md

05-views.md

06-functions.md

07-procedures.md

08-plpgsql.md

09-procedural-languages.md

10-dynamic-sql.md

11-window-functions.md

12-aggregates-grouping.md

13-cursors-and-prepares.md

14-data-types-builtin.md

15-data-types-custom.md

16-arrays.md

17-json-jsonb.md

18-uuid-numeric-money.md

19-timestamp-timezones.md

20-text-search.md

21-hstore.md

22-indexes-overview.md

23-btree-indexes.md

24-gin-gist-indexes.md

25-brin-hash-spgist-bloom-indexes.md

26-index-maintenance.md

27-mvcc-internals.md

28-vacuum-autovacuum.md

29-transaction-id-wraparound.md

30-hot-updates.md

31-toast.md

32-buffer-manager.md

33-wal.md

34-checkpoints-bgwriter.md

35-partitioning.md

36-inheritance.md

37-constraints.md

38-foreign-keys-deep.md

39-triggers.md

40-event-triggers.md

41-transactions.md

42-isolation-levels.md

43-locking.md

44-advisory-locks.md

45-listen-notify.md

46-roles-privileges.md

47-row-level-security.md

48-authentication-pg-hba.md

49-tls-ssl.md

50-encryption-pgcrypto.md

51-pgaudit.md

52-rules-system.md

53-server-configuration.md

54-memory-tuning.md

55-statistics-planner.md

56-explain.md

57-pg-stat-statements.md

58-performance-diagnostics.md

59-planner-tuning.md

60-parallel-query.md

61-jit-compilation.md

62-tablespaces.md

63-internals-architecture.md

64-system-catalogs.md

65-collations-encoding.md

66-bulk-operations-copy.md

67-cli-tools.md

68-pgbench.md

69-extensions.md

70-fdw.md

71-large-objects.md

72-extension-development.md

73-streaming-replication.md

74-logical-replication.md

75-replication-slots.md

76-logical-decoding.md

77-standby-failover.md

78-ha-architectures.md

79-patroni.md

80-connection-pooling.md

81-pgbouncer.md

82-monitoring.md

83-backup-pg-dump.md

84-backup-physical-pitr.md

85-backup-tools.md

86-pg-upgrade.md

87-major-version-upgrade.md

88-corruption-recovery.md

89-pg-rewind.md

90-disaster-recovery.md

91-docker-postgres.md

92-kubernetes-operators.md

93-pg-trgm.md

94-pgvector.md

95-postgis.md

96-timescaledb.md

97-citus.md

98-pg-cron.md

99-pg-partman.md

100-pg-versions-features.md

101-managed-vs-baremetal.md

102-skill-cookbook.md

SKILL.md

tile.json