CtrlK
BlogDocsLog inGet started
Tessl Logo

provisioning-cluster-for-production

Guides initial CockroachDB cluster provisioning and production deployment. Self-Hosted covers cockroach start/init, Kubernetes deployment (Operator, Helm), hardware sizing, and production configuration. Advanced/BYOC covers Cloud Console, API, and Terraform provisioning with production settings. Standard covers cluster creation and provisioned compute selection. Basic covers cluster creation and spending limits. Use when creating a new cluster, preparing for production go-live, or validating deployment configuration.

94

Quality

92%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Provisioning Cluster for Production

Guides CockroachDB cluster creation and production deployment configuration. Before providing procedures, this skill gathers context to deliver tier-appropriate provisioning steps and production hardening guidance.

When to Use This Skill

  • Creating a new CockroachDB cluster
  • Preparing a development/staging cluster for production go-live
  • Validating hardware and configuration for production readiness
  • Choosing the right deployment tier and sizing

For post-deployment health checks: Use reviewing-cluster-health. For ongoing settings management: Use managing-cluster-settings. For capacity changes after deployment: Use managing-cluster-capacity.


Step 1: Gather Context

Required Context

QuestionOptionsWhy It Matters
Deployment tier?Self-Hosted, Advanced, BYOC, Standard, BasicCompletely different provisioning procedures
Environment?Production, Staging, DevelopmentDetermines hardware sizing and configuration rigor

Additional Context (by tier)

If Self-Hosted:

QuestionOptionsWhy It Matters
Platform?Bare metal, VMs (AWS/GCP/Azure), KubernetesChanges installation and start commands
If Kubernetes?Operator (recommended), Helm, Manual StatefulSetDetermines deployment method
Node count?3 (minimum), 5, 9+Affects topology and replication
Multi-region?Yes (how many regions), NoRequires locality flags and topology planning
Expected workload?OLTP, mixed OLTP/analytics, write-heavyAffects hardware sizing
Security requirements?TLS required, encryption at rest, CMEKDetermines certificate and encryption setup

If Advanced or BYOC:

QuestionOptionsWhy It Matters
Provisioning method?Cloud Console, Cloud API, TerraformDetermines procedure
Cloud provider?AWS, GCP, AzureAffects region selection and networking
Node count and size?e.g., 3 nodes x 8 vCPUsDetermines initial capacity

If Standard: Gather expected workload size (vCPUs) and storage estimate.

If Basic: Gather expected usage pattern and monthly budget.

Context-Driven Routing

TierGo To
Self-HostedSelf-Hosted Provisioning
AdvancedAdvanced Provisioning
BYOCBYOC Provisioning
StandardStandard Provisioning
BasicBasic Provisioning

Self-Hosted Provisioning

Applies when: Tier = Self-Hosted

Hardware Sizing

ComponentMinimumProduction Recommended
Nodes33+ (odd number per failure domain)
CPU4 vCPUs (non-burstable)8+ vCPUs
RAM16 GB32+ GB
Storage150 GB SSD500+ GB NVMe SSD
Network1 Gbps10 Gbps

Memory formula: --cache + --max-sql-memory <= 75% of total RAM Recommended: --cache=.25 --max-sql-memory=.25

Never use: burstable instances, HDDs, network-attached HDD, shared CPU.

See hardware-and-infrastructure reference for cloud instance recommendations.

Deploy on VMs / Bare Metal

Step 1: Install CockroachDB on each node

curl https://binaries.cockroachdb.com/cockroach-v<version>.linux-amd64.tgz | tar -xz
cp cockroach-v<version>.linux-amd64/cockroach /usr/local/bin/

Step 2: Generate certificates

cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-node <node-hostname> <node-ip> localhost 127.0.0.1 \
  --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key

Step 3: Start nodes (repeat on each node)

cockroach start \
  --certs-dir=certs \
  --store=path=<store-path> \
  --listen-addr=<node-address>:26257 \
  --http-addr=<node-address>:8080 \
  --join=<node1-address>,<node2-address>,<node3-address> \
  --locality=region=<region>,zone=<zone> \
  --cache=.25 \
  --max-sql-memory=.25 \
  --background

Step 4: Initialize cluster (once, from any node)

cockroach init --certs-dir=certs --host=<any-node-address>

Step 5: Verify

SELECT node_id, address, locality, build_tag, is_live
FROM crdb_internal.gossip_nodes ORDER BY node_id;

Deploy on Kubernetes

Operator (recommended):

kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/operator.yaml
# Apply CrdbCluster manifest with node count, resources, and storage

Helm:

helm repo add cockroachdb https://charts.cockroachdb.com/
helm install cockroachdb cockroachdb/cockroachdb \
  --set statefulset.replicas=3 \
  --set storage.persistentVolume.size=100Gi

Production Configuration (Self-Hosted)

After cluster is running, apply production settings:

-- Enable critical features
SET CLUSTER SETTING kv.rangefeed.enabled = true;
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = true;
SET CLUSTER SETTING admission.kv.enabled = true;

-- Set timeouts
SET CLUSTER SETTING sql.defaults.idle_in_transaction_session_timeout = '300s';
SET CLUSTER SETTING sql.defaults.statement_timeout = '30s';

-- Install enterprise license (if applicable)
SET CLUSTER SETTING cluster.organization = '<org-name>';
SET CLUSTER SETTING enterprise.license = '<license-key>';

Create ballast files on each node:

cockroach debug ballast <store-path>/auxiliary/EMERGENCY_BALLAST --size=1GiB

Configure load balancer: Point to all nodes with health check on /health?ready=1.

See production-deployment-checklist reference for the full go-live checklist.


Advanced Provisioning

Applies when: Tier = Advanced

Via Cloud Console

  1. cockroachlabs.cloud → Create Cluster
  2. Select Advanced plan
  3. Choose cloud provider (AWS, GCP, Azure)
  4. Select region(s)
  5. Configure node count (minimum 3) and machine size (vCPUs per node)
  6. Configure storage
  7. Review and create

Via Cloud API

curl -X POST -H "Authorization: Bearer $COCKROACH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "<cluster-name>",
    "provider": "AWS",
    "spec": {
      "dedicated": {
        "region_nodes": {"us-east-1": 3},
        "machine_type": "m6i.xlarge",
        "storage_gib": 150
      }
    }
  }' \
  "https://cockroachlabs.cloud/api/v1/clusters"

Via Terraform

resource "cockroach_cluster" "production" {
  name           = "production"
  cloud_provider = "AWS"

  dedicated {
    num_virtual_cpus = 8
    storage_gib      = 150
    num_nodes        = 3
  }

  regions = [{
    name = "us-east-1"
  }]
}

Post-Provisioning

  • Configure IP allowlists or VPC Peering/PrivateLink
  • Create SQL users and databases
  • Set maintenance window (see performing-cluster-maintenance)
  • Configure metrics export to Datadog/Prometheus if needed

BYOC Provisioning

Applies when: Tier = BYOC

Follow Advanced Provisioning steps — BYOC uses the same Cloud Console, API, and Terraform interfaces.

Additional BYOC steps:

  • Ensure your cloud account meets CRL prerequisites (service account, VPC, IAM roles)
  • Configure PrivateLink/PSC for private connectivity
  • Verify CRL service account permissions

Standard Provisioning

Applies when: Tier = Standard

  1. cockroachlabs.cloud → Create Cluster
  2. Select Standard plan
  3. Choose cloud provider and region
  4. Set provisioned compute (vCPUs) based on expected workload
  5. Create

Post-provisioning:

  • Create SQL users and databases
  • Configure IP allowlists
  • Set session-level defaults:
    ALTER ROLE ALL SET statement_timeout = '30s';
    ALTER ROLE ALL SET idle_in_transaction_session_timeout = '300s';

Basic Provisioning

Applies when: Tier = Basic

  1. cockroachlabs.cloud → Create Cluster
  2. Select Basic plan
  3. Choose cloud provider and region
  4. Create (auto-scales, no sizing needed)

Post-provisioning:

  • Set spending limits (Cloud Console → Cluster → Settings)
  • Create SQL users and databases
  • Configure IP allowlists

Safety Considerations

OperationTierRisk
cockroach initSHSafe — only runs once; subsequent calls are no-ops
Certificate generationSHStore CA key securely — loss means no new certs
Cloud cluster creationADV/BYOC/STD/BASSafe — can be deleted if misconfigured
Production settings changesSHSee managing-cluster-settings

Critical (Self-Hosted):

  • Never use --insecure in production — always use TLS
  • Never use burstable instances for production workloads
  • Always set --locality flags for multi-node clusters
  • Always configure --cache and --max-sql-memory (defaults are too low)
  • Always create ballast files before going to production

Troubleshooting

IssueTierFix
cockroach init failsSHCheck all nodes are started and reachable on port 26257
Node won't join clusterSHVerify --join addresses; check firewall rules for ports 26257, 8080
"clock offset" errorSHSync clocks with NTP; check --max-offset setting
TLS handshake failureSHVerify certs match; check CA is the same across all nodes
Cloud cluster stuck in "Creating"ADV/BYOCWait 15 min; contact support if no progress
Cannot connect after creationALLCheck IP allowlist; verify connection string; try with root user

References

Skill references:

Related skills:

Official CockroachDB Documentation:

Repository
cockroachlabs/cockroachdb-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.