Guides initial CockroachDB cluster provisioning and production deployment. Self-Hosted covers cockroach start/init, Kubernetes deployment (Operator, Helm), hardware sizing, and production configuration. Advanced/BYOC covers Cloud Console, API, and Terraform provisioning with production settings. Standard covers cluster creation and provisioned compute selection. Basic covers cluster creation and spending limits. Use when creating a new cluster, preparing for production go-live, or validating deployment configuration.
94
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Guides CockroachDB cluster creation and production deployment configuration. Before providing procedures, this skill gathers context to deliver tier-appropriate provisioning steps and production hardening guidance.
For post-deployment health checks: Use reviewing-cluster-health. For ongoing settings management: Use managing-cluster-settings. For capacity changes after deployment: Use managing-cluster-capacity.
| Question | Options | Why It Matters |
|---|---|---|
| Deployment tier? | Self-Hosted, Advanced, BYOC, Standard, Basic | Completely different provisioning procedures |
| Environment? | Production, Staging, Development | Determines hardware sizing and configuration rigor |
If Self-Hosted:
| Question | Options | Why It Matters |
|---|---|---|
| Platform? | Bare metal, VMs (AWS/GCP/Azure), Kubernetes | Changes installation and start commands |
| If Kubernetes? | Operator (recommended), Helm, Manual StatefulSet | Determines deployment method |
| Node count? | 3 (minimum), 5, 9+ | Affects topology and replication |
| Multi-region? | Yes (how many regions), No | Requires locality flags and topology planning |
| Expected workload? | OLTP, mixed OLTP/analytics, write-heavy | Affects hardware sizing |
| Security requirements? | TLS required, encryption at rest, CMEK | Determines certificate and encryption setup |
If Advanced or BYOC:
| Question | Options | Why It Matters |
|---|---|---|
| Provisioning method? | Cloud Console, Cloud API, Terraform | Determines procedure |
| Cloud provider? | AWS, GCP, Azure | Affects region selection and networking |
| Node count and size? | e.g., 3 nodes x 8 vCPUs | Determines initial capacity |
If Standard: Gather expected workload size (vCPUs) and storage estimate.
If Basic: Gather expected usage pattern and monthly budget.
| Tier | Go To |
|---|---|
| Self-Hosted | Self-Hosted Provisioning |
| Advanced | Advanced Provisioning |
| BYOC | BYOC Provisioning |
| Standard | Standard Provisioning |
| Basic | Basic Provisioning |
Applies when: Tier = Self-Hosted
| Component | Minimum | Production Recommended |
|---|---|---|
| Nodes | 3 | 3+ (odd number per failure domain) |
| CPU | 4 vCPUs (non-burstable) | 8+ vCPUs |
| RAM | 16 GB | 32+ GB |
| Storage | 150 GB SSD | 500+ GB NVMe SSD |
| Network | 1 Gbps | 10 Gbps |
Memory formula: --cache + --max-sql-memory <= 75% of total RAM
Recommended: --cache=.25 --max-sql-memory=.25
Never use: burstable instances, HDDs, network-attached HDD, shared CPU.
See hardware-and-infrastructure reference for cloud instance recommendations.
Step 1: Install CockroachDB on each node
curl https://binaries.cockroachdb.com/cockroach-v<version>.linux-amd64.tgz | tar -xz
cp cockroach-v<version>.linux-amd64/cockroach /usr/local/bin/Step 2: Generate certificates
cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-node <node-hostname> <node-ip> localhost 127.0.0.1 \
--certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.keyStep 3: Start nodes (repeat on each node)
cockroach start \
--certs-dir=certs \
--store=path=<store-path> \
--listen-addr=<node-address>:26257 \
--http-addr=<node-address>:8080 \
--join=<node1-address>,<node2-address>,<node3-address> \
--locality=region=<region>,zone=<zone> \
--cache=.25 \
--max-sql-memory=.25 \
--backgroundStep 4: Initialize cluster (once, from any node)
cockroach init --certs-dir=certs --host=<any-node-address>Step 5: Verify
SELECT node_id, address, locality, build_tag, is_live
FROM crdb_internal.gossip_nodes ORDER BY node_id;Operator (recommended):
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/operator.yaml
# Apply CrdbCluster manifest with node count, resources, and storageHelm:
helm repo add cockroachdb https://charts.cockroachdb.com/
helm install cockroachdb cockroachdb/cockroachdb \
--set statefulset.replicas=3 \
--set storage.persistentVolume.size=100GiAfter cluster is running, apply production settings:
-- Enable critical features
SET CLUSTER SETTING kv.rangefeed.enabled = true;
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = true;
SET CLUSTER SETTING admission.kv.enabled = true;
-- Set timeouts
SET CLUSTER SETTING sql.defaults.idle_in_transaction_session_timeout = '300s';
SET CLUSTER SETTING sql.defaults.statement_timeout = '30s';
-- Install enterprise license (if applicable)
SET CLUSTER SETTING cluster.organization = '<org-name>';
SET CLUSTER SETTING enterprise.license = '<license-key>';Create ballast files on each node:
cockroach debug ballast <store-path>/auxiliary/EMERGENCY_BALLAST --size=1GiBConfigure load balancer: Point to all nodes with health check on /health?ready=1.
See production-deployment-checklist reference for the full go-live checklist.
Applies when: Tier = Advanced
curl -X POST -H "Authorization: Bearer $COCKROACH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "<cluster-name>",
"provider": "AWS",
"spec": {
"dedicated": {
"region_nodes": {"us-east-1": 3},
"machine_type": "m6i.xlarge",
"storage_gib": 150
}
}
}' \
"https://cockroachlabs.cloud/api/v1/clusters"resource "cockroach_cluster" "production" {
name = "production"
cloud_provider = "AWS"
dedicated {
num_virtual_cpus = 8
storage_gib = 150
num_nodes = 3
}
regions = [{
name = "us-east-1"
}]
}Applies when: Tier = BYOC
Follow Advanced Provisioning steps — BYOC uses the same Cloud Console, API, and Terraform interfaces.
Additional BYOC steps:
Applies when: Tier = Standard
Post-provisioning:
ALTER ROLE ALL SET statement_timeout = '30s';
ALTER ROLE ALL SET idle_in_transaction_session_timeout = '300s';Applies when: Tier = Basic
Post-provisioning:
| Operation | Tier | Risk |
|---|---|---|
cockroach init | SH | Safe — only runs once; subsequent calls are no-ops |
| Certificate generation | SH | Store CA key securely — loss means no new certs |
| Cloud cluster creation | ADV/BYOC/STD/BAS | Safe — can be deleted if misconfigured |
| Production settings changes | SH | See managing-cluster-settings |
Critical (Self-Hosted):
--insecure in production — always use TLS--locality flags for multi-node clusters--cache and --max-sql-memory (defaults are too low)| Issue | Tier | Fix |
|---|---|---|
cockroach init fails | SH | Check all nodes are started and reachable on port 26257 |
| Node won't join cluster | SH | Verify --join addresses; check firewall rules for ports 26257, 8080 |
| "clock offset" error | SH | Sync clocks with NTP; check --max-offset setting |
| TLS handshake failure | SH | Verify certs match; check CA is the same across all nodes |
| Cloud cluster stuck in "Creating" | ADV/BYOC | Wait 15 min; contact support if no progress |
| Cannot connect after creation | ALL | Check IP allowlist; verify connection string; try with root user |
Skill references:
Related skills:
Official CockroachDB Documentation:
84bc1e4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.