Diagnoses live CockroachDB cluster performance issues by identifying long-running queries, busy sessions, and active transactions using SQL-only interfaces. Use when users report cluster slowness, high CPU, or need to find runaway queries and their source applications without DB Console access.
90
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Diagnoses live cluster performance issues by identifying currently active long-running queries, busy sessions, and active transactions. Uses SQL-only interfaces (SHOW statements and crdb_internal views) to provide immediate triage without requiring DB Console, HTTP endpoints, or Prometheus access.
For historical performance analysis: Use profiling-statement-fingerprints to analyze query patterns over time, identify slow fingerprints, and investigate trends without needing live queries. For transaction-level analysis: Use profiling-transaction-fingerprints to analyze historical transaction retry patterns, commit latency trends, and statement composition. For background job monitoring: Use monitoring-background-jobs to monitor schema changes, backups, and automatic jobs that don't appear in SHOW CLUSTER STATEMENTS.
Required SQL access:
VIEWACTIVITY or VIEWACTIVITYREDACTED privilege
VIEWACTIVITYREDACTED: Redacts constants in other users' queries (recommended for privacy)VIEWACTIVITY: Shows full query text for all usersCANCELQUERY / CANCELSESSION privileges for cancellation operationsCheck your privileges:
SHOW GRANTS ON ROLE <username>;See permissions reference for detailed RBAC setup.
CockroachDB provides SQL-only interfaces for live activity triage:
| Interface | Purpose | Cluster-wide? |
|---|---|---|
SHOW CLUSTER STATEMENTS | Currently executing queries | Yes (with VIEWACTIVITY) |
SHOW CLUSTER SESSIONS | Active client sessions | Yes (with VIEWACTIVITY) |
crdb_internal.cluster_transactions | In-progress transactions | Yes (with VIEWACTIVITY) |
Triage workflow:
Safety: All diagnostic queries are read-only. Cancellation is opt-in with explicit warnings.
Identify queries running longer than a specified threshold:
-- Queries running longer than 5 minutes
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT
query_id,
node_id,
session_id,
user_name,
client_address,
application_name,
start,
now() - start AS running_for,
substring(query, 1, 200) AS query_preview,
distributed,
phase
FROM q
WHERE start < now() - INTERVAL '5 minutes'
ORDER BY start
LIMIT 50;Key columns:
running_for: How long the query has been executingquery_preview: First 200 characters (protects against massive queries)phase: execution phase (preparing, executing, etc.)distributed: whether query spans multiple nodesCustomizable thresholds:
INTERVAL '5 minutes' to '10 minutes', '30 seconds', etc.LIMIT based on cluster size and expected loadFind sessions with long-running active queries:
-- Sessions with active queries running > 5 minutes
WITH s AS (SHOW CLUSTER SESSIONS)
SELECT
node_id,
session_id,
user_name,
client_address,
application_name,
status,
active_query_start,
now() - active_query_start AS active_query_for,
substring(active_queries, 1, 200) AS active_queries_preview,
substring(last_active_query, 1, 200) AS last_query_preview
FROM s
WHERE active_query_start IS NOT NULL
AND active_query_start < now() - INTERVAL '5 minutes'
ORDER BY active_query_start
LIMIT 50;Key columns:
active_query_for: Duration of current active queryapplication_name: Source application for drill-downclient_address: Client IP/hostname for troubleshootingstatus: Session state (Idle, Active, etc.)Identify long-running transactions (potential blockers):
-- Transactions running > 5 minutes
SELECT
id AS txn_id,
node_id,
session_id,
application_name,
start,
now() - start AS running_for,
num_stmts,
num_retries,
num_auto_retries,
substring(txn_string, 1, 200) AS txn_string_preview
FROM crdb_internal.cluster_transactions
WHERE start < now() - INTERVAL '5 minutes'
ORDER BY start
LIMIT 50;Key columns:
num_retries / num_auto_retries: High retry counts indicate contentionnum_stmts: Number of statements in transaction (large = potentially problematic)txn_string: Transaction fingerprintProduction safety note: crdb_internal.cluster_transactions is production-approved and safe for triage.
Once you identify suspicious activity, drill down by filtering:
-- All activity from specific application
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT query_id, user_name, start, now() - start AS running_for,
substring(query, 1, 200) AS query_preview
FROM q
WHERE application_name = 'payments-api'
ORDER BY start;-- All activity from specific user
WITH s AS (SHOW CLUSTER SESSIONS)
SELECT session_id, application_name, client_address,
active_query_start, substring(active_queries, 1, 200) AS active_queries_preview
FROM s
WHERE user_name = 'app_user'
AND active_query_start IS NOT NULL
ORDER BY active_query_start;-- All sessions from specific client IP
WITH s AS (SHOW CLUSTER SESSIONS)
SELECT session_id, user_name, application_name,
status, substring(active_queries, 1, 200) AS active_queries_preview
FROM s
WHERE client_address LIKE '10.0.1.%'
ORDER BY active_query_start;-- Long queries from specific app and user
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT query_id, node_id, start, now() - start AS running_for,
substring(query, 1, 200) AS query_preview
FROM q
WHERE application_name = 'payments-api'
AND user_name = 'app_user'
AND start < now() - INTERVAL '10 minutes'
ORDER BY start;Read-only operations:
All diagnostic queries (SHOW statements, crdb_internal.cluster_transactions) are read-only and safe to run in production.
Cancellation operations (opt-in):
CAUTION: Canceling queries/sessions terminates user work
Only proceed if:
CANCELQUERY or CANCELSESSION privileges-- 1. Identify the query_id from triage queries above
-- 2. Cancel it
CANCEL QUERY '<query_id>';Example:
CANCEL QUERY '15f9e0e91f072f0f0000000000000001';-- 1. Identify the session_id from triage queries above
-- 2. Cancel all queries in that session
CANCEL SESSION '<session_id>';Example:
CANCEL SESSION '15f9e0e91f072f0f';Verification: After canceling, re-run the triage queries to confirm the query/session is gone.
Required privileges:
CANCELQUERY system privilege to cancel queriesCANCELSESSION system privilege to cancel sessionsSee permissions reference for granting these privileges.
Scenario: Users report general slowness.
Check for long-running queries:
-- Run the "Long-Running Queries" diagnostic
-- Look for queries running > 5-10 minutesIdentify source applications:
-- Group by application to find culprits
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT application_name, COUNT(*) AS num_queries,
AVG(now() - start) AS avg_duration
FROM q
WHERE start < now() - INTERVAL '5 minutes'
GROUP BY application_name
ORDER BY num_queries DESC;Drill down into specific app:
-- Filter by top application from step 2
-- Use "Filter by Application" queryDecide on action:
Scenario: Suspect contention issues.
Check for high retry counts:
SELECT application_name, AVG(num_retries) AS avg_retries,
MAX(num_retries) AS max_retries, COUNT(*) AS num_txns
FROM crdb_internal.cluster_transactions
WHERE start < now() - INTERVAL '5 minutes'
GROUP BY application_name
HAVING AVG(num_retries) > 5
ORDER BY avg_retries DESC;Investigate specific transactions:
-- Find transactions with >10 retries
SELECT id, application_name, num_retries, num_stmts,
substring(txn_string, 1, 200) AS txn_preview
FROM crdb_internal.cluster_transactions
WHERE num_retries > 10
ORDER BY num_retries DESC;Next steps:
Scenario: Need to attribute load to specific users.
Count active queries per user:
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT user_name, COUNT(*) AS num_active_queries,
AVG(now() - start) AS avg_duration
FROM q
GROUP BY user_name
ORDER BY num_active_queries DESC;Drill down to specific user's activity:
-- Use "Filter by User" queryTake action:
| Issue | Cause | Fix |
|---|---|---|
SHOW CLUSTER STATEMENTS returns empty | No active queries, or insufficient privileges | Grant VIEWACTIVITY or VIEWACTIVITYREDACTED; verify cluster has active load |
Query text shows <hidden> | Using VIEWACTIVITYREDACTED privilege | This is expected for privacy; use VIEWACTIVITY if full text needed |
| Can't cancel query: "permission denied" | Missing CANCELQUERY privilege | Grant CANCELQUERY system privilege to your user |
crdb_internal.cluster_transactions slow | High transaction volume on cluster | Add filters (application_name, time threshold) to reduce result set |
| "relation does not exist" error | Typo in table name or old CockroachDB version | Verify you're using production-approved tables; check CockroachDB version compatibility |
| Triage queries themselves are slow | Cluster under extreme load | Use more aggressive filters (shorter time window, specific apps); consider canceling obvious runaway work first |
VIEWACTIVITYREDACTED instead of VIEWACTIVITY to protect sensitive query constants in multi-tenant environmentsLIMIT to prevent overwhelming output on large clustersINTERVAL based on your workload (5 minutes is a reasonable default, but fast OLTP may need 30 seconds)SHOW CLUSTER STATEMENTS, SHOW CLUSTER SESSIONS, and crdb_internal.cluster_transactions for production triageSkill references:
Related skills:
Official CockroachDB Documentation:
84bc1e4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.