Analyzes transaction fingerprints using aggregated statistics from crdb_internal.transaction_statistics to identify high-retry transactions, contention patterns, and commit latency issues. Provides historical transaction-level analysis to understand which statement combinations are causing retries, contention, or performance degradation. Use when investigating transaction retry storms, analyzing commit latency trends, or understanding statement composition of problematic transactions without DB Console access.
89
86%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Analyzes historical transaction performance patterns using aggregated SQL statistics to identify high-retry transactions, contention patterns, and commit latency issues. Uses crdb_internal.transaction_statistics for time-windowed analysis of retry behavior, commit latency, and statement composition - entirely via SQL without requiring DB Console access.
Complement to profiling-statement-fingerprints: This skill analyzes transaction-level patterns (groups of statements with retry behavior); for statement-level optimization, see profiling-statement-fingerprints.
Complement to triaging-live-sql-activity: This skill analyzes historical transaction patterns; for immediate triage of currently active transactions, see triaging-live-sql-activity.
For immediate incident response: Use triaging-live-sql-activity to triage currently active transactions and cancel runaway work. For statement-level optimization: Use profiling-statement-fingerprints to analyze individual query patterns.
VIEWACTIVITY or VIEWACTIVITYREDACTED cluster privilege for cluster-wide visibility
sql.stats.automatic_collection.enabled = trueCheck transaction stats collection:
SHOW CLUSTER SETTING sql.stats.automatic_collection.enabled;
-- Should return: trueSee triaging-live-sql-activity permissions reference for RBAC setup (same privileges).
Transaction fingerprint: Normalized transaction pattern grouping statements with parameterized constants.
Key differences:
aggregated_ts: Hourly UTC buckets (e.g., 2026-02-21 14:00:00 = 14:00-14:59 executions)
Data retention: Default ~7 days (check sql.stats.persisted_rows.max)
Best practice: Always filter by time window: WHERE aggregated_ts > now() - INTERVAL '24 hours'
| Metric Category | JSON Path | Scope | Use Case |
|---|---|---|---|
| Aggregated | statistics.statistics.* | All executions | Retries, commit latency, execution counts |
| Sampled | statistics.execution_statistics.* | Probabilistic sample (~10%) | Contention, network, memory/disk |
Critical: Sampled metrics have cnt field showing sample size. Always check:
WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULLCockroachDB stores transaction metadata and statistics as JSONB. Use these operators:
Operators:
->: Extract JSON object/value (returns JSON)->>: Extract as text (returns text)::TYPE: Cast to specific typeencode(fingerprint_id, 'hex'): Convert binary fingerprint to hex stringTransaction-specific examples:
encode(fingerprint_id, 'hex') AS txn_fingerprint_id -- Hex encoding
(statistics->'statistics'->>'maxRetries')::INT -- Max retry count
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 -- Retry latency (seconds)
(statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 -- Commit latency (seconds)
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 -- Service latency (seconds)
metadata->'stmtFingerprintIDs' AS stmt_fingerprint_ids_json -- Statement compositionUnits:
See JSON field reference for complete schema.
metadata.stmtFingerprintIDs: JSONB array mapping transaction to constituent statements
Use case: Understand which statements compose high-retry transactions
Cross-reference workflow: Join transaction_statistics with statement_statistics on fingerprint IDs
Example pattern:
-- Extract statement fingerprint IDs from transaction
metadata -> 'stmtFingerprintIDs' AS stmt_ids
-- Use with jsonb_array_elements_text to expand and join
jsonb_array_elements_text(metadata->'stmtFingerprintIDs') AS stmt_fingerprint_id-- Identify transactions with high retry counts and contention
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'db' AS database,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->>'maxRetries')::INT AS max_retries,
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 AS mean_retry_lat_seconds,
(statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9 AS mean_contention_seconds,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'maxRetries')::INT > 0
ORDER BY (statistics->'statistics'->>'maxRetries')::INT DESC
LIMIT 20;Key columns: max_retries shows maximum retry count; mean_retry_lat_seconds shows time spent in retries; mean_contention_seconds shows lock wait time.
Interpretation: High max_retries (>10) indicates transaction conflicts; correlate with contention to identify lock hotspots.
-- Extract statement fingerprints for high-retry transactions
SELECT
encode(t.fingerprint_id, 'hex') AS txn_fingerprint_id,
t.metadata->>'app' AS application,
(t.statistics->'statistics'->>'maxRetries')::INT AS max_retries,
jsonb_array_length(t.metadata->'stmtFingerprintIDs') AS num_statements,
t.metadata->'stmtFingerprintIDs' AS stmt_fingerprint_ids,
t.aggregated_ts
FROM crdb_internal.transaction_statistics t
WHERE t.aggregated_ts > now() - INTERVAL '24 hours'
AND (t.statistics->'statistics'->>'maxRetries')::INT > 10
AND t.metadata->'stmtFingerprintIDs' IS NOT NULL
ORDER BY max_retries DESC
LIMIT 20;Key columns: num_statements shows transaction complexity; stmt_fingerprint_ids contains statement IDs for cross-reference with statement_statistics.
Use case: Understand which statement combinations cause retries; use Query 7 to drill down to specific statements.
-- Find transactions with slow commit latency
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'db' AS database,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 AS mean_commit_lat_seconds,
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 AS mean_service_lat_seconds,
ROUND(
((statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 /
NULLIF((statistics->'statistics'->'svcLat'->>'mean')::FLOAT8, 0)) * 100, 2
) AS commit_pct_of_service_lat,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 > 0.1 -- > 100ms commit latency
ORDER BY (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 DESC
LIMIT 20;Key columns: mean_commit_lat_seconds shows 2PC commit time; commit_pct_of_service_lat shows what percentage of total latency is commit overhead.
Interpretation: High commit percentage (>20%) suggests distributed transaction overhead, replication delays, or cross-region writes.
-- Analyze retry patterns by application
SELECT
metadata->>'app' AS application,
metadata->>'db' AS database,
COUNT(*) AS transaction_fingerprint_count,
SUM((statistics->'statistics'->>'cnt')::INT) AS total_executions,
AVG((statistics->'statistics'->>'maxRetries')::INT) AS avg_max_retries,
MAX((statistics->'statistics'->>'maxRetries')::INT) AS overall_max_retries,
AVG((statistics->'statistics'->'retryLat'->>'mean')::FLOAT8) AS avg_retry_lat_seconds
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'maxRetries')::INT > 0
GROUP BY metadata->>'app', metadata->>'db'
ORDER BY avg_max_retries DESC
LIMIT 20;Use case: Application-level health scorecard; identify which applications have the most problematic transaction patterns.
Customization: Adjust time window to 7 days for trends; filter by specific database.
-- Find transactions with high resource usage
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 / 1048576 AS mean_network_mb,
(statistics->'execution_statistics'->'maxMemUsage'->>'mean')::FLOAT8 / 1048576 AS mean_mem_mb,
(statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 / 1048576 AS mean_disk_mb,
ROUND(
((statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 / 1048576) *
(statistics->'statistics'->>'cnt')::INT, 2
) AS estimated_total_network_mb,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 > 0
ORDER BY estimated_total_network_mb DESC
LIMIT 20;Key columns: mean_network_mb shows distributed transaction overhead; mean_disk_mb > 0 indicates memory spill.
Interpretation: High network bytes suggest cross-region transactions or inefficient distribution; disk usage indicates memory pressure.
-- Understand retry latency as percentage of service latency
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->>'maxRetries')::INT AS max_retries,
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 AS mean_retry_lat_seconds,
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 AS mean_service_lat_seconds,
ROUND(
((statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 /
NULLIF((statistics->'statistics'->'svcLat'->>'mean')::FLOAT8, 0)) * 100, 2
) AS retry_pct_of_service_lat,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'maxRetries')::INT > 0
AND (statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 > 0
ORDER BY retry_pct_of_service_lat DESC
LIMIT 20;Interpretation: High retry percentage (>30%) means most latency is spent retrying due to contention; optimize transaction boundaries or schema.
-- Join transaction statistics with statement statistics to see constituent statements
SELECT
encode(t.fingerprint_id, 'hex') AS txn_fingerprint_id,
t.metadata->>'app' AS txn_application,
(t.statistics->'statistics'->>'maxRetries')::INT AS txn_max_retries,
stmt_fp_id AS stmt_fingerprint_id,
s.metadata->>'query' AS statement_query,
(s.statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS stmt_mean_run_lat_seconds,
t.aggregated_ts
FROM crdb_internal.transaction_statistics t
CROSS JOIN LATERAL jsonb_array_elements_text(t.metadata->'stmtFingerprintIDs') AS stmt_fp_id
LEFT JOIN crdb_internal.statement_statistics s
ON s.fingerprint_id = decode(stmt_fp_id, 'hex')
AND s.aggregated_ts = t.aggregated_ts
WHERE t.aggregated_ts > now() - INTERVAL '24 hours'
AND (t.statistics->'statistics'->>'maxRetries')::INT > 10
AND t.metadata->'stmtFingerprintIDs' IS NOT NULL
ORDER BY txn_max_retries DESC, stmt_mean_run_lat_seconds DESC
LIMIT 50;Use case: Drill down from high-retry transactions to specific problematic statements; identify which statement in a transaction is causing retries.
Note: Uses decode(stmt_fp_id, 'hex') to convert hex string back to binary for join with statement_statistics.
max_retries > 20commit_pct_of_service_lat > 20%aggregated_ts to identify peak periodsstmtFingerprintIDs for target transactionsRead-only operations:
All queries are SELECT statements against crdb_internal.transaction_statistics, which is production-approved and safe for diagnostic use.
Performance impact:
| Consideration | Impact | Mitigation |
|---|---|---|
| Large table | High transaction diversity = many rows | Always use WHERE aggregated_ts > now() - INTERVAL '24 hours' and LIMIT |
| JSON parsing | CPU overhead for JSONB extraction | Avoid tight loops; use specific time windows |
| Broad windows | 7-day queries = more rows | Default to 24h; expand only when needed |
| Sampled metrics | NULL handling overhead | Use defensive WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL |
Privacy: Use VIEWACTIVITYREDACTED to redact query constants in multi-tenant environments (same as statement profiling).
Default time window: 24 hours balances recent data with manageable result sets.
| Issue | Cause | Fix |
|---|---|---|
| Empty results | No data in window, or stats collection disabled | Check sql.stats.automatic_collection.enabled = true |
column does not exist | JSON field typo or version mismatch | Verify field names; check CockroachDB version |
| NULL in sampled metrics | Metric not sampled in bucket | Filter: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL |
fingerprint_id not hex | Default binary format | Use encode(fingerprint_id, 'hex') for readability |
| Statement join fails | Mismatched aggregated_ts or fingerprint format | Ensure same time bucket and proper type casting with decode() |
| Very slow query | Large table, no time filter | Always add time window and LIMIT |
Empty stmtFingerprintIDs | Single-statement transactions or old version | Normal for simple transactions |
cnt)sql.stats.persisted_rows.maxmaxRetries is maximum across all executions in bucket, not averageencode(fingerprint_id, 'hex') for human-readable IDsSkill references:
Official CockroachDB Documentation:
Related skills:
84bc1e4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.