CtrlK
BlogDocsLog inGet started
Tessl Logo

profiling-transaction-fingerprints

Analyzes transaction fingerprints using aggregated statistics from crdb_internal.transaction_statistics to identify high-retry transactions, contention patterns, and commit latency issues. Provides historical transaction-level analysis to understand which statement combinations are causing retries, contention, or performance degradation. Use when investigating transaction retry storms, analyzing commit latency trends, or understanding statement composition of problematic transactions without DB Console access.

89

Quality

86%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Profiling Transaction Fingerprints

Analyzes historical transaction performance patterns using aggregated SQL statistics to identify high-retry transactions, contention patterns, and commit latency issues. Uses crdb_internal.transaction_statistics for time-windowed analysis of retry behavior, commit latency, and statement composition - entirely via SQL without requiring DB Console access.

Complement to profiling-statement-fingerprints: This skill analyzes transaction-level patterns (groups of statements with retry behavior); for statement-level optimization, see profiling-statement-fingerprints.

Complement to triaging-live-sql-activity: This skill analyzes historical transaction patterns; for immediate triage of currently active transactions, see triaging-live-sql-activity.

When to Use This Skill

  • Identify transactions with high retry counts
  • Analyze commit latency trends for transaction fingerprints
  • Find transactions with high contention at transaction boundary
  • Understand statement composition of problematic transactions
  • Investigate transaction retry storms or abort patterns
  • SQL-only historical transaction analysis without DB Console access

For immediate incident response: Use triaging-live-sql-activity to triage currently active transactions and cancel runaway work. For statement-level optimization: Use profiling-statement-fingerprints to analyze individual query patterns.

Prerequisites

  • SQL connection to CockroachDB cluster
  • VIEWACTIVITY or VIEWACTIVITYREDACTED cluster privilege for cluster-wide visibility
    • Same privilege requirements as profiling-statement-fingerprints
  • Understanding of transaction performance concepts
  • Transaction statistics collection enabled (default): sql.stats.automatic_collection.enabled = true

Check transaction stats collection:

SHOW CLUSTER SETTING sql.stats.automatic_collection.enabled;
-- Should return: true

See triaging-live-sql-activity permissions reference for RBAC setup (same privileges).

Core Concepts

Transaction Fingerprints vs Live Transactions

Transaction fingerprint: Normalized transaction pattern grouping statements with parameterized constants.

Key differences:

  • Time scope: Historical hourly buckets vs real-time current state
  • Granularity: Aggregated retry/commit stats vs individual transaction instances
  • Relationship: Transaction = collection of statement fingerprints

Time-Series Bucketing

aggregated_ts: Hourly UTC buckets (e.g., 2026-02-21 14:00:00 = 14:00-14:59 executions) Data retention: Default ~7 days (check sql.stats.persisted_rows.max) Best practice: Always filter by time window: WHERE aggregated_ts > now() - INTERVAL '24 hours'

Aggregated vs Sampled Metrics

Metric CategoryJSON PathScopeUse Case
Aggregatedstatistics.statistics.*All executionsRetries, commit latency, execution counts
Sampledstatistics.execution_statistics.*Probabilistic sample (~10%)Contention, network, memory/disk

Critical: Sampled metrics have cnt field showing sample size. Always check:

WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL

JSON Field Extraction

CockroachDB stores transaction metadata and statistics as JSONB. Use these operators:

Operators:

  • ->: Extract JSON object/value (returns JSON)
  • ->>: Extract as text (returns text)
  • ::TYPE: Cast to specific type
  • encode(fingerprint_id, 'hex'): Convert binary fingerprint to hex string

Transaction-specific examples:

encode(fingerprint_id, 'hex') AS txn_fingerprint_id                     -- Hex encoding
(statistics->'statistics'->>'maxRetries')::INT                           -- Max retry count
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8                  -- Retry latency (seconds)
(statistics->'statistics'->'commitLat'->>'mean')::FLOAT8                 -- Commit latency (seconds)
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8                    -- Service latency (seconds)
metadata->'stmtFingerprintIDs' AS stmt_fingerprint_ids_json             -- Statement composition

Units:

  • Latency fields: seconds (FLOAT8)
  • CPU/contention: nanoseconds (divide by 1e9 for seconds)
  • Memory/disk: bytes (consider / 1048576 for MB)

See JSON field reference for complete schema.

Statement Composition

metadata.stmtFingerprintIDs: JSONB array mapping transaction to constituent statements

Use case: Understand which statements compose high-retry transactions

Cross-reference workflow: Join transaction_statistics with statement_statistics on fingerprint IDs

Example pattern:

-- Extract statement fingerprint IDs from transaction
metadata -> 'stmtFingerprintIDs' AS stmt_ids

-- Use with jsonb_array_elements_text to expand and join
jsonb_array_elements_text(metadata->'stmtFingerprintIDs') AS stmt_fingerprint_id

Core Diagnostic Queries

Query 1: Top Transactions by Retries and Contention

-- Identify transactions with high retry counts and contention
SELECT
  encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
  metadata->>'db' AS database,
  metadata->>'app' AS application,
  (statistics->'statistics'->>'cnt')::INT AS execution_count,
  (statistics->'statistics'->>'maxRetries')::INT AS max_retries,
  (statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 AS mean_retry_lat_seconds,
  (statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9 AS mean_contention_seconds,
  aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
  AND (statistics->'statistics'->>'maxRetries')::INT > 0
ORDER BY (statistics->'statistics'->>'maxRetries')::INT DESC
LIMIT 20;

Key columns: max_retries shows maximum retry count; mean_retry_lat_seconds shows time spent in retries; mean_contention_seconds shows lock wait time.

Interpretation: High max_retries (>10) indicates transaction conflicts; correlate with contention to identify lock hotspots.

Query 2: Statement Composition Analysis

-- Extract statement fingerprints for high-retry transactions
SELECT
  encode(t.fingerprint_id, 'hex') AS txn_fingerprint_id,
  t.metadata->>'app' AS application,
  (t.statistics->'statistics'->>'maxRetries')::INT AS max_retries,
  jsonb_array_length(t.metadata->'stmtFingerprintIDs') AS num_statements,
  t.metadata->'stmtFingerprintIDs' AS stmt_fingerprint_ids,
  t.aggregated_ts
FROM crdb_internal.transaction_statistics t
WHERE t.aggregated_ts > now() - INTERVAL '24 hours'
  AND (t.statistics->'statistics'->>'maxRetries')::INT > 10
  AND t.metadata->'stmtFingerprintIDs' IS NOT NULL
ORDER BY max_retries DESC
LIMIT 20;

Key columns: num_statements shows transaction complexity; stmt_fingerprint_ids contains statement IDs for cross-reference with statement_statistics.

Use case: Understand which statement combinations cause retries; use Query 7 to drill down to specific statements.

Query 3: High Commit Latency Transactions

-- Find transactions with slow commit latency
SELECT
  encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
  metadata->>'db' AS database,
  metadata->>'app' AS application,
  (statistics->'statistics'->>'cnt')::INT AS execution_count,
  (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 AS mean_commit_lat_seconds,
  (statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 AS mean_service_lat_seconds,
  ROUND(
    ((statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 /
    NULLIF((statistics->'statistics'->'svcLat'->>'mean')::FLOAT8, 0)) * 100, 2
  ) AS commit_pct_of_service_lat,
  aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
  AND (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 > 0.1  -- > 100ms commit latency
ORDER BY (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 DESC
LIMIT 20;

Key columns: mean_commit_lat_seconds shows 2PC commit time; commit_pct_of_service_lat shows what percentage of total latency is commit overhead.

Interpretation: High commit percentage (>20%) suggests distributed transaction overhead, replication delays, or cross-region writes.

Query 4: Retry Rate by Application

-- Analyze retry patterns by application
SELECT
  metadata->>'app' AS application,
  metadata->>'db' AS database,
  COUNT(*) AS transaction_fingerprint_count,
  SUM((statistics->'statistics'->>'cnt')::INT) AS total_executions,
  AVG((statistics->'statistics'->>'maxRetries')::INT) AS avg_max_retries,
  MAX((statistics->'statistics'->>'maxRetries')::INT) AS overall_max_retries,
  AVG((statistics->'statistics'->'retryLat'->>'mean')::FLOAT8) AS avg_retry_lat_seconds
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
  AND (statistics->'statistics'->>'maxRetries')::INT > 0
GROUP BY metadata->>'app', metadata->>'db'
ORDER BY avg_max_retries DESC
LIMIT 20;

Use case: Application-level health scorecard; identify which applications have the most problematic transaction patterns.

Customization: Adjust time window to 7 days for trends; filter by specific database.

Query 5: Transaction Resource Consumption

-- Find transactions with high resource usage
SELECT
  encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
  metadata->>'app' AS application,
  (statistics->'statistics'->>'cnt')::INT AS execution_count,
  (statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 / 1048576 AS mean_network_mb,
  (statistics->'execution_statistics'->'maxMemUsage'->>'mean')::FLOAT8 / 1048576 AS mean_mem_mb,
  (statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 / 1048576 AS mean_disk_mb,
  ROUND(
    ((statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 / 1048576) *
    (statistics->'statistics'->>'cnt')::INT, 2
  ) AS estimated_total_network_mb,
  aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
  AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
  AND (statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 > 0
ORDER BY estimated_total_network_mb DESC
LIMIT 20;

Key columns: mean_network_mb shows distributed transaction overhead; mean_disk_mb > 0 indicates memory spill.

Interpretation: High network bytes suggest cross-region transactions or inefficient distribution; disk usage indicates memory pressure.

Query 6: Retry Latency Decomposition

-- Understand retry latency as percentage of service latency
SELECT
  encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
  metadata->>'app' AS application,
  (statistics->'statistics'->>'cnt')::INT AS execution_count,
  (statistics->'statistics'->>'maxRetries')::INT AS max_retries,
  (statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 AS mean_retry_lat_seconds,
  (statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 AS mean_service_lat_seconds,
  ROUND(
    ((statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 /
    NULLIF((statistics->'statistics'->'svcLat'->>'mean')::FLOAT8, 0)) * 100, 2
  ) AS retry_pct_of_service_lat,
  aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
  AND (statistics->'statistics'->>'maxRetries')::INT > 0
  AND (statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 > 0
ORDER BY retry_pct_of_service_lat DESC
LIMIT 20;

Interpretation: High retry percentage (>30%) means most latency is spent retrying due to contention; optimize transaction boundaries or schema.

Query 7: Cross-Reference Transaction to Statements

-- Join transaction statistics with statement statistics to see constituent statements
SELECT
  encode(t.fingerprint_id, 'hex') AS txn_fingerprint_id,
  t.metadata->>'app' AS txn_application,
  (t.statistics->'statistics'->>'maxRetries')::INT AS txn_max_retries,
  stmt_fp_id AS stmt_fingerprint_id,
  s.metadata->>'query' AS statement_query,
  (s.statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS stmt_mean_run_lat_seconds,
  t.aggregated_ts
FROM crdb_internal.transaction_statistics t
CROSS JOIN LATERAL jsonb_array_elements_text(t.metadata->'stmtFingerprintIDs') AS stmt_fp_id
LEFT JOIN crdb_internal.statement_statistics s
  ON s.fingerprint_id = decode(stmt_fp_id, 'hex')
  AND s.aggregated_ts = t.aggregated_ts
WHERE t.aggregated_ts > now() - INTERVAL '24 hours'
  AND (t.statistics->'statistics'->>'maxRetries')::INT > 10
  AND t.metadata->'stmtFingerprintIDs' IS NOT NULL
ORDER BY txn_max_retries DESC, stmt_mean_run_lat_seconds DESC
LIMIT 50;

Use case: Drill down from high-retry transactions to specific problematic statements; identify which statement in a transaction is causing retries.

Note: Uses decode(stmt_fp_id, 'hex') to convert hex string back to binary for join with statement_statistics.

Common Workflows

Workflow 1: Retry Storm Investigation

  1. Identify high-retry transactions: Run Query 1, focus on max_retries > 20
  2. Analyze retry patterns by application: Run Query 4 to identify problematic apps
  3. Examine statement composition: Run Query 7 to see which statements are in high-retry transactions
  4. Cross-reference live activity: If ongoing, use triaging-live-sql-activity to check current transaction state
  5. Remediate: Adjust transaction boundaries, batch operations, optimize statements identified in step 3

Workflow 2: Commit Latency Analysis

  1. Find slow commit transactions: Run Query 3, focus on commit_pct_of_service_lat > 20%
  2. Check for contention correlation: Run Query 1 for same transaction fingerprints to see if contention is related
  3. Analyze time patterns: Group Query 3 by aggregated_ts to identify peak periods
  4. Resource investigation: Run Query 5 to check if network overhead correlates with commit latency
  5. Remediate: Consider batching operations, partitioning tables, or investigating replication configuration

Workflow 3: Statement Composition Drill-Down

  1. Identify problematic transactions: Run Query 1 or Query 3 to find high-retry or slow-commit transactions
  2. Extract statement IDs: Run Query 2 to see stmtFingerprintIDs for target transactions
  3. Join with statement_statistics: Run Query 7 to see full statement details
  4. Optimize bottleneck statements: Use profiling-statement-fingerprints skill to analyze and optimize identified statements
  5. Validate retry reduction: Re-run Query 1 after optimizations to confirm improved retry counts

Workflow 4: Application Health Scorecard

  1. Generate retry metrics by app: Run Query 4 to get application-level retry statistics
  2. Correlate with commit latency: Modify Query 3 to group by application
  3. Resource attribution: Run Query 5 grouped by application to see resource impact
  4. Trend analysis: Run queries with 7-day window and compare hourly buckets
  5. Contact application teams: Provide specific transaction fingerprints with high retries or latency for investigation

Safety Considerations

Read-only operations: All queries are SELECT statements against crdb_internal.transaction_statistics, which is production-approved and safe for diagnostic use.

Performance impact:

ConsiderationImpactMitigation
Large tableHigh transaction diversity = many rowsAlways use WHERE aggregated_ts > now() - INTERVAL '24 hours' and LIMIT
JSON parsingCPU overhead for JSONB extractionAvoid tight loops; use specific time windows
Broad windows7-day queries = more rowsDefault to 24h; expand only when needed
Sampled metricsNULL handling overheadUse defensive WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL

Privacy: Use VIEWACTIVITYREDACTED to redact query constants in multi-tenant environments (same as statement profiling).

Default time window: 24 hours balances recent data with manageable result sets.

Troubleshooting

IssueCauseFix
Empty resultsNo data in window, or stats collection disabledCheck sql.stats.automatic_collection.enabled = true
column does not existJSON field typo or version mismatchVerify field names; check CockroachDB version
NULL in sampled metricsMetric not sampled in bucketFilter: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL
fingerprint_id not hexDefault binary formatUse encode(fingerprint_id, 'hex') for readability
Statement join failsMismatched aggregated_ts or fingerprint formatEnsure same time bucket and proper type casting with decode()
Very slow queryLarge table, no time filterAlways add time window and LIMIT
Empty stmtFingerprintIDsSingle-statement transactions or old versionNormal for simple transactions

Key Considerations

  • Time windows: Default to 24h; expand to 7d for trends
  • Sampled metrics: Not all executions captured; check sample size (cnt)
  • JSON field safety: Use defensive NULL checks; handle type casting errors
  • Privacy: Use VIEWACTIVITYREDACTED in production
  • Performance: Always include time filters and LIMIT clauses
  • Complement to statement profiling: Use together for complete coverage (transaction + statement)
  • Complement to live triage: Historical patterns vs real-time (use both)
  • Data retention: Default ~7 days; verify with sql.stats.persisted_rows.max
  • Retry semantics: maxRetries is maximum across all executions in bucket, not average
  • Fingerprint encoding: Use encode(fingerprint_id, 'hex') for human-readable IDs

References

Skill references:

Official CockroachDB Documentation:

  • crdb_internal
  • Transactions Page (DB Console)
  • Monitor and Analyze Transaction Contention
  • VIEWACTIVITY privilege

Related skills:

Repository
cockroachlabs/cockroachdb-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.