databricks-debug-bundle

Collect Databricks debug evidence for support tickets and troubleshooting. Use when encountering persistent issues, preparing support tickets, or collecting diagnostic information for Databricks problems. Trigger with phrases like "databricks debug", "databricks support bundle", "collect databricks logs", "databricks diagnostic".

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/saas-packs/databricks-pack/skills/databricks-debug-bundle/SKILL.md

Databricks Debug Bundle

Current State

!databricks --version 2>/dev/null || echo 'CLI not installed' !python3 -c "import databricks.sdk; print(f'SDK {databricks.sdk.__version__}')" 2>/dev/null || echo 'SDK not installed'

Overview

Collect all diagnostic information needed for Databricks support tickets: environment info, cluster state, cluster events, job run details, Spark driver logs, and Delta table history. Produces a redacted tar.gz bundle safe to share with support.

Prerequisites

Databricks CLI installed and configured
Access to cluster logs (admin or cluster owner)
Permission to access job run details

Instructions

Step 1: Create Debug Collection Script

#!/bin/bash
set -euo pipefail
# databricks-debug-bundle.sh [cluster_id] [run_id] [table_name]

BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"

CLUSTER_ID="${1:-}"
RUN_ID="${2:-}"
TABLE_NAME="${3:-}"

echo "=== Databricks Debug Bundle ===" | tee "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE_DIR/summary.txt"
echo "Workspace: ${DATABRICKS_HOST:-unset}" >> "$BUNDLE_DIR/summary.txt"

Step 2: Collect Environment Info

{
    echo ""
    echo "--- Environment ---"
    echo "CLI: $(databricks --version 2>&1)"
    echo "SDK: $(pip show databricks-sdk 2>/dev/null | grep Version || echo 'not installed')"
    echo "Python: $(python3 --version 2>&1)"
    echo "OS: $(uname -srm)"
    echo ""
    echo "--- Current User ---"
    databricks current-user me --output json 2>&1 | jq '{userName, active}' || echo "Auth failed"
} >> "$BUNDLE_DIR/summary.txt"

Step 3: Collect Cluster Information

if [ -n "$CLUSTER_ID" ]; then
    echo "" >> "$BUNDLE_DIR/summary.txt"
    echo "--- Cluster: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt"

    # Full cluster config
    databricks clusters get --cluster-id "$CLUSTER_ID" --output json \
        > "$BUNDLE_DIR/cluster_config.json" 2>&1

    # Key fields summary
    jq '{state, spark_version, node_type_id, num_workers,
         autotermination_minutes, termination_reason}' \
        "$BUNDLE_DIR/cluster_config.json" >> "$BUNDLE_DIR/summary.txt"

    # Recent cluster events (state changes, errors, resizing)
    databricks clusters events --cluster-id "$CLUSTER_ID" --limit 30 --output json \
        > "$BUNDLE_DIR/cluster_events.json" 2>&1

    # Extract event timeline
    jq -r '.events[]? | "\(.timestamp): \(.type) — \(.details // "no details")"' \
        "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi

Step 4: Collect Job Run Information

if [ -n "$RUN_ID" ]; then
    echo "" >> "$BUNDLE_DIR/summary.txt"
    echo "--- Run: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt"

    # Full run details
    databricks runs get --run-id "$RUN_ID" --output json \
        > "$BUNDLE_DIR/run_details.json" 2>&1

    # Run state summary
    jq '{state: .state, start_time, end_time, run_duration}' \
        "$BUNDLE_DIR/run_details.json" >> "$BUNDLE_DIR/summary.txt"

    # Task-level breakdown
    jq -r '.tasks[]? | "  Task \(.task_key): \(.state.result_state // "RUNNING") — \(.state.state_message // "ok")"' \
        "$BUNDLE_DIR/run_details.json" >> "$BUNDLE_DIR/summary.txt"

    # Run output (error messages, stdout)
    databricks runs get-output --run-id "$RUN_ID" --output json \
        > "$BUNDLE_DIR/run_output.json" 2>&1

    jq '{error, error_trace: (.error_trace // "" | .[0:2000])}' \
        "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi

Step 5: Collect Spark Driver Logs

if [ -n "$CLUSTER_ID" ]; then
    echo "" >> "$BUNDLE_DIR/summary.txt"
    echo "--- Spark Driver Logs (last 500 lines) ---" >> "$BUNDLE_DIR/summary.txt"

    python3 << 'PYEOF' > "$BUNDLE_DIR/driver_logs.txt" 2>&1
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
try:
    content = w.dbfs.read("/cluster-logs/${CLUSTER_ID}/driver/log4j-active.log")
    # Take last 500 lines
    lines = content.data.decode().splitlines()[-500:]
    print("\n".join(lines))
except Exception as e:
    print(f"Could not fetch driver logs: {e}")
    print("Tip: Enable cluster log delivery in cluster config for persistent logs")
PYEOF
fi

Step 6: Collect Delta Table Diagnostics

if [ -n "$TABLE_NAME" ]; then
    echo "" >> "$BUNDLE_DIR/summary.txt"
    echo "--- Delta Table: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"

    python3 << PYEOF > "$BUNDLE_DIR/delta_diagnostics.txt" 2>&1
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()

print("=== Table Details ===")
spark.sql("DESCRIBE DETAIL ${TABLE_NAME}").show(truncate=False)

print("\n=== Recent History (last 20 operations) ===")
spark.sql("DESCRIBE HISTORY ${TABLE_NAME} LIMIT 20").show(truncate=False)

print("\n=== Schema ===")
spark.sql("DESCRIBE ${TABLE_NAME}").show(truncate=False)

print("\n=== File Stats ===")
detail = spark.sql("DESCRIBE DETAIL ${TABLE_NAME}").first()
print(f"Files: {detail.numFiles}, Size: {detail.sizeInBytes / 1024 / 1024:.1f} MB")
PYEOF
fi

Step 7: Package Bundle (Redacted)

# Redact sensitive data from config snapshot
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt"
if [ -f ~/.databrickscfg ]; then
    sed 's/token = .*/token = ***REDACTED***/' \
        ~/.databrickscfg > "$BUNDLE_DIR/config-redacted.txt"
    sed -i 's/client_secret = .*/client_secret = ***REDACTED***/' \
        "$BUNDLE_DIR/config-redacted.txt"
fi

# Network connectivity test
echo "--- Network ---" >> "$BUNDLE_DIR/summary.txt"
echo -n "API reachable: " >> "$BUNDLE_DIR/summary.txt"
curl -s -o /dev/null -w "%{http_code}" \
    "${DATABRICKS_HOST}/api/2.0/clusters/list" \
    -H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"

# Create archive
tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR"
rm -rf "$BUNDLE_DIR"

echo ""
echo "Bundle created: $BUNDLE_DIR.tar.gz"
echo "Contents: summary.txt, cluster_config.json, cluster_events.json,"
echo "  run_details.json, run_output.json, driver_logs.txt,"
echo "  delta_diagnostics.txt, config-redacted.txt"

Output

databricks-debug-YYYYMMDD-HHMMSS.tar.gz containing:
- summary.txt — Human-readable diagnostic summary
- cluster_config.json — Full cluster configuration
- cluster_events.json — State changes, errors, resizing events
- run_details.json — Job run with task-level breakdown
- run_output.json — Stdout/stderr and error traces
- driver_logs.txt — Last 500 lines of Spark driver log
- delta_diagnostics.txt — Table details, history, schema
- config-redacted.txt — CLI config with secrets removed

Error Handling

Item	Included	Notes
Tokens/secrets	NEVER	Redacted with `*REDACTED*`
PII in logs	Review before sharing	Scan driver_logs.txt manually
Cluster IDs	Yes	Safe to share with support
Error traces	Yes	Check for embedded connection strings

Examples

Usage

# Environment only
bash databricks-debug-bundle.sh

# With cluster diagnostics
bash databricks-debug-bundle.sh 0123-456789-abcde

# With cluster + job run
bash databricks-debug-bundle.sh 0123-456789-abcde 12345

# Full diagnostics including Delta table
bash databricks-debug-bundle.sh 0123-456789-abcde 12345 catalog.schema.table

Submit to Support

Generate bundle: bash databricks-debug-bundle.sh [args]
Review summary.txt for sensitive data
Open ticket at help.databricks.com
Attach the .tar.gz bundle
Include workspace ID (found in workspace URL: adb-<workspace-id>)

Resources

Next Steps

For rate limit issues, see databricks-rate-limits.

Repository: jeremylongshore/claude-code-plugins-plus-skills
Commit: 3a2d27d

Last updated: about 11 hours ago
Created: about 11 hours ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.