Ctrl + K
DocumentationLog inGet started

databricks-ci-integration

tessl install github:jeremylongshore/claude-code-plugins-plus-skills --skill databricks-ci-integration
github.com/jeremylongshore/claude-code-plugins-plus-skills

Configure Databricks CI/CD integration with GitHub Actions and Asset Bundles. Use when setting up automated testing, configuring CI pipelines, or integrating Databricks deployments into your build process. Trigger with phrases like "databricks CI", "databricks GitHub Actions", "databricks automated tests", "CI databricks", "databricks pipeline".

Review Score

83%

Validation Score

11/16

Implementation Score

77%

Activation Score

90%

Databricks CI Integration

Overview

Set up CI/CD pipelines for Databricks using GitHub Actions and Asset Bundles.

Prerequisites

  • GitHub repository with Actions enabled
  • Databricks workspace with service principal
  • Asset Bundles project structure

Instructions

Step 1: Configure Service Principal

# Create service principal in Databricks
databricks service-principals create --json '{
  "display_name": "GitHub Actions CI",
  "active": true
}'

# Note the application_id returned

# Create OAuth secret
databricks service-principal-secrets create \
  --service-principal-id <application_id>

# Grant permissions to service principal
databricks permissions update workspace --json '{
  "access_control_list": [{
    "service_principal_name": "<application_id>",
    "permission_level": "CAN_MANAGE"
  }]
}'

Step 2: Configure GitHub Secrets

# Set GitHub secrets
gh secret set DATABRICKS_HOST --body "https://adb-1234567890.1.azuredatabricks.net"
gh secret set DATABRICKS_CLIENT_ID --body "your-client-id"
gh secret set DATABRICKS_CLIENT_SECRET --body "your-client-secret"

# For staging/prod environments
gh secret set DATABRICKS_HOST_STAGING --body "https://staging.azuredatabricks.net"
gh secret set DATABRICKS_HOST_PROD --body "https://prod.azuredatabricks.net"

Step 3: Create GitHub Actions Workflow

# .github/workflows/databricks-ci.yml
name: Databricks CI/CD

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
  DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
  DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'

      - name: Install dependencies
        run: |
          pip install databricks-cli databricks-sdk pytest

      - name: Validate Asset Bundle
        run: databricks bundle validate

      - name: Run unit tests
        run: pytest tests/unit/ -v --tb=short

  deploy-staging:
    needs: validate
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    env:
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }}
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install Databricks CLI
        run: pip install databricks-cli

      - name: Deploy to Staging
        run: |
          databricks bundle deploy -t staging

      - name: Run Integration Tests
        run: |
          # Trigger test job and wait for completion
          RUN_ID=$(databricks bundle run -t staging integration-tests | jq -r '.run_id')
          databricks runs get --run-id $RUN_ID --wait
          # Check result
          RESULT=$(databricks runs get --run-id $RUN_ID | jq -r '.state.result_state')
          if [ "$RESULT" != "SUCCESS" ]; then
            echo "Integration tests failed!"
            exit 1
          fi

  deploy-production:
    needs: [validate, deploy-staging]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment:
      name: production
      url: ${{ secrets.DATABRICKS_HOST_PROD }}
    env:
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }}
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install Databricks CLI
        run: pip install databricks-cli

      - name: Deploy to Production
        run: |
          databricks bundle deploy -t prod

      - name: Verify Deployment
        run: |
          databricks bundle summary -t prod
          # Trigger smoke test
          databricks bundle run -t prod smoke-test

Step 4: PR Validation Workflow

# .github/workflows/pr-validation.yml
name: PR Validation

on:
  pull_request:
    branches: [main, develop]

jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install ruff mypy pytest pytest-cov databricks-sdk

      - name: Lint with ruff
        run: ruff check src/

      - name: Type check with mypy
        run: mypy src/ --ignore-missing-imports

      - name: Run tests with coverage
        run: pytest tests/unit/ --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          files: coverage.xml

  bundle-validation:
    runs-on: ubuntu-latest
    env:
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
      DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
      DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
    steps:
      - uses: actions/checkout@v4

      - name: Install Databricks CLI
        run: pip install databricks-cli

      - name: Validate bundle for all targets
        run: |
          databricks bundle validate -t dev
          databricks bundle validate -t staging
          databricks bundle validate -t prod

      - name: Check for breaking changes
        run: |
          # Compare job configurations
          databricks bundle summary -t prod --output json > current.json
          # Add logic to detect breaking changes

Step 5: Nightly Test Workflow

# .github/workflows/nightly-tests.yml
name: Nightly Tests

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM UTC daily
  workflow_dispatch:

jobs:
  integration-tests:
    runs-on: ubuntu-latest
    env:
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }}
      DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
      DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: pip install databricks-cli

      - name: Run full integration test suite
        run: |
          databricks bundle deploy -t staging
          RUN_ID=$(databricks bundle run -t staging full-integration-tests | jq -r '.run_id')
          databricks runs get --run-id $RUN_ID --wait

      - name: Generate test report
        if: always()
        run: |
          # Download test results
          databricks fs cp dbfs:/test-results/latest/ ./test-results/ --recursive

      - name: Upload test artifacts
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-results
          path: test-results/

      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          channel-id: 'data-engineering-alerts'
          slack-message: 'Nightly tests failed! Check ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'
        env:
          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

Output

  • Automated test pipeline
  • PR checks configured
  • Staging deployment on merge to develop
  • Production deployment on merge to main

Error Handling

IssueCauseSolution
Auth failedInvalid credentialsRegenerate service principal secret
Bundle validation failedInvalid YAMLRun databricks bundle validate locally
Deployment timeoutSlow cluster startupUse warm pools or increase timeout
Tests failedCode regressionFix code and re-run

Examples

Matrix Testing (Multiple DBR Versions)

jobs:
  test-matrix:
    strategy:
      matrix:
        dbr_version: ['13.3', '14.3', '15.1']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Test on DBR ${{ matrix.dbr_version }}
        run: |
          databricks bundle deploy -t test-${{ matrix.dbr_version }}
          databricks bundle run -t test-${{ matrix.dbr_version }} tests

Branch Protection Rules

# Set via GitHub API or UI
required_status_checks:
  - "lint-and-test"
  - "bundle-validation"
required_reviews: 1
dismiss_stale_reviews: true

Resources

  • Databricks Asset Bundles
  • GitHub Actions
  • Service Principal Auth

Next Steps

For deployment patterns, see databricks-deploy-integration.