CtrlK
BlogDocsLog inGet started
Tessl Logo

databricks-lakebase-autoscale

Patterns and best practices for Lakebase Autoscaling (next-gen managed PostgreSQL). Use when creating or managing Lakebase Autoscaling projects, configuring autoscaling compute or scale-to-zero, working with database branching for dev/test workflows, implementing reverse ETL via synced tables, or connecting applications to Lakebase with OAuth credentials.

89

Quality

86%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Lakebase Autoscaling

Patterns and best practices for using Lakebase Autoscaling, the next-generation managed PostgreSQL on Databricks with autoscaling compute, branching, scale-to-zero, and instant restore.

When to Use

Use this skill when:

  • Building applications that need a PostgreSQL database with autoscaling compute
  • Working with database branching for dev/test/staging workflows
  • Adding persistent state to applications with scale-to-zero cost savings
  • Implementing reverse ETL from Delta Lake to an operational database via synced tables
  • Managing Lakebase Autoscaling projects, branches, computes, or credentials

Overview

Lakebase Autoscaling is Databricks' next-generation managed PostgreSQL service for OLTP workloads. It provides autoscaling compute, Git-like branching, scale-to-zero, and instant point-in-time restore.

FeatureDescription
Autoscaling Compute0.5-112 CU with 2 GB RAM per CU; scales dynamically based on load
Scale-to-ZeroCompute suspends after configurable inactivity timeout
BranchingCreate isolated database environments (like Git branches) for dev/test
Instant RestorePoint-in-time restore from any moment within the configured window (up to 35 days)
OAuth AuthenticationToken-based auth via Databricks SDK (1-hour expiry)
Reverse ETLSync data from Delta tables to PostgreSQL via synced tables

Available Regions (AWS): us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2

Available Regions (Azure Beta): eastus2, westeurope, westus

Project Hierarchy

Understanding the hierarchy is essential for working with Lakebase Autoscaling:

Project (top-level container)
  └── Branch(es) (isolated database environments)
        ├── Compute (primary R/W endpoint)
        ├── Read Replica(s) (optional, read-only)
        ├── Role(s) (Postgres roles)
        └── Database(s) (Postgres databases)
              └── Schema(s)
ObjectDescription
ProjectTop-level container. Created via w.postgres.create_project().
BranchIsolated database environment with copy-on-write storage. Default branch is production.
ComputePostgres server powering a branch. Configurable CU sizing and autoscaling.
DatabaseStandard Postgres database within a branch. Default is databricks_postgres.

Quick Start

Create a project and connect:

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Project, ProjectSpec

w = WorkspaceClient()

# Create a project (long-running operation)
operation = w.postgres.create_project(
    project=Project(
        spec=ProjectSpec(
            display_name="My Application",
            pg_version="17"
        )
    ),
    project_id="my-app"
)
result = operation.wait()
print(f"Created project: {result.name}")

Common Patterns

Generate OAuth Token

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Generate database credential for connecting (optionally scoped to an endpoint)
cred = w.postgres.generate_database_credential(
    endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
token = cred.token  # Use as password in connection string
# Token expires after 1 hour

Connect from Notebook

import psycopg
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Get endpoint details
endpoint = w.postgres.get_endpoint(
    name="projects/my-app/branches/production/endpoints/ep-primary"
)
host = endpoint.status.hosts.host

# Generate token (scoped to endpoint)
cred = w.postgres.generate_database_credential(
    endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)

# Connect using psycopg3
conn_string = (
    f"host={host} "
    f"dbname=databricks_postgres "
    f"user={w.current_user.me().user_name} "
    f"password={cred.token} "
    f"sslmode=require"
)
with psycopg.connect(conn_string) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT version()")
        print(cur.fetchone())

Create a Branch for Development

from databricks.sdk.service.postgres import Branch, BranchSpec, Duration

# Create a dev branch with 7-day expiration
branch = w.postgres.create_branch(
    parent="projects/my-app",
    branch=Branch(
        spec=BranchSpec(
            source_branch="projects/my-app/branches/production",
            ttl=Duration(seconds=604800)  # 7 days
        )
    ),
    branch_id="development"
).wait()
print(f"Branch created: {branch.name}")

Resize Compute (Autoscaling)

from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMask

# Update compute to autoscale between 2-8 CU
w.postgres.update_endpoint(
    name="projects/my-app/branches/production/endpoints/ep-primary",
    endpoint=Endpoint(
        name="projects/my-app/branches/production/endpoints/ep-primary",
        spec=EndpointSpec(
            autoscaling_limit_min_cu=2.0,
            autoscaling_limit_max_cu=8.0
        )
    ),
    update_mask=FieldMask(field_mask=[
        "spec.autoscaling_limit_min_cu",
        "spec.autoscaling_limit_max_cu"
    ])
).wait()

MCP Tools

The following MCP tools are available for managing Lakebase infrastructure. Use type="autoscale" for Lakebase Autoscaling.

Database (Project) Management

ToolDescription
create_or_update_lakebase_databaseCreate or update a database. Finds by name, creates if new, updates if existing. Use type="autoscale", display_name, pg_version params. A new project auto-creates a production branch, default compute, and databricks_postgres database.
get_lakebase_databaseGet database details (including branches and endpoints) or list all. Pass name to get one, omit to list all. Use type="autoscale" to filter.
delete_lakebase_databaseDelete a project and all its branches, computes, and data. Use type="autoscale".

Branch Management

ToolDescription
create_or_update_lakebase_branchCreate or update a branch with its compute endpoint. Params: project_name, branch_id, source_branch, ttl_seconds, is_protected, plus compute params (autoscaling_limit_min_cu, autoscaling_limit_max_cu, scale_to_zero_seconds).
delete_lakebase_branchDelete a branch and its compute endpoints.

Credentials

ToolDescription
generate_lakebase_credentialGenerate OAuth token for PostgreSQL connections (1-hour expiry). Pass endpoint resource name for autoscale.

Reference Files

  • projects.md - Project management patterns and settings
  • branches.md - Branching workflows, protection, and expiration
  • computes.md - Compute sizing, autoscaling, and scale-to-zero
  • connection-patterns.md - Connection patterns for different use cases
  • reverse-etl.md - Synced tables from Delta Lake to Lakebase

CLI Quick Reference

# Create a project
databricks postgres create-project \
    --project-id my-app \
    --json '{"spec": {"display_name": "My App", "pg_version": "17"}}'

# List projects
databricks postgres list-projects

# Get project details
databricks postgres get-project projects/my-app

# Create a branch
databricks postgres create-branch projects/my-app development \
    --json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'

# List branches
databricks postgres list-branches projects/my-app

# Get endpoint details
databricks postgres get-endpoint projects/my-app/branches/production/endpoints/ep-primary

# Delete a project
databricks postgres delete-project projects/my-app

Key Differences from Lakebase Provisioned

AspectProvisionedAutoscaling
SDK modulew.databasew.postgres
Top-level resourceInstanceProject
CapacityCU_1, CU_2, CU_4, CU_8 (16 GB/CU)0.5-112 CU (2 GB/CU)
BranchingNot supportedFull branching support
Scale-to-zeroNot supportedConfigurable timeout
OperationsSynchronousLong-running operations (LRO)
Read replicasReadable secondariesDedicated read-only endpoints

Common Issues

IssueSolution
Token expired during long queryImplement token refresh loop; tokens expire after 1 hour
Connection refused after scale-to-zeroCompute wakes automatically on connection; reactivation takes a few hundred ms; implement retry logic
DNS resolution fails on macOSUse dig command to resolve hostname, pass hostaddr to psycopg
Branch deletion blockedDelete child branches first; cannot delete branches with children
Autoscaling range too wideMax - min cannot exceed 8 CU (e.g., 8-16 CU is valid, 0.5-32 CU is not)
SSL required errorAlways use sslmode=require in connection string
Update mask requiredAll update operations require an update_mask specifying fields to modify
Connection closed after 24h idleAll connections have a 24-hour idle timeout and 3-day max lifetime; implement retry logic

Current Limitations

These features are NOT yet supported in Lakebase Autoscaling:

  • High availability with readable secondaries (use read replicas instead)
  • Databricks Apps UI integration (Apps can connect manually via credentials)
  • Feature Store integration
  • Stateful AI agents (LangChain memory)
  • Postgres-to-Delta sync (only Delta-to-Postgres reverse ETL)
  • Custom billing tags and serverless budget policies
  • Direct migration from Lakebase Provisioned (use pg_dump/pg_restore or reverse ETL)

SDK Version Requirements

  • Databricks SDK for Python: >= 0.81.0 (for w.postgres module)
  • psycopg: 3.x (supports hostaddr parameter for DNS workaround)
  • SQLAlchemy: 2.x with postgresql+psycopg driver
%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemy

Notes

  • Compute Units in Autoscaling provide ~2 GB RAM each (vs 16 GB in Provisioned).
  • Resource naming follows hierarchical paths: projects/{id}/branches/{id}/endpoints/{id}.
  • All create/update/delete operations are long-running -- use .wait() in the SDK.
  • Tokens are short-lived (1 hour) -- production apps MUST implement token refresh.
  • Postgres versions 16 and 17 are supported.

Related Skills

Repository
databricks-solutions/ai-dev-kit
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.