CtrlK
BlogDocsLog inGet started
Tessl Logo

databricks-model-serving

Deploy and query Databricks Model Serving endpoints. Use when (1) deploying MLflow models or AI agents to endpoints, (2) creating ChatAgent/ResponsesAgent agents, (3) integrating UC Functions or Vector Search tools, (4) querying deployed endpoints, (5) checking endpoint status. Covers classical ML models, custom pyfunc, and GenAI agents.

89

Quality

86%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Databricks Model Serving

Deploy MLflow models and AI agents to scalable REST API endpoints.

Quick Decision: What Are You Deploying?

Model TypePatternReference
Traditional ML (sklearn, xgboost)mlflow.sklearn.autolog()1-classical-ml.md
Custom Python modelmlflow.pyfunc.PythonModel2-custom-pyfunc.md
GenAI Agent (LangGraph, tool-calling)ResponsesAgent3-genai-agents.md

Prerequisites

  • DBR 16.1+ recommended (pre-installed GenAI packages)
  • Unity Catalog enabled workspace
  • Model Serving enabled

Foundation Model API Endpoints

ALWAYS use exact endpoint names from this table. NEVER guess or abbreviate.

Chat / Instruct Models

Endpoint NameProviderNotes
databricks-gpt-5-2OpenAILatest GPT, 400K context
databricks-gpt-5-1OpenAIInstant + Thinking modes
databricks-gpt-5-1-codex-maxOpenAICode-specialized (high perf)
databricks-gpt-5-1-codex-miniOpenAICode-specialized (cost-opt)
databricks-gpt-5OpenAI400K context, reasoning
databricks-gpt-5-miniOpenAICost-optimized reasoning
databricks-gpt-5-nanoOpenAIHigh-throughput, lightweight
databricks-gpt-oss-120bOpenAIOpen-weight, 128K context
databricks-gpt-oss-20bOpenAILightweight open-weight
databricks-claude-opus-4-6AnthropicMost capable, 1M context
databricks-claude-sonnet-4-6AnthropicHybrid reasoning
databricks-claude-sonnet-4-5AnthropicHybrid reasoning
databricks-claude-opus-4-5AnthropicDeep analysis, 200K context
databricks-claude-sonnet-4AnthropicHybrid reasoning
databricks-claude-opus-4-1Anthropic200K context, 32K output
databricks-claude-haiku-4-5AnthropicFastest, cost-effective
databricks-claude-3-7-sonnetAnthropicRetiring April 2026
databricks-meta-llama-3-3-70b-instructMeta128K context, multilingual
databricks-meta-llama-3-1-405b-instructMetaRetiring May 2026 (PT)
databricks-meta-llama-3-1-8b-instructMetaLightweight, 128K context
databricks-llama-4-maverickMetaMoE architecture
databricks-gemini-3-1-proGoogle1M context, hybrid reasoning
databricks-gemini-3-proGoogle1M context, hybrid reasoning
databricks-gemini-3-flashGoogleFast, cost-efficient
databricks-gemini-2-5-proGoogle1M context, Deep Think
databricks-gemini-2-5-flashGoogle1M context, hybrid reasoning
databricks-gemma-3-12bGoogle128K context, multilingual
databricks-qwen3-next-80b-a3b-instructAlibabaEfficient MoE

Embedding Models

Endpoint NameDimensionsMax TokensNotes
databricks-gte-large-en10248192English, not normalized
databricks-bge-large-en1024512English, normalized
databricks-qwen3-embedding-0-6bup to 1024~32K100+ languages, instruction-aware

Common Defaults

  • Agent LLM: databricks-meta-llama-3-3-70b-instruct (good balance of quality/cost)
  • Embedding: databricks-gte-large-en
  • Code tasks: databricks-gpt-5-1-codex-mini or databricks-gpt-5-1-codex-max

These are pay-per-token endpoints available in every workspace. For production, consider provisioned throughput mode. See supported models.

Reference Files

TopicFileWhen to Read
Classical ML1-classical-ml.mdsklearn, xgboost, autolog
Custom PyFunc2-custom-pyfunc.mdCustom preprocessing, signatures
GenAI Agents3-genai-agents.mdResponsesAgent, LangGraph
Tools Integration4-tools-integration.mdUC Functions, Vector Search
Development & Testing5-development-testing.mdMCP workflow, iteration
Logging & Registration6-logging-registration.mdmlflow.pyfunc.log_model
Deployment7-deployment.mdJob-based async deployment
Querying Endpoints8-querying-endpoints.mdSDK, REST, MCP tools
Package Requirements9-package-requirements.mdDBR versions, pip

Quick Start: Deploy a GenAI Agent

Step 1: Install Packages (in notebook or via MCP)

%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic
dbutils.library.restartPython()

Or via MCP:

execute_code(code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic")

Step 2: Create Agent File

Create agent.py locally with ResponsesAgent pattern (see 3-genai-agents.md).

Step 3: Upload to Workspace

upload_to_workspace(
    local_path="./my_agent",
    workspace_path="/Workspace/Users/you@company.com/my_agent"
)

Step 4: Test Agent

execute_code(
    file_path="./my_agent/test_agent.py",
    cluster_id="<cluster_id>"
)

Step 5: Log Model

execute_code(
    file_path="./my_agent/log_model.py",
    cluster_id="<cluster_id>"
)

Step 6: Deploy (Async via Job)

See 7-deployment.md for job-based deployment that doesn't timeout.

Step 7: Query Endpoint

query_serving_endpoint(
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}]
)

Quick Start: Deploy a Classical ML Model

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression

# Enable autolog with auto-registration
mlflow.sklearn.autolog(
    log_input_examples=True,
    registered_model_name="main.models.my_classifier"
)

# Train - model is logged and registered automatically
model = LogisticRegression()
model.fit(X_train, y_train)

Then deploy via UI or SDK. See 1-classical-ml.md.


MCP Tools

If MCP tools are not available, use the SDK/CLI examples in the reference files below.

Development & Testing

ToolPurpose
upload_to_workspaceUpload agent files to workspace
execute_codeInstall packages, test agent, log model

Deployment

ToolPurpose
manage_jobs (action="create")Create deployment job (one-time)
manage_job_runs (action="run_now")Kick off deployment (async)
manage_job_runs (action="get")Check deployment job status

Querying

ToolPurpose
get_serving_endpoint_statusCheck if endpoint is READY
query_serving_endpointSend requests to endpoint
list_serving_endpointsList all endpoints

Common Workflows

Check Endpoint Status After Deployment

get_serving_endpoint_status(name="my-agent-endpoint")

Returns:

{
    "name": "my-agent-endpoint",
    "state": "READY",
    "served_entities": [...]
}

Query a Chat/Agent Endpoint

query_serving_endpoint(
    name="my-agent-endpoint",
    messages=[
        {"role": "user", "content": "What is Databricks?"}
    ],
    max_tokens=500
)

Query a Traditional ML Endpoint

query_serving_endpoint(
    name="sklearn-classifier",
    dataframe_records=[
        {"age": 25, "income": 50000, "credit_score": 720}
    ]
)

Common Issues

IssueSolution
Invalid output formatUse self.create_text_output_item(text, id) - NOT raw dicts!
Endpoint NOT_READYDeployment takes ~15 min. Use get_serving_endpoint_status to poll.
Package not foundSpecify exact versions in pip_requirements when logging model
Tool timeoutUse job-based deployment, not synchronous calls
Auth error on endpointEnsure resources specified in log_model for auto passthrough
Model not foundCheck Unity Catalog path: catalog.schema.model_name

Critical: ResponsesAgent Output Format

WRONG - raw dicts don't work:

return ResponsesAgentResponse(output=[{"role": "assistant", "content": "..."}])

CORRECT - use helper methods:

return ResponsesAgentResponse(
    output=[self.create_text_output_item(text="...", id="msg_1")]
)

Available helper methods:

  • self.create_text_output_item(text, id) - text responses
  • self.create_function_call_item(id, call_id, name, arguments) - tool calls
  • self.create_function_call_output_item(call_id, output) - tool results

Related Skills

Resources

Repository
databricks-solutions/ai-dev-kit
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.