Use the domino-data Python SDK (dominodatalab-data) for programmatic data access in Domino. Covers DataSourceClient for SQL queries and object storage, DatasetClient for dataset files, TrainingSets for ML data versioning, Feature Store, and VectorDB (Pinecone) integration. Use when querying data sources, downloading datasets, managing training sets, or working with vector databases in Domino.
80
—
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
This skill provides comprehensive knowledge for working with the domino-data Python SDK (dominodatalab-data) - the official library for Domino's Access Data features.
# Via pip
pip install -U dominodatalab-data
# Via Poetry
poetry add dominodatalab-data
# In Domino environment (requirements.txt)
dominodatalab-data>=6.0.0| Module | Purpose |
|---|---|
DataSourceClient | Query SQL databases and access object stores |
DatasetClient | Read files from Domino Datasets |
TrainingSets | Version and manage ML training data |
Feature Store | Manage ML features with Git integration |
VectorDB | Pinecone vector database integration |
from domino_data.data_sources import DataSourceClient
# Initialize client (auto-configured in Domino)
client = DataSourceClient()
# Get a data source by name
ds = client.get_datasource("my-redshift-db")
# Execute SQL query
result = ds.query("SELECT * FROM customers WHERE region = 'US'")
# Convert to pandas DataFrame
df = result.to_pandas()
# Or save to parquet
result.to_parquet("output.parquet")from domino_data.data_sources import DataSourceClient
client = DataSourceClient()
ds = client.get_datasource("my-s3-bucket")
# List objects
objects = ds.list_objects(prefix="data/", page_size=100)
# Download a file
ds.download_file("data/input.csv", "local_input.csv")
# Upload a file
ds.put("data/output.csv", open("results.csv", "rb").read())
# Get signed URL
url = ds.get_key_url("data/file.csv", is_read_write=False)from domino_data.datasets import DatasetClient
client = DatasetClient()
# Get dataset by name
dataset = client.get_dataset("training-data")
# List files
files = dataset.list_files(prefix="images/")
# Download file
dataset.download("model.pkl", "local_model.pkl", max_workers=4)
# Get file content as bytes
content = dataset.get("config.json")from domino_data.training_sets import (
create_training_set_version,
get_training_set,
list_training_sets
)
import pandas as pd
# Create training set version from DataFrame
df = pd.DataFrame({
"id": [1, 2, 3],
"feature_a": [0.1, 0.2, 0.3],
"label": [1, 0, 1]
})
version = create_training_set_version(
training_set_name="customer-churn",
df=df,
key_columns=["id"],
description="Initial training data"
)
# Get training set
ts = get_training_set("customer-churn")
# List all training sets
all_sets = list_training_sets()from domino_data.vectordb import (
domino_pinecone3x_init_params,
domino_pinecone3x_index_params
)
from pinecone import Pinecone
# Initialize Pinecone client with Domino credentials
init_params = domino_pinecone3x_init_params("my-pinecone-ds")
pc = Pinecone(**init_params)
# Get index parameters
index_params = domino_pinecone3x_index_params("my-pinecone-ds", "embeddings")
index = pc.Index(**index_params)
# Query vectors
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
include_metadata=True
)The library auto-configures authentication inside Domino workspaces and jobs using injected environment variables:
# Environment variables used automatically inside Domino:
# DOMINO_TOKEN_FILE - Token file location (preferred, short-lived)
# DOMINO_API_PROXY - API proxy URL
# DOMINO_DATA_API_GATEWAY - Data API gateway (default: http://127.0.0.1:8766)
#
# DOMINO_USER_API_KEY - Legacy API key (deprecated, will be removed)For external use (e.g., CI/CD outside a Domino execution):
Note:
DOMINO_USER_API_KEYis deprecated and will be removed in a future Domino release. Prefer running data-SDK code from inside a Domino workspace or job where token-based auth is injected automatically.
import os
os.environ["DOMINO_USER_API_KEY"] = "your-api-key" # deprecated
os.environ["DOMINO_API_HOST"] = "https://your-domino.com"
from domino_data.data_sources import DataSourceClient
client = DataSourceClient()from domino_data.data_sources import DominoError, UnauthenticatedError
try:
result = ds.query("SELECT * FROM table")
except UnauthenticatedError:
print("Authentication failed - check API key")
except DominoError as e:
print(f"Domino error: {e}")max_workers for large filespage_size when listing many objectsdominodatalab-data47c6e0a
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.