Access external LLM providers through Domino AI Gateway - a secure proxy with centralized API key management, usage monitoring, and compliance. Supports OpenAI, AWS Bedrock, Azure OpenAI, Anthropic, and more. Use when calling LLMs from Domino, configuring AI Gateway endpoints, or monitoring LLM usage and costs.
61
71%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/ai-gateway/SKILL.mdThis skill helps users work with Domino AI Gateway - a secure proxy for accessing external Large Language Model (LLM) providers with centralized management, monitoring, and compliance.
Activate this skill when users want to:
Domino AI Gateway provides:
| Provider | Models |
|---|---|
| OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 |
| AWS Bedrock | Claude, Titan, Llama 2 |
| Azure OpenAI | GPT-4, GPT-3.5 |
| Anthropic | Claude 3, Claude 2 |
| Google Vertex AI | PaLM, Gemini |
| Cohere | Command, Embed |
openai-gpt4)# Create endpoint via Domino API
import requests, os
TOKEN = requests.get("http://localhost:8899/access-token").text.strip()
BASE = os.environ["DOMINO_API_HOST"]
response = requests.post(
f"{BASE}/api/aigateway/v1/endpoints",
headers={"Authorization": f"Bearer {TOKEN}"},
json={
"name": "openai-gpt4",
"provider": "openai",
"model": "gpt-4",
"providerApiKey": "sk-..."
}
)AI Gateway provides an OpenAI-compatible interface:
from openai import OpenAI
# Configure client to use AI Gateway
client = OpenAI(
api_key="not-needed", # Handled by AI Gateway
base_url="https://your-domino.com/api/aigateway/v1/openai"
)
# Use like standard OpenAI
response = client.chat.completions.create(
model="openai-gpt4", # Your endpoint name
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai-gpt4", # Endpoint name
openai_api_key="not-needed",
openai_api_base="https://your-domino.com/api/aigateway/v1/openai"
)
response = llm.invoke("What is machine learning?")
print(response.content)import requests, os
TOKEN = requests.get("http://localhost:8899/access-token").text.strip()
BASE = os.environ["DOMINO_API_HOST"]
response = requests.post(
f"{BASE}/api/aigateway/v1/chat/completions",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {TOKEN}",
},
json={
"model": "openai-gpt4",
"messages": [{"role": "user", "content": "Hello!"}]
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])Configure who can use each endpoint:
# Via UI: Endpoints > Gateway LLMs > Download logs
# Logs include:
# - Timestamp
# - User
# - Model
# - Input/Output tokens
# - Response time
# - Status{
"timestamp": "2024-01-15T10:30:00Z",
"user": "user@company.com",
"endpoint": "openai-gpt4",
"model": "gpt-4",
"inputTokens": 150,
"outputTokens": 200,
"durationMs": 1500,
"status": "success"
}AI Gateway tracks token usage per:
Admins can configure:
# Define endpoint once
LLM_ENDPOINT = "production-gpt4"
# Use throughout code
response = client.chat.completions.create(
model=LLM_ENDPOINT,
messages=[...]
)import time
from openai import RateLimitError
def call_llm_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="openai-gpt4",
messages=messages
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raiseimport logging
logger = logging.getLogger(__name__)
def query_llm(prompt):
logger.info(f"Querying LLM with prompt length: {len(prompt)}")
response = client.chat.completions.create(
model="openai-gpt4",
messages=[{"role": "user", "content": prompt}]
)
logger.info(f"Response tokens: {response.usage.total_tokens}")
return response.choices[0].message.content# Streaming response
stream = client.chat.completions.create(
model="openai-gpt4",
messages=[{"role": "user", "content": "Write a long story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Error: 401 UnauthorizedError: 429 Too Many RequestsError: Model 'model-name' not foundBefore writing or verifying any API call, use the cluster swagger to confirm current endpoint paths and field names. Use public docs for workflow context and field explanations.
Get the cluster base URL: $DOMINO_API_HOST (injected by Domino into every workspace, job, and app).
Fetch the swagger spec:
# No authentication required for the public API spec
curl "$DOMINO_API_HOST/assets/public-api.json"
# Browser UI: $DOMINO_API_HOST/assets/lib/swagger-ui/index.html?url=/assets/public-api.json#/Public docs (workflow context and field explanations):
47c6e0a
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.