Building applications with LLMs through composability
—
LangChain provides a unified interface for initializing and using chat models from 20+ providers. Instead of importing provider-specific classes, you can use string identifiers like "openai:gpt-4o" or "anthropic:claude-3-5-sonnet-20241022" to initialize models. This approach simplifies switching between providers and makes code more portable.
Chat models are language models optimized for conversational interactions. They generate text responses based on message inputs and support features like tool calling, structured output, and streaming.
Initialize chat models using the init_chat_model() factory function with string identifiers:
def init_chat_model(
model: str | None = None,
*,
model_provider: str | None = None,
configurable_fields: Literal["any"] | list[str] | tuple[str, ...] | None = None,
config_prefix: str | None = None,
**kwargs: Any
) -> BaseChatModelParameters:
model (str | None): Model identifier in format "provider:model-name". Examples: "openai:gpt-4o", "anthropic:claude-3-5-sonnet-20241022". Optional if provider can be inferred.model_provider (str | None): Override provider detection. Useful when the provider cannot be automatically detected from the model string. Optional.configurable_fields (Literal["any"] | list[str] | tuple[str, ...] | None): Which parameters can be set at runtime via config["configurable"]. Use "any" to allow all fields. Optional.config_prefix (str | None): Prefix for configurable parameter names. Optional.**kwargs: Provider-specific parameters (see Common Parameters below). All extra keyword arguments are passed to the provider's model class.Returns: BaseChatModel instance or configurable model wrapper
These parameters are passed as **kwargs to init_chat_model():
temperature (float): Controls randomness (0.0 = deterministic, 2.0 = maximum randomness). Default varies by provider.max_tokens (int): Maximum number of tokens to generate. Default varies by provider.timeout (float): Request timeout in seconds.max_retries (int): Maximum number of automatic retry attempts on failure.base_url (str): Custom API endpoint URL. Useful for proxies or self-hosted models.rate_limiter (BaseRateLimiter): Rate limiter instance to control request rate.api_key, openai_api_key, anthropic_api_key)from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
# Initialize OpenAI model
model = init_chat_model("openai:gpt-4o")
# Generate response
response = model.invoke([
HumanMessage(content="What is the capital of France?")
])
print(response.content) # "The capital of France is Paris."from langchain.chat_models import init_chat_model
# Initialize with custom parameters
model = init_chat_model(
"openai:gpt-4o",
temperature=0.7,
max_tokens=1000,
timeout=30.0,
max_retries=3
)Make parameters configurable at runtime:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
# Make temperature configurable at runtime
model = init_chat_model(
"openai:gpt-4o",
configurable_fields=["temperature"],
temperature=0.5 # default value
)
# Override at runtime
response = model.invoke(
[HumanMessage(content="Hello")],
config={"configurable": {"temperature": 0.9}}
)Control request rate with rate limiters:
from langchain.chat_models import init_chat_model
from langchain.rate_limiters import InMemoryRateLimiter
# Create rate limiter (10 requests per minute)
rate_limiter = InMemoryRateLimiter(
requests_per_second=10/60
)
model = init_chat_model(
"openai:gpt-4o",
rate_limiter=rate_limiter
)LangChain supports 20+ chat model providers. The provider is automatically detected from the model string format "provider:model-name".
Major Providers:
OpenAI (openai): GPT-4, GPT-3.5, O1, O3 models
"openai:gpt-4o", "openai:gpt-4-turbo", "openai:gpt-3.5-turbo", "openai:o1-preview"Anthropic (anthropic): Claude models
"anthropic:claude-3-5-sonnet-20241022", "anthropic:claude-3-opus-20240229"Google Vertex AI (google_vertexai): Gemini models via Google Cloud
"google_vertexai:gemini-1.5-pro", "google_vertexai:gemini-1.5-flash"Google Generative AI (google_genai): Gemini models via Google AI Studio
"google_genai:gemini-1.5-pro", "google_genai:gemini-1.5-flash"AWS Bedrock (bedrock, bedrock_converse): Models on AWS Bedrock
"bedrock:anthropic.claude-3-sonnet-20240229-v1:0", "bedrock:meta.llama3-70b-instruct-v1:0"Azure OpenAI (azure_openai): OpenAI models hosted on Azure
"azure_openai:gpt-4o", "azure_openai:gpt-35-turbo"Additional Providers:
cohere): Command modelsmistralai): Mistral and Mixtral modelsgroq): Fast inference APIollama): Local model servinghuggingface): HuggingFace modelstogether): Together APIfireworks): Fireworks APIdeepseek): DeepSeek modelsxai): Grok modelsperplexity): Perplexity APIupstage): Upstage modelsibm): IBM Watson modelsnvidia): NVIDIA AI endpointsazure_ai): Azure AI servicesgoogle_anthropic_vertex): Anthropic models via Vertex AISee Provider Reference for the complete list of supported providers.
Each provider has its own model naming convention. The general format is "provider:model-name", but the exact model name varies:
# OpenAI
model = init_chat_model("openai:gpt-4o")
model = init_chat_model("openai:gpt-4-turbo")
model = init_chat_model("openai:o1-preview")
# Anthropic
model = init_chat_model("anthropic:claude-3-5-sonnet-20241022")
model = init_chat_model("anthropic:claude-3-opus-20240229")
# Google
model = init_chat_model("google_vertexai:gemini-1.5-pro")
model = init_chat_model("google_genai:gemini-1.5-flash")
# AWS Bedrock (uses provider's full model ID)
model = init_chat_model("bedrock:anthropic.claude-3-sonnet-20240229-v1:0")
model = init_chat_model("bedrock:meta.llama3-70b-instruct-v1:0")
# Local models
model = init_chat_model("ollama:llama2")
model = init_chat_model("ollama:mistral")The BaseChatModel class is the base interface for all chat models. All models returned by init_chat_model() implement this interface.
class BaseChatModel:
"""
Base class for chat models.
All chat models support synchronous and asynchronous execution,
streaming, and batch processing.
"""
def invoke(
self,
messages: list[AnyMessage],
**kwargs: Any
) -> AIMessage: ...
async def ainvoke(
self,
messages: list[AnyMessage],
**kwargs: Any
) -> AIMessage: ...
def stream(
self,
messages: list[AnyMessage],
**kwargs: Any
) -> Iterator[AIMessageChunk]: ...
async def astream(
self,
messages: list[AnyMessage],
**kwargs: Any
) -> AsyncIterator[AIMessageChunk]: ...
def batch(
self,
messages: list[list[AnyMessage]],
**kwargs: Any
) -> list[AIMessage]: ...
async def abatch(
self,
messages: list[list[AnyMessage]],
**kwargs: Any
) -> list[AIMessage]: ...Methods:
invoke(messages, **kwargs) - Execute model synchronously and return complete responseainvoke(messages, **kwargs) - Execute model asynchronously and return complete responsestream(messages, **kwargs) - Stream model response synchronously as chunksastream(messages, **kwargs) - Stream model response asynchronously as chunksbatch(messages, **kwargs) - Execute multiple requests synchronously in batchabatch(messages, **kwargs) - Execute multiple requests asynchronously in batchGenerate a complete response synchronously:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage, SystemMessage
model = init_chat_model("openai:gpt-4o")
response = model.invoke([
SystemMessage(content="You are a helpful assistant."),
HumanMessage(content="What is 2 + 2?")
])
print(response.content) # "2 + 2 equals 4."
print(response.usage_metadata) # Token usage informationGenerate a complete response asynchronously:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
model = init_chat_model("openai:gpt-4o")
response = await model.ainvoke([
HumanMessage(content="Hello!")
])
print(response.content)Stream response as it's generated:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
model = init_chat_model("openai:gpt-4o")
# Synchronous streaming
for chunk in model.stream([HumanMessage(content="Write a poem")]):
print(chunk.content, end="", flush=True)
# Async streaming
async for chunk in model.astream([HumanMessage(content="Write a poem")]):
print(chunk.content, end="", flush=True)Execute multiple requests in a single batch:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
model = init_chat_model("openai:gpt-4o")
# Synchronous batch
responses = model.batch([
[HumanMessage(content="What is 2+2?")],
[HumanMessage(content="What is 3+3?")],
[HumanMessage(content="What is 4+4?")]
])
for response in responses:
print(response.content)
# Async batch
responses = await model.abatch([
[HumanMessage(content="What is 2+2?")],
[HumanMessage(content="What is 3+3?")]
])Many models support tool calling (function calling):
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
from langchain.tools import tool
@tool
def get_weather(location: str) -> str:
"""Get weather for a location."""
return f"Sunny, 72°F in {location}"
model = init_chat_model("openai:gpt-4o")
# Bind tools to model
model_with_tools = model.bind_tools([get_weather])
# Model will return tool calls
response = model_with_tools.invoke([
HumanMessage(content="What's the weather in Paris?")
])
# Check for tool calls
if response.tool_calls:
tool_call = response.tool_calls[0]
print(f"Tool: {tool_call['name']}")
print(f"Args: {tool_call['args']}")Request structured output from models that support it:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
email: str
model = init_chat_model("openai:gpt-4o")
structured_model = model.with_structured_output(Person)
response = structured_model.invoke([
HumanMessage(content="Extract: John Doe, 30 years old, john@example.com")
])
print(response) # Person(name="John Doe", age=30, email="john@example.com")Pass runtime configuration to model calls:
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
model = init_chat_model("openai:gpt-4o")
# Pass configuration
response = model.invoke(
[HumanMessage(content="Hello")],
config={
"run_name": "my_run",
"tags": ["production"],
"metadata": {"user_id": "123"}
}
)Different providers require different authentication methods:
# OpenAI (uses OPENAI_API_KEY environment variable or parameter)
model = init_chat_model("openai:gpt-4o", openai_api_key="sk-...")
# Anthropic (uses ANTHROPIC_API_KEY environment variable or parameter)
model = init_chat_model("anthropic:claude-3-5-sonnet-20241022", anthropic_api_key="sk-ant-...")
# AWS Bedrock (uses AWS credentials from environment/IAM)
model = init_chat_model("bedrock:anthropic.claude-3-sonnet-20240229-v1:0")
# Azure OpenAI (requires deployment name and endpoint)
model = init_chat_model(
"azure_openai:gpt-4o",
azure_deployment="my-gpt4-deployment",
azure_endpoint="https://my-resource.openai.azure.com/",
api_key="..."
)
# Ollama (local, no authentication required)
model = init_chat_model("ollama:llama2")Control randomness and creativity:
# Deterministic (good for factual tasks)
model = init_chat_model("openai:gpt-4o", temperature=0)
# Balanced
model = init_chat_model("openai:gpt-4o", temperature=0.7)
# Creative (good for creative writing)
model = init_chat_model("openai:gpt-4o", temperature=1.5)Use custom endpoints for proxies or self-hosted models:
model = init_chat_model(
"openai:gpt-4o",
base_url="https://my-proxy.example.com/v1"
)The string-based initialization makes it easy to switch providers:
from langchain.chat_models import init_chat_model
import os
# Get provider from environment or default to OpenAI
provider = os.getenv("LLM_PROVIDER", "openai")
model_name = os.getenv("MODEL_NAME", "gpt-4o")
model = init_chat_model(f"{provider}:{model_name}")from typing import Any, Iterator, AsyncIterator, Literal
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import AnyMessage, AIMessage, AIMessageChunk
from langchain_core.rate_limiters import BaseRateLimiterInstall with Tessl CLI
npx tessl i tessl/pypi-langchain