Building applications with LLMs through composability
—
The middleware system provides powerful customization of agent behavior through lifecycle hooks and execution wrappers. Middleware allows you to intercept and modify agent execution at key points: before/after agent execution, before/after model calls, and with full control over model and tool call execution.
Middleware is composable - you can combine multiple middleware plugins to build sophisticated agent behaviors like retry logic, fallback models, human-in-the-loop workflows, and more.
Lifecycle hooks allow you to run code at specific points in the agent execution lifecycle. Hooks receive the current state or request/response objects and can modify them before returning.
Run code once at the start of agent execution, before any model calls:
def before_agent(func: Callable[[AgentState], AgentState]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import before_agent, AgentState
@before_agent
def log_start(state: AgentState) -> AgentState:
print(f"Starting agent with {len(state['messages'])} messages")
return stateAsync Support:
The before_agent decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import before_agent
@before_agent
async def async_log_start(state: AgentState) -> AgentState:
print("Starting agent execution")
return stateRun code before each model invocation. Useful for modifying prompts, logging, or controlling flow:
def before_model(func: Callable[[ModelRequest], ModelRequest]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import before_model, ModelRequest
@before_model
def log_model_call(request: ModelRequest) -> ModelRequest:
print(f"Calling model with {len(request['state']['messages'])} messages")
return requestWith Flow Control:
from langchain.agents.middleware import before_model, hook_config
@before_model
@hook_config(can_jump_to=["tools", "model", "end"])
def conditional_skip(request: ModelRequest) -> ModelRequest:
# Skip model call if too many messages
if len(request['state']['messages']) > 100:
request['state']['jump_to'] = "end"
return requestAsync Support:
The before_model decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import before_model
@before_model
async def async_before(request: ModelRequest) -> ModelRequest:
return requestRun code after each model invocation. Useful for logging responses, modifying output, or controlling flow:
def after_model(func: Callable[[ModelResponse], ModelResponse]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import after_model, ModelResponse
@after_model
def log_model_response(response: ModelResponse) -> ModelResponse:
print(f"Model returned: {response}")
return responseWith Flow Control:
from langchain.agents.middleware import after_model, hook_config
@after_model
@hook_config(can_jump_to=["tools", "model", "end"])
def force_retry(response: ModelResponse) -> ModelResponse:
# Retry model call if response is empty
if not response.get("content"):
response['state']['jump_to'] = "model"
return responseAsync Support:
The after_model decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import after_model
@after_model
async def async_after(response: ModelResponse) -> ModelResponse:
return responseRun code once at the end of agent execution, after all processing is complete:
def after_agent(func: Callable[[AgentState], AgentState]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import after_agent, AgentState
@after_agent
def log_completion(state: AgentState) -> AgentState:
print(f"Agent completed with {len(state['messages'])} messages")
return stateAsync Support:
The after_agent decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import after_agent
@after_agent
async def async_log_completion(state: AgentState) -> AgentState:
return stateExecution wrappers provide complete control over model and tool execution. Unlike hooks, wrappers receive a handler callback that performs the actual execution, allowing you to implement retry logic, fallbacks, caching, and more.
Wrap model execution with custom logic:
def wrap_model_call(func: Callable[[Callable, ModelRequest], ModelResponse]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
@wrap_model_call
def retry_model(handler: Callable, request: ModelRequest) -> ModelResponse:
"""Retry model call up to 3 times on failure."""
for attempt in range(3):
try:
return handler(request)
except Exception as e:
if attempt == 2:
raise
print(f"Retry {attempt + 1} after error: {e}")
return handler(request)Async Support:
The wrap_model_call decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import wrap_model_call
@wrap_model_call
async def async_retry_model(handler: Callable, request: ModelRequest) -> ModelResponse:
try:
return await handler(request)
except Exception:
return await handler(request) # Retry onceUse Cases:
Wrap tool execution with custom logic:
def wrap_tool_call(func: Callable[[Callable, ToolCallRequest], Any]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import wrap_tool_call
@wrap_tool_call
def cache_tool_calls(handler: Callable, request: ToolCallRequest) -> Any:
"""Cache tool call results."""
cache_key = f"{request['tool_name']}:{request['tool_args']}"
if cache_key in cache:
return cache[cache_key]
result = handler(request)
cache[cache_key] = result
return resultAsync Support:
The wrap_tool_call decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import wrap_tool_call
@wrap_tool_call
async def async_wrap_tool(handler: Callable, request: ToolCallRequest) -> Any:
return await handler(request)Use Cases:
Generate system prompts dynamically based on the request context:
def dynamic_prompt(func: Callable[[ModelRequest], str]) -> AgentMiddleware: ...Decorator Usage:
from langchain.agents.middleware import dynamic_prompt, ModelRequest
@dynamic_prompt
def time_aware_prompt(request: ModelRequest) -> str:
"""Add current time to system prompt."""
from datetime import datetime
current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
return f"You are a helpful assistant. Current time: {current_time}"Async Support:
The dynamic_prompt decorator automatically detects and supports async functions. Simply define your function as async:
from langchain.agents.middleware import dynamic_prompt
@dynamic_prompt
async def async_dynamic_prompt(request: ModelRequest) -> str:
return "Dynamic system prompt"The hook_config decorator marks valid jump destinations for flow control:
def hook_config(can_jump_to: list[str]) -> Callable: ...Usage:
from langchain.agents.middleware import before_model, hook_config
@before_model
@hook_config(can_jump_to=["tools", "model", "end"])
def conditional_jump(request: ModelRequest) -> ModelRequest:
if some_condition:
request['state']['jump_to'] = "end"
return requestValid Jump Targets:
"tools" - Jump to tool execution"model" - Jump to model call (useful for retries)"end" - Jump to end of executionAll middleware inherits from the AgentMiddleware base class:
class AgentMiddleware:
"""
Base class for middleware plugins.
Middleware can be created by subclassing this class or by using
the decorator functions (before_model, after_model, etc.).
"""
passCustom Middleware Class:
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
class CustomMiddleware(AgentMiddleware):
def __init__(self, config: dict):
self.config = config
def before_model(self, request: ModelRequest) -> ModelRequest:
# Custom logic
return request
def after_model(self, response: ModelResponse) -> ModelResponse:
# Custom logic
return responseLangChain provides several pre-built middleware classes for common use cases:
Automatically retry model calls on failure:
class ModelRetryMiddleware(AgentMiddleware):
"""
Retry model calls on failure with configurable attempts and backoff.
Parameters:
max_retries: Maximum number of retry attempts
backoff_factor: Exponential backoff multiplier
retry_on: Exception types to retry on
"""
def __init__(
self,
max_retries: int = 3,
backoff_factor: float = 2.0,
retry_on: tuple[type[Exception], ...] = (Exception,)
): ...Usage:
from langchain.agents import create_agent
from langchain.agents.middleware import ModelRetryMiddleware
agent = create_agent(
model="openai:gpt-4o",
middleware=[ModelRetryMiddleware(max_retries=3)]
)Switch to fallback model on error:
class ModelFallbackMiddleware(AgentMiddleware):
"""
Use fallback model if primary model fails.
Parameters:
fallback_models: List of fallback model identifiers to try in order
"""
def __init__(self, fallback_models: list[str]): ...Usage:
from langchain.agents.middleware import ModelFallbackMiddleware
agent = create_agent(
model="openai:gpt-4o",
middleware=[ModelFallbackMiddleware(
fallback_models=["anthropic:claude-3-5-sonnet-20241022", "openai:gpt-3.5-turbo"]
)]
)Limit the number of tool calls per execution:
class ToolCallLimitMiddleware(AgentMiddleware):
"""
Limit the number of tool calls per agent execution.
Parameters:
max_tool_calls: Maximum number of tool calls allowed
"""
def __init__(self, max_tool_calls: int): ...Usage:
from langchain.agents.middleware import ToolCallLimitMiddleware
agent = create_agent(
model="openai:gpt-4o",
tools=[...],
middleware=[ToolCallLimitMiddleware(max_tool_calls=10)]
)Retry tool calls on failure:
class ToolRetryMiddleware(AgentMiddleware):
"""
Retry tool calls on failure.
Parameters:
max_retries: Maximum number of retry attempts per tool call
"""
def __init__(self, max_retries: int = 3): ...Usage:
from langchain.agents.middleware import ToolRetryMiddleware
agent = create_agent(
model="openai:gpt-4o",
tools=[...],
middleware=[ToolRetryMiddleware(max_retries=2)]
)Pause execution for human confirmation or input:
class HumanInTheLoopMiddleware(AgentMiddleware):
"""
Pause agent execution for human review and approval.
Parameters:
interrupt_on: Configuration for when to interrupt
"""
def __init__(self, interrupt_on: InterruptOnConfig): ...
class InterruptOnConfig:
"""Configuration for human-in-the-loop interruptions."""
passUsage:
from langchain.agents.middleware import HumanInTheLoopMiddleware
agent = create_agent(
model="openai:gpt-4o",
middleware=[HumanInTheLoopMiddleware(interrupt_on=...)]
)Emulate tool calls using LLM when tools are not available:
class LLMToolEmulator(AgentMiddleware):
"""
Emulate tool execution using LLM calls instead of actual tool execution.
Useful for simulation or when tools are unavailable.
"""
def __init__(self): ...Use LLM to intelligently select which tools to use:
class LLMToolSelectorMiddleware(AgentMiddleware):
"""
Use LLM to select relevant tools before execution.
Useful when agent has many tools available.
"""
def __init__(self): ...Search filesystem for files:
class FilesystemFileSearchMiddleware(AgentMiddleware):
"""
Provide file search capabilities to the agent.
Parameters:
search_paths: Directories to search
file_patterns: File patterns to match
"""
def __init__(
self,
search_paths: list[str],
file_patterns: list[str] = ["*"]
): ...Execute shell commands with security policies:
class ShellToolMiddleware(AgentMiddleware):
"""
Allow agent to execute shell commands with execution policy controls.
Parameters:
execution_policy: Policy controlling what commands can be executed
redaction_rules: Rules for redacting sensitive output
"""
def __init__(
self,
execution_policy: HostExecutionPolicy | DockerExecutionPolicy | CodexSandboxExecutionPolicy,
redaction_rules: list[RedactionRule] = []
): ...
class HostExecutionPolicy:
"""Execute commands on host system."""
pass
class DockerExecutionPolicy:
"""Execute commands in Docker container."""
pass
class CodexSandboxExecutionPolicy:
"""Execute commands in Codex sandbox."""
pass
class RedactionRule:
"""Rule for redacting sensitive output."""
passSummarize long conversations to manage context length:
class SummarizationMiddleware(AgentMiddleware):
"""
Automatically summarize conversation history when it becomes too long.
Parameters:
max_tokens: Maximum tokens before summarization
summary_prompt: Prompt template for summarization
"""
def __init__(
self,
max_tokens: int = 4000,
summary_prompt: str | None = None
): ...Detect and redact personally identifiable information:
class PIIMiddleware(AgentMiddleware):
"""
Detect and redact PII from messages.
Parameters:
pii_types: Types of PII to detect (email, phone, ssn, etc.)
redact: Whether to redact or raise error
"""
def __init__(
self,
pii_types: list[str],
redact: bool = True
): ...
class PIIDetectionError(Exception):
"""Raised when PII is detected and redact=False."""
passManage todo lists within agent execution:
class TodoListMiddleware(AgentMiddleware):
"""
Track and manage todo items during agent execution.
"""
def __init__(self): ...Limit total number of model calls:
class ModelCallLimitMiddleware(AgentMiddleware):
"""
Limit total number of model calls in agent execution.
Parameters:
max_calls: Maximum number of model calls allowed
"""
def __init__(self, max_calls: int): ...Edit message context during execution:
class ContextEditingMiddleware(AgentMiddleware):
"""
Edit and manipulate message context during execution.
Parameters:
edits: List of edit operations to apply
"""
def __init__(self, edits: list): ...
class ClearToolUsesEdit:
"""Edit operation to clear tool usage from context."""
passfrom typing import TypedDict, Callable, Any
from dataclasses import dataclass
@dataclass
class ModelRequest:
"""
Request object passed to before_model and wrap_model_call.
Attributes:
state: Current agent state
runtime: Execution runtime context
model_settings: Model configuration settings
"""
state: AgentState
runtime: Any
model_settings: Any
@dataclass
class ModelResponse:
"""
Response object from model call, passed to after_model.
Attributes:
result: List of messages returned from the model
structured_response: Structured output data (if using response_format)
"""
result: list[BaseMessage]
structured_response: Any = None
class ToolCallRequest(TypedDict):
"""
Request object passed to wrap_tool_call.
Attributes:
tool_name: Name of tool being called
tool_args: Arguments for tool call
tool_call_id: Unique identifier for tool call
"""
tool_name: str
tool_args: dict
tool_call_id: str
class AgentState(TypedDict):
"""
Base state schema for agent execution.
Attributes:
messages: Conversation history
structured_response: Structured output (if using response_format)
jump_to: Control flow target (ephemeral)
"""
messages: list[AnyMessage]
structured_response: Any
jump_to: strMiddleware is composable - pass a list to create_agent():
from langchain.agents import create_agent
from langchain.agents.middleware import (
ModelRetryMiddleware,
ToolCallLimitMiddleware,
SummarizationMiddleware
)
agent = create_agent(
model="openai:gpt-4o",
tools=[...],
middleware=[
ModelRetryMiddleware(max_retries=3),
ToolCallLimitMiddleware(max_tool_calls=10),
SummarizationMiddleware(max_tokens=4000)
]
)from langchain.agents.middleware import before_model, AgentState
@before_model
def add_context(request: ModelRequest) -> ModelRequest:
state = request['state']
# Access custom state fields
user_name = state.get('user_name', 'User')
# Modify messages
state['messages'].insert(0, SystemMessage(
content=f"The user's name is {user_name}"
))
return requestfrom langchain.agents.middleware import after_model, hook_config
@after_model
@hook_config(can_jump_to=["tools", "model", "end"])
def quality_check(response: ModelResponse) -> ModelResponse:
content = response.get('content', '')
# Force retry if response is too short
if len(content) < 10:
response['state']['jump_to'] = "model"
# Skip tools and end if no tool calls needed
if not response.get('tool_calls'):
response['state']['jump_to'] = "end"
return responseInstall with Tessl CLI
npx tessl i tessl/pypi-langchain