Implement LangChain rate limiting, retry strategies, and backoff. Use when handling API rate limits, controlling request throughput, or implementing concurrency-safe batch processing. Trigger: "langchain rate limit", "langchain throttling", "langchain backoff", "langchain retry", "API quota", "429 error".
84
82%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Handle API rate limits gracefully with built-in retries, exponential backoff, concurrency control, provider fallbacks, and custom rate limiters.
| Provider | Model | RPM | TPM |
|---|---|---|---|
| OpenAI | gpt-4o | 10,000 | 800,000 |
| OpenAI | gpt-4o-mini | 10,000 | 4,000,000 |
| Anthropic | claude-sonnet | 4,000 | 400,000 |
| Anthropic | claude-haiku | 4,000 | 400,000 |
| gemini-1.5-pro | 360 | 4,000,000 |
RPM = requests/minute, TPM = tokens/minute. Actual limits depend on your tier.
import { ChatOpenAI } from "@langchain/openai";
// Built-in exponential backoff on 429/500/503
const model = new ChatOpenAI({
model: "gpt-4o-mini",
maxRetries: 5, // retries with exponential backoff
timeout: 30000, // 30s timeout per request
});
// This automatically retries on rate limit errors
const response = await model.invoke("Hello");import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
const chain = ChatPromptTemplate.fromTemplate("Summarize: {text}")
.pipe(new ChatOpenAI({ model: "gpt-4o-mini", maxRetries: 3 }))
.pipe(new StringOutputParser());
const inputs = articles.map((text) => ({ text }));
// batch() with maxConcurrency prevents flooding the API
const results = await chain.batch(inputs, {
maxConcurrency: 5, // max 5 parallel requests
});import { ChatOpenAI } from "@langchain/openai";
import { ChatAnthropic } from "@langchain/anthropic";
const primary = new ChatOpenAI({
model: "gpt-4o-mini",
maxRetries: 2,
timeout: 10000,
});
const fallback = new ChatAnthropic({
model: "claude-sonnet-4-20250514",
maxRetries: 2,
});
// Automatically switches to Anthropic if OpenAI rate-limits
const resilientModel = primary.withFallbacks({
fallbacks: [fallback],
});
const chain = prompt.pipe(resilientModel).pipe(new StringOutputParser());class TokenBucketLimiter {
private tokens: number;
private lastRefill: number;
constructor(
private maxTokens: number, // bucket size
private refillRate: number, // tokens per second
) {
this.tokens = maxTokens;
this.lastRefill = Date.now();
}
async acquire(): Promise<void> {
this.refill();
while (this.tokens < 1) {
const waitMs = (1 / this.refillRate) * 1000;
await new Promise((r) => setTimeout(r, waitMs));
this.refill();
}
this.tokens -= 1;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
// Usage: 100 requests per minute
const limiter = new TokenBucketLimiter(100, 100 / 60);
async function rateLimitedInvoke(chain: any, input: any) {
await limiter.acquire();
return chain.invoke(input);
}async function batchWithSemaphore<T>(
chain: { invoke: (input: any) => Promise<T> },
inputs: any[],
maxConcurrent = 5,
): Promise<T[]> {
let active = 0;
const results: T[] = [];
const queue = [...inputs.entries()];
return new Promise((resolve, reject) => {
function next() {
while (active < maxConcurrent && queue.length > 0) {
const [index, input] = queue.shift()!;
active++;
chain.invoke(input)
.then((result) => {
results[index] = result;
active--;
if (queue.length === 0 && active === 0) resolve(results);
else next();
})
.catch(reject);
}
}
next();
});
}
// Process 100 items, 5 at a time
const results = await batchWithSemaphore(chain, inputs, 5);from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.runnables import RunnableConfig
# Built-in retry
llm = ChatOpenAI(model="gpt-4o-mini", max_retries=5, request_timeout=30)
# Fallback
primary = ChatOpenAI(model="gpt-4o-mini", max_retries=2)
fallback = ChatAnthropic(model="claude-sonnet-4-20250514")
robust = primary.with_fallbacks([fallback])
# Batch with concurrency control
results = chain.batch(
[{"text": t} for t in texts],
config=RunnableConfig(max_concurrency=10),
)| Error | Cause | Fix |
|---|---|---|
429 Too Many Requests | Rate limit hit | Increase maxRetries, reduce maxConcurrency |
Timeout | Response too slow | Increase timeout, check network |
QuotaExceeded | Monthly limit hit | Upgrade tier or switch provider |
| Batch partially fails | Some items rate limited | Use .batch() with returnExceptions: true |
Proceed to langchain-security-basics for security best practices.
c8a915c
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.