tessl install tessl/npm-langsmith@0.4.3TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.
What NOT to do when using LangSmith - a comprehensive guide to avoiding common pitfalls.
This guide documents anti-patterns, common mistakes, and their corrections. Following these guidelines prevents performance issues, data loss, security vulnerabilities, and incorrect behavior.
Problem: Creates unnecessary overhead and loses trace context.
Don't:
async function makeOpenAICall(prompt: string) {
// BAD: Creates new wrapper on every call
const openai = wrapOpenAI(new OpenAI());
return await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: prompt }]
});
}Do:
// GOOD: Create wrapper once at module level
const openai = wrapOpenAI(new OpenAI(), {
projectName: "my-project"
});
async function makeOpenAICall(prompt: string) {
return await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: prompt }]
});
}Problem: Multiple wrappers conflict and cause duplicate traces.
Don't:
const openai = new OpenAI();
const wrapped1 = wrapOpenAI(openai);
const wrapped2 = wrapOpenAI(wrapped1); // BAD: Double wrappingDo:
const openai = new OpenAI();
const wrappedOpenAI = wrapOpenAI(openai); // GOOD: Wrap onceProblem: Traces may not upload before function terminates.
Don't:
export const handler = async (event) => {
await tracedFunction(event);
return { statusCode: 200 }; // BAD: Traces may be lost
};Do:
import { Client } from "langsmith";
const client = new Client();
export const handler = async (event) => {
try {
const result = await tracedFunction(event);
// GOOD: Ensure traces upload before return
await client.awaitPendingTraceBatches();
return { statusCode: 200, body: result };
} catch (error) {
await client.awaitPendingTraceBatches(); // Also flush on error
throw error;
}
};Problem: Subpath exports are required - importing from main export fails.
Don't:
// BAD: Won't work - traceable not exported from main
import { traceable } from "langsmith";
// BAD: Won't work - evaluate not in main export
import { evaluate } from "langsmith";Do:
// GOOD: Use subpath exports
import { traceable } from "langsmith/traceable";
import { evaluate } from "langsmith/evaluation";
import { wrapOpenAI } from "langsmith/wrappers/openai";Problem: Blocking mode adds latency to every traced call.
Don't:
const client = new Client({
blockOnRootRunFinalization: true // BAD: Blocks on every root run
});Do:
const client = new Client({
blockOnRootRunFinalization: false, // GOOD: Non-blocking async upload
autoBatchTracing: true
});
// Flush only when needed (shutdown, critical operations)
await client.awaitPendingTraceBatches();Problem: Makes traces hard to understand and debug.
Don't:
const fn1 = traceable(async (x) => x, { name: "fn1" }); // BAD: Generic name
const process = traceable(async (x) => x, { name: "process" }); // BAD: Too vague
const func = traceable(async (x) => x); // BAD: No name (uses function name)Do:
const extractEntities = traceable(async (text) => {...}, {
name: "extract-entities",
run_type: "tool"
});
const generateSummary = traceable(async (doc) => {...}, {
name: "generate-summary",
run_type: "llm"
});Problem: Security risk, not portable across environments.
Don't:
const client = new Client({
apiKey: "lsv2_pt_abc123..." // BAD: Hardcoded secret
});Do:
// GOOD: Use environment variables
const client = new Client({
apiKey: process.env.LANGCHAIN_API_KEY
});
// BEST: Use default env-based client
const client = new Client(); // Reads from LANGCHAIN_API_KEY automaticallyProblem: Unnecessary cost and potential performance impact.
Don't:
const client = new Client({
tracingSamplingRate: 1.0 // BAD: Traces every single request
});Do:
const client = new Client({
tracingSamplingRate: process.env.NODE_ENV === "production" ? 0.1 : 1.0,
// GOOD: Sample 10% in prod, 100% in dev
});Problem: Silent failures or unclear error messages.
Don't:
const client = new Client();
// BAD: May fail later with unclear error
await client.createRun({...});Do:
if (!process.env.LANGCHAIN_API_KEY) {
console.warn("LANGCHAIN_API_KEY not set - tracing disabled");
}
const client = new Client();
// Or: Verify at startup
try {
const config = Client.getDefaultClientConfig();
if (!config.apiKey) {
throw new Error("LANGCHAIN_API_KEY environment variable is required");
}
} catch (error) {
console.error("LangSmith configuration error:", error);
process.exit(1);
}Problem: Increases network overhead and reduces performance.
Don't:
const client = new Client({
autoBatchTracing: false // BAD: Unless debugging, keep enabled
});Do:
// GOOD: Use batching in production
const client = new Client({
autoBatchTracing: true,
batchSizeBytesLimit: 20_000_000
});
// ACCEPTABLE: Disable only for debugging specific issues
const debugClient = new Client({
autoBatchTracing: false, // OK for debugging
debug: true
});Problem: Examples must belong to a dataset.
Don't:
await client.createExample({
inputs: { question: "What is AI?" },
outputs: { answer: "AI is..." }
// BAD: Missing dataset_id or datasetName
});Do:
await client.createExample({
dataset_id: datasetId, // GOOD: Always specify dataset
inputs: { question: "What is AI?" },
outputs: { answer: "AI is..." }
});Problem: Creates duplicate datasets or fails with unclear errors.
Don't:
// BAD: May create duplicate
await client.createDataset({
datasetName: "my-dataset"
});Do:
// GOOD: Check existence first
const exists = await client.hasDataset({ datasetName: "my-dataset" });
if (!exists) {
await client.createDataset({
datasetName: "my-dataset",
description: "My test dataset"
});
}
// BETTER: Use upsert
await client.createDataset({
datasetName: "my-dataset",
description: "My test dataset",
upsert: true // Creates or uses existing
});Problem: Arrays must be parallel - same length.
Don't:
await client.createExamples({
datasetName: "qa",
inputs: [{ q: "A" }, { q: "B" }, { q: "C" }], // 3 items
outputs: [{ a: "1" }, { a: "2" }] // BAD: Only 2 items
});Do:
// GOOD: Ensure parallel arrays have same length
await client.createExamples({
datasetName: "qa",
inputs: [{ q: "A" }, { q: "B" }, { q: "C" }],
outputs: [{ a: "1" }, { a: "2" }, { a: "3" }] // GOOD: 3 items
});
// BETTER: Use examples array to avoid this issue
await client.createExamples({
datasetName: "qa",
examples: [
{ inputs: { q: "A" }, outputs: { a: "1" } },
{ inputs: { q: "B" }, outputs: { a: "2" } },
{ inputs: { q: "C" }, outputs: { a: "3" } }
]
});Problem: API keys, passwords, tokens visible in LangSmith UI.
Don't:
const traced = traceable(
async (apiKey: string, data: any) => {
// BAD: API key will be in trace inputs
return await callExternalAPI(apiKey, data);
},
{ name: "api-call" }
);Do:
const traced = traceable(
async (apiKey: string, data: any) => {
return await callExternalAPI(apiKey, data);
},
{
name: "api-call",
processInputs: (inputs) => ({
data: inputs.data
// GOOD: API key not included in logged inputs
})
}
);Problem: User emails, SSNs, phone numbers in traces.
Don't:
const processUser = traceable(
async (user: { email: string; ssn: string }) => {
// BAD: PII will be logged
return await analyze(user);
}
);Do:
import { createAnonymizer } from "langsmith/anonymizer";
const anonymizer = createAnonymizer([
{ pattern: /\b[\w\.-]+@[\w\.-]+\.\w+\b/g, replace: "[EMAIL]" },
{ pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replace: "[SSN]" }
]);
const processUser = traceable(
async (user: { email: string; ssn: string }) => {
return await analyze(user);
},
{
processInputs: anonymizer,
processOutputs: anonymizer
}
);Problem: Metadata is visible and searchable.
Don't:
const traced = traceable(
async (data) => {...},
{
metadata: {
userId: "user-123",
apiKey: "sk-...", // BAD: Secret in metadata
creditCard: "4111..." // BAD: PII in metadata
}
}
);Do:
const traced = traceable(
async (data) => {...},
{
metadata: {
userId: "user-123", // OK: Non-sensitive ID
hasApiKey: true, // GOOD: Boolean instead of value
paymentMethod: "credit" // GOOD: Category, not actual number
}
}
);Problem: One failing evaluator stops entire evaluation.
Don't:
const riskyEvaluator = async ({ run }) => {
const score = run.outputs.value / run.inputs.divisor; // BAD: May throw division by zero
return { key: "score", score };
};
await evaluate(target, {
data: "dataset",
evaluators: [riskyEvaluator] // BAD: Will crash on error
});Do:
const safeEvaluator = async ({ run }) => {
try {
const score = run.outputs.value / run.inputs.divisor;
return { key: "score", score };
} catch (error) {
// GOOD: Handle errors gracefully
return {
key: "score",
score: null,
comment: `Evaluation failed: ${error.message}`
};
}
};
await evaluate(target, {
data: "dataset",
evaluators: [safeEvaluator]
});Problem: Blocks event loop during evaluation.
Don't:
const slowEvaluator = ({ run, example }) => {
// BAD: Synchronous heavy computation
for (let i = 0; i < 1000000000; i++) {
// blocking work
}
return { key: "quality", score: 1 };
};Do:
const asyncEvaluator = async ({ run, example }) => {
// GOOD: Async operation
const score = await computeQualityAsync(run.outputs);
return { key: "quality", score };
};Problem: Evaluation fails to find dataset.
Don't:
await evaluate(target, {
data: "my dataset" // BAD: Spaces in name may cause issues
});Do:
await evaluate(target, {
data: "my-dataset" // GOOD: Use kebab-case or underscores
});
// BETTER: Use dataset ID for certainty
await evaluate(target, {
data: datasetId // BEST: Unambiguous
});Problem: Loading all runs into memory causes OOM errors.
Don't:
// BAD: Tries to load all runs into array
const allRuns = [];
for await (const run of client.listRuns({ projectName: "big-project" })) {
allRuns.push(run); // BAD: May run out of memory
}Do:
// GOOD: Process runs one at a time
let processedCount = 0;
for await (const run of client.listRuns({ projectName: "big-project" })) {
await processRun(run); // GOOD: Stream processing
processedCount++;
// GOOD: Safety limit
if (processedCount >= 10000) {
console.warn("Reached safety limit");
break;
}
}Problem: Tries to fetch unlimited results.
Don't:
// BAD: No limit - may timeout or OOM
for await (const run of client.listRuns({ projectName: "huge-project" })) {
console.log(run.name);
}Do:
// GOOD: Set reasonable limit
for await (const run of client.listRuns({
projectName: "huge-project",
limit: 1000 // GOOD: Explicit limit
})) {
console.log(run.name);
}Problem: Unhandled rejections crash application.
Don't:
// BAD: No error handling
const run = await client.readRun(runId);
console.log(run.name);Do:
// GOOD: Handle expected errors
try {
const run = await client.readRun(runId);
console.log(run.name);
} catch (error) {
if (error.status === 404) {
console.error("Run not found");
} else if (error.status === 401) {
console.error("Authentication failed - check LANGCHAIN_API_KEY");
} else {
console.error("API error:", error.message);
}
}Problem: Memory leaks from unposted runs.
Don't:
for (let i = 0; i < 10000; i++) {
const run = new RunTree({ name: `run-${i}`, run_type: "tool" });
await run.end({ result: i });
// BAD: Never posted - accumulates in memory
}Do:
for (let i = 0; i < 10000; i++) {
const run = new RunTree({ name: `run-${i}`, run_type: "tool" });
await run.end({ result: i });
await run.postRun(); // GOOD: Post immediately
}Problem: Large metadata increases trace size and cost.
Don't:
const traced = traceable(
async (input) => {...},
{
metadata: {
fullConfig: entireConfigObject, // BAD: Large object
history: allPreviousRequests, // BAD: Huge array
debug: complexDebugInfo // BAD: Verbose data
}
}
);Do:
const traced = traceable(
async (input) => {...},
{
metadata: {
configVersion: "v1.2.3", // GOOD: Minimal reference
requestCount: 42, // GOOD: Summary metric
debugEnabled: true // GOOD: Boolean flag
}
}
);Problem: Background timers and resources leak.
Don't:
async function main() {
const client = new Client({ cache: true });
await doWork();
// BAD: Cache timers still running
process.exit(0);
}Do:
async function main() {
const client = new Client({ cache: true });
try {
await doWork();
} finally {
await client.awaitPendingTraceBatches();
client.cleanup(); // GOOD: Stop timers, cleanup resources
}
}Problem: Hard to query and analyze feedback.
Don't:
await client.createFeedback(run_id, "rating1", { score: 1 });
await client.createFeedback(run_id, "user-rating", { score: 1 });
await client.createFeedback(run_id, "UserRating", { score: 1 });
// BAD: Three different keys for same conceptDo:
// GOOD: Consistent naming convention
await client.createFeedback(run_id, "user_rating", { score: 1 });
await client.createFeedback(run_id, "user_rating", { score: 1 });
await client.createFeedback(run_id, "user_rating", { score: 1 });
// GOOD: Use constants
const FeedbackKeys = {
USER_RATING: "user_rating",
CORRECTNESS: "correctness",
HELPFULNESS: "helpfulness"
} as const;
await client.createFeedback(run_id, FeedbackKeys.USER_RATING, { score: 1 });Problem: Inconsistent score ranges make analysis difficult.
Don't:
// BAD: Mixed score ranges
await client.createFeedback(run_id, "quality", { score: 4 }); // Out of 5
await client.createFeedback(run_id, "accuracy", { score: 0.8 }); // Out of 1.0
await client.createFeedback(run_id, "speed", { score: 100 }); // Out of 100Do:
// GOOD: Normalize to 0.0-1.0
await client.createFeedback(run_id, "quality", {
score: 4 / 5, // Normalized: 0.8,
value: 4, // Store original in value,
comment: "4/5 stars",
});
await client.createFeedback(run_id, "accuracy", { score: 0.8 // Already normalized });
await client.createFeedback(run_id, "speed", {
score: 100 / 100, // Normalized: 1.0,
value: 100,
});Problem: Security risk if tokens leak.
Don't:
const token = await client.createPresignedFeedbackToken(runId, "rating", {
expiration: new Date(Date.now() + 365 * 24 * 60 * 60 * 1000) // BAD: 1 year
});Do:
// GOOD: Short expiration for public tokens
const token = await client.createPresignedFeedbackToken(runId, "rating", {
expiration: new Date(Date.now() + 24 * 60 * 60 * 1000) // GOOD: 24 hours
});
// ACCEPTABLE: Longer for email links
const emailToken = await client.createPresignedFeedbackToken(runId, "review", {
expiration: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000) // OK: 7 days
});Problem: Lost history, can't rollback changes.
Don't:
// BAD: Overwrites without version history
await client.createPrompt("system-prompt");
// ... later, no versioning:
await client.updatePrompt("system-prompt", {
content: "New prompt" // BAD: Old version lost
});Do:
// GOOD: Use pushPrompt for versioning
await client.createPrompt("system-prompt");
// Push v1
await client.pushPrompt("system-prompt", {
object: { type: "chat", messages: [...] },
description: "Initial version"
});
// Push v2 (v1 preserved in history)
await client.pushPrompt("system-prompt", {
object: { type: "chat", messages: [...] },
description: "Improved tone"
});
// Can pull any version
const v1 = await client.pullPrompt({ promptName: "system-prompt:v1" });Problem: Repeated API calls for same prompt.
Don't:
async function generateResponse(input: string) {
// BAD: Fetches prompt on every call
const prompt = await client.pullPrompt({ promptName: "system-prompt" });
return await llm.generate({ prompt, input });
}Do:
import { Cache } from "langsmith";
// GOOD: Cache prompt pulls
const promptCache = new Cache({
ttlSeconds: 3600,
fetchFunc: async (key) => {
return await client.pullPromptCommit(key);
}
});
async function generateResponse(input: string) {
const prompt = await promptCache.get("system-prompt:latest");
return await llm.generate({ prompt, input });
}
// Cleanup when done
promptCache.stop();Problem: Loses SDK-specific optimizations.
Don't:
import { wrapSDK } from "langsmith/wrappers";
import OpenAI from "openai";
// BAD: Use specialized wrapper instead
const openai = wrapSDK(new OpenAI());Do:
import { wrapOpenAI } from "langsmith/wrappers/openai";
import OpenAI from "openai";
// GOOD: Use specialized wrapper
const openai = wrapOpenAI(new OpenAI());
// Captures token usage, streaming, function calls properlyProblem: wrapAISDK requires wrapLanguageModel to function.
Don't:
import { wrapAISDK } from "langsmith/experimental/vercel";
import { generateText } from "ai";
// BAD: Missing wrapLanguageModel
const wrapped = wrapAISDK({ generateText });Do:
import { wrapAISDK } from "langsmith/experimental/vercel";
import { wrapLanguageModel, generateText } from "ai";
// GOOD: Include wrapLanguageModel
const wrapped = wrapAISDK({ wrapLanguageModel, generateText });Problem: LangChain calls not traced as children.
Don't:
const myFunction = traceable(async (input) => {
const model = new ChatOpenAI();
// BAD: Missing callbacks - not traced as child
const result = await model.invoke(input);
return result;
});Do:
import { getLangchainCallbacks } from "langsmith/langchain";
const myFunction = traceable(async (input) => {
const callbacks = getLangchainCallbacks(); // GOOD: Get callbacks
const model = new ChatOpenAI();
const result = await model.invoke(input, { callbacks }); // GOOD: Pass callbacks
return result;
});Problem: Vitest tests don't create experiments.
Don't:
// vitest.config.ts
export default defineConfig({
test: {
// BAD: Missing LangSmith reporter
}
});Do:
// vitest.config.ts
import { defineConfig } from "vitest/config";
export default defineConfig({
test: {
reporters: ["default", "langsmith/vitest/reporter"] // GOOD: Include reporter
}
});Problem: Conflates test types, unclear separation.
Don't:
// BAD: Mixed in same file
test("unit test: parser works", () => {
expect(parse("data")).toBe("parsed");
});
test("eval test: chatbot quality", async () => {
const result = await chatbot("query");
expect(result).toContain("answer");
}, wrapEvaluator({ datasetName: "qa" }));Do:
// tests/unit/parser.test.ts - GOOD: Separate unit tests
test("parser works", () => {
expect(parse("data")).toBe("parsed");
});
// tests/eval/chatbot.test.ts - GOOD: Separate evaluations
import { test } from "langsmith/jest";
test("chatbot quality", async () => {
const result = await chatbot("query");
expect(result).toContain("answer");
}, wrapEvaluator({ datasetName: "qa" }));Problem: Runs harder to filter and analyze.
Don't:
await client.createRun({
name: "MyOperation",
// BAD: Missing run_type
inputs: {...}
});Do:
await client.createRun({
name: "MyOperation",
run_type: "chain", // GOOD: Explicit type
inputs: {...}
});
// Use appropriate types:
// - "llm" for LLM API calls
// - "chain" for workflows
// - "tool" for individual tools
// - "retriever" for document retrieval
// - "embedding" for embeddingsProblem: Deprecated field, use project_name instead.
Don't:
await client.createRun({
name: "MyRun",
run_type: "chain",
session_id: "session-123", // BAD: Deprecated
session_name: "MySession" // BAD: Deprecated
});Do:
await client.createRun({
name: "MyRun",
run_type: "chain",
project_name: "my-project" // GOOD: Use project_name
});Problem: Breaks time-based queries and analytics.
Don't:
await client.createRun({
name: "MyRun",
run_type: "chain",
start_time: Date.now() + 1000000 // BAD: Future timestamp
});Do:
await client.createRun({
name: "MyRun",
run_type: "chain",
start_time: Date.now() // GOOD: Current time
// Or omit - auto-set to current time
});// Application shutdown
process.on('SIGTERM', async () => {
await client.awaitPendingTraceBatches();
client.cleanup();
process.exit(0);
});
// Lambda/serverless
export const handler = async (event) => {
const result = await processEvent(event);
await client.awaitPendingTraceBatches();
return result;
};// GOOD: Environment-based
const client = new Client();
// GOOD: With fallbacks
const client = new Client({
apiKey: process.env.LANGCHAIN_API_KEY,
projectName: process.env.LANGCHAIN_PROJECT || "default"
});async function robustAPICall() {
const maxRetries = 3;
for (let i = 0; i < maxRetries; i++) {
try {
return await client.someMethod();
} catch (error) {
if (error.status === 429 || error.status >= 500) {
if (i < maxRetries - 1) {
await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
continue;
}
}
throw error;
}
}
}// GOOD: Use tags for filtering
const traced = traceable(
async (input) => {...},
{
tags: ["production", "customer-facing", "v2"],
metadata: { version: "2.1.0", team: "ml-team" }
}
);const client = new Client({
// GOOD: Environment-based sampling
tracingSamplingRate: process.env.NODE_ENV === "production" ? 0.1 : 1.0
});Before deploying code using LangSmith, check: