tessl install tessl/npm-langsmith@0.4.3TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.
Quick decision guides for choosing between similar APIs, patterns, and approaches.
This guide helps coding agents make optimal choices when multiple LangSmith APIs or patterns can solve the same problem. Each decision tree provides a systematic way to select the right approach based on specific requirements.
Need to trace code execution?
│
├─ Tracing a function?
│ ├─ Simple automatic tracing? → Use traceable()
│ ├─ Need dynamic metadata? → Use traceable() + getCurrentRunTree()
│ ├─ Need runtime config changes? → Use traceable().withConfig()
│ └─ Already have traceable function? → Check with isTraceableFunction()
│
├─ Tracing non-function code?
│ ├─ Manual control needed? → Use RunTree class
│ ├─ Custom events? → Use RunTree + addEvent()
│ └─ Complex hierarchies? → Use RunTree + createChild()
│
├─ Using an AI SDK?
│ ├─ OpenAI SDK? → Use wrapOpenAI()
│ ├─ Anthropic SDK? → Use wrapAnthropic()
│ ├─ Vercel AI SDK? → Use wrapAISDK()
│ ├─ LangChain? → Use getLangchainCallbacks()
│ └─ Custom/other SDK? → Use wrapSDK()
│
└─ Distributed across services?
├─ HTTP-based services? → Use RunTree.toHeaders() / fromHeaders()
├─ Need W3C context? → Use OpenTelemetry integration
└─ Multiple projects? → Use RunTree replicasQuick Reference:
traceable(fn, config)new RunTree(config)wrapOpenAI(), wrapAnthropic(), etc.toHeaders() / fromHeaders()Need to create examples in dataset?
│
├─ Creating single example?
│ ├─ Key-value format? → createExample({ dataset_id, inputs, outputs })
│ ├─ LLM text completion? → createLLMExample(input, generation, options)
│ └─ Chat message format? → createChatExample(messages, response, options)
│
├─ Creating multiple examples?
│ ├─ Uniform structure, simple data?
│ │ └─ Use createExamples({ inputs: [], outputs: [] }) [Separate Arrays]
│ │
│ ├─ Per-example metadata/config?
│ │ └─ Use createExamples({ examples: [{...}] }) [Examples Array]
│ │
│ ├─ Large files or binary data?
│ │ └─ Use uploadExamplesMultipart({ examples: [...] })
│ │
│ └─ From CSV file?
│ └─ Use uploadCsv({ csvFile, inputKeys, outputKeys })
│
└─ From production runs?
└─ Use createExample({ source_run_id, useSourceRunIO: true })Decision Factors:
| Method | Best For | Structure | Attachments |
|---|---|---|---|
createExample() | Single example | Any format | Via dataset_id |
createExamples() (arrays) | Bulk, uniform | Parallel arrays | No |
createExamples() (objects) | Bulk, varied | Object array | Yes (via objects) |
createLLMExample() | Text completion | String in/out | Via options |
createChatExample() | Chat conversations | Message arrays | Via options |
uploadExamplesMultipart() | Large/binary | Any with files | Yes |
Need to evaluate LLM application?
│
├─ Single model on dataset?
│ ├─ Custom evaluators? → evaluate(target, { data, evaluators })
│ ├─ Test framework integration?
│ │ ├─ Using Jest? → import { test } from "langsmith/jest"
│ │ └─ Using Vitest? → import { test } from "langsmith/vitest"
│ └─ Quick script? → evaluate() with inline evaluators
│
├─ Compare multiple models/configs?
│ ├─ Side-by-side comparison? → evaluateComparative(experiments, options)
│ ├─ A/B test with humans? → createComparativeExperiment() + annotation queue
│ └─ Sequential experiments? → Run evaluate() multiple times
│
├─ Production monitoring?
│ ├─ Automated scoring? → Use Feedback API + createFeedback()
│ ├─ Human review? → Annotation queues
│ └─ LLM-as-judge? → Custom evaluator calling LLM
│
└─ Regression testing?
├─ In test suite? → Jest/Vitest integration
└─ CI/CD pipeline? → evaluate() in test scriptsQuick Reference:
evaluate(target, { data: "dataset-name", evaluators: [...] })evaluateComparative([exp1, exp2], { comparativeEvaluators })import { test } from "langsmith/jest"createFeedback(run_id, key, { score })Need to protect sensitive data?
│
├─ Hide everything?
│ └─ Use hideInputs: true, hideOutputs: true
│
├─ Selective field hiding?
│ └─ Use functions: hideInputs: (inputs) => { const {secret, ...safe} = inputs; return safe; }
│
├─ Pattern-based PII removal?
│ ├─ Emails, SSNs, phones? → createAnonymizer([{ pattern: /email regex/, replace: "[EMAIL]" }])
│ ├─ API keys, tokens? → createAnonymizer([{ pattern: /sk-.*/, replace: "[KEY]" }])
│ └─ Custom patterns? → createAnonymizer([{ pattern: /.../, replace: "..." }])
│
├─ Path-based selective anonymization?
│ └─ createAnonymizer(rules, { paths: ["inputs.user.email"] })
│
├─ Structural anonymization?
│ └─ Use processor-based: createAnonymizer((node, path) => {...})
│
└─ Public feedback collection?
└─ Use createPresignedFeedbackToken() - no API key neededDecision Matrix:
| Requirement | Approach | Method |
|---|---|---|
| Hide all inputs | Boolean flag | hideInputs: true |
| Hide specific fields | Function filter | hideInputs: (i) => filter(i) |
| Remove PII patterns | Regex rules | createAnonymizer([rules]) |
| Path-specific | Anonymizer with paths | createAnonymizer(rules, {paths}) |
| Complex logic | Processor function | createAnonymizer(processor) |
| Public feedback | Presigned tokens | createPresignedFeedbackToken() |
Optimizing for production?
│
├─ High-volume tracing (>1000 traces/min)?
│ ├─ Enable sampling → tracingSamplingRate: 0.1 (10%)
│ ├─ Enable batching → autoBatchTracing: true
│ ├─ Increase concurrency → traceBatchConcurrency: 10
│ └─ Increase batch size → batchSizeBytesLimit: 50_000_000
│
├─ Serverless/short-lived?
│ ├─ Always flush → await client.awaitPendingTraceBatches()
│ ├─ Consider blocking mode → blockOnRootRunFinalization: true
│ └─ Manual flush mode → manualFlushMode: true + flush()
│
├─ Memory-constrained?
│ ├─ Reduce batch size → batchSizeBytesLimit: 10_000_000
│ ├─ Limit operations per batch → batchSizeLimit: 50
│ └─ Lower memory limit → maxIngestMemoryBytes: 500_000_000
│
├─ Low-latency requirements?
│ ├─ Disable batching → autoBatchTracing: false
│ └─ Reduce timeout → timeout_ms: 5000
│
└─ Development/debugging?
├─ No batching → autoBatchTracing: false
├─ Blocking mode → blockOnRootRunFinalization: true
└─ Debug logging → debug: trueConfiguration Presets:
// High-volume production
const productionClient = new Client({
tracingSamplingRate: 0.1,
autoBatchTracing: true,
batchSizeBytesLimit: 50_000_000,
traceBatchConcurrency: 10,
hideInputs: (i) => redactPII(i)
});
// Serverless (Lambda, Cloud Functions)
const serverlessClient = new Client({
autoBatchTracing: true,
blockOnRootRunFinalization: false
});
// Always: await client.awaitPendingTraceBatches() before return
// Development/Testing
const devClient = new Client({
autoBatchTracing: false,
blockOnRootRunFinalization: true,
debug: true
});
// Memory-constrained
const lightweightClient = new Client({
batchSizeBytesLimit: 10_000_000,
batchSizeLimit: 50,
maxIngestMemoryBytes: 500_000_000
});Working with datasets?
│
├─ Creating dataset?
│ └─ createDataset({ datasetName, dataType })
│
├─ Adding examples?
│ └─ See "Choosing Example Creation Method" above
│
├─ Finding similar examples?
│ ├─ First time? → indexDataset() then similarExamples()
│ └─ Already indexed? → similarExamples(inputs, datasetId)
│
├─ Versioning dataset?
│ ├─ Create version → Add examples (auto-versioned)
│ ├─ Tag version → updateDatasetTag({ tag, asOf })
│ ├─ Read version → readDatasetVersion({ asOf })
│ └─ Compare versions → diffDatasetVersions({ fromVersion, toVersion })
│
├─ Organizing examples?
│ ├─ Create splits → updateDatasetSplits({ splitName, exampleIds })
│ ├─ List splits → listDatasetSplits()
│ └─ Remove from split → updateDatasetSplits({ remove: true })
│
├─ Sharing dataset?
│ ├─ Share publicly → shareDataset(datasetId)
│ ├─ Clone public → clonePublicDataset(shareToken)
│ ├─ Unshare → unshareDataset(datasetId)
│ └─ Read shared → readSharedDataset(shareToken)
│
└─ Exporting dataset?
├─ For OpenAI fine-tuning → readDatasetOpenaiFinetuning()
└─ As CSV → List examples and format manuallyCollecting feedback on runs?
│
├─ Direct API access available?
│ ├─ Simple score/comment → createFeedback(run_id, key, { score })
│ ├─ With correction → createFeedback(run_id, key, { score, correction })
│ └─ From evaluator → logEvaluationFeedback(params)
│
├─ Public/external collection?
│ ├─ Create token → createPresignedFeedbackToken(runId, key)
│ ├─ Share URL → token.url (users POST without auth)
│ └─ List tokens → listPresignedFeedbackTokens()
│
├─ Human review workflow?
│ ├─ Create queue → createAnnotationQueue()
│ ├─ Add runs → addRunsToAnnotationQueue()
│ ├─ Review → getRunFromAnnotationQueue()
│ └─ Submit → createFeedback() with queue context
│
└─ Automated/model feedback?
├─ From LLM judge → createFeedback(run_id, key, { feedbackSourceType: "model" })
├─ From API check → createFeedback(run_id, key, { feedbackSourceType: "api" })
└─ From evaluation → Automatically logged by evaluate()Need to query runs?
│
├─ Single run by ID?
│ ├─ Basic info → readRun(runId)
│ └─ With children → readRun(runId, { loadChildRuns: true })
│
├─ Multiple runs?
│ ├─ All in project → listRuns({ projectName })
│ ├─ Root runs only → listRuns({ isRoot: true })
│ ├─ By trace → listRuns({ traceId })
│ ├─ By parent → listRuns({ parentRunId })
│ ├─ Failed only → listRuns({ error: true })
│ ├─ Time range → listRuns({ startTime, endTime })
│ └─ Complex filter → listRuns({ filter: 'and(...)' })
│
├─ Grouped analytics?
│ ├─ By conversation → listGroupRuns({ groupBy: "metadata.conversation_id" })
│ ├─ By user → listGroupRuns({ groupBy: "metadata.user_id" })
│ └─ Custom grouping → listGroupRuns({ groupBy: "metadata.custom_field" })
│
├─ Just statistics?
│ └─ getRunStats({ projectName, filter })
│
└─ Public shared runs?
└─ listSharedRuns({ shareToken })Need to filter runs?
│
├─ Simple filters?
│ ├─ By project → projectName: "my-project"
│ ├─ By type → runType: "llm"
│ ├─ By error → error: true
│ ├─ By time → startTime/endTime: Date
│ └─ Root only → isRoot: true
│
├─ Complex conditions?
│ ├─ Single condition → filter: 'eq(status, "success")'
│ ├─ Comparison → filter: 'gte(latency, 1000)'
│ ├─ Multiple AND → filter: 'and(eq(error, null), gte(latency, 1000))'
│ ├─ Multiple OR → filter: 'or(eq(run_type, "llm"), eq(run_type, "chain"))'
│ ├─ Array contains → filter: 'has(tags, "production")'
│ └─ Text search → filter: 'search(name, "customer")'
│
├─ Trace-level filtering?
│ ├─ Filter root run → traceFilter: 'eq(name, "pipeline")'
│ ├─ Filter children → treeFilter: 'eq(run_type, "llm")'
│ └─ Both → Use traceFilter + treeFilter together
│
└─ Field selection?
└─ select: ["id", "name", "start_time"]Filter Comparators:
eq(field, value) - Equalsneq(field, value) - Not equalsgt(field, value) - Greater thangte(field, value) - Greater than or equallt(field, value) - Less thanlte(field, value) - Less than or equalhas(array_field, value) - Array containssearch(field, text) - Text searchand(condition1, condition2, ...) - Logical ANDor(condition1, condition2, ...) - Logical ORConfiguring LangSmith client?
│
├─ Environment-based (recommended)?
│ └─ Use new Client() with LANGCHAIN_API_KEY, LANGCHAIN_PROJECT env vars
│
├─ Explicit configuration?
│ ├─ Basic → new Client({ apiKey, apiUrl })
│ ├─ With privacy → new Client({ hideInputs, hideOutputs })
│ ├─ With anonymization → new Client({ anonymizer })
│ └─ Full custom → new Client({ ...all options })
│
├─ Different configs per environment?
│ ├─ Dev → autoBatchTracing: false, debug: true
│ ├─ Staging → tracingSamplingRate: 0.5
│ └─ Production → tracingSamplingRate: 0.1, hideInputs: true
│
└─ Using proxy/custom networking?
├─ Global → overrideFetchImplementation(customFetch)
└─ Per-client → new Client({ fetchImplementation: customFetch })Managing prompts?
│
├─ Creating new prompt?
│ └─ createPrompt("prompt-name", { description, tags })
│
├─ Versioning prompt?
│ ├─ New version → pushPrompt("name", { object, description })
│ ├─ Tag version → pushPrompt("name:tag", { object })
│ └─ View history → listCommits({ promptName })
│
├─ Using prompt in code?
│ ├─ Latest version → pullPrompt({ promptName })
│ ├─ Specific version → pullPrompt({ promptName, commit: "hash" })
│ ├─ Tagged version → pullPrompt({ promptName: "name:tag" })
│ └─ With caching → Use Cache with fetchFunc
│
└─ Sharing prompts?
├─ Make public → updatePrompt({ isPublic: true })
├─ Like prompt → likePrompt(promptName)
└─ Check exists → promptExists(promptName)Need test-driven evaluation?
│
├─ Already using Jest?
│ └─ import { test, expect } from "langsmith/jest"
│
├─ Already using Vitest?
│ └─ import { test, expect } from "langsmith/vitest"
│ (requires reporter in vitest.config.ts)
│
├─ No test framework?
│ ├─ Want test framework features → Choose Jest or Vitest
│ └─ Just evaluate → Use evaluate() directly
│
└─ Custom test harness?
└─ Use Client API directly with evaluate()Framework Comparison:
| Feature | Jest | Vitest | Direct evaluate() |
|---|---|---|---|
| Test per example | ✓ | ✓ | Manual loop |
| Custom matchers | ✓ | ✓ | N/A |
| Parallel execution | ✓ | ✓ (faster) | Custom control |
| Watch mode | ✓ | ✓ | N/A |
| Setup required | Minimal | Config file | None |
| Best for | React, Node | Vite, modern | Scripts, custom |
Need to share dataset?
│
├─ Within organization?
│ └─ Normal sharing: shareDataset(datasetId)
│
├─ Public sharing?
│ ├─ Share → shareDataset(datasetId, customShareId)
│ ├─ Get share URL → Response contains share_token
│ └─ Others clone → clonePublicDataset(shareToken)
│
├─ Reading shared dataset?
│ ├─ Dataset metadata → readSharedDataset(shareToken)
│ └─ Examples → listSharedExamples(shareToken)
│
└─ Collaboration?
├─ Share with custom ID → shareDataset(datasetId, "team-qa-set")
└─ Version control → Use dataset versioning + sharingNeed human review?
│
├─ Quality assurance?
│ ├─ Random sampling → createAnnotationQueue() + random selection
│ └─ Edge cases → Filter runs then addRunsToAnnotationQueue()
│
├─ Model comparison?
│ ├─ Side-by-side → createComparativeExperiment() + queue
│ └─ Sequential → Add runs from different experiments
│
├─ Training data collection?
│ └─ Annotation queue + feedback with corrections
│
└─ Active learning?
├─ Low confidence → Filter by metadata.confidence then add to queue
└─ High error rate → Filter by error then add to queueCreating/updating runs?
├─ Single run, manual → createRun() then updateRun()
├─ Many runs → batchIngestRuns({ post: [...], patch: [...] })
└─ Very large batch → multipartIngestRuns()Collecting feedback?
├─ Direct API access → createFeedback()
├─ No API key → createPresignedFeedbackToken()
├─ From evaluator → logEvaluationFeedback()
└─ Automatic from eval → Use evaluate() with evaluatorsAdding tracing?
├─ Own functions → traceable()
├─ Third-party SDK → wrappers (wrapOpenAI, etc.)
├─ Non-function code → RunTree
└─ Framework (LangChain) → getLangchainCallbacks()Need multiple clients?
│
├─ Different projects?
│ └─ One client per project: new Client({ projectName })
│
├─ Different privacy settings?
│ ├─ Public client → new Client({ hideInputs: false })
│ └─ Private client → new Client({ hideInputs: true })
│
├─ Different sampling rates?
│ ├─ Dev (100%) → new Client({ tracingSamplingRate: 1.0 })
│ └─ Prod (10%) → new Client({ tracingSamplingRate: 0.1 })
│
└─ Different workspaces?
└─ new Client({ workspaceId: "workspace-123" })When NOT to use multiple clients:
Managing run tree context?
│
├─ Within traceable function?
│ ├─ Access context → getCurrentRunTree()
│ ├─ Optional access → getCurrentRunTree(true)
│ └─ From function → traceableFn.getCurrentRunTree()
│
├─ Need to set context?
│ └─ withRunTree(runTree, () => {...})
│
├─ Check if traceable?
│ └─ isTraceableFunction(fn)
│
└─ Need ROOT marker?
└─ import { ROOT } from "langsmith/traceable"