or run

tessl search

tessl/npm-langsmith

tessl install tessl/npm-langsmith@0.4.3

TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.

Decision Trees

Quick decision guides for choosing between similar APIs, patterns, and approaches.

Overview

This guide helps coding agents make optimal choices when multiple LangSmith APIs or patterns can solve the same problem. Each decision tree provides a systematic way to select the right approach based on specific requirements.

Choosing Tracing Approach

Need to trace code execution?
│
├─ Tracing a function?
│  ├─ Simple automatic tracing? → Use traceable()
│  ├─ Need dynamic metadata? → Use traceable() + getCurrentRunTree()
│  ├─ Need runtime config changes? → Use traceable().withConfig()
│  └─ Already have traceable function? → Check with isTraceableFunction()
│
├─ Tracing non-function code?
│  ├─ Manual control needed? → Use RunTree class
│  ├─ Custom events? → Use RunTree + addEvent()
│  └─ Complex hierarchies? → Use RunTree + createChild()
│
├─ Using an AI SDK?
│  ├─ OpenAI SDK? → Use wrapOpenAI()
│  ├─ Anthropic SDK? → Use wrapAnthropic()
│  ├─ Vercel AI SDK? → Use wrapAISDK()
│  ├─ LangChain? → Use getLangchainCallbacks()
│  └─ Custom/other SDK? → Use wrapSDK()
│
└─ Distributed across services?
   ├─ HTTP-based services? → Use RunTree.toHeaders() / fromHeaders()
   ├─ Need W3C context? → Use OpenTelemetry integration
   └─ Multiple projects? → Use RunTree replicas

Quick Reference:

Simple function tracing: traceable(fn, config)
Manual tracing: new RunTree(config)
SDK auto-tracing: wrapOpenAI(), wrapAnthropic(), etc.
Cross-service: toHeaders() / fromHeaders()

Choosing Example Creation Method

Need to create examples in dataset?
│
├─ Creating single example?
│  ├─ Key-value format? → createExample({ dataset_id, inputs, outputs })
│  ├─ LLM text completion? → createLLMExample(input, generation, options)
│  └─ Chat message format? → createChatExample(messages, response, options)
│
├─ Creating multiple examples?
│  ├─ Uniform structure, simple data?
│  │  └─ Use createExamples({ inputs: [], outputs: [] }) [Separate Arrays]
│  │
│  ├─ Per-example metadata/config?
│  │  └─ Use createExamples({ examples: [{...}] }) [Examples Array]
│  │
│  ├─ Large files or binary data?
│  │  └─ Use uploadExamplesMultipart({ examples: [...] })
│  │
│  └─ From CSV file?
│     └─ Use uploadCsv({ csvFile, inputKeys, outputKeys })
│
└─ From production runs?
   └─ Use createExample({ source_run_id, useSourceRunIO: true })

Decision Factors:

Method	Best For	Structure	Attachments
`createExample()`	Single example	Any format	Via `dataset_id`
`createExamples()` (arrays)	Bulk, uniform	Parallel arrays	No
`createExamples()` (objects)	Bulk, varied	Object array	Yes (via objects)
`createLLMExample()`	Text completion	String in/out	Via options
`createChatExample()`	Chat conversations	Message arrays	Via options
`uploadExamplesMultipart()`	Large/binary	Any with files	Yes

Choosing Evaluation Approach

Need to evaluate LLM application?
│
├─ Single model on dataset?
│  ├─ Custom evaluators? → evaluate(target, { data, evaluators })
│  ├─ Test framework integration?
│  │  ├─ Using Jest? → import { test } from "langsmith/jest"
│  │  └─ Using Vitest? → import { test } from "langsmith/vitest"
│  └─ Quick script? → evaluate() with inline evaluators
│
├─ Compare multiple models/configs?
│  ├─ Side-by-side comparison? → evaluateComparative(experiments, options)
│  ├─ A/B test with humans? → createComparativeExperiment() + annotation queue
│  └─ Sequential experiments? → Run evaluate() multiple times
│
├─ Production monitoring?
│  ├─ Automated scoring? → Use Feedback API + createFeedback()
│  ├─ Human review? → Annotation queues
│  └─ LLM-as-judge? → Custom evaluator calling LLM
│
└─ Regression testing?
   ├─ In test suite? → Jest/Vitest integration
   └─ CI/CD pipeline? → evaluate() in test scripts

Quick Reference:

Basic evaluation: evaluate(target, { data: "dataset-name", evaluators: [...] })
A/B testing: evaluateComparative([exp1, exp2], { comparativeEvaluators })
Test integration: import { test } from "langsmith/jest"
Production feedback: createFeedback(run_id, key, { score })

Choosing Privacy/Security Approach

Need to protect sensitive data?
│
├─ Hide everything?
│  └─ Use hideInputs: true, hideOutputs: true
│
├─ Selective field hiding?
│  └─ Use functions: hideInputs: (inputs) => { const {secret, ...safe} = inputs; return safe; }
│
├─ Pattern-based PII removal?
│  ├─ Emails, SSNs, phones? → createAnonymizer([{ pattern: /email regex/, replace: "[EMAIL]" }])
│  ├─ API keys, tokens? → createAnonymizer([{ pattern: /sk-.*/, replace: "[KEY]" }])
│  └─ Custom patterns? → createAnonymizer([{ pattern: /.../, replace: "..." }])
│
├─ Path-based selective anonymization?
│  └─ createAnonymizer(rules, { paths: ["inputs.user.email"] })
│
├─ Structural anonymization?
│  └─ Use processor-based: createAnonymizer((node, path) => {...})
│
└─ Public feedback collection?
   └─ Use createPresignedFeedbackToken() - no API key needed

Decision Matrix:

Requirement	Approach	Method
Hide all inputs	Boolean flag	`hideInputs: true`
Hide specific fields	Function filter	`hideInputs: (i) => filter(i)`
Remove PII patterns	Regex rules	`createAnonymizer([rules])`
Path-specific	Anonymizer with paths	`createAnonymizer(rules, {paths})`
Complex logic	Processor function	`createAnonymizer(processor)`
Public feedback	Presigned tokens	`createPresignedFeedbackToken()`

Choosing Performance Configuration

Optimizing for production?
│
├─ High-volume tracing (>1000 traces/min)?
│  ├─ Enable sampling → tracingSamplingRate: 0.1 (10%)
│  ├─ Enable batching → autoBatchTracing: true
│  ├─ Increase concurrency → traceBatchConcurrency: 10
│  └─ Increase batch size → batchSizeBytesLimit: 50_000_000
│
├─ Serverless/short-lived?
│  ├─ Always flush → await client.awaitPendingTraceBatches()
│  ├─ Consider blocking mode → blockOnRootRunFinalization: true
│  └─ Manual flush mode → manualFlushMode: true + flush()
│
├─ Memory-constrained?
│  ├─ Reduce batch size → batchSizeBytesLimit: 10_000_000
│  ├─ Limit operations per batch → batchSizeLimit: 50
│  └─ Lower memory limit → maxIngestMemoryBytes: 500_000_000
│
├─ Low-latency requirements?
│  ├─ Disable batching → autoBatchTracing: false
│  └─ Reduce timeout → timeout_ms: 5000
│
└─ Development/debugging?
   ├─ No batching → autoBatchTracing: false
   ├─ Blocking mode → blockOnRootRunFinalization: true
   └─ Debug logging → debug: true

Configuration Presets:

// High-volume production
const productionClient = new Client({
  tracingSamplingRate: 0.1,
  autoBatchTracing: true,
  batchSizeBytesLimit: 50_000_000,
  traceBatchConcurrency: 10,
  hideInputs: (i) => redactPII(i)
});

// Serverless (Lambda, Cloud Functions)
const serverlessClient = new Client({
  autoBatchTracing: true,
  blockOnRootRunFinalization: false
});
// Always: await client.awaitPendingTraceBatches() before return

// Development/Testing
const devClient = new Client({
  autoBatchTracing: false,
  blockOnRootRunFinalization: true,
  debug: true
});

// Memory-constrained
const lightweightClient = new Client({
  batchSizeBytesLimit: 10_000_000,
  batchSizeLimit: 50,
  maxIngestMemoryBytes: 500_000_000
});

Choosing Dataset Operations

Working with datasets?
│
├─ Creating dataset?
│  └─ createDataset({ datasetName, dataType })
│
├─ Adding examples?
│  └─ See "Choosing Example Creation Method" above
│
├─ Finding similar examples?
│  ├─ First time? → indexDataset() then similarExamples()
│  └─ Already indexed? → similarExamples(inputs, datasetId)
│
├─ Versioning dataset?
│  ├─ Create version → Add examples (auto-versioned)
│  ├─ Tag version → updateDatasetTag({ tag, asOf })
│  ├─ Read version → readDatasetVersion({ asOf })
│  └─ Compare versions → diffDatasetVersions({ fromVersion, toVersion })
│
├─ Organizing examples?
│  ├─ Create splits → updateDatasetSplits({ splitName, exampleIds })
│  ├─ List splits → listDatasetSplits()
│  └─ Remove from split → updateDatasetSplits({ remove: true })
│
├─ Sharing dataset?
│  ├─ Share publicly → shareDataset(datasetId)
│  ├─ Clone public → clonePublicDataset(shareToken)
│  ├─ Unshare → unshareDataset(datasetId)
│  └─ Read shared → readSharedDataset(shareToken)
│
└─ Exporting dataset?
   ├─ For OpenAI fine-tuning → readDatasetOpenaiFinetuning()
   └─ As CSV → List examples and format manually

Choosing Feedback Collection Method

Collecting feedback on runs?
│
├─ Direct API access available?
│  ├─ Simple score/comment → createFeedback(run_id, key, { score })
│  ├─ With correction → createFeedback(run_id, key, { score, correction })
│  └─ From evaluator → logEvaluationFeedback(params)
│
├─ Public/external collection?
│  ├─ Create token → createPresignedFeedbackToken(runId, key)
│  ├─ Share URL → token.url (users POST without auth)
│  └─ List tokens → listPresignedFeedbackTokens()
│
├─ Human review workflow?
│  ├─ Create queue → createAnnotationQueue()
│  ├─ Add runs → addRunsToAnnotationQueue()
│  ├─ Review → getRunFromAnnotationQueue()
│  └─ Submit → createFeedback() with queue context
│
└─ Automated/model feedback?
   ├─ From LLM judge → createFeedback(run_id, key, { feedbackSourceType: "model" })
   ├─ From API check → createFeedback(run_id, key, { feedbackSourceType: "api" })
   └─ From evaluation → Automatically logged by evaluate()

Choosing Run Query Method

Need to query runs?
│
├─ Single run by ID?
│  ├─ Basic info → readRun(runId)
│  └─ With children → readRun(runId, { loadChildRuns: true })
│
├─ Multiple runs?
│  ├─ All in project → listRuns({ projectName })
│  ├─ Root runs only → listRuns({ isRoot: true })
│  ├─ By trace → listRuns({ traceId })
│  ├─ By parent → listRuns({ parentRunId })
│  ├─ Failed only → listRuns({ error: true })
│  ├─ Time range → listRuns({ startTime, endTime })
│  └─ Complex filter → listRuns({ filter: 'and(...)' })
│
├─ Grouped analytics?
│  ├─ By conversation → listGroupRuns({ groupBy: "metadata.conversation_id" })
│  ├─ By user → listGroupRuns({ groupBy: "metadata.user_id" })
│  └─ Custom grouping → listGroupRuns({ groupBy: "metadata.custom_field" })
│
├─ Just statistics?
│  └─ getRunStats({ projectName, filter })
│
└─ Public shared runs?
   └─ listSharedRuns({ shareToken })

Choosing Filter Strategy

Need to filter runs?
│
├─ Simple filters?
│  ├─ By project → projectName: "my-project"
│  ├─ By type → runType: "llm"
│  ├─ By error → error: true
│  ├─ By time → startTime/endTime: Date
│  └─ Root only → isRoot: true
│
├─ Complex conditions?
│  ├─ Single condition → filter: 'eq(status, "success")'
│  ├─ Comparison → filter: 'gte(latency, 1000)'
│  ├─ Multiple AND → filter: 'and(eq(error, null), gte(latency, 1000))'
│  ├─ Multiple OR → filter: 'or(eq(run_type, "llm"), eq(run_type, "chain"))'
│  ├─ Array contains → filter: 'has(tags, "production")'
│  └─ Text search → filter: 'search(name, "customer")'
│
├─ Trace-level filtering?
│  ├─ Filter root run → traceFilter: 'eq(name, "pipeline")'
│  ├─ Filter children → treeFilter: 'eq(run_type, "llm")'
│  └─ Both → Use traceFilter + treeFilter together
│
└─ Field selection?
   └─ select: ["id", "name", "start_time"]

Filter Comparators:

eq(field, value) - Equals
neq(field, value) - Not equals
gt(field, value) - Greater than
gte(field, value) - Greater than or equal
lt(field, value) - Less than
lte(field, value) - Less than or equal
has(array_field, value) - Array contains
search(field, text) - Text search
and(condition1, condition2, ...) - Logical AND
or(condition1, condition2, ...) - Logical OR

Choosing Client Configuration Strategy

Configuring LangSmith client?
│
├─ Environment-based (recommended)?
│  └─ Use new Client() with LANGCHAIN_API_KEY, LANGCHAIN_PROJECT env vars
│
├─ Explicit configuration?
│  ├─ Basic → new Client({ apiKey, apiUrl })
│  ├─ With privacy → new Client({ hideInputs, hideOutputs })
│  ├─ With anonymization → new Client({ anonymizer })
│  └─ Full custom → new Client({ ...all options })
│
├─ Different configs per environment?
│  ├─ Dev → autoBatchTracing: false, debug: true
│  ├─ Staging → tracingSamplingRate: 0.5
│  └─ Production → tracingSamplingRate: 0.1, hideInputs: true
│
└─ Using proxy/custom networking?
   ├─ Global → overrideFetchImplementation(customFetch)
   └─ Per-client → new Client({ fetchImplementation: customFetch })

Choosing Prompt Management Approach

Managing prompts?
│
├─ Creating new prompt?
│  └─ createPrompt("prompt-name", { description, tags })
│
├─ Versioning prompt?
│  ├─ New version → pushPrompt("name", { object, description })
│  ├─ Tag version → pushPrompt("name:tag", { object })
│  └─ View history → listCommits({ promptName })
│
├─ Using prompt in code?
│  ├─ Latest version → pullPrompt({ promptName })
│  ├─ Specific version → pullPrompt({ promptName, commit: "hash" })
│  ├─ Tagged version → pullPrompt({ promptName: "name:tag" })
│  └─ With caching → Use Cache with fetchFunc
│
└─ Sharing prompts?
   ├─ Make public → updatePrompt({ isPublic: true })
   ├─ Like prompt → likePrompt(promptName)
   └─ Check exists → promptExists(promptName)

Choosing Testing Framework

Need test-driven evaluation?
│
├─ Already using Jest?
│  └─ import { test, expect } from "langsmith/jest"
│
├─ Already using Vitest?
│  └─ import { test, expect } from "langsmith/vitest"
│     (requires reporter in vitest.config.ts)
│
├─ No test framework?
│  ├─ Want test framework features → Choose Jest or Vitest
│  └─ Just evaluate → Use evaluate() directly
│
└─ Custom test harness?
   └─ Use Client API directly with evaluate()

Framework Comparison:

Feature	Jest	Vitest	Direct evaluate()
Test per example	✓	✓	Manual loop
Custom matchers	✓	✓	N/A
Parallel execution	✓	✓ (faster)	Custom control
Watch mode	✓	✓	N/A
Setup required	Minimal	Config file	None
Best for	React, Node	Vite, modern	Scripts, custom

Choosing Dataset Sharing Method

Need to share dataset?
│
├─ Within organization?
│  └─ Normal sharing: shareDataset(datasetId)
│
├─ Public sharing?
│  ├─ Share → shareDataset(datasetId, customShareId)
│  ├─ Get share URL → Response contains share_token
│  └─ Others clone → clonePublicDataset(shareToken)
│
├─ Reading shared dataset?
│  ├─ Dataset metadata → readSharedDataset(shareToken)
│  └─ Examples → listSharedExamples(shareToken)
│
└─ Collaboration?
   ├─ Share with custom ID → shareDataset(datasetId, "team-qa-set")
   └─ Version control → Use dataset versioning + sharing

Choosing Annotation Queue Strategy

Need human review?
│
├─ Quality assurance?
│  ├─ Random sampling → createAnnotationQueue() + random selection
│  └─ Edge cases → Filter runs then addRunsToAnnotationQueue()
│
├─ Model comparison?
│  ├─ Side-by-side → createComparativeExperiment() + queue
│  └─ Sequential → Add runs from different experiments
│
├─ Training data collection?
│  └─ Annotation queue + feedback with corrections
│
└─ Active learning?
   ├─ Low confidence → Filter by metadata.confidence then add to queue
   └─ High error rate → Filter by error then add to queue

Choosing Between Similar Methods

Run Management: create vs update vs batch

Creating/updating runs?
├─ Single run, manual → createRun() then updateRun()
├─ Many runs → batchIngestRuns({ post: [...], patch: [...] })
└─ Very large batch → multipartIngestRuns()

Feedback: create vs presigned vs evaluate

Collecting feedback?
├─ Direct API access → createFeedback()
├─ No API key → createPresignedFeedbackToken()
├─ From evaluator → logEvaluationFeedback()
└─ Automatic from eval → Use evaluate() with evaluators

Tracing: traceable vs RunTree vs wrappers

Adding tracing?
├─ Own functions → traceable()
├─ Third-party SDK → wrappers (wrapOpenAI, etc.)
├─ Non-function code → RunTree
└─ Framework (LangChain) → getLangchainCallbacks()

Advanced Decision: When to Use Multiple Clients

Need multiple clients?
│
├─ Different projects?
│  └─ One client per project: new Client({ projectName })
│
├─ Different privacy settings?
│  ├─ Public client → new Client({ hideInputs: false })
│  └─ Private client → new Client({ hideInputs: true })
│
├─ Different sampling rates?
│  ├─ Dev (100%) → new Client({ tracingSamplingRate: 1.0 })
│  └─ Prod (10%) → new Client({ tracingSamplingRate: 0.1 })
│
└─ Different workspaces?
   └─ new Client({ workspaceId: "workspace-123" })

When NOT to use multiple clients:

Same project, same config → Reuse single client
Just different metadata → Use traceable config, not new client

Context Management Decision

Managing run tree context?
│
├─ Within traceable function?
│  ├─ Access context → getCurrentRunTree()
│  ├─ Optional access → getCurrentRunTree(true)
│  └─ From function → traceableFn.getCurrentRunTree()
│
├─ Need to set context?
│  └─ withRunTree(runTree, () => {...})
│
├─ Check if traceable?
│  └─ isTraceableFunction(fn)
│
└─ Need ROOT marker?
   └─ import { ROOT } from "langsmith/traceable"

Version

tessl/npm-langsmith

decision-trees.mddocs/guides/

Decision Trees

Overview

Choosing Tracing Approach

Choosing Example Creation Method

Choosing Evaluation Approach

Choosing Privacy/Security Approach

Choosing Performance Configuration

Choosing Dataset Operations

Choosing Feedback Collection Method

Choosing Run Query Method

Choosing Filter Strategy

Choosing Client Configuration Strategy

Choosing Prompt Management Approach

Choosing Testing Framework

Choosing Dataset Sharing Method

Choosing Annotation Queue Strategy

Choosing Between Similar Methods

Run Management: create vs update vs batch

Feedback: create vs presigned vs evaluate

Tracing: traceable vs RunTree vs wrappers

Advanced Decision: When to Use Multiple Clients

Context Management Decision

Related Documentation

Version

tessl/npm-langsmith

decision-trees.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/guides/

Decision Trees

Overview

Choosing Tracing Approach

Choosing Example Creation Method

Choosing Evaluation Approach

Choosing Privacy/Security Approach

Choosing Performance Configuration

Choosing Dataset Operations

Choosing Feedback Collection Method

Choosing Run Query Method

Choosing Filter Strategy

Choosing Client Configuration Strategy

Choosing Prompt Management Approach

Choosing Testing Framework

Choosing Dataset Sharing Method

Choosing Annotation Queue Strategy

Choosing Between Similar Methods

Run Management: create vs update vs batch

Feedback: create vs presigned vs evaluate

Tracing: traceable vs RunTree vs wrappers

Advanced Decision: When to Use Multiple Clients

Context Management Decision

Related Documentation

decision-trees.mddocs/guides/