or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/langsmith@0.4.x

docs

index.md
tile.json

tessl/npm-langsmith

tessl install tessl/npm-langsmith@0.4.3

TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.

decision-trees.mddocs/guides/

Decision Trees

Quick decision guides for choosing between similar APIs, patterns, and approaches.

Overview

This guide helps coding agents make optimal choices when multiple LangSmith APIs or patterns can solve the same problem. Each decision tree provides a systematic way to select the right approach based on specific requirements.

Choosing Tracing Approach

Need to trace code execution?
│
├─ Tracing a function?
│  ├─ Simple automatic tracing? → Use traceable()
│  ├─ Need dynamic metadata? → Use traceable() + getCurrentRunTree()
│  ├─ Need runtime config changes? → Use traceable().withConfig()
│  └─ Already have traceable function? → Check with isTraceableFunction()
│
├─ Tracing non-function code?
│  ├─ Manual control needed? → Use RunTree class
│  ├─ Custom events? → Use RunTree + addEvent()
│  └─ Complex hierarchies? → Use RunTree + createChild()
│
├─ Using an AI SDK?
│  ├─ OpenAI SDK? → Use wrapOpenAI()
│  ├─ Anthropic SDK? → Use wrapAnthropic()
│  ├─ Vercel AI SDK? → Use wrapAISDK()
│  ├─ LangChain? → Use getLangchainCallbacks()
│  └─ Custom/other SDK? → Use wrapSDK()
│
└─ Distributed across services?
   ├─ HTTP-based services? → Use RunTree.toHeaders() / fromHeaders()
   ├─ Need W3C context? → Use OpenTelemetry integration
   └─ Multiple projects? → Use RunTree replicas

Quick Reference:

  • Simple function tracing: traceable(fn, config)
  • Manual tracing: new RunTree(config)
  • SDK auto-tracing: wrapOpenAI(), wrapAnthropic(), etc.
  • Cross-service: toHeaders() / fromHeaders()

Choosing Example Creation Method

Need to create examples in dataset?
│
├─ Creating single example?
│  ├─ Key-value format? → createExample({ dataset_id, inputs, outputs })
│  ├─ LLM text completion? → createLLMExample(input, generation, options)
│  └─ Chat message format? → createChatExample(messages, response, options)
│
├─ Creating multiple examples?
│  ├─ Uniform structure, simple data?
│  │  └─ Use createExamples({ inputs: [], outputs: [] }) [Separate Arrays]
│  │
│  ├─ Per-example metadata/config?
│  │  └─ Use createExamples({ examples: [{...}] }) [Examples Array]
│  │
│  ├─ Large files or binary data?
│  │  └─ Use uploadExamplesMultipart({ examples: [...] })
│  │
│  └─ From CSV file?
│     └─ Use uploadCsv({ csvFile, inputKeys, outputKeys })
│
└─ From production runs?
   └─ Use createExample({ source_run_id, useSourceRunIO: true })

Decision Factors:

MethodBest ForStructureAttachments
createExample()Single exampleAny formatVia dataset_id
createExamples() (arrays)Bulk, uniformParallel arraysNo
createExamples() (objects)Bulk, variedObject arrayYes (via objects)
createLLMExample()Text completionString in/outVia options
createChatExample()Chat conversationsMessage arraysVia options
uploadExamplesMultipart()Large/binaryAny with filesYes

Choosing Evaluation Approach

Need to evaluate LLM application?
│
├─ Single model on dataset?
│  ├─ Custom evaluators? → evaluate(target, { data, evaluators })
│  ├─ Test framework integration?
│  │  ├─ Using Jest? → import { test } from "langsmith/jest"
│  │  └─ Using Vitest? → import { test } from "langsmith/vitest"
│  └─ Quick script? → evaluate() with inline evaluators
│
├─ Compare multiple models/configs?
│  ├─ Side-by-side comparison? → evaluateComparative(experiments, options)
│  ├─ A/B test with humans? → createComparativeExperiment() + annotation queue
│  └─ Sequential experiments? → Run evaluate() multiple times
│
├─ Production monitoring?
│  ├─ Automated scoring? → Use Feedback API + createFeedback()
│  ├─ Human review? → Annotation queues
│  └─ LLM-as-judge? → Custom evaluator calling LLM
│
└─ Regression testing?
   ├─ In test suite? → Jest/Vitest integration
   └─ CI/CD pipeline? → evaluate() in test scripts

Quick Reference:

  • Basic evaluation: evaluate(target, { data: "dataset-name", evaluators: [...] })
  • A/B testing: evaluateComparative([exp1, exp2], { comparativeEvaluators })
  • Test integration: import { test } from "langsmith/jest"
  • Production feedback: createFeedback(run_id, key, { score })

Choosing Privacy/Security Approach

Need to protect sensitive data?
│
├─ Hide everything?
│  └─ Use hideInputs: true, hideOutputs: true
│
├─ Selective field hiding?
│  └─ Use functions: hideInputs: (inputs) => { const {secret, ...safe} = inputs; return safe; }
│
├─ Pattern-based PII removal?
│  ├─ Emails, SSNs, phones? → createAnonymizer([{ pattern: /email regex/, replace: "[EMAIL]" }])
│  ├─ API keys, tokens? → createAnonymizer([{ pattern: /sk-.*/, replace: "[KEY]" }])
│  └─ Custom patterns? → createAnonymizer([{ pattern: /.../, replace: "..." }])
│
├─ Path-based selective anonymization?
│  └─ createAnonymizer(rules, { paths: ["inputs.user.email"] })
│
├─ Structural anonymization?
│  └─ Use processor-based: createAnonymizer((node, path) => {...})
│
└─ Public feedback collection?
   └─ Use createPresignedFeedbackToken() - no API key needed

Decision Matrix:

RequirementApproachMethod
Hide all inputsBoolean flaghideInputs: true
Hide specific fieldsFunction filterhideInputs: (i) => filter(i)
Remove PII patternsRegex rulescreateAnonymizer([rules])
Path-specificAnonymizer with pathscreateAnonymizer(rules, {paths})
Complex logicProcessor functioncreateAnonymizer(processor)
Public feedbackPresigned tokenscreatePresignedFeedbackToken()

Choosing Performance Configuration

Optimizing for production?
│
├─ High-volume tracing (>1000 traces/min)?
│  ├─ Enable sampling → tracingSamplingRate: 0.1 (10%)
│  ├─ Enable batching → autoBatchTracing: true
│  ├─ Increase concurrency → traceBatchConcurrency: 10
│  └─ Increase batch size → batchSizeBytesLimit: 50_000_000
│
├─ Serverless/short-lived?
│  ├─ Always flush → await client.awaitPendingTraceBatches()
│  ├─ Consider blocking mode → blockOnRootRunFinalization: true
│  └─ Manual flush mode → manualFlushMode: true + flush()
│
├─ Memory-constrained?
│  ├─ Reduce batch size → batchSizeBytesLimit: 10_000_000
│  ├─ Limit operations per batch → batchSizeLimit: 50
│  └─ Lower memory limit → maxIngestMemoryBytes: 500_000_000
│
├─ Low-latency requirements?
│  ├─ Disable batching → autoBatchTracing: false
│  └─ Reduce timeout → timeout_ms: 5000
│
└─ Development/debugging?
   ├─ No batching → autoBatchTracing: false
   ├─ Blocking mode → blockOnRootRunFinalization: true
   └─ Debug logging → debug: true

Configuration Presets:

// High-volume production
const productionClient = new Client({
  tracingSamplingRate: 0.1,
  autoBatchTracing: true,
  batchSizeBytesLimit: 50_000_000,
  traceBatchConcurrency: 10,
  hideInputs: (i) => redactPII(i)
});

// Serverless (Lambda, Cloud Functions)
const serverlessClient = new Client({
  autoBatchTracing: true,
  blockOnRootRunFinalization: false
});
// Always: await client.awaitPendingTraceBatches() before return

// Development/Testing
const devClient = new Client({
  autoBatchTracing: false,
  blockOnRootRunFinalization: true,
  debug: true
});

// Memory-constrained
const lightweightClient = new Client({
  batchSizeBytesLimit: 10_000_000,
  batchSizeLimit: 50,
  maxIngestMemoryBytes: 500_000_000
});

Choosing Dataset Operations

Working with datasets?
│
├─ Creating dataset?
│  └─ createDataset({ datasetName, dataType })
│
├─ Adding examples?
│  └─ See "Choosing Example Creation Method" above
│
├─ Finding similar examples?
│  ├─ First time? → indexDataset() then similarExamples()
│  └─ Already indexed? → similarExamples(inputs, datasetId)
│
├─ Versioning dataset?
│  ├─ Create version → Add examples (auto-versioned)
│  ├─ Tag version → updateDatasetTag({ tag, asOf })
│  ├─ Read version → readDatasetVersion({ asOf })
│  └─ Compare versions → diffDatasetVersions({ fromVersion, toVersion })
│
├─ Organizing examples?
│  ├─ Create splits → updateDatasetSplits({ splitName, exampleIds })
│  ├─ List splits → listDatasetSplits()
│  └─ Remove from split → updateDatasetSplits({ remove: true })
│
├─ Sharing dataset?
│  ├─ Share publicly → shareDataset(datasetId)
│  ├─ Clone public → clonePublicDataset(shareToken)
│  ├─ Unshare → unshareDataset(datasetId)
│  └─ Read shared → readSharedDataset(shareToken)
│
└─ Exporting dataset?
   ├─ For OpenAI fine-tuning → readDatasetOpenaiFinetuning()
   └─ As CSV → List examples and format manually

Choosing Feedback Collection Method

Collecting feedback on runs?
│
├─ Direct API access available?
│  ├─ Simple score/comment → createFeedback(run_id, key, { score })
│  ├─ With correction → createFeedback(run_id, key, { score, correction })
│  └─ From evaluator → logEvaluationFeedback(params)
│
├─ Public/external collection?
│  ├─ Create token → createPresignedFeedbackToken(runId, key)
│  ├─ Share URL → token.url (users POST without auth)
│  └─ List tokens → listPresignedFeedbackTokens()
│
├─ Human review workflow?
│  ├─ Create queue → createAnnotationQueue()
│  ├─ Add runs → addRunsToAnnotationQueue()
│  ├─ Review → getRunFromAnnotationQueue()
│  └─ Submit → createFeedback() with queue context
│
└─ Automated/model feedback?
   ├─ From LLM judge → createFeedback(run_id, key, { feedbackSourceType: "model" })
   ├─ From API check → createFeedback(run_id, key, { feedbackSourceType: "api" })
   └─ From evaluation → Automatically logged by evaluate()

Choosing Run Query Method

Need to query runs?
│
├─ Single run by ID?
│  ├─ Basic info → readRun(runId)
│  └─ With children → readRun(runId, { loadChildRuns: true })
│
├─ Multiple runs?
│  ├─ All in project → listRuns({ projectName })
│  ├─ Root runs only → listRuns({ isRoot: true })
│  ├─ By trace → listRuns({ traceId })
│  ├─ By parent → listRuns({ parentRunId })
│  ├─ Failed only → listRuns({ error: true })
│  ├─ Time range → listRuns({ startTime, endTime })
│  └─ Complex filter → listRuns({ filter: 'and(...)' })
│
├─ Grouped analytics?
│  ├─ By conversation → listGroupRuns({ groupBy: "metadata.conversation_id" })
│  ├─ By user → listGroupRuns({ groupBy: "metadata.user_id" })
│  └─ Custom grouping → listGroupRuns({ groupBy: "metadata.custom_field" })
│
├─ Just statistics?
│  └─ getRunStats({ projectName, filter })
│
└─ Public shared runs?
   └─ listSharedRuns({ shareToken })

Choosing Filter Strategy

Need to filter runs?
│
├─ Simple filters?
│  ├─ By project → projectName: "my-project"
│  ├─ By type → runType: "llm"
│  ├─ By error → error: true
│  ├─ By time → startTime/endTime: Date
│  └─ Root only → isRoot: true
│
├─ Complex conditions?
│  ├─ Single condition → filter: 'eq(status, "success")'
│  ├─ Comparison → filter: 'gte(latency, 1000)'
│  ├─ Multiple AND → filter: 'and(eq(error, null), gte(latency, 1000))'
│  ├─ Multiple OR → filter: 'or(eq(run_type, "llm"), eq(run_type, "chain"))'
│  ├─ Array contains → filter: 'has(tags, "production")'
│  └─ Text search → filter: 'search(name, "customer")'
│
├─ Trace-level filtering?
│  ├─ Filter root run → traceFilter: 'eq(name, "pipeline")'
│  ├─ Filter children → treeFilter: 'eq(run_type, "llm")'
│  └─ Both → Use traceFilter + treeFilter together
│
└─ Field selection?
   └─ select: ["id", "name", "start_time"]

Filter Comparators:

  • eq(field, value) - Equals
  • neq(field, value) - Not equals
  • gt(field, value) - Greater than
  • gte(field, value) - Greater than or equal
  • lt(field, value) - Less than
  • lte(field, value) - Less than or equal
  • has(array_field, value) - Array contains
  • search(field, text) - Text search
  • and(condition1, condition2, ...) - Logical AND
  • or(condition1, condition2, ...) - Logical OR

Choosing Client Configuration Strategy

Configuring LangSmith client?
│
├─ Environment-based (recommended)?
│  └─ Use new Client() with LANGCHAIN_API_KEY, LANGCHAIN_PROJECT env vars
│
├─ Explicit configuration?
│  ├─ Basic → new Client({ apiKey, apiUrl })
│  ├─ With privacy → new Client({ hideInputs, hideOutputs })
│  ├─ With anonymization → new Client({ anonymizer })
│  └─ Full custom → new Client({ ...all options })
│
├─ Different configs per environment?
│  ├─ Dev → autoBatchTracing: false, debug: true
│  ├─ Staging → tracingSamplingRate: 0.5
│  └─ Production → tracingSamplingRate: 0.1, hideInputs: true
│
└─ Using proxy/custom networking?
   ├─ Global → overrideFetchImplementation(customFetch)
   └─ Per-client → new Client({ fetchImplementation: customFetch })

Choosing Prompt Management Approach

Managing prompts?
│
├─ Creating new prompt?
│  └─ createPrompt("prompt-name", { description, tags })
│
├─ Versioning prompt?
│  ├─ New version → pushPrompt("name", { object, description })
│  ├─ Tag version → pushPrompt("name:tag", { object })
│  └─ View history → listCommits({ promptName })
│
├─ Using prompt in code?
│  ├─ Latest version → pullPrompt({ promptName })
│  ├─ Specific version → pullPrompt({ promptName, commit: "hash" })
│  ├─ Tagged version → pullPrompt({ promptName: "name:tag" })
│  └─ With caching → Use Cache with fetchFunc
│
└─ Sharing prompts?
   ├─ Make public → updatePrompt({ isPublic: true })
   ├─ Like prompt → likePrompt(promptName)
   └─ Check exists → promptExists(promptName)

Choosing Testing Framework

Need test-driven evaluation?
│
├─ Already using Jest?
│  └─ import { test, expect } from "langsmith/jest"
│
├─ Already using Vitest?
│  └─ import { test, expect } from "langsmith/vitest"
│     (requires reporter in vitest.config.ts)
│
├─ No test framework?
│  ├─ Want test framework features → Choose Jest or Vitest
│  └─ Just evaluate → Use evaluate() directly
│
└─ Custom test harness?
   └─ Use Client API directly with evaluate()

Framework Comparison:

FeatureJestVitestDirect evaluate()
Test per exampleManual loop
Custom matchersN/A
Parallel execution✓ (faster)Custom control
Watch modeN/A
Setup requiredMinimalConfig fileNone
Best forReact, NodeVite, modernScripts, custom

Choosing Dataset Sharing Method

Need to share dataset?
│
├─ Within organization?
│  └─ Normal sharing: shareDataset(datasetId)
│
├─ Public sharing?
│  ├─ Share → shareDataset(datasetId, customShareId)
│  ├─ Get share URL → Response contains share_token
│  └─ Others clone → clonePublicDataset(shareToken)
│
├─ Reading shared dataset?
│  ├─ Dataset metadata → readSharedDataset(shareToken)
│  └─ Examples → listSharedExamples(shareToken)
│
└─ Collaboration?
   ├─ Share with custom ID → shareDataset(datasetId, "team-qa-set")
   └─ Version control → Use dataset versioning + sharing

Choosing Annotation Queue Strategy

Need human review?
│
├─ Quality assurance?
│  ├─ Random sampling → createAnnotationQueue() + random selection
│  └─ Edge cases → Filter runs then addRunsToAnnotationQueue()
│
├─ Model comparison?
│  ├─ Side-by-side → createComparativeExperiment() + queue
│  └─ Sequential → Add runs from different experiments
│
├─ Training data collection?
│  └─ Annotation queue + feedback with corrections
│
└─ Active learning?
   ├─ Low confidence → Filter by metadata.confidence then add to queue
   └─ High error rate → Filter by error then add to queue

Choosing Between Similar Methods

Run Management: create vs update vs batch

Creating/updating runs?
├─ Single run, manual → createRun() then updateRun()
├─ Many runs → batchIngestRuns({ post: [...], patch: [...] })
└─ Very large batch → multipartIngestRuns()

Feedback: create vs presigned vs evaluate

Collecting feedback?
├─ Direct API access → createFeedback()
├─ No API key → createPresignedFeedbackToken()
├─ From evaluator → logEvaluationFeedback()
└─ Automatic from eval → Use evaluate() with evaluators

Tracing: traceable vs RunTree vs wrappers

Adding tracing?
├─ Own functions → traceable()
├─ Third-party SDK → wrappers (wrapOpenAI, etc.)
├─ Non-function code → RunTree
└─ Framework (LangChain) → getLangchainCallbacks()

Advanced Decision: When to Use Multiple Clients

Need multiple clients?
│
├─ Different projects?
│  └─ One client per project: new Client({ projectName })
│
├─ Different privacy settings?
│  ├─ Public client → new Client({ hideInputs: false })
│  └─ Private client → new Client({ hideInputs: true })
│
├─ Different sampling rates?
│  ├─ Dev (100%) → new Client({ tracingSamplingRate: 1.0 })
│  └─ Prod (10%) → new Client({ tracingSamplingRate: 0.1 })
│
└─ Different workspaces?
   └─ new Client({ workspaceId: "workspace-123" })

When NOT to use multiple clients:

  • Same project, same config → Reuse single client
  • Just different metadata → Use traceable config, not new client

Context Management Decision

Managing run tree context?
│
├─ Within traceable function?
│  ├─ Access context → getCurrentRunTree()
│  ├─ Optional access → getCurrentRunTree(true)
│  └─ From function → traceableFn.getCurrentRunTree()
│
├─ Need to set context?
│  └─ withRunTree(runTree, () => {...})
│
├─ Check if traceable?
│  └─ isTraceableFunction(fn)
│
└─ Need ROOT marker?
   └─ import { ROOT } from "langsmith/traceable"

Related Documentation

  • Setup Guide - Initial configuration decisions
  • Tracing Guide - Tracing approach details
  • Evaluation Guide - Evaluation strategy details
  • API Reference - Complete method reference
  • Quick Reference - Common patterns
  • Anti-Patterns - What to avoid