or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/langsmith@0.4.x

docs

index.md
tile.json

tessl/npm-langsmith

tessl install tessl/npm-langsmith@0.4.3

TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.

core-concepts.mddocs/concepts/

Core Concepts

Understanding the fundamental concepts in LangSmith: Projects, Runs, Datasets, Examples, and Feedback.

Projects

Projects (also called Sessions or TracerSessions) organize your traces and runs into logical groups.

What are Projects?

Projects are containers for organizing related traces:

  • Group traces by environment (dev, staging, production)
  • Separate different features or use cases
  • Organize by time period or experiment
  • Track specific deployments or versions

Working with Projects

import { Client } from "langsmith";

const client = new Client();

// Create project
const project = await client.createProject({
  projectName: "my-chatbot-v1",
  description: "Production chatbot deployment",
  metadata: { version: "1.0.0", env: "production" }
});

// Read project
const project = await client.readProject({
  projectName: "my-chatbot-v1"
});

// List projects
for await (const project of client.listProjects({ limit: 100 })) {
  console.log(project.name);
}

Default Project

Set default project via environment variable:

export LANGCHAIN_PROJECT=my-default-project

Or in code:

const client = new Client();
const defaultProject = getDefaultProjectName();

Runs

Runs are individual traces of function executions, LLM calls, or operations.

What are Runs?

Runs capture:

  • Input and output data
  • Execution time and latency
  • Errors and exceptions
  • Metadata and tags
  • Hierarchical relationships (parent/child)

Run Types

type RunType =
  | "llm"        // Direct language model call
  | "chain"      // Sequence of operations
  | "tool"       // Individual tool/function
  | "retriever"  // Document retrieval
  | "embedding"  // Embedding generation
  | "prompt"     // Prompt formatting
  | "parser";    // Output parsing

Creating Runs

import { traceable } from "langsmith/traceable";

// Automatic via traceable
const myFunction = traceable(
  async (input: string) => processInput(input),
  { name: "my-function", run_type: "chain" }
);

await myFunction("test");  // Creates run automatically

Hierarchical Runs

Runs can have parent-child relationships:

const parent = traceable(async (input: string) => {
  const step1 = await child1(input);  // Child run
  const step2 = await child2(step1);   // Child run
  return step2;
}, { name: "parent-operation" });

const child1 = traceable(
  async (x: string) => x.toUpperCase(),
  { name: "uppercase", run_type: "tool" }
);

const child2 = traceable(
  async (x: string) => x + "!",
  { name: "add-exclamation", run_type: "tool" }
);

Datasets

Datasets are collections of examples used for testing and evaluation.

What are Datasets?

Datasets contain:

  • Input examples
  • Expected outputs (optional)
  • Metadata for organization
  • Version tracking
  • Split assignments (train/test/val)

Data Types

type DataType =
  | "kv"      // Key-value data (most flexible)
  | "llm"     // LLM input/output format
  | "chat";   // Chat message format

Creating Datasets

import { Client } from "langsmith";

const client = new Client();

const dataset = await client.createDataset({
  datasetName: "customer-support-qa",
  description: "Q&A pairs for customer support",
  dataType: "kv"
});

await client.createExamples({
  datasetId: dataset.id,
  inputs: [
    { question: "How do I reset my password?" },
    { question: "What are your business hours?" }
  ],
  outputs: [
    { answer: "Click 'Forgot Password' on the login page." },
    { answer: "We're open Monday-Friday, 9am-5pm EST." }
  ]
});

Dataset Versioning

// Create version snapshot
const version = await client.createDatasetVersion({
  datasetName: "qa-dataset",
  name: "v1.0.0",
  description: "Initial release version"
});

// Compare versions
const diff = await client.diffDatasetVersions({
  datasetName: "qa-dataset",
  fromVersion: "v1.0.0",
  toVersion: "v1.1.0"
});

console.log("Examples added:", diff.examples_added.length);
console.log("Examples modified:", diff.examples_modified.length);

Examples

Examples are individual data points within datasets.

What are Examples?

Examples consist of:

  • Input data (required)
  • Output data (optional, for evaluation)
  • Metadata
  • Split assignment (train/test/validation)
  • Source run reference

Working with Examples

import { Client } from "langsmith";

const client = new Client();

// Create single example
const example = await client.createExample({
  dataset_id: dataset.id,
  inputs: { question: "What is LangSmith?" },
  outputs: { answer: "LangSmith is a platform..." },
  metadata: { category: "product-info" }
});

// Bulk create
await client.createExamples({
  datasetName: "qa-dataset",
  inputs: [
    { question: "What is 2+2?" },
    { question: "What is 3+3?" }
  ],
  outputs: [
    { answer: "4" },
    { answer: "6" }
  ]
});

// List examples
for await (const example of client.listExamples({
  datasetName: "qa-dataset",
  limit: 100
})) {
  console.log(example.inputs, example.outputs);
}

Feedback

Feedback represents evaluative information about a run's performance.

What is Feedback?

Feedback can come from:

  • Human feedback: Manual annotations and corrections
  • Model feedback: LLM-as-judge evaluations
  • API feedback: Automated feedback from external systems
  • App feedback: End-user ratings and comments

Feedback Types

Feedback supports:

  • Quantitative scores: Numeric ratings (0-1), booleans
  • Qualitative values: Text comments, categorical labels
  • Corrections: Suggested improvements
  • Metadata: Source information, evaluator details

Creating Feedback

import { Client } from "langsmith";

const client = new Client();

// Thumbs up/down
await client.createFeedback(runId, "user_rating", {
  score: 1,  // 1 = thumbs up, 0 = thumbs down,
  comment: "Great response!",
});

// Numeric score
await client.createFeedback(runId, "accuracy", {
  score: 0.95,
  comment: "Highly accurate",
});

// With correction
await client.createFeedback(runId, "correctness", {
  score: 0,
  correction: {
  outputs: { answer: "The correct answer is..." },
  },
});

// Model-generated feedback
await client.createFeedback(runId, "coherence", {
  score: 0.88,
  feedback_source_type: "model",
  source_run_id: judgeRunId,
});

Presigned Feedback Tokens

Allow external systems to submit feedback without API keys:

const token = await client.createPresignedFeedbackToken({
  run_id: runId,
  feedback_key: "user_rating",
  expires_in: 86400  // 24 hours
});

// Share token.url with users
// They can POST feedback without authentication

Relationships

How Concepts Connect

Project
  └── Run (root)
       ├── Run (child)
       ├── Run (child)
       │    └── Run (grandchild)
       └── Feedback
            ├── Feedback entry 1
            └── Feedback entry 2

Dataset
  ├── Example 1
  ├── Example 2
  └── Example 3

Evaluation
  ├── Uses Dataset
  ├── Creates Runs
  └── Generates Feedback

Example Workflow

import { Client, evaluate } from "langsmith";

const client = new Client();

// 1. Create project (implicit via environment or config)
const projectName = "my-app";

// 2. Trace runs to the project
const myBot = traceable(
  async (input) => processInput(input),
  { project_name: projectName }
);

await myBot("test");  // Creates run in project

// 3. Create dataset
const dataset = await client.createDataset({
  datasetName: "test-set"
});

await client.createExamples({
  datasetId: dataset.id,
  inputs: [{ question: "test" }],
  outputs: [{ answer: "result" }]
});

// 4. Run evaluation (creates runs and feedback)
const results = await evaluate(myBot, {
  data: "test-set",
  evaluators: [evaluator]
});

// 5. Collect user feedback on production runs
await client.createFeedback(runId, "user_satisfaction", {
  score: 1,
});

Best Practices

Project Organization

// Separate by environment
LANGCHAIN_PROJECT=dev-chatbot
LANGCHAIN_PROJECT=staging-chatbot
LANGCHAIN_PROJECT=production-chatbot

// Separate by feature
LANGCHAIN_PROJECT=feature-translation
LANGCHAIN_PROJECT=feature-summarization

// Separate by experiment
LANGCHAIN_PROJECT=experiment-gpt4-baseline
LANGCHAIN_PROJECT=experiment-claude-comparison

Run Naming

// Good: Descriptive names
{ name: "summarize-document", run_type: "chain" }
{ name: "retrieve-context", run_type: "retriever" }
{ name: "openai-chat", run_type: "llm" }

// Bad: Generic names
{ name: "func1", run_type: "chain" }
{ name: "process", run_type: "chain" }

Dataset Management

// Version datasets for reproducibility
datasetName: "qa-eval-v1.0.0"
datasetName: "qa-eval-v1.1.0"

// Use descriptive names
datasetName: "customer-support-qa"
datasetName: "translation-test-set"

// Include metadata
metadata: {
  created_by: "data-team",
  purpose: "regression-testing",
  date: "2024-01-15"
}

Feedback Keys

// Good: Consistent, descriptive keys
"correctness"
"helpfulness"
"response_quality"
"safety_compliance"
"user_satisfaction"

// Avoid: Generic or ambiguous keys
"feedback1"
"rating"
"score"

Related Documentation

  • Setup Guide - Get started with LangSmith
  • Tracing Guide - Trace your applications
  • Evaluation Guide - Evaluate with datasets
  • Client API - Complete API reference
  • Datasets API - Dataset management
  • Feedback API - Feedback collection