or run

tessl search

tessl/npm-langsmith

tessl install tessl/npm-langsmith@0.4.3

TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.

Core Concepts

Understanding the fundamental concepts in LangSmith: Projects, Runs, Datasets, Examples, and Feedback.

Projects

Projects (also called Sessions or TracerSessions) organize your traces and runs into logical groups.

What are Projects?

Projects are containers for organizing related traces:

Group traces by environment (dev, staging, production)
Separate different features or use cases
Organize by time period or experiment
Track specific deployments or versions

Working with Projects

import { Client } from "langsmith";

const client = new Client();

// Create project
const project = await client.createProject({
  projectName: "my-chatbot-v1",
  description: "Production chatbot deployment",
  metadata: { version: "1.0.0", env: "production" }
});

// Read project
const project = await client.readProject({
  projectName: "my-chatbot-v1"
});

// List projects
for await (const project of client.listProjects({ limit: 100 })) {
  console.log(project.name);
}

Default Project

Set default project via environment variable:

export LANGCHAIN_PROJECT=my-default-project

Or in code:

const client = new Client();
const defaultProject = getDefaultProjectName();

Runs

Runs are individual traces of function executions, LLM calls, or operations.

What are Runs?

Runs capture:

Input and output data
Execution time and latency
Errors and exceptions
Metadata and tags
Hierarchical relationships (parent/child)

Run Types

type RunType =
  | "llm"        // Direct language model call
  | "chain"      // Sequence of operations
  | "tool"       // Individual tool/function
  | "retriever"  // Document retrieval
  | "embedding"  // Embedding generation
  | "prompt"     // Prompt formatting
  | "parser";    // Output parsing

Creating Runs

import { traceable } from "langsmith/traceable";

// Automatic via traceable
const myFunction = traceable(
  async (input: string) => processInput(input),
  { name: "my-function", run_type: "chain" }
);

await myFunction("test");  // Creates run automatically

Hierarchical Runs

Runs can have parent-child relationships:

const parent = traceable(async (input: string) => {
  const step1 = await child1(input);  // Child run
  const step2 = await child2(step1);   // Child run
  return step2;
}, { name: "parent-operation" });

const child1 = traceable(
  async (x: string) => x.toUpperCase(),
  { name: "uppercase", run_type: "tool" }
);

const child2 = traceable(
  async (x: string) => x + "!",
  { name: "add-exclamation", run_type: "tool" }
);

Datasets

Datasets are collections of examples used for testing and evaluation.

What are Datasets?

Datasets contain:

Input examples
Expected outputs (optional)
Metadata for organization
Version tracking
Split assignments (train/test/val)

Data Types

type DataType =
  | "kv"      // Key-value data (most flexible)
  | "llm"     // LLM input/output format
  | "chat";   // Chat message format

Creating Datasets

import { Client } from "langsmith";

const client = new Client();

const dataset = await client.createDataset({
  datasetName: "customer-support-qa",
  description: "Q&A pairs for customer support",
  dataType: "kv"
});

await client.createExamples({
  datasetId: dataset.id,
  inputs: [
    { question: "How do I reset my password?" },
    { question: "What are your business hours?" }
  ],
  outputs: [
    { answer: "Click 'Forgot Password' on the login page." },
    { answer: "We're open Monday-Friday, 9am-5pm EST." }
  ]
});

Dataset Versioning

// Create version snapshot
const version = await client.createDatasetVersion({
  datasetName: "qa-dataset",
  name: "v1.0.0",
  description: "Initial release version"
});

// Compare versions
const diff = await client.diffDatasetVersions({
  datasetName: "qa-dataset",
  fromVersion: "v1.0.0",
  toVersion: "v1.1.0"
});

console.log("Examples added:", diff.examples_added.length);
console.log("Examples modified:", diff.examples_modified.length);

Examples

Examples are individual data points within datasets.

What are Examples?

Examples consist of:

Input data (required)
Output data (optional, for evaluation)
Metadata
Split assignment (train/test/validation)
Source run reference

Working with Examples

import { Client } from "langsmith";

const client = new Client();

// Create single example
const example = await client.createExample({
  dataset_id: dataset.id,
  inputs: { question: "What is LangSmith?" },
  outputs: { answer: "LangSmith is a platform..." },
  metadata: { category: "product-info" }
});

// Bulk create
await client.createExamples({
  datasetName: "qa-dataset",
  inputs: [
    { question: "What is 2+2?" },
    { question: "What is 3+3?" }
  ],
  outputs: [
    { answer: "4" },
    { answer: "6" }
  ]
});

// List examples
for await (const example of client.listExamples({
  datasetName: "qa-dataset",
  limit: 100
})) {
  console.log(example.inputs, example.outputs);
}

Feedback

Feedback represents evaluative information about a run's performance.

What is Feedback?

Feedback can come from:

Human feedback: Manual annotations and corrections
Model feedback: LLM-as-judge evaluations
API feedback: Automated feedback from external systems
App feedback: End-user ratings and comments

Feedback Types

Feedback supports:

Quantitative scores: Numeric ratings (0-1), booleans
Qualitative values: Text comments, categorical labels
Corrections: Suggested improvements
Metadata: Source information, evaluator details

Creating Feedback

import { Client } from "langsmith";

const client = new Client();

// Thumbs up/down
await client.createFeedback(runId, "user_rating", {
  score: 1,  // 1 = thumbs up, 0 = thumbs down,
  comment: "Great response!",
});

// Numeric score
await client.createFeedback(runId, "accuracy", {
  score: 0.95,
  comment: "Highly accurate",
});

// With correction
await client.createFeedback(runId, "correctness", {
  score: 0,
  correction: {
  outputs: { answer: "The correct answer is..." },
  },
});

// Model-generated feedback
await client.createFeedback(runId, "coherence", {
  score: 0.88,
  feedback_source_type: "model",
  source_run_id: judgeRunId,
});

Presigned Feedback Tokens

Allow external systems to submit feedback without API keys:

const token = await client.createPresignedFeedbackToken({
  run_id: runId,
  feedback_key: "user_rating",
  expires_in: 86400  // 24 hours
});

// Share token.url with users
// They can POST feedback without authentication

Relationships

How Concepts Connect

Project
  └── Run (root)
       ├── Run (child)
       ├── Run (child)
       │    └── Run (grandchild)
       └── Feedback
            ├── Feedback entry 1
            └── Feedback entry 2

Dataset
  ├── Example 1
  ├── Example 2
  └── Example 3

Evaluation
  ├── Uses Dataset
  ├── Creates Runs
  └── Generates Feedback

Example Workflow

import { Client, evaluate } from "langsmith";

const client = new Client();

// 1. Create project (implicit via environment or config)
const projectName = "my-app";

// 2. Trace runs to the project
const myBot = traceable(
  async (input) => processInput(input),
  { project_name: projectName }
);

await myBot("test");  // Creates run in project

// 3. Create dataset
const dataset = await client.createDataset({
  datasetName: "test-set"
});

await client.createExamples({
  datasetId: dataset.id,
  inputs: [{ question: "test" }],
  outputs: [{ answer: "result" }]
});

// 4. Run evaluation (creates runs and feedback)
const results = await evaluate(myBot, {
  data: "test-set",
  evaluators: [evaluator]
});

// 5. Collect user feedback on production runs
await client.createFeedback(runId, "user_satisfaction", {
  score: 1,
});

Best Practices

Project Organization

// Separate by environment
LANGCHAIN_PROJECT=dev-chatbot
LANGCHAIN_PROJECT=staging-chatbot
LANGCHAIN_PROJECT=production-chatbot

// Separate by feature
LANGCHAIN_PROJECT=feature-translation
LANGCHAIN_PROJECT=feature-summarization

// Separate by experiment
LANGCHAIN_PROJECT=experiment-gpt4-baseline
LANGCHAIN_PROJECT=experiment-claude-comparison

Run Naming

// Good: Descriptive names
{ name: "summarize-document", run_type: "chain" }
{ name: "retrieve-context", run_type: "retriever" }
{ name: "openai-chat", run_type: "llm" }

// Bad: Generic names
{ name: "func1", run_type: "chain" }
{ name: "process", run_type: "chain" }

Dataset Management

// Version datasets for reproducibility
datasetName: "qa-eval-v1.0.0"
datasetName: "qa-eval-v1.1.0"

// Use descriptive names
datasetName: "customer-support-qa"
datasetName: "translation-test-set"

// Include metadata
metadata: {
  created_by: "data-team",
  purpose: "regression-testing",
  date: "2024-01-15"
}

Feedback Keys

// Good: Consistent, descriptive keys
"correctness"
"helpfulness"
"response_quality"
"safety_compliance"
"user_satisfaction"

// Avoid: Generic or ambiguous keys
"feedback1"
"rating"
"score"

Version

tessl/npm-langsmith

core-concepts.mddocs/concepts/

Core Concepts

Projects

What are Projects?

Working with Projects

Default Project

Runs

What are Runs?

Run Types

Creating Runs

Hierarchical Runs

Datasets

What are Datasets?

Data Types

Creating Datasets

Dataset Versioning

Examples

What are Examples?

Working with Examples

Feedback

What is Feedback?

Feedback Types

Creating Feedback

Presigned Feedback Tokens

Relationships

How Concepts Connect

Example Workflow

Best Practices

Project Organization

Run Naming

Dataset Management

Feedback Keys

Related Documentation

Version

tessl/npm-langsmith

core-concepts.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/concepts/

Core Concepts

Projects

What are Projects?

Working with Projects

Default Project

Runs

What are Runs?

Run Types

Creating Runs

Hierarchical Runs

Datasets

What are Datasets?

Data Types

Creating Datasets

Dataset Versioning

Examples

What are Examples?

Working with Examples

Feedback

What is Feedback?

Feedback Types

Creating Feedback

Presigned Feedback Tokens

Relationships

How Concepts Connect

Example Workflow

Best Practices

Project Organization

Run Naming

Dataset Management

Feedback Keys

Related Documentation

core-concepts.mddocs/concepts/