tessl install tessl/npm-langsmith@0.4.3TypeScript client SDK for the LangSmith LLM tracing, evaluation, and monitoring platform.
Understanding the fundamental concepts in LangSmith: Projects, Runs, Datasets, Examples, and Feedback.
Projects (also called Sessions or TracerSessions) organize your traces and runs into logical groups.
Projects are containers for organizing related traces:
import { Client } from "langsmith";
const client = new Client();
// Create project
const project = await client.createProject({
projectName: "my-chatbot-v1",
description: "Production chatbot deployment",
metadata: { version: "1.0.0", env: "production" }
});
// Read project
const project = await client.readProject({
projectName: "my-chatbot-v1"
});
// List projects
for await (const project of client.listProjects({ limit: 100 })) {
console.log(project.name);
}Set default project via environment variable:
export LANGCHAIN_PROJECT=my-default-projectOr in code:
const client = new Client();
const defaultProject = getDefaultProjectName();Runs are individual traces of function executions, LLM calls, or operations.
Runs capture:
type RunType =
| "llm" // Direct language model call
| "chain" // Sequence of operations
| "tool" // Individual tool/function
| "retriever" // Document retrieval
| "embedding" // Embedding generation
| "prompt" // Prompt formatting
| "parser"; // Output parsingimport { traceable } from "langsmith/traceable";
// Automatic via traceable
const myFunction = traceable(
async (input: string) => processInput(input),
{ name: "my-function", run_type: "chain" }
);
await myFunction("test"); // Creates run automaticallyRuns can have parent-child relationships:
const parent = traceable(async (input: string) => {
const step1 = await child1(input); // Child run
const step2 = await child2(step1); // Child run
return step2;
}, { name: "parent-operation" });
const child1 = traceable(
async (x: string) => x.toUpperCase(),
{ name: "uppercase", run_type: "tool" }
);
const child2 = traceable(
async (x: string) => x + "!",
{ name: "add-exclamation", run_type: "tool" }
);Datasets are collections of examples used for testing and evaluation.
Datasets contain:
type DataType =
| "kv" // Key-value data (most flexible)
| "llm" // LLM input/output format
| "chat"; // Chat message formatimport { Client } from "langsmith";
const client = new Client();
const dataset = await client.createDataset({
datasetName: "customer-support-qa",
description: "Q&A pairs for customer support",
dataType: "kv"
});
await client.createExamples({
datasetId: dataset.id,
inputs: [
{ question: "How do I reset my password?" },
{ question: "What are your business hours?" }
],
outputs: [
{ answer: "Click 'Forgot Password' on the login page." },
{ answer: "We're open Monday-Friday, 9am-5pm EST." }
]
});// Create version snapshot
const version = await client.createDatasetVersion({
datasetName: "qa-dataset",
name: "v1.0.0",
description: "Initial release version"
});
// Compare versions
const diff = await client.diffDatasetVersions({
datasetName: "qa-dataset",
fromVersion: "v1.0.0",
toVersion: "v1.1.0"
});
console.log("Examples added:", diff.examples_added.length);
console.log("Examples modified:", diff.examples_modified.length);Examples are individual data points within datasets.
Examples consist of:
import { Client } from "langsmith";
const client = new Client();
// Create single example
const example = await client.createExample({
dataset_id: dataset.id,
inputs: { question: "What is LangSmith?" },
outputs: { answer: "LangSmith is a platform..." },
metadata: { category: "product-info" }
});
// Bulk create
await client.createExamples({
datasetName: "qa-dataset",
inputs: [
{ question: "What is 2+2?" },
{ question: "What is 3+3?" }
],
outputs: [
{ answer: "4" },
{ answer: "6" }
]
});
// List examples
for await (const example of client.listExamples({
datasetName: "qa-dataset",
limit: 100
})) {
console.log(example.inputs, example.outputs);
}Feedback represents evaluative information about a run's performance.
Feedback can come from:
Feedback supports:
import { Client } from "langsmith";
const client = new Client();
// Thumbs up/down
await client.createFeedback(runId, "user_rating", {
score: 1, // 1 = thumbs up, 0 = thumbs down,
comment: "Great response!",
});
// Numeric score
await client.createFeedback(runId, "accuracy", {
score: 0.95,
comment: "Highly accurate",
});
// With correction
await client.createFeedback(runId, "correctness", {
score: 0,
correction: {
outputs: { answer: "The correct answer is..." },
},
});
// Model-generated feedback
await client.createFeedback(runId, "coherence", {
score: 0.88,
feedback_source_type: "model",
source_run_id: judgeRunId,
});Allow external systems to submit feedback without API keys:
const token = await client.createPresignedFeedbackToken({
run_id: runId,
feedback_key: "user_rating",
expires_in: 86400 // 24 hours
});
// Share token.url with users
// They can POST feedback without authenticationProject
└── Run (root)
├── Run (child)
├── Run (child)
│ └── Run (grandchild)
└── Feedback
├── Feedback entry 1
└── Feedback entry 2
Dataset
├── Example 1
├── Example 2
└── Example 3
Evaluation
├── Uses Dataset
├── Creates Runs
└── Generates Feedbackimport { Client, evaluate } from "langsmith";
const client = new Client();
// 1. Create project (implicit via environment or config)
const projectName = "my-app";
// 2. Trace runs to the project
const myBot = traceable(
async (input) => processInput(input),
{ project_name: projectName }
);
await myBot("test"); // Creates run in project
// 3. Create dataset
const dataset = await client.createDataset({
datasetName: "test-set"
});
await client.createExamples({
datasetId: dataset.id,
inputs: [{ question: "test" }],
outputs: [{ answer: "result" }]
});
// 4. Run evaluation (creates runs and feedback)
const results = await evaluate(myBot, {
data: "test-set",
evaluators: [evaluator]
});
// 5. Collect user feedback on production runs
await client.createFeedback(runId, "user_satisfaction", {
score: 1,
});// Separate by environment
LANGCHAIN_PROJECT=dev-chatbot
LANGCHAIN_PROJECT=staging-chatbot
LANGCHAIN_PROJECT=production-chatbot
// Separate by feature
LANGCHAIN_PROJECT=feature-translation
LANGCHAIN_PROJECT=feature-summarization
// Separate by experiment
LANGCHAIN_PROJECT=experiment-gpt4-baseline
LANGCHAIN_PROJECT=experiment-claude-comparison// Good: Descriptive names
{ name: "summarize-document", run_type: "chain" }
{ name: "retrieve-context", run_type: "retriever" }
{ name: "openai-chat", run_type: "llm" }
// Bad: Generic names
{ name: "func1", run_type: "chain" }
{ name: "process", run_type: "chain" }// Version datasets for reproducibility
datasetName: "qa-eval-v1.0.0"
datasetName: "qa-eval-v1.1.0"
// Use descriptive names
datasetName: "customer-support-qa"
datasetName: "translation-test-set"
// Include metadata
metadata: {
created_by: "data-team",
purpose: "regression-testing",
date: "2024-01-15"
}// Good: Consistent, descriptive keys
"correctness"
"helpfulness"
"response_quality"
"safety_compliance"
"user_satisfaction"
// Avoid: Generic or ambiguous keys
"feedback1"
"rating"
"score"