or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

autoevals-adapter.mdclient.mddatasets.mdexperiments.mdindex.mdmedia.mdprompts.mdscores.md

index.mddocs/

0

# Langfuse Client

1

2

Langfuse Client (@langfuse/client) is a comprehensive API client for Langfuse, an observability platform for LLM applications. It provides core abstractions for prompt management, dataset operations, experiment execution, scoring, and media handling. The package is designed for universal JavaScript environments including browsers, Node.js, Edge Functions, and other JavaScript runtimes.

3

4

## Package Information

5

6

- **Package Name**: @langfuse/client

7

- **Package Type**: npm

8

- **Language**: TypeScript

9

- **Installation**: `npm install @langfuse/client @langfuse/tracing @opentelemetry/api`

10

- **Version**: 4.2.0

11

- **Dependencies**:

12

- `@langfuse/core` (workspace)

13

- `@langfuse/tracing` (workspace)

14

- `@opentelemetry/api` (peer dependency)

15

- `mustache`

16

17

## Core Imports

18

19

```typescript { .api }

20

import { LangfuseClient } from '@langfuse/client';

21

```

22

23

CommonJS:

24

25

```javascript

26

const { LangfuseClient } = require('@langfuse/client');

27

```

28

29

Specific imports:

30

31

```typescript { .api }

32

import {

33

LangfuseClient,

34

TextPromptClient,

35

ChatPromptClient,

36

createEvaluatorFromAutoevals,

37

// Type imports

38

type ExperimentParams,

39

type ExperimentResult,

40

type ExperimentTask,

41

type ExperimentItem,

42

type ExperimentItemResult,

43

type ExperimentTaskParams,

44

type Evaluator,

45

type EvaluatorParams,

46

type Evaluation,

47

type RunEvaluator,

48

type RunEvaluatorParams,

49

type FetchedDataset,

50

type RunExperimentOnDataset,

51

type LinkDatasetItemFunction,

52

type ChatMessageOrPlaceholder,

53

type ChatMessageWithPlaceholders,

54

type LangchainMessagesPlaceholder,

55

type ChatMessageType

56

} from '@langfuse/client';

57

```

58

59

## Basic Usage

60

61

```typescript

62

import { LangfuseClient } from '@langfuse/client';

63

64

// Initialize client with credentials

65

const langfuse = new LangfuseClient({

66

publicKey: 'pk_...',

67

secretKey: 'sk_...',

68

baseUrl: 'https://cloud.langfuse.com' // optional, default shown

69

});

70

71

// Or use environment variables (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL)

72

const langfuse = new LangfuseClient();

73

74

// Fetch and compile a prompt

75

const prompt = await langfuse.prompt.get('my-prompt');

76

const compiled = prompt.compile({ variable: 'value' });

77

78

// Get a dataset and run an experiment

79

const dataset = await langfuse.dataset.get('my-dataset');

80

const result = await dataset.runExperiment({

81

name: 'Model Evaluation',

82

task: async ({ input }) => myModel.generate(input),

83

evaluators: [myEvaluator]

84

});

85

86

// Create scores

87

langfuse.score.create({

88

name: 'quality',

89

value: 0.85,

90

traceId: 'trace-123'

91

});

92

93

// Flush pending data before exit

94

await langfuse.flush();

95

```

96

97

## Architecture

98

99

Langfuse Client is built around several key components:

100

101

- **LangfuseClient**: Main entry point providing access to all managers and API client

102

- **Managers**: Specialized managers for different capabilities (PromptManager, DatasetManager, etc.)

103

- **Prompt Clients**: Type-specific clients for text and chat prompts with compilation support

104

- **Experiment Framework**: Comprehensive system for running experiments with evaluation

105

- **Batching & Caching**: Automatic batching for scores and intelligent caching for prompts

106

- **OpenTelemetry Integration**: Built-in support for distributed tracing via OTel spans

107

108

## Capabilities

109

110

### Client Initialization

111

112

The main LangfuseClient class provides centralized access to all Langfuse functionality and direct API access for advanced use cases.

113

114

```typescript { .api }

115

class LangfuseClient {

116

constructor(params?: LangfuseClientParams);

117

118

// Manager access

119

readonly api: LangfuseAPIClient;

120

readonly prompt: PromptManager;

121

readonly dataset: DatasetManager;

122

readonly score: ScoreManager;

123

readonly media: MediaManager;

124

readonly experiment: ExperimentManager;

125

126

// Utility methods

127

flush(): Promise<void>;

128

shutdown(): Promise<void>;

129

getTraceUrl(traceId: string): Promise<string>;

130

}

131

132

interface LangfuseClientParams {

133

publicKey?: string;

134

secretKey?: string;

135

baseUrl?: string;

136

timeout?: number;

137

additionalHeaders?: Record<string, string>;

138

}

139

```

140

141

[Client Initialization](./client.md)

142

143

### Prompt Management

144

145

Fetch, create, and manage prompts with built-in caching, version control, and variable substitution. Supports both text and chat prompts with LangChain compatibility.

146

147

```typescript { .api }

148

class PromptManager {

149

get(name: string, options?: { version?: number; label?: string; cacheTtlSeconds?: number; fallback?: string | ChatMessage[]; maxRetries?: number; type?: "chat" | "text"; fetchTimeoutMs?: number }): Promise<TextPromptClient | ChatPromptClient>;

150

create(body: CreatePromptRequest): Promise<TextPromptClient | ChatPromptClient>;

151

update(params: { name: string; version: number; newLabels: string[] }): Promise<Prompt>;

152

}

153

154

class TextPromptClient {

155

readonly name: string;

156

readonly version: number;

157

readonly prompt: string;

158

readonly config: unknown;

159

readonly labels: string[];

160

readonly tags: string[];

161

readonly isFallback: boolean;

162

163

compile(variables?: Record<string, string>): string;

164

getLangchainPrompt(): string;

165

toJSON(): string;

166

}

167

168

class ChatPromptClient {

169

readonly name: string;

170

readonly version: number;

171

readonly prompt: ChatMessageWithPlaceholders[];

172

readonly config: unknown;

173

readonly labels: string[];

174

readonly tags: string[];

175

readonly isFallback: boolean;

176

177

compile(

178

variables?: Record<string, string>,

179

placeholders?: Record<string, any>

180

): (ChatMessageOrPlaceholder | any)[];

181

getLangchainPrompt(options?: { placeholders?: Record<string, any> }): (ChatMessage | LangchainMessagesPlaceholder | any)[];

182

toJSON(): string;

183

}

184

```

185

186

[Prompt Management](./prompts.md)

187

188

### Dataset Operations

189

190

Retrieve datasets with all items, link dataset items to traces for experiment tracking, and run experiments directly on datasets.

191

192

```typescript { .api }

193

class DatasetManager {

194

get(name: string, options?: { fetchItemsPageSize: number }): Promise<FetchedDataset>;

195

}

196

197

type FetchedDataset = Dataset & {

198

items: (DatasetItem & { link: LinkDatasetItemFunction })[];

199

runExperiment: RunExperimentOnDataset;

200

};

201

202

type LinkDatasetItemFunction = (

203

obj: { otelSpan: Span },

204

runName: string,

205

runArgs?: { description?: string; metadata?: any }

206

) => Promise<DatasetRunItem>;

207

208

type RunExperimentOnDataset = (

209

params: Omit<ExperimentParams<any, any, Record<string, any>>, "data">

210

) => Promise<ExperimentResult<any, any, Record<string, any>>>;

211

```

212

213

[Dataset Operations](./datasets.md)

214

215

### Score Management

216

217

Create and manage scores for traces and observations with automatic batching for efficient API usage.

218

219

```typescript { .api }

220

class ScoreManager {

221

create(data: ScoreBody): void;

222

observation(observation: { otelSpan: Span }, data: Omit<ScoreBody, "traceId" | "sessionId" | "observationId" | "datasetRunId">): void;

223

trace(observation: { otelSpan: Span }, data: Omit<ScoreBody, "traceId" | "sessionId" | "observationId" | "datasetRunId">): void;

224

activeObservation(data: Omit<ScoreBody, "traceId" | "sessionId" | "observationId" | "datasetRunId">): void;

225

activeTrace(data: Omit<ScoreBody, "traceId" | "sessionId" | "observationId" | "datasetRunId">): void;

226

flush(): Promise<void>;

227

shutdown(): Promise<void>;

228

}

229

```

230

231

[Score Management](./scores.md)

232

233

### Media Reference Resolution

234

235

Resolve media reference strings in objects by fetching media content and converting to base64 data URIs.

236

237

```typescript { .api }

238

class MediaManager {

239

resolveReferences<T>(params: LangfuseMediaResolveMediaReferencesParams<T>): Promise<T>;

240

static parseReferenceString(referenceString: string): ParsedMediaReference;

241

}

242

243

type LangfuseMediaResolveMediaReferencesParams<T> = {

244

obj: T;

245

resolveWith: "base64DataUri";

246

maxDepth?: number;

247

};

248

```

249

250

[Media Management](./media.md)

251

252

### Experiment Execution

253

254

Run comprehensive experiments on datasets or custom data with automatic tracing, evaluation, and result formatting.

255

256

```typescript { .api }

257

class ExperimentManager {

258

run<Input, ExpectedOutput, Metadata extends Record<string, any>>(

259

config: ExperimentParams<Input, ExpectedOutput, Metadata>

260

): Promise<ExperimentResult<Input, ExpectedOutput, Metadata>>;

261

}

262

263

type ExperimentParams<Input, ExpectedOutput, Metadata> = {

264

name: string;

265

runName?: string;

266

description?: string;

267

metadata?: Record<string, any>;

268

data: ExperimentItem<Input, ExpectedOutput, Metadata>[];

269

task: ExperimentTask<Input, ExpectedOutput, Metadata>;

270

evaluators?: Evaluator<Input, ExpectedOutput, Metadata>[];

271

runEvaluators?: RunEvaluator<Input, ExpectedOutput, Metadata>[];

272

maxConcurrency?: number;

273

};

274

275

type ExperimentResult<Input, ExpectedOutput, Metadata> = {

276

runName: string;

277

datasetRunId?: string;

278

datasetRunUrl?: string;

279

itemResults: ExperimentItemResult<Input, ExpectedOutput, Metadata>[];

280

runEvaluations: Evaluation[];

281

format: (options?: { includeItemResults?: boolean }) => Promise<string>;

282

};

283

```

284

285

[Experiment Execution](./experiments.md)

286

287

### AutoEvals Integration

288

289

Convert AutoEvals library evaluators to Langfuse-compatible evaluators for seamless integration.

290

291

```typescript { .api }

292

function createEvaluatorFromAutoevals<E extends CallableFunction>(

293

autoevalEvaluator: E,

294

params?: Params<E>

295

): Evaluator;

296

```

297

298

[AutoEvals Integration](./autoevals-adapter.md)

299

300

## Core Types

301

302

### Common Types from @langfuse/core

303

304

```typescript { .api }

305

// Dataset types

306

interface Dataset {

307

id: string;

308

name: string;

309

description?: string;

310

metadata?: any;

311

projectId: string;

312

createdAt: string;

313

updatedAt: string;

314

}

315

316

interface DatasetItem {

317

id: string;

318

datasetId: string;

319

input: any;

320

expectedOutput?: any;

321

metadata?: any;

322

sourceTraceId?: string;

323

sourceObservationId?: string;

324

status: string;

325

createdAt: string;

326

updatedAt: string;

327

}

328

329

interface DatasetRunItem {

330

id: string;

331

datasetRunId: string;

332

datasetRunName: string;

333

datasetItemId: string;

334

traceId: string;

335

observationId?: string;

336

createdAt: string;

337

updatedAt: string;

338

}

339

340

// Score type

341

interface ScoreBody {

342

id?: string;

343

name: string;

344

value: number | string;

345

traceId?: string;

346

observationId?: string;

347

sessionId?: string;

348

datasetRunId?: string;

349

comment?: string;

350

metadata?: any;

351

dataType?: 'NUMERIC' | 'CATEGORICAL' | 'BOOLEAN';

352

environment?: string;

353

}

354

355

// Chat message types

356

interface ChatMessage {

357

role: string;

358

content: string;

359

}

360

361

// Prompt types

362

type Prompt = Prompt.Text | Prompt.Chat;

363

364

namespace Prompt {

365

interface Text {

366

name: string;

367

version: number;

368

type: 'text';

369

prompt: string;

370

config: unknown;

371

labels: string[];

372

tags: string[];

373

commitMessage?: string | null;

374

}

375

376

interface Chat {

377

name: string;

378

version: number;

379

type: 'chat';

380

prompt: ChatMessageWithPlaceholders[];

381

config: unknown;

382

labels: string[];

383

tags: string[];

384

commitMessage?: string | null;

385

}

386

}

387

388

// Chat message types

389

enum ChatMessageType {

390

ChatMessage = "chatmessage",

391

Placeholder = "placeholder"

392

}

393

394

interface ChatMessageWithPlaceholders {

395

type: "chatmessage" | "placeholder";

396

role?: string;

397

content?: string;

398

name?: string;

399

}

400

401

type ChatMessageOrPlaceholder =

402

| ChatMessage

403

| { type: "placeholder"; name: string };

404

405

interface LangchainMessagesPlaceholder {

406

variableName: string;

407

optional?: boolean;

408

}

409

410

// Experiment types

411

type ExperimentItem<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> =

412

| {

413

input?: Input;

414

expectedOutput?: ExpectedOutput;

415

metadata?: Metadata;

416

}

417

| DatasetItem;

418

419

type ExperimentTaskParams<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> =

420

ExperimentItem<Input, ExpectedOutput, Metadata>;

421

422

type ExperimentTask<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> = (

423

params: ExperimentTaskParams<Input, ExpectedOutput, Metadata>

424

) => Promise<any>;

425

426

type Evaluation = Pick<ScoreBody, "name" | "value" | "comment" | "metadata" | "dataType">;

427

428

interface EvaluatorParams<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> {

429

input: Input;

430

output: any;

431

expectedOutput?: ExpectedOutput;

432

metadata?: Metadata;

433

}

434

435

type Evaluator<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> = (

436

params: EvaluatorParams<Input, ExpectedOutput, Metadata>

437

) => Promise<Evaluation[] | Evaluation>;

438

439

interface RunEvaluatorParams<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> {

440

itemResults: ExperimentItemResult<Input, ExpectedOutput, Metadata>[];

441

}

442

443

type RunEvaluator<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> = (

444

params: RunEvaluatorParams<Input, ExpectedOutput, Metadata>

445

) => Promise<Evaluation[] | Evaluation>;

446

447

type ExperimentItemResult<Input = any, ExpectedOutput = any, Metadata extends Record<string, any> = Record<string, any>> = {

448

item: ExperimentItem<Input, ExpectedOutput, Metadata>;

449

output: any;

450

evaluations: Evaluation[];

451

traceId?: string;

452

datasetRunId?: string;

453

};

454

455

// Media types

456

interface ParsedMediaReference {

457

mediaId: string;

458

source: string;

459

contentType: string;

460

}

461

```

462

463

## Environment Variables

464

465

The client supports configuration via environment variables:

466

467

- `LANGFUSE_PUBLIC_KEY`: Public API key

468

- `LANGFUSE_SECRET_KEY`: Secret API key

469

- `LANGFUSE_BASE_URL` (or `LANGFUSE_BASEURL`): Langfuse instance URL

470

- `LANGFUSE_TIMEOUT`: Request timeout in seconds

471

- `LANGFUSE_FLUSH_AT`: Number of scores to batch before flushing (default: 10)

472

- `LANGFUSE_FLUSH_INTERVAL`: Flush interval in seconds (default: 1)

473

- `LANGFUSE_TRACING_ENVIRONMENT`: Default environment tag for traces

474

475

## OpenTelemetry Integration

476

477

This package integrates with OpenTelemetry for distributed tracing. Score methods accept OpenTelemetry `Span` objects to automatically link scores to traces and observations. The experiment framework uses OpenTelemetry for automatic tracing of task executions.

478

479

```typescript

480

import { Span } from '@opentelemetry/api';

481

482

// Link dataset item to a span

483

await datasetItem.link({ otelSpan: span }, 'experiment-run-1');

484

485

// Score an observation using its span

486

langfuse.score.observation({ otelSpan: span }, {

487

name: 'accuracy',

488

value: 0.95

489

});

490

```

491

492

## Error Handling

493

494

Methods that fetch data (like `prompt.get()`, `dataset.get()`) support fallback mechanisms:

495

496

```typescript

497

// Prompt with fallback

498

const prompt = await langfuse.prompt.get('my-prompt', {

499

type: 'text',

500

fallback: 'Default prompt text: {{variable}}'

501

});

502

503

// If fetch fails, fallback content is used

504

// prompt.isFallback will be true

505

```

506

507

Experiment evaluators handle failures gracefully - failed evaluators are logged but don't stop the experiment:

508

509

```typescript

510

const result = await langfuse.experiment.run({

511

name: 'Test',

512

data: items,

513

task: myTask,

514

evaluators: [

515

goodEvaluator, // Works fine

516

brokenEvaluator, // Fails but logged

517

anotherEvaluator // Still runs

518

]

519

});

520

// result.itemResults contains evaluations from successful evaluators

521

```

522

523

## Lifecycle Management

524

525

Always flush pending data before application exit:

526

527

```typescript

528

// Option 1: Manual flush

529

await langfuse.flush();

530

531

// Option 2: Graceful shutdown (flushes all managers)

532

await langfuse.shutdown();

533

```

534

535

Scores are batched automatically but can be flushed manually:

536

537

```typescript

538

langfuse.score.create({ name: 'quality', value: 0.8, traceId: 'abc' });

539

langfuse.score.create({ name: 'latency', value: 120, traceId: 'abc' });

540

541

// Force immediate send

542

await langfuse.score.flush();

543

```

544

545

## Deprecated APIs

546

547

The package maintains v3 compatibility with deprecated methods:

548

549

- `getPrompt()` → Use `prompt.get()`

550

- `createPrompt()` → Use `prompt.create()`

551

- `updatePrompt()` → Use `prompt.update()`

552

- `getDataset()` → Use `dataset.get()`

553

- `fetchTrace()` → Use `api.trace.get()`

554

- `fetchTraces()` → Use `api.trace.list()`

555

- `fetchObservation()` → Use `api.observations.get()`

556

- `fetchObservations()` → Use `api.observations.getMany()`

557

- `fetchSessions()` → Use `api.sessions.get()`

558

- `getDatasetRun()` → Use `api.datasets.getRun()`

559

- `getDatasetRuns()` → Use `api.datasets.getRuns()`

560

- `createDataset()` → Use `api.datasets.create()`

561

- `getDatasetItem()` → Use `api.datasetItems.get()`

562

- `createDatasetItem()` → Use `api.datasetItems.create()`

563

- `fetchMedia()` → Use `api.media.get()`

564

- `resolveMediaReferences()` → Use `media.resolveReferences()`

565

566

All deprecated methods are maintained for backward compatibility but the new manager-based API is recommended for new code.

567