or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-langfuse

Comprehensive Python SDK for AI application observability and experimentation with OpenTelemetry-based tracing, automatic instrumentation, and dataset management.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/langfuse@3.7.x

To install, run

npx @tessl/cli install tessl/pypi-langfuse@3.7.0

0

# Langfuse

1

2

A comprehensive Python SDK for AI application observability and experimentation. Langfuse provides automatic tracing of LLM applications, experiment management with evaluation capabilities, dataset handling, and prompt template management - all built on OpenTelemetry standards for seamless integration with existing observability infrastructure.

3

4

## Package Information

5

6

- **Package Name**: langfuse

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install langfuse`

10

- **Version**: 3.7.0

11

- **License**: MIT

12

13

## Core Imports

14

15

```python

16

from langfuse import Langfuse, observe, get_client

17

```

18

19

For specialized functionality:

20

21

```python

22

# Experiment system

23

from langfuse import Evaluation

24

25

# Span types

26

from langfuse import (

27

LangfuseSpan, LangfuseGeneration, LangfuseEvent,

28

LangfuseAgent, LangfuseTool, LangfuseChain

29

)

30

31

# OpenAI integration (drop-in replacement)

32

from langfuse.openai import OpenAI, AsyncOpenAI

33

34

# LangChain integration

35

from langfuse.langchain import CallbackHandler

36

```

37

38

## Basic Usage

39

40

```python

41

from langfuse import Langfuse, observe

42

43

# Initialize client

44

langfuse = Langfuse(

45

public_key="your-public-key",

46

secret_key="your-secret-key",

47

host="https://cloud.langfuse.com" # or your self-hosted URL

48

)

49

50

# Simple tracing with decorator

51

@observe(as_type="generation")

52

def generate_response(prompt: str) -> str:

53

# Your LLM call here

54

response = openai.chat.completions.create(

55

model="gpt-4",

56

messages=[{"role": "user", "content": prompt}]

57

)

58

return response.choices[0].message.content

59

60

# Manual span creation

61

with langfuse.start_as_current_span(name="process-query") as span:

62

result = process_data()

63

span.update(output=result)

64

span.score(name="accuracy", value=0.95)

65

66

# Experiments with evaluators

67

def accuracy_evaluator(*, input, output, expected_output=None, **kwargs):

68

from langfuse import Evaluation

69

is_correct = output.strip().lower() == expected_output.strip().lower()

70

return Evaluation(

71

name="accuracy",

72

value=1.0 if is_correct else 0.0,

73

comment="Exact match" if is_correct else "No match"

74

)

75

76

result = langfuse.run_experiment(

77

name="Capital Cities Test",

78

data=[{"input": "Capital of France?", "expected_output": "Paris"}],

79

task=generate_response,

80

evaluators=[accuracy_evaluator]

81

)

82

```

83

84

## Architecture

85

86

Langfuse is built around four core concepts that work together to provide comprehensive observability:

87

88

### Tracing Foundation

89

Built on **OpenTelemetry**, providing industry-standard distributed tracing with hierarchical span relationships. Every operation creates spans that capture timing, inputs, outputs, and metadata, enabling detailed performance analysis and debugging.

90

91

### Observation Types

92

**Specialized span types** for AI/LLM applications including generations (for model calls), agents (for reasoning), tools (for external calls), chains (for workflows), and evaluators (for quality assessment). Each type captures relevant metadata and provides appropriate visualizations.

93

94

### Automatic Instrumentation

95

**Decorator-based tracing** with the `@observe` decorator automatically instruments Python functions, supporting both synchronous and asynchronous operations with proper context propagation and error handling.

96

97

### Experiment Framework

98

**Built-in experimentation** system for running evaluations on datasets with automatic tracing, supporting both local data and Langfuse-managed datasets with comprehensive result formatting and analysis.

99

100

## Capabilities

101

102

### Core Tracing and Observability

103

104

Fundamental tracing functionality for instrumenting AI applications with automatic span creation, context propagation, and detailed performance monitoring.

105

106

```python { .api }

107

class Langfuse:

108

def start_span(self, name: str, **kwargs) -> LangfuseSpan: ...

109

def start_as_current_span(self, *, name: str, **kwargs) -> ContextManager[LangfuseSpan]: ...

110

def start_observation(self, *, name: str, as_type: str, **kwargs) -> Union[LangfuseSpan, LangfuseGeneration, ...]: ...

111

def start_as_current_observation(self, *, name: str, as_type: str, **kwargs) -> ContextManager[...]: ...

112

def create_event(self, *, name: str, **kwargs) -> LangfuseEvent: ...

113

def flush(self) -> None: ...

114

def shutdown(self) -> None: ...

115

116

def observe(func=None, *, name: str = None, as_type: str = None, **kwargs) -> Callable: ...

117

def get_client(*, public_key: str = None) -> Langfuse: ...

118

```

119

120

[Core Tracing](./core-tracing.md)

121

122

### Specialized Observation Types

123

124

Dedicated span types for different AI application components, each optimized for specific use cases with appropriate metadata and visualization.

125

126

```python { .api }

127

class LangfuseGeneration:

128

# Specialized for LLM calls with model metrics

129

def update(self, *, model: str = None, usage_details: Dict[str, int] = None,

130

cost_details: Dict[str, float] = None, **kwargs) -> "LangfuseGeneration": ...

131

132

class LangfuseAgent:

133

# For agent reasoning blocks

134

pass

135

136

class LangfuseTool:

137

# For external tool calls (APIs, databases)

138

pass

139

140

class LangfuseChain:

141

# For connecting application steps

142

pass

143

144

class LangfuseRetriever:

145

# For data retrieval operations

146

pass

147

```

148

149

[Observation Types](./observation-types.md)

150

151

### Experiment Management

152

153

Comprehensive system for running experiments on datasets with automatic evaluation, result aggregation, and detailed reporting capabilities.

154

155

```python { .api }

156

class Evaluation:

157

def __init__(self, *, name: str, value: Union[int, float, str, bool, None],

158

comment: str = None, metadata: Dict[str, Any] = None): ...

159

160

class ExperimentResult:

161

def format(self, *, include_item_results: bool = False) -> str: ...

162

163

# Attributes

164

name: str

165

item_results: List[ExperimentItemResult]

166

run_evaluations: List[Evaluation]

167

168

def run_experiment(*, name: str, data: List[Any], task: Callable,

169

evaluators: List[Callable] = None, **kwargs) -> ExperimentResult: ...

170

```

171

172

[Experiments](./experiments.md)

173

174

### Dataset Management

175

176

Tools for creating, managing, and running experiments on datasets with support for both local data and Langfuse-hosted datasets.

177

178

```python { .api }

179

class DatasetClient:

180

def run_experiment(self, *, name: str, task: Callable, **kwargs) -> ExperimentResult: ...

181

182

# Attributes

183

id: str

184

name: str

185

items: List[DatasetItemClient]

186

187

class DatasetItemClient:

188

# Attributes

189

input: Any

190

expected_output: Any

191

metadata: Any

192

193

class Langfuse:

194

def get_dataset(self, name: str) -> DatasetClient: ...

195

def create_dataset(self, *, name: str, **kwargs) -> DatasetClient: ...

196

def create_dataset_item(self, *, dataset_name: str, **kwargs) -> DatasetItemClient: ...

197

```

198

199

[Dataset Management](./datasets.md)

200

201

### Prompt Management

202

203

Template management system supporting both text and chat-based prompts with variable interpolation and LangChain integration.

204

205

```python { .api }

206

class TextPromptClient:

207

def compile(self, **kwargs) -> str: ...

208

def get_langchain_prompt(self) -> Any: ...

209

210

# Attributes

211

name: str

212

version: int

213

prompt: str

214

215

class ChatPromptClient:

216

def compile(self, **kwargs) -> List[Dict[str, str]]: ...

217

def get_langchain_prompt(self) -> Any: ...

218

219

# Attributes

220

name: str

221

version: int

222

prompt: List[Dict[str, Any]]

223

224

class Langfuse:

225

def get_prompt(self, name: str, version: int = None, **kwargs) -> Union[TextPromptClient, ChatPromptClient]: ...

226

def create_prompt(self, *, name: str, prompt: Union[str, List[Dict]], **kwargs) -> Union[TextPromptClient, ChatPromptClient]: ...

227

```

228

229

[Prompt Management](./prompts.md)

230

231

### Scoring and Evaluation

232

233

System for adding scores and evaluations to traces and observations, supporting numeric, categorical, and boolean score types.

234

235

```python { .api }

236

class LangfuseObservationWrapper:

237

def score(self, *, name: str, value: Union[float, str],

238

data_type: str = None, comment: str = None) -> None: ...

239

def score_trace(self, *, name: str, value: Union[float, str],

240

data_type: str = None, comment: str = None) -> None: ...

241

242

class Langfuse:

243

def create_score(self, *, name: str, value: str, trace_id: str = None,

244

observation_id: str = None, **kwargs) -> None: ...

245

```

246

247

[Scoring](./scoring.md)

248

249

### Integration Support

250

251

Pre-built integrations for popular AI frameworks with automatic instrumentation and minimal configuration required.

252

253

```python { .api }

254

# OpenAI Integration (drop-in replacement)

255

from langfuse.openai import OpenAI, AsyncOpenAI, AzureOpenAI

256

257

# LangChain Integration

258

from langfuse.langchain import CallbackHandler

259

260

class CallbackHandler:

261

def __init__(self, *, public_key: str = None, secret_key: str = None, **kwargs): ...

262

```

263

264

[Integrations](./integrations.md)

265

266

### Media and Advanced Features

267

268

Support for media uploads, data masking, multi-project setups, and advanced configuration options.

269

270

```python { .api }

271

class LangfuseMedia:

272

def __init__(self, *, obj: object = None, base64_data_uri: str = None,

273

content_type: str = None, **kwargs): ...

274

275

class Langfuse:

276

def get_trace_url(self, trace_id: str) -> str: ...

277

def auth_check(self) -> bool: ...

278

def create_trace_id(self) -> str: ...

279

def get_current_trace_id(self) -> str: ...

280

```

281

282

[Advanced Features](./advanced.md)

283

284

## Types

285

286

```python { .api }

287

# Core Types

288

SpanLevel = Literal["DEBUG", "DEFAULT", "WARNING", "ERROR"]

289

ScoreDataType = Literal["NUMERIC", "CATEGORICAL", "BOOLEAN"]

290

ObservationTypeLiteral = Literal["span", "generation", "event", "agent", "tool", "chain", "retriever", "embedding", "evaluator", "guardrail"]

291

292

# Experiment Types

293

LocalExperimentItem = TypedDict('LocalExperimentItem', {

294

'input': Any,

295

'expected_output': Any,

296

'metadata': Optional[Dict[str, Any]]

297

}, total=False)

298

299

ExperimentItem = Union[LocalExperimentItem, DatasetItemClient]

300

301

# Function Protocols

302

class TaskFunction(Protocol):

303

def __call__(self, *, item: ExperimentItem, **kwargs) -> Union[Any, Awaitable[Any]]: ...

304

305

class EvaluatorFunction(Protocol):

306

def __call__(self, *, input: Any, output: Any, expected_output: Any = None,

307

metadata: Dict[str, Any] = None, **kwargs) -> Union[Evaluation, List[Evaluation], Awaitable[Union[Evaluation, List[Evaluation]]]]: ...

308

```