Tessl Tile for pypi/langfuse@3.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced.md core-tracing.md datasets.md experiments.md index.md integrations.md observation-types.md prompts.md scoring.md

index.mddocs/

0
# Langfuse
1

2
A comprehensive Python SDK for AI application observability and experimentation. Langfuse provides automatic tracing of LLM applications, experiment management with evaluation capabilities, dataset handling, and prompt template management - all built on OpenTelemetry standards for seamless integration with existing observability infrastructure.
3

4
## Package Information
5

6
- **Package Name**: langfuse
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install langfuse`
10
- **Version**: 3.7.0
11
- **License**: MIT
12

13
## Core Imports
14

15
```python
16
from langfuse import Langfuse, observe, get_client
17
```
18

19
For specialized functionality:
20

21
```python
22
# Experiment system
23
from langfuse import Evaluation
24

25
# Span types
26
from langfuse import (
27
    LangfuseSpan, LangfuseGeneration, LangfuseEvent,
28
    LangfuseAgent, LangfuseTool, LangfuseChain
29
)
30

31
# OpenAI integration (drop-in replacement)
32
from langfuse.openai import OpenAI, AsyncOpenAI
33

34
# LangChain integration
35
from langfuse.langchain import CallbackHandler
36
```
37

38
## Basic Usage
39

40
```python
41
from langfuse import Langfuse, observe
42

43
# Initialize client
44
langfuse = Langfuse(
45
    public_key="your-public-key",
46
    secret_key="your-secret-key",
47
    host="https://cloud.langfuse.com"  # or your self-hosted URL
48
)
49

50
# Simple tracing with decorator
51
@observe(as_type="generation")
52
def generate_response(prompt: str) -> str:
53
    # Your LLM call here
54
    response = openai.chat.completions.create(
55
        model="gpt-4",
56
        messages=[{"role": "user", "content": prompt}]
57
    )
58
    return response.choices[0].message.content
59

60
# Manual span creation
61
with langfuse.start_as_current_span(name="process-query") as span:
62
    result = process_data()
63
    span.update(output=result)
64
    span.score(name="accuracy", value=0.95)
65

66
# Experiments with evaluators
67
def accuracy_evaluator(*, input, output, expected_output=None, **kwargs):
68
    from langfuse import Evaluation
69
    is_correct = output.strip().lower() == expected_output.strip().lower()
70
    return Evaluation(
71
        name="accuracy",
72
        value=1.0 if is_correct else 0.0,
73
        comment="Exact match" if is_correct else "No match"
74
    )
75

76
result = langfuse.run_experiment(
77
    name="Capital Cities Test",
78
    data=[{"input": "Capital of France?", "expected_output": "Paris"}],
79
    task=generate_response,
80
    evaluators=[accuracy_evaluator]
81
)
82
```
83

84
## Architecture
85

86
Langfuse is built around four core concepts that work together to provide comprehensive observability:
87

88
### Tracing Foundation
89
Built on **OpenTelemetry**, providing industry-standard distributed tracing with hierarchical span relationships. Every operation creates spans that capture timing, inputs, outputs, and metadata, enabling detailed performance analysis and debugging.
90

91
### Observation Types
92
**Specialized span types** for AI/LLM applications including generations (for model calls), agents (for reasoning), tools (for external calls), chains (for workflows), and evaluators (for quality assessment). Each type captures relevant metadata and provides appropriate visualizations.
93

94
### Automatic Instrumentation
95
**Decorator-based tracing** with the `@observe` decorator automatically instruments Python functions, supporting both synchronous and asynchronous operations with proper context propagation and error handling.
96

97
### Experiment Framework
98
**Built-in experimentation** system for running evaluations on datasets with automatic tracing, supporting both local data and Langfuse-managed datasets with comprehensive result formatting and analysis.
99

100
## Capabilities
101

102
### Core Tracing and Observability
103

104
Fundamental tracing functionality for instrumenting AI applications with automatic span creation, context propagation, and detailed performance monitoring.
105

106
```python { .api }
107
class Langfuse:
108
    def start_span(self, name: str, **kwargs) -> LangfuseSpan: ...
109
    def start_as_current_span(self, *, name: str, **kwargs) -> ContextManager[LangfuseSpan]: ...
110
    def start_observation(self, *, name: str, as_type: str, **kwargs) -> Union[LangfuseSpan, LangfuseGeneration, ...]: ...
111
    def start_as_current_observation(self, *, name: str, as_type: str, **kwargs) -> ContextManager[...]: ...
112
    def create_event(self, *, name: str, **kwargs) -> LangfuseEvent: ...
113
    def flush(self) -> None: ...
114
    def shutdown(self) -> None: ...
115

116
def observe(func=None, *, name: str = None, as_type: str = None, **kwargs) -> Callable: ...
117
def get_client(*, public_key: str = None) -> Langfuse: ...
118
```
119

120
[Core Tracing](./core-tracing.md)
121

122
### Specialized Observation Types
123

124
Dedicated span types for different AI application components, each optimized for specific use cases with appropriate metadata and visualization.
125

126
```python { .api }
127
class LangfuseGeneration:
128
    # Specialized for LLM calls with model metrics
129
    def update(self, *, model: str = None, usage_details: Dict[str, int] = None,
130
               cost_details: Dict[str, float] = None, **kwargs) -> "LangfuseGeneration": ...
131

132
class LangfuseAgent:
133
    # For agent reasoning blocks
134
    pass
135

136
class LangfuseTool:
137
    # For external tool calls (APIs, databases)
138
    pass
139

140
class LangfuseChain:
141
    # For connecting application steps
142
    pass
143

144
class LangfuseRetriever:
145
    # For data retrieval operations
146
    pass
147
```
148

149
[Observation Types](./observation-types.md)
150

151
### Experiment Management
152

153
Comprehensive system for running experiments on datasets with automatic evaluation, result aggregation, and detailed reporting capabilities.
154

155
```python { .api }
156
class Evaluation:
157
    def __init__(self, *, name: str, value: Union[int, float, str, bool, None],
158
                 comment: str = None, metadata: Dict[str, Any] = None): ...
159

160
class ExperimentResult:
161
    def format(self, *, include_item_results: bool = False) -> str: ...
162

163
    # Attributes
164
    name: str
165
    item_results: List[ExperimentItemResult]
166
    run_evaluations: List[Evaluation]
167

168
def run_experiment(*, name: str, data: List[Any], task: Callable,
169
                   evaluators: List[Callable] = None, **kwargs) -> ExperimentResult: ...
170
```
171

172
[Experiments](./experiments.md)
173

174
### Dataset Management
175

176
Tools for creating, managing, and running experiments on datasets with support for both local data and Langfuse-hosted datasets.
177

178
```python { .api }
179
class DatasetClient:
180
    def run_experiment(self, *, name: str, task: Callable, **kwargs) -> ExperimentResult: ...
181

182
    # Attributes
183
    id: str
184
    name: str
185
    items: List[DatasetItemClient]
186

187
class DatasetItemClient:
188
    # Attributes
189
    input: Any
190
    expected_output: Any
191
    metadata: Any
192

193
class Langfuse:
194
    def get_dataset(self, name: str) -> DatasetClient: ...
195
    def create_dataset(self, *, name: str, **kwargs) -> DatasetClient: ...
196
    def create_dataset_item(self, *, dataset_name: str, **kwargs) -> DatasetItemClient: ...
197
```
198

199
[Dataset Management](./datasets.md)
200

201
### Prompt Management
202

203
Template management system supporting both text and chat-based prompts with variable interpolation and LangChain integration.
204

205
```python { .api }
206
class TextPromptClient:
207
    def compile(self, **kwargs) -> str: ...
208
    def get_langchain_prompt(self) -> Any: ...
209

210
    # Attributes
211
    name: str
212
    version: int
213
    prompt: str
214

215
class ChatPromptClient:
216
    def compile(self, **kwargs) -> List[Dict[str, str]]: ...
217
    def get_langchain_prompt(self) -> Any: ...
218

219
    # Attributes
220
    name: str
221
    version: int
222
    prompt: List[Dict[str, Any]]
223

224
class Langfuse:
225
    def get_prompt(self, name: str, version: int = None, **kwargs) -> Union[TextPromptClient, ChatPromptClient]: ...
226
    def create_prompt(self, *, name: str, prompt: Union[str, List[Dict]], **kwargs) -> Union[TextPromptClient, ChatPromptClient]: ...
227
```
228

229
[Prompt Management](./prompts.md)
230

231
### Scoring and Evaluation
232

233
System for adding scores and evaluations to traces and observations, supporting numeric, categorical, and boolean score types.
234

235
```python { .api }
236
class LangfuseObservationWrapper:
237
    def score(self, *, name: str, value: Union[float, str],
238
              data_type: str = None, comment: str = None) -> None: ...
239
    def score_trace(self, *, name: str, value: Union[float, str],
240
                    data_type: str = None, comment: str = None) -> None: ...
241

242
class Langfuse:
243
    def create_score(self, *, name: str, value: str, trace_id: str = None,
244
                     observation_id: str = None, **kwargs) -> None: ...
245
```
246

247
[Scoring](./scoring.md)
248

249
### Integration Support
250

251
Pre-built integrations for popular AI frameworks with automatic instrumentation and minimal configuration required.
252

253
```python { .api }
254
# OpenAI Integration (drop-in replacement)
255
from langfuse.openai import OpenAI, AsyncOpenAI, AzureOpenAI
256

257
# LangChain Integration
258
from langfuse.langchain import CallbackHandler
259

260
class CallbackHandler:
261
    def __init__(self, *, public_key: str = None, secret_key: str = None, **kwargs): ...
262
```
263

264
[Integrations](./integrations.md)
265

266
### Media and Advanced Features
267

268
Support for media uploads, data masking, multi-project setups, and advanced configuration options.
269

270
```python { .api }
271
class LangfuseMedia:
272
    def __init__(self, *, obj: object = None, base64_data_uri: str = None,
273
                 content_type: str = None, **kwargs): ...
274

275
class Langfuse:
276
    def get_trace_url(self, trace_id: str) -> str: ...
277
    def auth_check(self) -> bool: ...
278
    def create_trace_id(self) -> str: ...
279
    def get_current_trace_id(self) -> str: ...
280
```
281

282
[Advanced Features](./advanced.md)
283

284
## Types
285

286
```python { .api }
287
# Core Types
288
SpanLevel = Literal["DEBUG", "DEFAULT", "WARNING", "ERROR"]
289
ScoreDataType = Literal["NUMERIC", "CATEGORICAL", "BOOLEAN"]
290
ObservationTypeLiteral = Literal["span", "generation", "event", "agent", "tool", "chain", "retriever", "embedding", "evaluator", "guardrail"]
291

292
# Experiment Types
293
LocalExperimentItem = TypedDict('LocalExperimentItem', {
294
    'input': Any,
295
    'expected_output': Any,
296
    'metadata': Optional[Dict[str, Any]]
297
}, total=False)
298

299
ExperimentItem = Union[LocalExperimentItem, DatasetItemClient]
300

301
# Function Protocols
302
class TaskFunction(Protocol):
303
    def __call__(self, *, item: ExperimentItem, **kwargs) -> Union[Any, Awaitable[Any]]: ...
304

305
class EvaluatorFunction(Protocol):
306
    def __call__(self, *, input: Any, output: Any, expected_output: Any = None,
307
                 metadata: Dict[str, Any] = None, **kwargs) -> Union[Evaluation, List[Evaluation], Awaitable[Union[Evaluation, List[Evaluation]]]]: ...
308
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/