0
# MLflow
1
2
MLflow is an open-source developer platform designed to build AI/LLM applications and models with confidence. It provides a comprehensive solution for managing the complete machine learning lifecycle, including experiment tracking, model management, deployment, and observability. The platform offers specialized features for both LLM/GenAI developers (tracing/observability, LLM evaluation, prompt management, version tracking) and traditional data scientists (experiment tracking, model registry, deployment tools).
3
4
## Package Information
5
6
- **Package Name**: mlflow
7
- **Language**: Python
8
- **Installation**: `pip install mlflow`
9
- **Documentation**: https://mlflow.org/docs/latest/index.html
10
11
## Core Imports
12
13
```python
14
import mlflow
15
```
16
17
For client API access:
18
19
```python
20
from mlflow import MlflowClient
21
client = MlflowClient()
22
```
23
24
For specific modules:
25
26
```python
27
import mlflow.tracking
28
import mlflow.models
29
import mlflow.data
30
import mlflow.tracing
31
```
32
33
## Basic Usage
34
35
```python
36
import mlflow
37
import mlflow.sklearn
38
from sklearn.ensemble import RandomForestRegressor
39
from sklearn.model_selection import train_test_split
40
import numpy as np
41
42
# Set tracking URI and experiment
43
mlflow.set_tracking_uri("http://localhost:5000")
44
mlflow.set_experiment("my-experiment")
45
46
# Generate sample data
47
X = np.random.rand(100, 4)
48
y = np.random.rand(100)
49
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
50
51
# Start MLflow run with context manager
52
with mlflow.start_run():
53
# Enable autologging for sklearn
54
mlflow.sklearn.autolog()
55
56
# Train model
57
model = RandomForestRegressor(n_estimators=50, random_state=42)
58
model.fit(X_train, y_train)
59
60
# Log custom parameters and metrics
61
mlflow.log_param("custom_param", "value")
62
mlflow.log_metric("custom_metric", 0.85)
63
64
# Log model manually (optional with autolog)
65
mlflow.sklearn.log_model(model, "model")
66
67
# Get run info
68
run = mlflow.active_run()
69
print(f"Run ID: {run.info.run_id}")
70
```
71
72
## Architecture
73
74
MLflow's modular architecture supports the complete ML lifecycle:
75
76
- **Tracking**: Experiment and run management with parameter, metric, and artifact logging
77
- **Models**: Universal model format with deployment capabilities across platforms
78
- **Model Registry**: Centralized model store with versioning and stage transitions
79
- **Projects**: Reproducible ML code packaging with dependency management
80
- **Tracing**: Distributed tracing for LLM/GenAI applications with observability
81
- **Data**: Dataset tracking and lineage with multiple format support
82
- **Evaluation**: Model performance assessment with built-in metrics and custom evaluators
83
84
The platform integrates natively with 25+ ML frameworks and 15+ LLM/GenAI libraries, providing automatic logging capabilities and standardized model formats for seamless deployment across infrastructures.
85
86
## Capabilities
87
88
### Tracking and Experiment Management
89
90
Core functionality for tracking experiments, runs, parameters, metrics, and artifacts. Provides both fluent API for interactive use and client API for programmatic access.
91
92
```python { .api }
93
def start_run(run_id=None, experiment_id=None, run_name=None, nested=False, tags=None, description=None): ...
94
def end_run(status=None): ...
95
def log_param(key, value): ...
96
def log_metric(key, value, step=None, timestamp=None, synchronous=None): ...
97
def log_artifact(local_path, artifact_path=None, synchronous=None): ...
98
def log_outputs(outputs, artifact_path=None): ...
99
def log_assessment(assessment, request_id=None, run_id=None, timestamp_ms=None): ...
100
def log_feedback(feedback, request_id=None, run_id=None): ...
101
def create_experiment(name, artifact_location=None, tags=None): ...
102
def set_experiment(experiment_name=None, experiment_id=None): ...
103
```
104
105
[Tracking and Experiments](./tracking.md)
106
107
### Client API
108
109
Lower-level programmatic interface providing direct access to MLflow's REST API with comprehensive methods for managing experiments, runs, models, and artifacts.
110
111
```python { .api }
112
class MlflowClient:
113
def __init__(self, tracking_uri=None, registry_uri=None): ...
114
def create_experiment(self, name, artifact_location=None, tags=None): ...
115
def get_run(self, run_id): ...
116
def search_runs(self, experiment_ids, filter_string="", run_view_type=ViewType.ACTIVE_ONLY, max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None, page_token=None): ...
117
def log_batch(self, run_id, metrics=None, params=None, tags=None): ...
118
```
119
120
[Client API](./client.md)
121
122
### Model Management
123
124
Comprehensive model lifecycle management including logging, loading, evaluation, and deployment with support for multiple ML frameworks and custom models.
125
126
```python { .api }
127
def log_model(model, artifact_path, **kwargs): ...
128
def load_model(model_uri, dst_path=None, **kwargs): ...
129
def evaluate(model=None, data=None, targets=None, model_type=None, evaluators=None, evaluator_config=None, **kwargs): ...
130
def register_model(model_uri, name, await_registration_for=DEFAULT_AWAIT_MAX_SLEEP_SECONDS, tags=None, **kwargs): ...
131
```
132
133
[Models](./models.md)
134
135
### Data Management
136
137
Dataset tracking and lineage capabilities supporting multiple data formats including pandas, numpy, Spark, Delta, and HuggingFace datasets with comprehensive metadata management.
138
139
```python { .api }
140
def from_pandas(df, source=None, targets=None, name=None, digest=None, predictions=None): ...
141
def from_numpy(features, source=None, targets=None, name=None, digest=None, predictions=None): ...
142
def from_spark(df, source=None, targets=None, name=None, digest=None, predictions=None): ...
143
def log_input(dataset, context=None, tags=None): ...
144
def log_inputs(datasets, tags=None): ...
145
```
146
147
[Data Management](./data.md)
148
149
### Tracing and Observability
150
151
Distributed tracing system for LLM/GenAI applications providing observability, debugging, and performance monitoring with span management and assessment capabilities.
152
153
```python { .api }
154
def trace(name=None, span_type=None, inputs=None, attributes=None): ...
155
def start_span(name, span_type=None, inputs=None, parent_id=None, attributes=None): ...
156
def get_trace(request_id): ...
157
def search_traces(experiment_ids=None, filter_string="", max_results=None, order_by=None, run_id=None): ...
158
def log_assessment(assessment, request_id=None, run_id=None, timestamp_ms=None): ...
159
```
160
161
[Tracing and Observability](./tracing.md)
162
163
### Configuration and System Management
164
165
System configuration including tracking URIs, model registry settings, system metrics, and authentication management for MLflow deployments.
166
167
```python { .api }
168
def set_tracking_uri(uri): ...
169
def get_tracking_uri(): ...
170
def set_registry_uri(uri): ...
171
def enable_system_metrics_logging(): ...
172
def disable_system_metrics_logging(): ...
173
def login(): ...
174
```
175
176
[Configuration](./configuration.md)
177
178
### GenAI and LLM Integration
179
180
Specialized capabilities for LLM/GenAI workflows including prompt management, LLM evaluation, and integration with popular AI frameworks and libraries.
181
182
```python { .api }
183
def load_prompt(model_name, model_version=None, model_alias=None): ...
184
def register_prompt(prompt, name, version=None, tags=None, description=None, metadata=None): ...
185
def search_prompts(filter_string=None, max_results=None, order_by=None, page_token=None): ...
186
```
187
188
[GenAI and LLM](./genai.md)
189
190
### ML Framework Integrations
191
192
Native integrations with 25+ ML frameworks providing automatic logging, model serialization, and deployment capabilities with framework-specific optimizations.
193
194
```python { .api }
195
# Popular integrations (via lazy loading)
196
import mlflow.sklearn
197
import mlflow.pytorch
198
import mlflow.tensorflow
199
import mlflow.keras
200
import mlflow.xgboost
201
import mlflow.lightgbm
202
import mlflow.transformers
203
```
204
205
[Framework Integrations](./frameworks.md)
206
207
### MLflow Projects
208
209
Reproducible ML project execution with environment management, parameter validation, and multi-backend support for local, cloud, and containerized workflows.
210
211
```python { .api }
212
import mlflow.projects
213
214
def run(uri, entry_point="main", version=None, parameters=None, backend="local", backend_config=None, synchronous=True, **kwargs): ...
215
216
class SubmittedRun:
217
run_id: str
218
def wait(self) -> bool: ...
219
def get_status(self) -> str: ...
220
def cancel(self): ...
221
```
222
223
[MLflow Projects](./projects.md)
224
225
## Types
226
227
```python { .api }
228
from mlflow.entities import Experiment, Run, RunInfo, RunData, Metric, Param, RegisteredModel, ModelVersion
229
from mlflow.tracking.fluent import ActiveRun
230
from mlflow.client import MlflowClient
231
from mlflow.exceptions import MlflowException
232
233
class Experiment:
234
experiment_id: str
235
name: str
236
artifact_location: str
237
lifecycle_stage: str
238
tags: Dict[str, str]
239
240
class Run:
241
info: RunInfo
242
data: RunData
243
244
class RunInfo:
245
run_id: str
246
experiment_id: str
247
status: str
248
start_time: int
249
end_time: int
250
artifact_uri: str
251
252
class ActiveRun:
253
info: RunInfo
254
data: RunData
255
256
class MlflowException(Exception):
257
error_code: str
258
message: str
259
```