0
# Materializers
1
2
Built-in materializers for serializing and deserializing Python objects. Materializers handle the conversion of artifacts between Python objects and storage formats, enabling automatic artifact persistence and lineage tracking.
3
4
## Capabilities
5
6
### Built-In Materializer
7
8
```python { .api }
9
class BuiltInMaterializer:
10
"""
11
Materializer for built-in Python types.
12
13
Handles: int, float, str, bool, bytes, None
14
15
Automatically used for these types without explicit configuration.
16
"""
17
```
18
19
Import from:
20
21
```python
22
from zenml.materializers import BuiltInMaterializer
23
```
24
25
### Built-In Container Materializer
26
27
```python { .api }
28
class BuiltInContainerMaterializer:
29
"""
30
Materializer for container types.
31
32
Handles: list, dict, tuple, set
33
34
Uses JSON serialization for storage.
35
Automatically used for these types.
36
"""
37
```
38
39
Import from:
40
41
```python
42
from zenml.materializers import BuiltInContainerMaterializer
43
```
44
45
### Bytes Materializer
46
47
```python { .api }
48
class BytesMaterializer:
49
"""
50
Materializer for bytes objects.
51
52
Stores bytes directly without additional encoding.
53
"""
54
```
55
56
Import from:
57
58
```python
59
from zenml.materializers import BytesMaterializer
60
```
61
62
### Cloudpickle Materializer
63
64
```python { .api }
65
class CloudpickleMaterializer:
66
"""
67
Materializer using cloudpickle for serialization.
68
69
Handles most Python objects including functions, lambdas, and classes.
70
Uses cloudpickle which is more flexible than standard pickle.
71
72
Useful for complex objects that don't have specialized materializers.
73
"""
74
```
75
76
Import from:
77
78
```python
79
from zenml.materializers import CloudpickleMaterializer
80
```
81
82
### In-Memory Materializer
83
84
```python { .api }
85
class InMemoryMaterializer:
86
"""
87
Materializer that keeps artifacts in memory.
88
89
Does not persist to disk. Useful for temporary data
90
that should not be saved.
91
"""
92
```
93
94
Import from:
95
96
```python
97
from zenml.materializers import InMemoryMaterializer
98
```
99
100
### Path Materializer
101
102
```python { .api }
103
class PathMaterializer:
104
"""
105
Materializer for pathlib.Path objects.
106
107
Stores the path as a string and reconstructs Path object on load.
108
"""
109
```
110
111
Import from:
112
113
```python
114
from zenml.materializers import PathMaterializer
115
```
116
117
### Pydantic Materializer
118
119
```python { .api }
120
class PydanticMaterializer:
121
"""
122
Materializer for Pydantic models.
123
124
Serializes Pydantic models to JSON and deserializes back.
125
Preserves model validation and structure.
126
127
Automatically used for Pydantic model subclasses.
128
"""
129
```
130
131
Import from:
132
133
```python
134
from zenml.materializers import PydanticMaterializer
135
```
136
137
### Service Materializer
138
139
```python { .api }
140
class ServiceMaterializer:
141
"""
142
Materializer for ZenML services.
143
144
Handles persistence of service configurations and state.
145
Used for model deployment services and other long-running processes.
146
"""
147
```
148
149
Import from:
150
151
```python
152
from zenml.materializers import ServiceMaterializer
153
```
154
155
### Structured String Materializer
156
157
```python { .api }
158
class StructuredStringMaterializer:
159
"""
160
Materializer for structured string types.
161
162
Handles: HTMLString, MarkdownString, CSVString, JSONString
163
164
Preserves the string content and type information.
165
"""
166
```
167
168
Import from:
169
170
```python
171
from zenml.materializers import StructuredStringMaterializer
172
```
173
174
### UUID Materializer
175
176
```python { .api }
177
class UUIDMaterializer:
178
"""
179
Materializer for UUID objects.
180
181
Stores UUID as string and reconstructs UUID object on load.
182
"""
183
```
184
185
Import from:
186
187
```python
188
from zenml.materializers import UUIDMaterializer
189
```
190
191
## Integration Materializers
192
193
ZenML integrations provide additional materializers for framework-specific types:
194
195
- **NumPy**: NumPy array materializers
196
- **Pandas**: DataFrame, Series materializers
197
- **PyTorch**: Tensor, Module, DataLoader materializers
198
- **TensorFlow**: Tensor, Model materializers
199
- **Scikit-learn**: Model materializers
200
- **XGBoost**: Booster, DMatrix materializers
201
- **LightGBM**: Booster, Dataset materializers
202
- **HuggingFace**: Tokenizer, Model, Dataset materializers
203
- **Pillow**: Image materializers
204
- **PyArrow**: Table materializers
205
206
## Usage Examples
207
208
### Automatic Materialization
209
210
```python
211
from zenml import step
212
213
@step
214
def process_data(data: list) -> dict:
215
"""Built-in types use automatic materializers."""
216
return {"processed": data, "count": len(data)}
217
218
# BuiltInContainerMaterializer automatically handles list and dict
219
```
220
221
### Custom Materializer for Step Output
222
223
```python
224
from zenml import step
225
from zenml.materializers import CloudpickleMaterializer
226
227
class CustomModel:
228
def __init__(self, weights):
229
self.weights = weights
230
231
@step(output_materializers=CloudpickleMaterializer)
232
def train_custom_model(data: list) -> CustomModel:
233
"""Use cloudpickle for custom class."""
234
return CustomModel(weights=[0.1, 0.2, 0.3])
235
```
236
237
### Pydantic Model Materialization
238
239
```python
240
from zenml import step
241
from pydantic import BaseModel
242
243
class ModelMetrics(BaseModel):
244
accuracy: float
245
precision: float
246
recall: float
247
f1_score: float
248
249
@step
250
def evaluate_model(data: list) -> ModelMetrics:
251
"""Pydantic models automatically use PydanticMaterializer."""
252
return ModelMetrics(
253
accuracy=0.95,
254
precision=0.93,
255
recall=0.97,
256
f1_score=0.95
257
)
258
259
@step
260
def report_metrics(metrics: ModelMetrics):
261
"""Pydantic model automatically deserialized."""
262
print(f"Accuracy: {metrics.accuracy}")
263
print(f"F1: {metrics.f1_score}")
264
```
265
266
### Different Materializers for Multiple Outputs
267
268
```python
269
from zenml import step
270
from zenml.materializers import CloudpickleMaterializer, PydanticMaterializer
271
from typing import Tuple
272
from pydantic import BaseModel
273
274
class Config(BaseModel):
275
learning_rate: float
276
277
class CustomModel:
278
pass
279
280
@step(
281
output_materializers={
282
"model": CloudpickleMaterializer,
283
"config": PydanticMaterializer
284
}
285
)
286
def train_with_config(data: list) -> Tuple[CustomModel, Config]:
287
"""Different materializers for different outputs."""
288
model = CustomModel()
289
config = Config(learning_rate=0.001)
290
return model, config
291
```
292
293
### In-Memory Artifacts
294
295
```python
296
from zenml import step
297
from zenml.materializers import InMemoryMaterializer
298
299
@step(output_materializers=InMemoryMaterializer)
300
def generate_temp_data() -> dict:
301
"""Temporary data not persisted to storage."""
302
return {"temp": "data", "should_not_save": True}
303
```
304
305
### Structured String Types
306
307
```python
308
from zenml import step
309
from zenml.types import HTMLString, MarkdownString, CSVString
310
311
@step
312
def generate_report() -> HTMLString:
313
"""Generate HTML report."""
314
html = HTMLString("<html><body><h1>Report</h1></body></html>")
315
return html
316
317
@step
318
def generate_markdown() -> MarkdownString:
319
"""Generate Markdown documentation."""
320
md = MarkdownString("# Title\n\nThis is content.")
321
return md
322
323
@step
324
def export_csv() -> CSVString:
325
"""Export as CSV."""
326
csv = CSVString("name,value\nitem1,100\nitem2,200")
327
return csv
328
```
329
330
### Cloudpickle for Complex Objects
331
332
```python
333
from zenml import step
334
from zenml.materializers import CloudpickleMaterializer
335
336
@step(output_materializers=CloudpickleMaterializer)
337
def create_pipeline_config() -> dict:
338
"""Complex object with functions."""
339
def preprocess(x):
340
return x * 2
341
342
return {
343
"preprocessor": preprocess, # Function
344
"params": {"learning_rate": 0.001},
345
"nested": {"deep": {"value": 42}}
346
}
347
348
@step
349
def use_config(config: dict):
350
"""Use the complex config."""
351
preprocessor = config["preprocessor"]
352
result = preprocessor(5)
353
print(f"Result: {result}")
354
```
355
356
### Integration Materializers
357
358
```python
359
# NumPy arrays (requires zenml[numpy] or automatic with numpy installed)
360
from zenml import step
361
import numpy as np
362
363
@step
364
def process_array(data: list) -> np.ndarray:
365
"""NumPy array automatically materialized."""
366
return np.array(data)
367
368
# Pandas DataFrames (requires zenml[pandas])
369
import pandas as pd
370
371
@step
372
def process_dataframe(data: dict) -> pd.DataFrame:
373
"""DataFrame automatically materialized."""
374
return pd.DataFrame(data)
375
376
# PyTorch models (requires zenml[pytorch])
377
import torch
378
379
@step
380
def train_pytorch_model(data: list) -> torch.nn.Module:
381
"""PyTorch model automatically materialized."""
382
model = torch.nn.Linear(10, 1)
383
return model
384
```
385
386
### Custom Materializer Example
387
388
```python
389
from zenml.materializers import BaseMaterializer
390
from typing import Type
391
import json
392
393
class MyCustomClass:
394
def __init__(self, data: dict):
395
self.data = data
396
397
class MyCustomMaterializer(BaseMaterializer):
398
"""Custom materializer for MyCustomClass."""
399
400
ASSOCIATED_TYPES = (MyCustomClass,)
401
ASSOCIATED_ARTIFACT_TYPE = "custom_data"
402
403
def load(self, data_type: Type[MyCustomClass]) -> MyCustomClass:
404
"""Load from artifact store."""
405
with self.artifact_store.open(
406
self.uri + "/data.json", "r"
407
) as f:
408
data = json.load(f)
409
return MyCustomClass(data)
410
411
def save(self, obj: MyCustomClass):
412
"""Save to artifact store."""
413
with self.artifact_store.open(
414
self.uri + "/data.json", "w"
415
) as f:
416
json.dump(obj.data, f)
417
418
# Use custom materializer
419
from zenml import step
420
421
@step(output_materializers=MyCustomMaterializer)
422
def create_custom_object() -> MyCustomClass:
423
return MyCustomClass({"key": "value"})
424
```
425