0
# Algebra and ONNX Operators
1
2
ONNX operator creation system and mixin classes for enhancing scikit-learn models with ONNX capabilities. The algebra module enables direct ONNX operator composition, sklearn integration, and creation of custom ONNX-based transformations that can be seamlessly integrated into scikit-learn pipelines.
3
4
## Capabilities
5
6
### ONNX Operator Creation
7
8
Core class for creating and manipulating ONNX operators programmatically.
9
10
```python { .api }
11
class OnnxOperator:
12
"""
13
Main class for creating ONNX operators programmatically.
14
15
Enables direct construction of ONNX computational graphs using
16
a Python-based API that mirrors ONNX operator specifications.
17
"""
18
19
def __init__(self, op_type, *inputs, **kwargs):
20
"""
21
Create an ONNX operator instance.
22
23
Parameters:
24
- op_type: str, ONNX operator type (e.g., 'MatMul', 'Add', 'Relu')
25
- inputs: Variable, input variables for the operator
26
- kwargs: Additional operator attributes and parameters
27
"""
28
29
def to_onnx(self, inputs=None, outputs=None, target_opset=None):
30
"""
31
Generate ONNX model from operator graph.
32
33
Parameters:
34
- inputs: list, input specifications for the model
35
- outputs: list, output specifications for the model
36
- target_opset: int, target ONNX opset version
37
38
Returns:
39
- ModelProto: Complete ONNX model
40
"""
41
42
def add_to(self, scope, container):
43
"""
44
Add operator to conversion container.
45
46
Parameters:
47
- scope: Scope, conversion scope context
48
- container: Container, conversion container for operators
49
"""
50
```
51
52
### ONNX Operator Mixin
53
54
Mixin class that adds ONNX operator capabilities to scikit-learn models.
55
56
```python { .api }
57
class OnnxOperatorMixin:
58
"""
59
Mixin class for adding ONNX operator capabilities to sklearn models.
60
61
When combined with sklearn estimators, enables direct use of ONNX
62
operators within sklearn pipelines and provides seamless conversion
63
to ONNX format.
64
65
Import from: from skl2onnx.algebra.onnx_operator_mixin import OnnxOperatorMixin
66
"""
67
68
def to_onnx(self, X=None, name=None, options=None, white_op=None,
69
black_op=None, final_types=None, target_opset=None, verbose=0):
70
"""
71
Convert enhanced model to ONNX format.
72
73
Parameters:
74
- X: array-like, sample input for type inference (optional)
75
- name: str, name for the ONNX model (optional)
76
- options: dict, conversion options (optional)
77
- white_op: list, whitelist of allowed operators (optional)
78
- black_op: list, blacklist of forbidden operators (optional)
79
- final_types: list, expected output types for validation (optional)
80
- target_opset: int, target ONNX opset version (optional)
81
- verbose: int, verbosity level (default 0)
82
83
Returns:
84
- ModelProto: ONNX model representation
85
"""
86
87
def onnx_graph(self, **kwargs):
88
"""
89
Generate ONNX graph representation of the model.
90
91
Parameters:
92
- kwargs: Additional parameters for graph generation
93
94
Returns:
95
- GraphProto: ONNX graph representation
96
"""
97
```
98
99
### Custom ONNX Transformers
100
101
Pre-built ONNX-based transformers that can be used directly in sklearn pipelines.
102
103
```python { .api }
104
class CastTransformer:
105
"""
106
Transformer for type casting operations using ONNX Cast operator.
107
108
Converts input data types to specified output types, useful for
109
ensuring type compatibility in mixed-precision pipelines.
110
"""
111
112
def __init__(self, dtype=None):
113
"""
114
Initialize cast transformer.
115
116
Parameters:
117
- dtype: numpy.dtype, target data type for casting
118
"""
119
120
def fit(self, X, y=None):
121
"""Fit the transformer (no-op for casting)."""
122
return self
123
124
def transform(self, X):
125
"""Apply type casting to input data."""
126
pass
127
128
class ReplaceTransformer:
129
"""
130
Transformer for value replacement using ONNX operators.
131
132
Replaces specified values in input data with new values,
133
useful for handling missing values or categorical mappings.
134
"""
135
136
def __init__(self, replace_dict=None):
137
"""
138
Initialize replace transformer.
139
140
Parameters:
141
- replace_dict: dict, mapping of old values to new values
142
"""
143
144
def fit(self, X, y=None):
145
"""Fit the transformer and learn replacement mappings."""
146
return self
147
148
def transform(self, X):
149
"""Apply value replacements to input data."""
150
pass
151
152
class WOETransformer:
153
"""
154
Weight of Evidence transformer using ONNX operators.
155
156
Computes Weight of Evidence encoding for categorical variables,
157
commonly used in credit scoring and risk modeling applications.
158
"""
159
160
def __init__(self, positive_class=1):
161
"""
162
Initialize WOE transformer.
163
164
Parameters:
165
- positive_class: Value representing positive class for WOE calculation
166
"""
167
168
def fit(self, X, y):
169
"""Fit WOE transformer and compute evidence weights."""
170
return self
171
172
def transform(self, X):
173
"""Apply WOE transformation to categorical features."""
174
pass
175
```
176
177
### Custom ONNX Regressors
178
179
ONNX-based regression models with type casting capabilities.
180
181
```python { .api }
182
class CastRegressor:
183
"""
184
Regressor with built-in type casting capabilities.
185
186
Wraps any sklearn regressor and adds automatic type casting
187
for inputs and outputs, ensuring ONNX compatibility.
188
"""
189
190
def __init__(self, regressor, dtype=None):
191
"""
192
Initialize cast regressor.
193
194
Parameters:
195
- regressor: sklearn regressor instance to wrap
196
- dtype: numpy.dtype, target data type for casting
197
"""
198
199
def fit(self, X, y):
200
"""Fit the underlying regressor with type casting."""
201
return self
202
203
def predict(self, X):
204
"""Predict with automatic input/output type casting."""
205
pass
206
```
207
208
### Enhanced Text Processing
209
210
ONNX-compatible text processing transformers with conversion tracing.
211
212
```python { .api }
213
class TraceableCountVectorizer:
214
"""
215
Enhanced CountVectorizer with ONNX conversion tracing capabilities.
216
217
Extends sklearn's CountVectorizer with detailed logging and tracing
218
of the conversion process for debugging and optimization.
219
"""
220
221
def __init__(self, **kwargs):
222
"""
223
Initialize traceable count vectorizer.
224
225
Parameters:
226
- kwargs: Parameters passed to underlying CountVectorizer
227
"""
228
229
def fit(self, X, y=None):
230
"""Fit vectorizer with conversion tracing."""
231
return self
232
233
def transform(self, X):
234
"""Transform text with tracing support."""
235
pass
236
237
def get_conversion_trace(self):
238
"""Get detailed conversion trace information."""
239
pass
240
241
class TraceableTfidfVectorizer:
242
"""
243
Enhanced TfidfVectorizer with ONNX conversion tracing capabilities.
244
245
Extends sklearn's TfidfVectorizer with detailed logging and tracing
246
of the conversion process for debugging and optimization.
247
"""
248
249
def __init__(self, **kwargs):
250
"""
251
Initialize traceable TF-IDF vectorizer.
252
253
Parameters:
254
- kwargs: Parameters passed to underlying TfidfVectorizer
255
"""
256
257
def fit(self, X, y=None):
258
"""Fit vectorizer with conversion tracing."""
259
return self
260
261
def transform(self, X):
262
"""Transform text with tracing support."""
263
pass
264
265
def get_conversion_trace(self):
266
"""Get detailed conversion trace information."""
267
pass
268
```
269
270
## Usage Examples
271
272
### Creating Custom ONNX Operators
273
274
```python
275
from skl2onnx.algebra import OnnxOperator
276
from skl2onnx.common.data_types import FloatTensorType
277
import numpy as np
278
279
# Create input variables
280
X = np.random.randn(10, 5).astype(np.float32)
281
input_type = FloatTensorType([None, 5])
282
283
# Create simple linear transformation: Y = X @ W + b
284
W = np.random.randn(5, 3).astype(np.float32)
285
b = np.random.randn(3).astype(np.float32)
286
287
# Define ONNX operators
288
matmul_op = OnnxOperator('MatMul', 'X', W, name='linear_transform')
289
add_op = OnnxOperator('Add', matmul_op, b, name='add_bias')
290
291
# Generate ONNX model
292
onnx_model = add_op.to_onnx(
293
inputs=[('X', input_type)],
294
outputs=[('Y', FloatTensorType([None, 3]))],
295
target_opset=18
296
)
297
```
298
299
### Using ONNX Operator Mixin
300
301
```python
302
from skl2onnx import wrap_as_onnx_mixin
303
from sklearn.linear_model import LinearRegression
304
from sklearn.datasets import make_regression
305
306
# Create and train model
307
X, y = make_regression(n_samples=100, n_features=10, random_state=42)
308
model = LinearRegression()
309
model.fit(X, y)
310
311
# Enhance with ONNX capabilities
312
enhanced_model = wrap_as_onnx_mixin(model, target_opset=18)
313
314
# Now the model has ONNX methods
315
onnx_model = enhanced_model.to_onnx(X, name="enhanced_linear_regression")
316
317
# Can also generate graph representation
318
onnx_graph = enhanced_model.onnx_graph()
319
```
320
321
### Custom ONNX Transformers in Pipelines
322
323
```python
324
from skl2onnx.sklapi import CastTransformer, ReplaceTransformer
325
from sklearn.pipeline import Pipeline
326
from sklearn.preprocessing import StandardScaler
327
from sklearn.ensemble import RandomForestRegressor
328
import numpy as np
329
330
# Create pipeline with ONNX transformers
331
pipeline = Pipeline([
332
('cast_input', CastTransformer(dtype=np.float32)),
333
('replace_missing', ReplaceTransformer({-999: 0.0})),
334
('scaler', StandardScaler()),
335
('regressor', RandomForestRegressor(n_estimators=10))
336
])
337
338
# Fit pipeline
339
X_train = np.random.randn(100, 5)
340
X_train[X_train < -2] = -999 # Add missing value indicators
341
y_train = np.random.randn(100)
342
343
pipeline.fit(X_train, y_train)
344
345
# Convert entire pipeline to ONNX
346
from skl2onnx import to_onnx
347
onnx_pipeline = to_onnx(pipeline, X_train.astype(np.float32))
348
```
349
350
### Weight of Evidence Encoding
351
352
```python
353
from skl2onnx.sklapi import WOETransformer
354
import pandas as pd
355
356
# Create categorical data
357
data = pd.DataFrame({
358
'category': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B'],
359
'target': [1, 0, 1, 1, 0, 0, 1, 1]
360
})
361
362
# Apply WOE transformation
363
woe_transformer = WOETransformer(positive_class=1)
364
woe_transformer.fit(data[['category']], data['target'])
365
woe_encoded = woe_transformer.transform(data[['category']])
366
367
print("WOE encoded features:", woe_encoded)
368
```
369
370
### Enhanced Text Processing with Tracing
371
372
```python
373
from skl2onnx.sklapi import TraceableCountVectorizer
374
from sklearn.pipeline import Pipeline
375
from sklearn.linear_model import LogisticRegression
376
377
# Create text processing pipeline with tracing
378
text_pipeline = Pipeline([
379
('vectorizer', TraceableCountVectorizer(max_features=1000, stop_words='english')),
380
('classifier', LogisticRegression())
381
])
382
383
# Sample text data
384
texts = [
385
"This is a positive example",
386
"This is a negative example",
387
"Another positive text sample",
388
"Another negative text sample"
389
]
390
labels = [1, 0, 1, 0]
391
392
# Fit pipeline
393
text_pipeline.fit(texts, labels)
394
395
# Get conversion trace
396
vectorizer = text_pipeline.named_steps['vectorizer']
397
trace_info = vectorizer.get_conversion_trace()
398
print("Conversion trace information:", trace_info)
399
400
# Convert to ONNX
401
from skl2onnx import to_onnx
402
onnx_text_model = to_onnx(text_pipeline, texts)
403
```
404
405
### Complex ONNX Operator Composition
406
407
```python
408
from skl2onnx.algebra import OnnxOperator
409
import numpy as np
410
411
# Create complex mathematical operation: sigmoid(X @ W + b)
412
X_shape = [None, 10]
413
W_shape = [10, 5]
414
415
# Define computation graph
416
matmul = OnnxOperator('MatMul', 'X', 'W')
417
add_bias = OnnxOperator('Add', matmul, 'b')
418
sigmoid = OnnxOperator('Sigmoid', add_bias, output_names=['Y'])
419
420
# Create complete model with initializers
421
W_init = np.random.randn(*W_shape).astype(np.float32)
422
b_init = np.random.randn(5).astype(np.float32)
423
424
# Generate ONNX model with initializers
425
onnx_model = sigmoid.to_onnx(
426
inputs=[('X', FloatTensorType(X_shape))],
427
outputs=[('Y', FloatTensorType([None, 5]))],
428
target_opset=18
429
)
430
431
# Add initializers manually if needed
432
from onnx import helper, TensorProto
433
W_tensor = helper.make_tensor('W', TensorProto.FLOAT, W_shape, W_init.flatten())
434
b_tensor = helper.make_tensor('b', TensorProto.FLOAT, [5], b_init)
435
onnx_model.graph.initializer.extend([W_tensor, b_tensor])
436
```
437
438
### Custom Regressor with Type Casting
439
440
```python
441
from skl2onnx.sklapi import CastRegressor
442
from sklearn.ensemble import RandomForestRegressor
443
import numpy as np
444
445
# Create base regressor
446
base_regressor = RandomForestRegressor(n_estimators=20, random_state=42)
447
448
# Wrap with type casting capabilities
449
cast_regressor = CastRegressor(base_regressor, dtype=np.float32)
450
451
# Train with automatic casting
452
X_train = np.random.randn(100, 8).astype(np.float64) # Double precision input
453
y_train = np.random.randn(100).astype(np.float64)
454
455
cast_regressor.fit(X_train, y_train)
456
457
# Predictions automatically cast to specified type
458
X_test = np.random.randn(20, 8).astype(np.float64)
459
predictions = cast_regressor.predict(X_test)
460
print(f"Prediction dtype: {predictions.dtype}") # Will be float32
461
462
# Convert to ONNX
463
onnx_cast_model = to_onnx(cast_regressor, X_test.astype(np.float32))
464
```
465
466
## Advanced ONNX Operator Patterns
467
468
### Conditional Operations
469
470
```python
471
# Create conditional logic: output = X if condition else Y
472
condition_op = OnnxOperator('Greater', 'X', 0.5)
473
where_op = OnnxOperator('Where', condition_op, 'X', 'Y', output_names=['result'])
474
475
# Generate model
476
conditional_model = where_op.to_onnx(
477
inputs=[('X', FloatTensorType([None, 1])), ('Y', FloatTensorType([None, 1]))],
478
outputs=[('result', FloatTensorType([None, 1]))],
479
target_opset=18
480
)
481
```
482
483
### Reduction Operations
484
485
```python
486
# Create reduction operations: mean along axis
487
reduce_mean_op = OnnxOperator('ReduceMean', 'X', axes=[1], keepdims=1,
488
output_names=['mean_result'])
489
490
reduction_model = reduce_mean_op.to_onnx(
491
inputs=[('X', FloatTensorType([None, 10]))],
492
outputs=[('mean_result', FloatTensorType([None, 1]))],
493
target_opset=18
494
)
495
```
496
497
## Integration Guidelines
498
499
### Mixin Usage Patterns
500
- **Enhance existing models** with `wrap_as_onnx_mixin` for ONNX capabilities
501
- **Combine with pipelines** for end-to-end ONNX conversion
502
- **Use in ensemble methods** for heterogeneous model combinations
503
504
### Custom Transformer Best Practices
505
- **Implement sklearn interface** (fit/transform methods)
506
- **Support ONNX conversion** through proper operator usage
507
- **Handle edge cases** like empty inputs or missing values
508
- **Provide clear documentation** for custom parameters
509
510
### Performance Optimization
511
- **Use appropriate data types** for target deployment environment
512
- **Minimize operator count** in custom graphs
513
- **Consider memory layout** for optimal inference performance
514
- **Profile custom operators** against sklearn equivalents