Tessl Tile for pypi/skl2onnx@1.19.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

algebra.md conversion.md data-types.md helpers.md index.md registration.md

algebra.mddocs/

0
# Algebra and ONNX Operators
1

2
ONNX operator creation system and mixin classes for enhancing scikit-learn models with ONNX capabilities. The algebra module enables direct ONNX operator composition, sklearn integration, and creation of custom ONNX-based transformations that can be seamlessly integrated into scikit-learn pipelines.
3

4
## Capabilities
5

6
### ONNX Operator Creation
7

8
Core class for creating and manipulating ONNX operators programmatically.
9

10
```python { .api }
11
class OnnxOperator:
12
    """
13
    Main class for creating ONNX operators programmatically.
14
    
15
    Enables direct construction of ONNX computational graphs using
16
    a Python-based API that mirrors ONNX operator specifications.
17
    """
18
    
19
    def __init__(self, op_type, *inputs, **kwargs):
20
        """
21
        Create an ONNX operator instance.
22
        
23
        Parameters:
24
        - op_type: str, ONNX operator type (e.g., 'MatMul', 'Add', 'Relu')
25
        - inputs: Variable, input variables for the operator
26
        - kwargs: Additional operator attributes and parameters
27
        """
28
    
29
    def to_onnx(self, inputs=None, outputs=None, target_opset=None):
30
        """
31
        Generate ONNX model from operator graph.
32
        
33
        Parameters:
34
        - inputs: list, input specifications for the model
35
        - outputs: list, output specifications for the model  
36
        - target_opset: int, target ONNX opset version
37
        
38
        Returns:
39
        - ModelProto: Complete ONNX model
40
        """
41
    
42
    def add_to(self, scope, container):
43
        """
44
        Add operator to conversion container.
45
        
46
        Parameters:
47
        - scope: Scope, conversion scope context
48
        - container: Container, conversion container for operators
49
        """
50
```
51

52
### ONNX Operator Mixin
53

54
Mixin class that adds ONNX operator capabilities to scikit-learn models.
55

56
```python { .api }
57
class OnnxOperatorMixin:
58
    """
59
    Mixin class for adding ONNX operator capabilities to sklearn models.
60
    
61
    When combined with sklearn estimators, enables direct use of ONNX
62
    operators within sklearn pipelines and provides seamless conversion
63
    to ONNX format.
64
    
65
    Import from: from skl2onnx.algebra.onnx_operator_mixin import OnnxOperatorMixin
66
    """
67
    
68
    def to_onnx(self, X=None, name=None, options=None, white_op=None, 
69
                black_op=None, final_types=None, target_opset=None, verbose=0):
70
        """
71
        Convert enhanced model to ONNX format.
72
        
73
        Parameters:
74
        - X: array-like, sample input for type inference (optional)
75
        - name: str, name for the ONNX model (optional)
76
        - options: dict, conversion options (optional)
77
        - white_op: list, whitelist of allowed operators (optional)
78
        - black_op: list, blacklist of forbidden operators (optional)
79
        - final_types: list, expected output types for validation (optional)
80
        - target_opset: int, target ONNX opset version (optional)
81
        - verbose: int, verbosity level (default 0)
82
        
83
        Returns:
84
        - ModelProto: ONNX model representation
85
        """
86
    
87
    def onnx_graph(self, **kwargs):
88
        """
89
        Generate ONNX graph representation of the model.
90
        
91
        Parameters:
92
        - kwargs: Additional parameters for graph generation
93
        
94
        Returns:
95
        - GraphProto: ONNX graph representation
96
        """
97
```
98

99
### Custom ONNX Transformers
100

101
Pre-built ONNX-based transformers that can be used directly in sklearn pipelines.
102

103
```python { .api }
104
class CastTransformer:
105
    """
106
    Transformer for type casting operations using ONNX Cast operator.
107
    
108
    Converts input data types to specified output types, useful for
109
    ensuring type compatibility in mixed-precision pipelines.
110
    """
111
    
112
    def __init__(self, dtype=None):
113
        """
114
        Initialize cast transformer.
115
        
116
        Parameters:
117
        - dtype: numpy.dtype, target data type for casting
118
        """
119
    
120
    def fit(self, X, y=None):
121
        """Fit the transformer (no-op for casting)."""
122
        return self
123
    
124
    def transform(self, X):
125
        """Apply type casting to input data."""
126
        pass
127

128
class ReplaceTransformer:
129
    """
130
    Transformer for value replacement using ONNX operators.
131
    
132
    Replaces specified values in input data with new values,
133
    useful for handling missing values or categorical mappings.
134
    """
135
    
136
    def __init__(self, replace_dict=None):
137
        """
138
        Initialize replace transformer.
139
        
140
        Parameters:
141
        - replace_dict: dict, mapping of old values to new values
142
        """
143
    
144
    def fit(self, X, y=None):
145
        """Fit the transformer and learn replacement mappings."""
146
        return self
147
    
148
    def transform(self, X):
149
        """Apply value replacements to input data."""
150
        pass
151

152
class WOETransformer:
153
    """
154
    Weight of Evidence transformer using ONNX operators.
155
    
156
    Computes Weight of Evidence encoding for categorical variables,
157
    commonly used in credit scoring and risk modeling applications.
158
    """
159
    
160
    def __init__(self, positive_class=1):
161
        """
162
        Initialize WOE transformer.
163
        
164
        Parameters:
165
        - positive_class: Value representing positive class for WOE calculation
166
        """
167
    
168
    def fit(self, X, y):
169
        """Fit WOE transformer and compute evidence weights."""
170
        return self
171
    
172
    def transform(self, X):
173
        """Apply WOE transformation to categorical features."""
174
        pass
175
```
176

177
### Custom ONNX Regressors
178

179
ONNX-based regression models with type casting capabilities.
180

181
```python { .api }
182
class CastRegressor:
183
    """
184
    Regressor with built-in type casting capabilities.
185
    
186
    Wraps any sklearn regressor and adds automatic type casting
187
    for inputs and outputs, ensuring ONNX compatibility.
188
    """
189
    
190
    def __init__(self, regressor, dtype=None):
191
        """
192
        Initialize cast regressor.
193
        
194
        Parameters:
195
        - regressor: sklearn regressor instance to wrap
196
        - dtype: numpy.dtype, target data type for casting
197
        """
198
    
199
    def fit(self, X, y):
200
        """Fit the underlying regressor with type casting."""
201
        return self
202
    
203
    def predict(self, X):
204
        """Predict with automatic input/output type casting."""
205
        pass
206
```
207

208
### Enhanced Text Processing
209

210
ONNX-compatible text processing transformers with conversion tracing.
211

212
```python { .api }
213
class TraceableCountVectorizer:
214
    """
215
    Enhanced CountVectorizer with ONNX conversion tracing capabilities.
216
    
217
    Extends sklearn's CountVectorizer with detailed logging and tracing
218
    of the conversion process for debugging and optimization.
219
    """
220
    
221
    def __init__(self, **kwargs):
222
        """
223
        Initialize traceable count vectorizer.
224
        
225
        Parameters:
226
        - kwargs: Parameters passed to underlying CountVectorizer
227
        """
228
    
229
    def fit(self, X, y=None):
230
        """Fit vectorizer with conversion tracing."""
231
        return self
232
    
233
    def transform(self, X):
234
        """Transform text with tracing support."""
235
        pass
236
    
237
    def get_conversion_trace(self):
238
        """Get detailed conversion trace information."""
239
        pass
240

241
class TraceableTfidfVectorizer:
242
    """
243
    Enhanced TfidfVectorizer with ONNX conversion tracing capabilities.
244
    
245
    Extends sklearn's TfidfVectorizer with detailed logging and tracing
246
    of the conversion process for debugging and optimization.
247
    """
248
    
249
    def __init__(self, **kwargs):
250
        """
251
        Initialize traceable TF-IDF vectorizer.
252
        
253
        Parameters:
254
        - kwargs: Parameters passed to underlying TfidfVectorizer
255
        """
256
    
257
    def fit(self, X, y=None):
258
        """Fit vectorizer with conversion tracing."""
259
        return self
260
    
261
    def transform(self, X):
262
        """Transform text with tracing support."""
263
        pass
264
    
265
    def get_conversion_trace(self):
266
        """Get detailed conversion trace information."""
267
        pass
268
```
269

270
## Usage Examples
271

272
### Creating Custom ONNX Operators
273

274
```python
275
from skl2onnx.algebra import OnnxOperator
276
from skl2onnx.common.data_types import FloatTensorType
277
import numpy as np
278

279
# Create input variables
280
X = np.random.randn(10, 5).astype(np.float32)
281
input_type = FloatTensorType([None, 5])
282

283
# Create simple linear transformation: Y = X @ W + b
284
W = np.random.randn(5, 3).astype(np.float32)
285
b = np.random.randn(3).astype(np.float32)
286

287
# Define ONNX operators
288
matmul_op = OnnxOperator('MatMul', 'X', W, name='linear_transform')
289
add_op = OnnxOperator('Add', matmul_op, b, name='add_bias')
290

291
# Generate ONNX model
292
onnx_model = add_op.to_onnx(
293
    inputs=[('X', input_type)],
294
    outputs=[('Y', FloatTensorType([None, 3]))],
295
    target_opset=18
296
)
297
```
298

299
### Using ONNX Operator Mixin
300

301
```python
302
from skl2onnx import wrap_as_onnx_mixin
303
from sklearn.linear_model import LinearRegression
304
from sklearn.datasets import make_regression
305

306
# Create and train model
307
X, y = make_regression(n_samples=100, n_features=10, random_state=42)
308
model = LinearRegression()
309
model.fit(X, y)
310

311
# Enhance with ONNX capabilities
312
enhanced_model = wrap_as_onnx_mixin(model, target_opset=18)
313

314
# Now the model has ONNX methods
315
onnx_model = enhanced_model.to_onnx(X, name="enhanced_linear_regression")
316

317
# Can also generate graph representation
318
onnx_graph = enhanced_model.onnx_graph()
319
```
320

321
### Custom ONNX Transformers in Pipelines
322

323
```python
324
from skl2onnx.sklapi import CastTransformer, ReplaceTransformer
325
from sklearn.pipeline import Pipeline
326
from sklearn.preprocessing import StandardScaler
327
from sklearn.ensemble import RandomForestRegressor
328
import numpy as np
329

330
# Create pipeline with ONNX transformers
331
pipeline = Pipeline([
332
    ('cast_input', CastTransformer(dtype=np.float32)),
333
    ('replace_missing', ReplaceTransformer({-999: 0.0})),
334
    ('scaler', StandardScaler()),
335
    ('regressor', RandomForestRegressor(n_estimators=10))
336
])
337

338
# Fit pipeline
339
X_train = np.random.randn(100, 5)
340
X_train[X_train < -2] = -999  # Add missing value indicators
341
y_train = np.random.randn(100)
342

343
pipeline.fit(X_train, y_train)
344

345
# Convert entire pipeline to ONNX
346
from skl2onnx import to_onnx
347
onnx_pipeline = to_onnx(pipeline, X_train.astype(np.float32))
348
```
349

350
### Weight of Evidence Encoding
351

352
```python
353
from skl2onnx.sklapi import WOETransformer
354
import pandas as pd
355

356
# Create categorical data
357
data = pd.DataFrame({
358
    'category': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B'],
359
    'target': [1, 0, 1, 1, 0, 0, 1, 1]
360
})
361

362
# Apply WOE transformation
363
woe_transformer = WOETransformer(positive_class=1)
364
woe_transformer.fit(data[['category']], data['target'])
365
woe_encoded = woe_transformer.transform(data[['category']])
366

367
print("WOE encoded features:", woe_encoded)
368
```
369

370
### Enhanced Text Processing with Tracing
371

372
```python
373
from skl2onnx.sklapi import TraceableCountVectorizer
374
from sklearn.pipeline import Pipeline
375
from sklearn.linear_model import LogisticRegression
376

377
# Create text processing pipeline with tracing
378
text_pipeline = Pipeline([
379
    ('vectorizer', TraceableCountVectorizer(max_features=1000, stop_words='english')),
380
    ('classifier', LogisticRegression())
381
])
382

383
# Sample text data
384
texts = [
385
    "This is a positive example",
386
    "This is a negative example", 
387
    "Another positive text sample",
388
    "Another negative text sample"
389
]
390
labels = [1, 0, 1, 0]
391

392
# Fit pipeline
393
text_pipeline.fit(texts, labels)
394

395
# Get conversion trace
396
vectorizer = text_pipeline.named_steps['vectorizer']
397
trace_info = vectorizer.get_conversion_trace()
398
print("Conversion trace information:", trace_info)
399

400
# Convert to ONNX
401
from skl2onnx import to_onnx
402
onnx_text_model = to_onnx(text_pipeline, texts)
403
```
404

405
### Complex ONNX Operator Composition
406

407
```python
408
from skl2onnx.algebra import OnnxOperator
409
import numpy as np
410

411
# Create complex mathematical operation: sigmoid(X @ W + b)
412
X_shape = [None, 10]
413
W_shape = [10, 5]
414

415
# Define computation graph
416
matmul = OnnxOperator('MatMul', 'X', 'W')
417
add_bias = OnnxOperator('Add', matmul, 'b')
418
sigmoid = OnnxOperator('Sigmoid', add_bias, output_names=['Y'])
419

420
# Create complete model with initializers
421
W_init = np.random.randn(*W_shape).astype(np.float32)
422
b_init = np.random.randn(5).astype(np.float32)
423

424
# Generate ONNX model with initializers
425
onnx_model = sigmoid.to_onnx(
426
    inputs=[('X', FloatTensorType(X_shape))],
427
    outputs=[('Y', FloatTensorType([None, 5]))],
428
    target_opset=18
429
)
430

431
# Add initializers manually if needed
432
from onnx import helper, TensorProto
433
W_tensor = helper.make_tensor('W', TensorProto.FLOAT, W_shape, W_init.flatten())
434
b_tensor = helper.make_tensor('b', TensorProto.FLOAT, [5], b_init)
435
onnx_model.graph.initializer.extend([W_tensor, b_tensor])
436
```
437

438
### Custom Regressor with Type Casting
439

440
```python
441
from skl2onnx.sklapi import CastRegressor
442
from sklearn.ensemble import RandomForestRegressor
443
import numpy as np
444

445
# Create base regressor
446
base_regressor = RandomForestRegressor(n_estimators=20, random_state=42)
447

448
# Wrap with type casting capabilities
449
cast_regressor = CastRegressor(base_regressor, dtype=np.float32)
450

451
# Train with automatic casting
452
X_train = np.random.randn(100, 8).astype(np.float64)  # Double precision input
453
y_train = np.random.randn(100).astype(np.float64)
454

455
cast_regressor.fit(X_train, y_train)
456

457
# Predictions automatically cast to specified type
458
X_test = np.random.randn(20, 8).astype(np.float64)
459
predictions = cast_regressor.predict(X_test)
460
print(f"Prediction dtype: {predictions.dtype}")  # Will be float32
461

462
# Convert to ONNX
463
onnx_cast_model = to_onnx(cast_regressor, X_test.astype(np.float32))
464
```
465

466
## Advanced ONNX Operator Patterns
467

468
### Conditional Operations
469

470
```python
471
# Create conditional logic: output = X if condition else Y
472
condition_op = OnnxOperator('Greater', 'X', 0.5)
473
where_op = OnnxOperator('Where', condition_op, 'X', 'Y', output_names=['result'])
474

475
# Generate model
476
conditional_model = where_op.to_onnx(
477
    inputs=[('X', FloatTensorType([None, 1])), ('Y', FloatTensorType([None, 1]))],
478
    outputs=[('result', FloatTensorType([None, 1]))],
479
    target_opset=18
480
)
481
```
482

483
### Reduction Operations
484

485
```python
486
# Create reduction operations: mean along axis
487
reduce_mean_op = OnnxOperator('ReduceMean', 'X', axes=[1], keepdims=1, 
488
                             output_names=['mean_result'])
489

490
reduction_model = reduce_mean_op.to_onnx(
491
    inputs=[('X', FloatTensorType([None, 10]))],
492
    outputs=[('mean_result', FloatTensorType([None, 1]))],
493
    target_opset=18
494
)
495
```
496

497
## Integration Guidelines
498

499
### Mixin Usage Patterns
500
- **Enhance existing models** with `wrap_as_onnx_mixin` for ONNX capabilities
501
- **Combine with pipelines** for end-to-end ONNX conversion
502
- **Use in ensemble methods** for heterogeneous model combinations
503

504
### Custom Transformer Best Practices
505
- **Implement sklearn interface** (fit/transform methods)
506
- **Support ONNX conversion** through proper operator usage
507
- **Handle edge cases** like empty inputs or missing values
508
- **Provide clear documentation** for custom parameters
509

510
### Performance Optimization
511
- **Use appropriate data types** for target deployment environment
512
- **Minimize operator count** in custom graphs
513
- **Consider memory layout** for optimal inference performance
514
- **Profile custom operators** against sklearn equivalents

Version

Tile

Files

algebra.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

algebra.mddocs/