or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

algebra.mdconversion.mddata-types.mdhelpers.mdindex.mdregistration.md

algebra.mddocs/

0

# Algebra and ONNX Operators

1

2

ONNX operator creation system and mixin classes for enhancing scikit-learn models with ONNX capabilities. The algebra module enables direct ONNX operator composition, sklearn integration, and creation of custom ONNX-based transformations that can be seamlessly integrated into scikit-learn pipelines.

3

4

## Capabilities

5

6

### ONNX Operator Creation

7

8

Core class for creating and manipulating ONNX operators programmatically.

9

10

```python { .api }

11

class OnnxOperator:

12

"""

13

Main class for creating ONNX operators programmatically.

14

15

Enables direct construction of ONNX computational graphs using

16

a Python-based API that mirrors ONNX operator specifications.

17

"""

18

19

def __init__(self, op_type, *inputs, **kwargs):

20

"""

21

Create an ONNX operator instance.

22

23

Parameters:

24

- op_type: str, ONNX operator type (e.g., 'MatMul', 'Add', 'Relu')

25

- inputs: Variable, input variables for the operator

26

- kwargs: Additional operator attributes and parameters

27

"""

28

29

def to_onnx(self, inputs=None, outputs=None, target_opset=None):

30

"""

31

Generate ONNX model from operator graph.

32

33

Parameters:

34

- inputs: list, input specifications for the model

35

- outputs: list, output specifications for the model

36

- target_opset: int, target ONNX opset version

37

38

Returns:

39

- ModelProto: Complete ONNX model

40

"""

41

42

def add_to(self, scope, container):

43

"""

44

Add operator to conversion container.

45

46

Parameters:

47

- scope: Scope, conversion scope context

48

- container: Container, conversion container for operators

49

"""

50

```

51

52

### ONNX Operator Mixin

53

54

Mixin class that adds ONNX operator capabilities to scikit-learn models.

55

56

```python { .api }

57

class OnnxOperatorMixin:

58

"""

59

Mixin class for adding ONNX operator capabilities to sklearn models.

60

61

When combined with sklearn estimators, enables direct use of ONNX

62

operators within sklearn pipelines and provides seamless conversion

63

to ONNX format.

64

65

Import from: from skl2onnx.algebra.onnx_operator_mixin import OnnxOperatorMixin

66

"""

67

68

def to_onnx(self, X=None, name=None, options=None, white_op=None,

69

black_op=None, final_types=None, target_opset=None, verbose=0):

70

"""

71

Convert enhanced model to ONNX format.

72

73

Parameters:

74

- X: array-like, sample input for type inference (optional)

75

- name: str, name for the ONNX model (optional)

76

- options: dict, conversion options (optional)

77

- white_op: list, whitelist of allowed operators (optional)

78

- black_op: list, blacklist of forbidden operators (optional)

79

- final_types: list, expected output types for validation (optional)

80

- target_opset: int, target ONNX opset version (optional)

81

- verbose: int, verbosity level (default 0)

82

83

Returns:

84

- ModelProto: ONNX model representation

85

"""

86

87

def onnx_graph(self, **kwargs):

88

"""

89

Generate ONNX graph representation of the model.

90

91

Parameters:

92

- kwargs: Additional parameters for graph generation

93

94

Returns:

95

- GraphProto: ONNX graph representation

96

"""

97

```

98

99

### Custom ONNX Transformers

100

101

Pre-built ONNX-based transformers that can be used directly in sklearn pipelines.

102

103

```python { .api }

104

class CastTransformer:

105

"""

106

Transformer for type casting operations using ONNX Cast operator.

107

108

Converts input data types to specified output types, useful for

109

ensuring type compatibility in mixed-precision pipelines.

110

"""

111

112

def __init__(self, dtype=None):

113

"""

114

Initialize cast transformer.

115

116

Parameters:

117

- dtype: numpy.dtype, target data type for casting

118

"""

119

120

def fit(self, X, y=None):

121

"""Fit the transformer (no-op for casting)."""

122

return self

123

124

def transform(self, X):

125

"""Apply type casting to input data."""

126

pass

127

128

class ReplaceTransformer:

129

"""

130

Transformer for value replacement using ONNX operators.

131

132

Replaces specified values in input data with new values,

133

useful for handling missing values or categorical mappings.

134

"""

135

136

def __init__(self, replace_dict=None):

137

"""

138

Initialize replace transformer.

139

140

Parameters:

141

- replace_dict: dict, mapping of old values to new values

142

"""

143

144

def fit(self, X, y=None):

145

"""Fit the transformer and learn replacement mappings."""

146

return self

147

148

def transform(self, X):

149

"""Apply value replacements to input data."""

150

pass

151

152

class WOETransformer:

153

"""

154

Weight of Evidence transformer using ONNX operators.

155

156

Computes Weight of Evidence encoding for categorical variables,

157

commonly used in credit scoring and risk modeling applications.

158

"""

159

160

def __init__(self, positive_class=1):

161

"""

162

Initialize WOE transformer.

163

164

Parameters:

165

- positive_class: Value representing positive class for WOE calculation

166

"""

167

168

def fit(self, X, y):

169

"""Fit WOE transformer and compute evidence weights."""

170

return self

171

172

def transform(self, X):

173

"""Apply WOE transformation to categorical features."""

174

pass

175

```

176

177

### Custom ONNX Regressors

178

179

ONNX-based regression models with type casting capabilities.

180

181

```python { .api }

182

class CastRegressor:

183

"""

184

Regressor with built-in type casting capabilities.

185

186

Wraps any sklearn regressor and adds automatic type casting

187

for inputs and outputs, ensuring ONNX compatibility.

188

"""

189

190

def __init__(self, regressor, dtype=None):

191

"""

192

Initialize cast regressor.

193

194

Parameters:

195

- regressor: sklearn regressor instance to wrap

196

- dtype: numpy.dtype, target data type for casting

197

"""

198

199

def fit(self, X, y):

200

"""Fit the underlying regressor with type casting."""

201

return self

202

203

def predict(self, X):

204

"""Predict with automatic input/output type casting."""

205

pass

206

```

207

208

### Enhanced Text Processing

209

210

ONNX-compatible text processing transformers with conversion tracing.

211

212

```python { .api }

213

class TraceableCountVectorizer:

214

"""

215

Enhanced CountVectorizer with ONNX conversion tracing capabilities.

216

217

Extends sklearn's CountVectorizer with detailed logging and tracing

218

of the conversion process for debugging and optimization.

219

"""

220

221

def __init__(self, **kwargs):

222

"""

223

Initialize traceable count vectorizer.

224

225

Parameters:

226

- kwargs: Parameters passed to underlying CountVectorizer

227

"""

228

229

def fit(self, X, y=None):

230

"""Fit vectorizer with conversion tracing."""

231

return self

232

233

def transform(self, X):

234

"""Transform text with tracing support."""

235

pass

236

237

def get_conversion_trace(self):

238

"""Get detailed conversion trace information."""

239

pass

240

241

class TraceableTfidfVectorizer:

242

"""

243

Enhanced TfidfVectorizer with ONNX conversion tracing capabilities.

244

245

Extends sklearn's TfidfVectorizer with detailed logging and tracing

246

of the conversion process for debugging and optimization.

247

"""

248

249

def __init__(self, **kwargs):

250

"""

251

Initialize traceable TF-IDF vectorizer.

252

253

Parameters:

254

- kwargs: Parameters passed to underlying TfidfVectorizer

255

"""

256

257

def fit(self, X, y=None):

258

"""Fit vectorizer with conversion tracing."""

259

return self

260

261

def transform(self, X):

262

"""Transform text with tracing support."""

263

pass

264

265

def get_conversion_trace(self):

266

"""Get detailed conversion trace information."""

267

pass

268

```

269

270

## Usage Examples

271

272

### Creating Custom ONNX Operators

273

274

```python

275

from skl2onnx.algebra import OnnxOperator

276

from skl2onnx.common.data_types import FloatTensorType

277

import numpy as np

278

279

# Create input variables

280

X = np.random.randn(10, 5).astype(np.float32)

281

input_type = FloatTensorType([None, 5])

282

283

# Create simple linear transformation: Y = X @ W + b

284

W = np.random.randn(5, 3).astype(np.float32)

285

b = np.random.randn(3).astype(np.float32)

286

287

# Define ONNX operators

288

matmul_op = OnnxOperator('MatMul', 'X', W, name='linear_transform')

289

add_op = OnnxOperator('Add', matmul_op, b, name='add_bias')

290

291

# Generate ONNX model

292

onnx_model = add_op.to_onnx(

293

inputs=[('X', input_type)],

294

outputs=[('Y', FloatTensorType([None, 3]))],

295

target_opset=18

296

)

297

```

298

299

### Using ONNX Operator Mixin

300

301

```python

302

from skl2onnx import wrap_as_onnx_mixin

303

from sklearn.linear_model import LinearRegression

304

from sklearn.datasets import make_regression

305

306

# Create and train model

307

X, y = make_regression(n_samples=100, n_features=10, random_state=42)

308

model = LinearRegression()

309

model.fit(X, y)

310

311

# Enhance with ONNX capabilities

312

enhanced_model = wrap_as_onnx_mixin(model, target_opset=18)

313

314

# Now the model has ONNX methods

315

onnx_model = enhanced_model.to_onnx(X, name="enhanced_linear_regression")

316

317

# Can also generate graph representation

318

onnx_graph = enhanced_model.onnx_graph()

319

```

320

321

### Custom ONNX Transformers in Pipelines

322

323

```python

324

from skl2onnx.sklapi import CastTransformer, ReplaceTransformer

325

from sklearn.pipeline import Pipeline

326

from sklearn.preprocessing import StandardScaler

327

from sklearn.ensemble import RandomForestRegressor

328

import numpy as np

329

330

# Create pipeline with ONNX transformers

331

pipeline = Pipeline([

332

('cast_input', CastTransformer(dtype=np.float32)),

333

('replace_missing', ReplaceTransformer({-999: 0.0})),

334

('scaler', StandardScaler()),

335

('regressor', RandomForestRegressor(n_estimators=10))

336

])

337

338

# Fit pipeline

339

X_train = np.random.randn(100, 5)

340

X_train[X_train < -2] = -999 # Add missing value indicators

341

y_train = np.random.randn(100)

342

343

pipeline.fit(X_train, y_train)

344

345

# Convert entire pipeline to ONNX

346

from skl2onnx import to_onnx

347

onnx_pipeline = to_onnx(pipeline, X_train.astype(np.float32))

348

```

349

350

### Weight of Evidence Encoding

351

352

```python

353

from skl2onnx.sklapi import WOETransformer

354

import pandas as pd

355

356

# Create categorical data

357

data = pd.DataFrame({

358

'category': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B'],

359

'target': [1, 0, 1, 1, 0, 0, 1, 1]

360

})

361

362

# Apply WOE transformation

363

woe_transformer = WOETransformer(positive_class=1)

364

woe_transformer.fit(data[['category']], data['target'])

365

woe_encoded = woe_transformer.transform(data[['category']])

366

367

print("WOE encoded features:", woe_encoded)

368

```

369

370

### Enhanced Text Processing with Tracing

371

372

```python

373

from skl2onnx.sklapi import TraceableCountVectorizer

374

from sklearn.pipeline import Pipeline

375

from sklearn.linear_model import LogisticRegression

376

377

# Create text processing pipeline with tracing

378

text_pipeline = Pipeline([

379

('vectorizer', TraceableCountVectorizer(max_features=1000, stop_words='english')),

380

('classifier', LogisticRegression())

381

])

382

383

# Sample text data

384

texts = [

385

"This is a positive example",

386

"This is a negative example",

387

"Another positive text sample",

388

"Another negative text sample"

389

]

390

labels = [1, 0, 1, 0]

391

392

# Fit pipeline

393

text_pipeline.fit(texts, labels)

394

395

# Get conversion trace

396

vectorizer = text_pipeline.named_steps['vectorizer']

397

trace_info = vectorizer.get_conversion_trace()

398

print("Conversion trace information:", trace_info)

399

400

# Convert to ONNX

401

from skl2onnx import to_onnx

402

onnx_text_model = to_onnx(text_pipeline, texts)

403

```

404

405

### Complex ONNX Operator Composition

406

407

```python

408

from skl2onnx.algebra import OnnxOperator

409

import numpy as np

410

411

# Create complex mathematical operation: sigmoid(X @ W + b)

412

X_shape = [None, 10]

413

W_shape = [10, 5]

414

415

# Define computation graph

416

matmul = OnnxOperator('MatMul', 'X', 'W')

417

add_bias = OnnxOperator('Add', matmul, 'b')

418

sigmoid = OnnxOperator('Sigmoid', add_bias, output_names=['Y'])

419

420

# Create complete model with initializers

421

W_init = np.random.randn(*W_shape).astype(np.float32)

422

b_init = np.random.randn(5).astype(np.float32)

423

424

# Generate ONNX model with initializers

425

onnx_model = sigmoid.to_onnx(

426

inputs=[('X', FloatTensorType(X_shape))],

427

outputs=[('Y', FloatTensorType([None, 5]))],

428

target_opset=18

429

)

430

431

# Add initializers manually if needed

432

from onnx import helper, TensorProto

433

W_tensor = helper.make_tensor('W', TensorProto.FLOAT, W_shape, W_init.flatten())

434

b_tensor = helper.make_tensor('b', TensorProto.FLOAT, [5], b_init)

435

onnx_model.graph.initializer.extend([W_tensor, b_tensor])

436

```

437

438

### Custom Regressor with Type Casting

439

440

```python

441

from skl2onnx.sklapi import CastRegressor

442

from sklearn.ensemble import RandomForestRegressor

443

import numpy as np

444

445

# Create base regressor

446

base_regressor = RandomForestRegressor(n_estimators=20, random_state=42)

447

448

# Wrap with type casting capabilities

449

cast_regressor = CastRegressor(base_regressor, dtype=np.float32)

450

451

# Train with automatic casting

452

X_train = np.random.randn(100, 8).astype(np.float64) # Double precision input

453

y_train = np.random.randn(100).astype(np.float64)

454

455

cast_regressor.fit(X_train, y_train)

456

457

# Predictions automatically cast to specified type

458

X_test = np.random.randn(20, 8).astype(np.float64)

459

predictions = cast_regressor.predict(X_test)

460

print(f"Prediction dtype: {predictions.dtype}") # Will be float32

461

462

# Convert to ONNX

463

onnx_cast_model = to_onnx(cast_regressor, X_test.astype(np.float32))

464

```

465

466

## Advanced ONNX Operator Patterns

467

468

### Conditional Operations

469

470

```python

471

# Create conditional logic: output = X if condition else Y

472

condition_op = OnnxOperator('Greater', 'X', 0.5)

473

where_op = OnnxOperator('Where', condition_op, 'X', 'Y', output_names=['result'])

474

475

# Generate model

476

conditional_model = where_op.to_onnx(

477

inputs=[('X', FloatTensorType([None, 1])), ('Y', FloatTensorType([None, 1]))],

478

outputs=[('result', FloatTensorType([None, 1]))],

479

target_opset=18

480

)

481

```

482

483

### Reduction Operations

484

485

```python

486

# Create reduction operations: mean along axis

487

reduce_mean_op = OnnxOperator('ReduceMean', 'X', axes=[1], keepdims=1,

488

output_names=['mean_result'])

489

490

reduction_model = reduce_mean_op.to_onnx(

491

inputs=[('X', FloatTensorType([None, 10]))],

492

outputs=[('mean_result', FloatTensorType([None, 1]))],

493

target_opset=18

494

)

495

```

496

497

## Integration Guidelines

498

499

### Mixin Usage Patterns

500

- **Enhance existing models** with `wrap_as_onnx_mixin` for ONNX capabilities

501

- **Combine with pipelines** for end-to-end ONNX conversion

502

- **Use in ensemble methods** for heterogeneous model combinations

503

504

### Custom Transformer Best Practices

505

- **Implement sklearn interface** (fit/transform methods)

506

- **Support ONNX conversion** through proper operator usage

507

- **Handle edge cases** like empty inputs or missing values

508

- **Provide clear documentation** for custom parameters

509

510

### Performance Optimization

511

- **Use appropriate data types** for target deployment environment

512

- **Minimize operator count** in custom graphs

513

- **Consider memory layout** for optimal inference performance

514

- **Profile custom operators** against sklearn equivalents