or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

algebra.mdconversion.mddata-types.mdhelpers.mdindex.mdregistration.md

helpers.mddocs/

0

# Helper Utilities

1

2

Investigation and integration utilities for debugging conversions, comparing outputs between scikit-learn and ONNX models, analyzing pipeline structures, and integrating custom ONNX graphs. These utilities support development, testing, and troubleshooting of ONNX conversions.

3

4

## Capabilities

5

6

### Investigation and Debugging

7

8

Tools for analyzing conversion processes, collecting intermediate results, and debugging conversion issues.

9

10

```python { .api }

11

def collect_intermediate_steps(model, X=None, target_opset=None):

12

"""

13

Collect intermediate outputs during conversion process for debugging.

14

15

Provides detailed information about shape inference, operator creation,

16

and conversion steps to help diagnose conversion issues.

17

18

Parameters:

19

- model: scikit-learn model to analyze

20

- X: array-like, sample input data for type inference (optional)

21

- target_opset: int, target ONNX opset version (optional)

22

23

Returns:

24

- dict: Detailed conversion information including:

25

- 'shapes': Shape inference results for each step

26

- 'operators': Generated ONNX operators

27

- 'variables': Variable names and types

28

- 'topology': Model topology structure

29

"""

30

31

def compare_objects(sklearn_output, onnx_output, decimal=5):

32

"""

33

Compare outputs between scikit-learn and ONNX models.

34

35

Validates conversion accuracy by comparing predictions from original

36

sklearn model with converted ONNX model outputs.

37

38

Parameters:

39

- sklearn_output: array-like, output from sklearn model

40

- onnx_output: array-like, output from ONNX model

41

- decimal: int, number of decimal places for comparison (default 5)

42

43

Returns:

44

- bool: True if outputs match within specified precision

45

46

Raises:

47

- AssertionError: If outputs don't match within tolerance

48

- ValueError: If output shapes or types are incompatible

49

"""

50

51

def enumerate_pipeline_models(model):

52

"""

53

Enumerate all models within a pipeline or ensemble.

54

55

Recursively discovers all sub-models in complex pipelines,

56

feature unions, and ensemble models for analysis or debugging.

57

58

Parameters:

59

- model: scikit-learn model, pipeline, or ensemble

60

61

Returns:

62

- list: List of tuples (model_name, model_instance, path)

63

where path indicates the location within the pipeline structure

64

"""

65

```

66

67

### Integration Utilities

68

69

Functions for integrating custom ONNX graphs and extending existing models.

70

71

```python { .api }

72

def add_onnx_graph(onx, to_add, inputs, outputs):

73

"""

74

Add a custom ONNX graph to an existing ONNX model.

75

76

Enables integration of custom operators or preprocessing/postprocessing

77

steps by merging ONNX graphs while maintaining proper variable connections.

78

79

Parameters:

80

- onx: ModelProto, existing ONNX model

81

- to_add: GraphProto or ModelProto, graph/model to add

82

- inputs: list, input variable names for connection

83

- outputs: list, output variable names for connection

84

85

Returns:

86

- ModelProto: Modified ONNX model with integrated graph

87

88

Raises:

89

- ValueError: If input/output connections are invalid

90

- TypeError: If graph types are incompatible

91

"""

92

```

93

94

### Performance and Benchmarking

95

96

Utilities for measuring and comparing performance between sklearn and ONNX models.

97

98

```python { .api }

99

def measure_time(stmt, context, repeat=10, number=50, div_by_number=False):

100

"""

101

Measure execution time for model operations.

102

103

Provides accurate timing measurements for comparing sklearn vs ONNX

104

model performance, including statistical analysis of multiple runs.

105

106

Parameters:

107

- stmt: str, statement to time (e.g., 'model.predict(X)')

108

- context: dict, variable context dictionary for statement execution

109

- repeat: int, number of timing runs for statistical analysis (default 10)

110

- number: int, number of executions per timing run (default 50)

111

- div_by_number: bool, divide timing results by number of executions (default False)

112

113

Returns:

114

- dict: Timing results including:

115

- 'average': Average execution time

116

- 'deviation': Standard deviation

117

- 'min_exec': Minimum execution time

118

- 'max_exec': Maximum execution time

119

- 'repeat': Number of repeat runs

120

- 'number': Number of executions per run

121

"""

122

```

123

124

## Usage Examples

125

126

### Debugging Conversion Issues

127

128

```python

129

from skl2onnx.helpers.investigate import collect_intermediate_steps

130

from sklearn.ensemble import RandomForestClassifier

131

from sklearn.datasets import make_classification

132

133

# Create model

134

X, y = make_classification(n_samples=100, n_features=10, random_state=42)

135

model = RandomForestClassifier(n_estimators=5, random_state=42)

136

model.fit(X, y)

137

138

# Collect detailed conversion information

139

debug_info = collect_intermediate_steps(model, X, target_opset=18)

140

141

# Analyze the results

142

print("Shape inference results:")

143

for step, shapes in debug_info['shapes'].items():

144

print(f" {step}: {shapes}")

145

146

print("\nGenerated operators:")

147

for i, op in enumerate(debug_info['operators']):

148

print(f" {i}: {op.op_type} ({op.inputs} -> {op.outputs})")

149

150

print("\nVariable information:")

151

for name, var_info in debug_info['variables'].items():

152

print(f" {name}: {var_info}")

153

```

154

155

### Validating Conversion Accuracy

156

157

```python

158

from skl2onnx.helpers.investigate import compare_objects

159

from skl2onnx import to_onnx

160

import onnxruntime as rt

161

import numpy as np

162

163

# Convert model

164

onnx_model = to_onnx(model, X)

165

166

# Get sklearn predictions

167

sklearn_pred = model.predict_proba(X)

168

169

# Get ONNX predictions

170

sess = rt.InferenceSession(onnx_model.SerializeToString())

171

input_name = sess.get_inputs()[0].name

172

onnx_pred = sess.run(None, {input_name: X.astype(np.float32)})[1]

173

174

# Compare outputs

175

try:

176

match = compare_objects(sklearn_pred, onnx_pred, decimal=4)

177

print("Conversion validated: outputs match within tolerance")

178

except AssertionError as e:

179

print(f"Conversion issue detected: {e}")

180

```

181

182

### Analyzing Pipeline Structure

183

184

```python

185

from skl2onnx.helpers.investigate import enumerate_pipeline_models

186

from sklearn.pipeline import Pipeline

187

from sklearn.preprocessing import StandardScaler

188

from sklearn.feature_selection import SelectKBest

189

from sklearn.ensemble import RandomForestClassifier

190

191

# Create complex pipeline

192

pipeline = Pipeline([

193

('scaler', StandardScaler()),

194

('selector', SelectKBest(k=5)),

195

('classifier', RandomForestClassifier(n_estimators=10))

196

])

197

pipeline.fit(X, y)

198

199

# Enumerate all models in pipeline

200

models = enumerate_pipeline_models(pipeline)

201

202

print("Pipeline structure:")

203

for name, model_instance, path in models:

204

print(f" {path}: {name} ({type(model_instance).__name__})")

205

```

206

207

### Adding Custom ONNX Operations

208

209

```python

210

from skl2onnx.helpers.integration import add_onnx_graph

211

from skl2onnx import to_onnx

212

import onnx

213

from onnx import helper, TensorProto

214

215

# Convert base model

216

base_model = to_onnx(model, X)

217

218

# Create custom preprocessing graph

219

custom_inputs = [helper.make_tensor_value_info('input', TensorProto.FLOAT, [None, 10])]

220

custom_outputs = [helper.make_tensor_value_info('processed', TensorProto.FLOAT, [None, 10])]

221

222

# Custom operation: multiply by constant

223

multiply_node = helper.make_node(

224

'Mul',

225

inputs=['input', 'scale_factor'],

226

outputs=['processed'],

227

name='custom_scaling'

228

)

229

230

# Create scale factor initializer

231

scale_factor = helper.make_tensor(

232

'scale_factor',

233

TensorProto.FLOAT,

234

[1],

235

[2.0] # Scale factor value

236

)

237

238

custom_graph = helper.make_graph(

239

[multiply_node],

240

'custom_preprocessing',

241

custom_inputs,

242

custom_outputs,

243

[scale_factor]

244

)

245

246

# Integrate custom graph with base model

247

enhanced_model = add_onnx_graph(

248

base_model,

249

custom_graph,

250

inputs=['input'],

251

outputs=['processed']

252

)

253

```

254

255

### Performance Benchmarking

256

257

```python

258

from skl2onnx.tutorial import measure_time

259

import onnxruntime as rt

260

import numpy as np

261

from sklearn.ensemble import RandomForestClassifier

262

from skl2onnx import to_onnx

263

264

# Create and train model

265

X_test = np.random.randn(1000, 10).astype(np.float32)

266

model = RandomForestClassifier(n_estimators=100, random_state=42)

267

model.fit(X_test[:100], np.random.randint(0, 2, 100))

268

269

# Convert to ONNX

270

onnx_model = to_onnx(model, X_test[:1])

271

sess = rt.InferenceSession(onnx_model.SerializeToString())

272

input_name = sess.get_inputs()[0].name

273

274

# Measure sklearn performance

275

sklearn_context = {

276

'model': model,

277

'X_test': X_test

278

}

279

sklearn_times = measure_time(

280

'model.predict_proba(X_test)',

281

context=sklearn_context,

282

number=10,

283

repeat=5

284

)

285

286

# Measure ONNX performance

287

onnx_context = {

288

'sess': sess,

289

'input_name': input_name,

290

'X_test': X_test

291

}

292

onnx_times = measure_time(

293

'sess.run(None, {input_name: X_test})',

294

context=onnx_context,

295

number=10,

296

repeat=5

297

)

298

299

print(f"Sklearn average time: {sklearn_times['average']:.4f}s (±{sklearn_times['deviation']:.4f})")

300

print(f"ONNX average time: {onnx_times['average']:.4f}s (±{onnx_times['deviation']:.4f})")

301

print(f"Speedup: {sklearn_times['average'] / onnx_times['average']:.2f}x")

302

```

303

304

### Advanced Pipeline Analysis

305

306

```python

307

# Analyze complex nested pipeline

308

from sklearn.compose import ColumnTransformer

309

from sklearn.preprocessing import OneHotEncoder, StandardScaler

310

311

# Create complex pipeline with column transformer

312

preprocessor = ColumnTransformer([

313

('num', StandardScaler(), [0, 1, 2]),

314

('cat', OneHotEncoder(), [3, 4])

315

])

316

317

complex_pipeline = Pipeline([

318

('preprocessing', preprocessor),

319

('classifier', RandomForestClassifier())

320

])

321

322

# Enumerate all components

323

all_models = enumerate_pipeline_models(complex_pipeline)

324

325

print("Complex pipeline analysis:")

326

for name, instance, path in all_models:

327

print(f" {path}: {name}")

328

if hasattr(instance, 'get_params'):

329

key_params = {k: v for k, v in instance.get_params().items()

330

if not k.endswith('__') and not callable(v)}

331

print(f" Key parameters: {key_params}")

332

```

333

334

## Debugging Guidelines

335

336

### Common Investigation Patterns

337

1. **Shape Mismatches**: Use `collect_intermediate_steps` to trace shape inference

338

2. **Type Errors**: Check data type consistency with `compare_objects`

339

3. **Pipeline Issues**: Use `enumerate_pipeline_models` to understand structure

340

4. **Performance Problems**: Use `measure_time` for systematic benchmarking

341

342

### Troubleshooting Tips

343

- **Enable verbose logging** during conversion for detailed information

344

- **Compare intermediate outputs** at each pipeline stage

345

- **Validate with simple test cases** before complex scenarios

346

- **Check ONNX opset compatibility** for target deployment environment

347

348

### Integration Best Practices

349

- **Test custom graphs separately** before integration

350

- **Validate variable connections** between graph components

351

- **Consider performance implications** of additional operations

352

- **Document custom modifications** for maintainability