Tessl Tile for pypi/skl2onnx@1.19.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

algebra.md conversion.md data-types.md helpers.md index.md registration.md

data-types.mddocs/

0
# Data Types and Type System
1

2
Comprehensive type system for ONNX conversion that maps Python/NumPy data types to ONNX types with automatic inference and shape validation. The type system ensures accurate data representation and compatibility between scikit-learn models and ONNX runtime environments.
3

4
## Capabilities
5

6
### Base Type Classes
7

8
Foundation classes for the type system hierarchy that provide common functionality and structure for all data types.
9

10
```python { .api }
11
class DataType:
12
    """
13
    Base class for all data types in the conversion system.
14
    
15
    Provides common interface for type operations and validation.
16
    """
17

18
class TensorType(DataType):
19
    """
20
    Base class for tensor data types.
21
    
22
    Represents multi-dimensional arrays with shape and element type information.
23
    """
24
```
25

26
### Container Types
27

28
Types for complex data structures that contain multiple values or nested data.
29

30
```python { .api }
31
class SequenceType(DataType):
32
    """
33
    Represents sequence data containing ordered collections of elements.
34
    
35
    Parameters:
36
    - element_type: DataType, type of elements in the sequence
37
    """
38

39
class DictionaryType(DataType):
40
    """
41
    Represents dictionary/map data with key-value pairs.
42
    
43
    Parameters:
44
    - key_type: DataType, type of dictionary keys
45
    - value_type: DataType, type of dictionary values
46
    """
47
```
48

49
### Scalar Types
50

51
Simple data types representing single values without dimensions.
52

53
```python { .api }
54
class FloatType(DataType):
55
    """32-bit floating point scalar type."""
56

57
class Int64Type(DataType):
58
    """64-bit signed integer scalar type."""
59

60
class StringType(DataType):
61
    """String scalar type."""
62
```
63

64
### Tensor Types
65

66
Multi-dimensional array types supporting various numeric and string data representations.
67

68
#### Integer Tensor Types
69

70
```python { .api }
71
class Int8TensorType(TensorType):
72
    """
73
    8-bit signed integer tensor type.
74
    
75
    Parameters:
76
    - shape: list, tensor dimensions (None for dynamic dimensions)
77
    """
78

79
class Int16TensorType(TensorType):
80
    """
81
    16-bit signed integer tensor type.
82
    
83
    Parameters:
84
    - shape: list, tensor dimensions (None for dynamic dimensions)
85
    """
86

87
class Int32TensorType(TensorType):
88
    """
89
    32-bit signed integer tensor type.
90
    
91
    Parameters:
92
    - shape: list, tensor dimensions (None for dynamic dimensions)
93
    """
94

95
class Int64TensorType(TensorType):
96
    """
97
    64-bit signed integer tensor type.
98
    
99
    Parameters:
100
    - shape: list, tensor dimensions (None for dynamic dimensions)
101
    """
102

103
class UInt8TensorType(TensorType):
104
    """
105
    8-bit unsigned integer tensor type.
106
    
107
    Parameters:
108
    - shape: list, tensor dimensions (None for dynamic dimensions)
109
    """
110

111
class UInt16TensorType(TensorType):
112
    """
113
    16-bit unsigned integer tensor type.
114
    
115
    Parameters:
116
    - shape: list, tensor dimensions (None for dynamic dimensions)
117
    """
118

119
class UInt32TensorType(TensorType):
120
    """
121
    32-bit unsigned integer tensor type.
122
    
123
    Parameters:
124
    - shape: list, tensor dimensions (None for dynamic dimensions)
125
    """
126

127
class UInt64TensorType(TensorType):
128
    """
129
    64-bit unsigned integer tensor type.
130
    
131
    Parameters:
132
    - shape: list, tensor dimensions (None for dynamic dimensions)
133
    """
134
```
135

136
#### Floating Point Tensor Types
137

138
```python { .api }
139
class Float16TensorType(TensorType):
140
    """
141
    16-bit floating point tensor type (half precision).
142
    
143
    Parameters:
144
    - shape: list, tensor dimensions (None for dynamic dimensions)
145
    """
146

147
class FloatTensorType(TensorType):
148
    """
149
    32-bit floating point tensor type (single precision).
150
    
151
    Parameters:
152
    - shape: list, tensor dimensions (None for dynamic dimensions)
153
    """
154

155
class DoubleTensorType(TensorType):
156
    """
157
    64-bit floating point tensor type (double precision).
158
    
159
    Parameters:
160
    - shape: list, tensor dimensions (None for dynamic dimensions)
161
    """
162
```
163

164
#### Other Tensor Types
165

166
```python { .api }
167
class BooleanTensorType(TensorType):
168
    """
169
    Boolean tensor type.
170
    
171
    Parameters:
172
    - shape: list, tensor dimensions (None for dynamic dimensions)
173
    """
174

175
class StringTensorType(TensorType):
176
    """
177
    String tensor type.
178
    
179
    Parameters:
180
    - shape: list, tensor dimensions (None for dynamic dimensions)
181
    """
182

183
class Complex64TensorType(TensorType):
184
    """
185
    64-bit complex number tensor type.
186
    
187
    Parameters:
188
    - shape: list, tensor dimensions (None for dynamic dimensions)
189
    """
190

191
class Complex128TensorType(TensorType):
192
    """
193
    128-bit complex number tensor type.
194
    
195
    Parameters:
196
    - shape: list, tensor dimensions (None for dynamic dimensions)
197
    """
198
```
199

200
### Type Inference Functions
201

202
Automatic type detection and conversion utilities that analyze Python/NumPy objects to determine appropriate ONNX types.
203

204
```python { .api }
205
def guess_data_type(data_type):
206
    """
207
    Infer ONNX data type from Python/NumPy type.
208
    
209
    Parameters:
210
    - data_type: Python type, NumPy dtype, or data sample
211
    
212
    Returns:
213
    - DataType: Appropriate ONNX data type
214
    """
215

216
def guess_numpy_type(data_type):
217
    """
218
    Convert data type to NumPy equivalent.
219
    
220
    Parameters:
221
    - data_type: DataType instance
222
    
223
    Returns:
224
    - numpy.dtype: Equivalent NumPy data type
225
    """
226

227
def guess_proto_type(data_type):
228
    """
229
    Convert data type to ONNX protobuf type.
230
    
231
    Parameters:
232
    - data_type: DataType instance
233
    
234
    Returns:
235
    - int: ONNX protobuf type identifier
236
    """
237

238
def guess_tensor_type(data_type):
239
    """
240
    Convert scalar type to tensor type.
241
    
242
    Parameters:
243
    - data_type: DataType instance
244
    
245
    Returns:
246
    - TensorType: Corresponding tensor type
247
    """
248

249
def copy_type(data_type):
250
    """
251
    Create a copy of existing data type.
252
    
253
    Parameters:
254
    - data_type: DataType instance to copy
255
    
256
    Returns:
257
    - DataType: Copy of the input type
258
    """
259
```
260

261
### Automatic Type Inference from Data
262

263
```python { .api }
264
def guess_initial_types(X, initial_types=None):
265
    """
266
    Automatically infer initial types from input data.
267
    
268
    Parameters:
269
    - X: array-like, input data sample
270
    - initial_types: list, existing type specifications (optional)
271
    
272
    Returns:
273
    - list: List of (name, type) tuples for model inputs
274
    """
275
```
276

277
## Usage Examples
278

279
### Basic Type Creation
280

281
```python
282
from skl2onnx.common.data_types import (
283
    FloatTensorType, Int64TensorType, StringTensorType, BooleanTensorType
284
)
285

286
# Create tensor types with explicit shapes
287
float_input = FloatTensorType([None, 10])  # Variable batch size, 10 features
288
int_labels = Int64TensorType([None])       # Variable length label vector
289
string_features = StringTensorType([None, 5])  # Variable batch, 5 string features
290
bool_mask = BooleanTensorType([None, 10])  # Boolean mask tensor
291
```
292

293
### Dynamic and Fixed Shapes
294

295
```python
296
# Dynamic shapes (None for variable dimensions)
297
dynamic_input = FloatTensorType([None, None])  # Fully dynamic 2D tensor
298
batch_dynamic = FloatTensorType([None, 100])   # Variable batch, fixed features
299

300
# Fixed shapes
301
fixed_input = FloatTensorType([32, 64])  # Fixed 32x64 tensor
302
image_input = FloatTensorType([1, 3, 224, 224])  # Single RGB image
303
```
304

305
### Automatic Type Inference
306

307
```python
308
import numpy as np
309
from skl2onnx.common.data_types import guess_data_type, guess_initial_types
310

311
# Infer type from NumPy array
312
X = np.random.randn(100, 20).astype(np.float32)
313
inferred_type = guess_data_type(X.dtype)
314
print(inferred_type)  # FloatTensorType
315

316
# Automatically create initial types from data
317
initial_types = guess_initial_types(X)
318
print(initial_types)  # [('X', FloatTensorType([None, 20]))]
319
```
320

321
### Type Conversion and Validation
322

323
```python
324
from skl2onnx.common.data_types import (
325
    guess_numpy_type, guess_proto_type, copy_type
326
)
327

328
# Create a tensor type
329
tensor_type = FloatTensorType([None, 10])
330

331
# Convert to NumPy equivalent
332
numpy_dtype = guess_numpy_type(tensor_type)
333
print(numpy_dtype)  # float32
334

335
# Get ONNX protobuf type
336
proto_type = guess_proto_type(tensor_type)
337
print(proto_type)  # ONNX TensorProto type ID
338

339
# Create a copy
340
type_copy = copy_type(tensor_type)
341
```
342

343
### Complex Data Types
344

345
```python
346
from skl2onnx.common.data_types import SequenceType, DictionaryType
347

348
# Sequence of float tensors
349
sequence_type = SequenceType(FloatTensorType([None, 5]))
350

351
# Dictionary with string keys and float values
352
dict_type = DictionaryType(StringType(), FloatTensorType([None]))
353
```
354

355
### Multi-Input Type Specifications
356

357
```python
358
# Multiple inputs with different types
359
initial_types = [
360
    ('numerical_features', FloatTensorType([None, 20])),
361
    ('categorical_features', Int64TensorType([None, 5])),
362
    ('text_features', StringTensorType([None, 1]))
363
]
364
```
365

366
### Precision Control
367

368
```python
369
# Different precision levels
370
half_precision = Float16TensorType([None, 10])    # Memory efficient
371
single_precision = FloatTensorType([None, 10])    # Standard precision
372
double_precision = DoubleTensorType([None, 10])   # High precision
373

374
# Integer precision levels
375
small_ints = Int8TensorType([None])     # -128 to 127
376
large_ints = Int64TensorType([None])    # Full 64-bit range
377
```
378

379
## Type System Guidelines
380

381
### Shape Specification
382
- Use `None` for variable/dynamic dimensions
383
- Specify exact values for fixed dimensions
384
- Consider batch dimension variability (typically first dimension is `None`)
385

386
### Data Type Selection
387
- **FloatTensorType**: Most common for numerical features and model outputs
388
- **Int64TensorType**: Integer labels, indices, categorical data
389
- **StringTensorType**: Text data, categorical strings
390
- **BooleanTensorType**: Binary masks, boolean features
391

392
### Performance Considerations
393
- **Float32** (FloatTensorType): Best balance of precision and performance
394
- **Float16**: Memory efficient but reduced precision
395
- **Float64**: High precision but increased memory usage
396
- Use appropriate integer types based on value ranges to optimize memory
397

398
### Compatibility Notes
399
- ONNX runtime support varies by data type and operator
400
- Some operators may require specific input types
401
- Consider target deployment environment limitations

Version

Tile

Files

data-types.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-types.mddocs/