0
# Data Types and Type System
1
2
Comprehensive type system for ONNX conversion that maps Python/NumPy data types to ONNX types with automatic inference and shape validation. The type system ensures accurate data representation and compatibility between scikit-learn models and ONNX runtime environments.
3
4
## Capabilities
5
6
### Base Type Classes
7
8
Foundation classes for the type system hierarchy that provide common functionality and structure for all data types.
9
10
```python { .api }
11
class DataType:
12
"""
13
Base class for all data types in the conversion system.
14
15
Provides common interface for type operations and validation.
16
"""
17
18
class TensorType(DataType):
19
"""
20
Base class for tensor data types.
21
22
Represents multi-dimensional arrays with shape and element type information.
23
"""
24
```
25
26
### Container Types
27
28
Types for complex data structures that contain multiple values or nested data.
29
30
```python { .api }
31
class SequenceType(DataType):
32
"""
33
Represents sequence data containing ordered collections of elements.
34
35
Parameters:
36
- element_type: DataType, type of elements in the sequence
37
"""
38
39
class DictionaryType(DataType):
40
"""
41
Represents dictionary/map data with key-value pairs.
42
43
Parameters:
44
- key_type: DataType, type of dictionary keys
45
- value_type: DataType, type of dictionary values
46
"""
47
```
48
49
### Scalar Types
50
51
Simple data types representing single values without dimensions.
52
53
```python { .api }
54
class FloatType(DataType):
55
"""32-bit floating point scalar type."""
56
57
class Int64Type(DataType):
58
"""64-bit signed integer scalar type."""
59
60
class StringType(DataType):
61
"""String scalar type."""
62
```
63
64
### Tensor Types
65
66
Multi-dimensional array types supporting various numeric and string data representations.
67
68
#### Integer Tensor Types
69
70
```python { .api }
71
class Int8TensorType(TensorType):
72
"""
73
8-bit signed integer tensor type.
74
75
Parameters:
76
- shape: list, tensor dimensions (None for dynamic dimensions)
77
"""
78
79
class Int16TensorType(TensorType):
80
"""
81
16-bit signed integer tensor type.
82
83
Parameters:
84
- shape: list, tensor dimensions (None for dynamic dimensions)
85
"""
86
87
class Int32TensorType(TensorType):
88
"""
89
32-bit signed integer tensor type.
90
91
Parameters:
92
- shape: list, tensor dimensions (None for dynamic dimensions)
93
"""
94
95
class Int64TensorType(TensorType):
96
"""
97
64-bit signed integer tensor type.
98
99
Parameters:
100
- shape: list, tensor dimensions (None for dynamic dimensions)
101
"""
102
103
class UInt8TensorType(TensorType):
104
"""
105
8-bit unsigned integer tensor type.
106
107
Parameters:
108
- shape: list, tensor dimensions (None for dynamic dimensions)
109
"""
110
111
class UInt16TensorType(TensorType):
112
"""
113
16-bit unsigned integer tensor type.
114
115
Parameters:
116
- shape: list, tensor dimensions (None for dynamic dimensions)
117
"""
118
119
class UInt32TensorType(TensorType):
120
"""
121
32-bit unsigned integer tensor type.
122
123
Parameters:
124
- shape: list, tensor dimensions (None for dynamic dimensions)
125
"""
126
127
class UInt64TensorType(TensorType):
128
"""
129
64-bit unsigned integer tensor type.
130
131
Parameters:
132
- shape: list, tensor dimensions (None for dynamic dimensions)
133
"""
134
```
135
136
#### Floating Point Tensor Types
137
138
```python { .api }
139
class Float16TensorType(TensorType):
140
"""
141
16-bit floating point tensor type (half precision).
142
143
Parameters:
144
- shape: list, tensor dimensions (None for dynamic dimensions)
145
"""
146
147
class FloatTensorType(TensorType):
148
"""
149
32-bit floating point tensor type (single precision).
150
151
Parameters:
152
- shape: list, tensor dimensions (None for dynamic dimensions)
153
"""
154
155
class DoubleTensorType(TensorType):
156
"""
157
64-bit floating point tensor type (double precision).
158
159
Parameters:
160
- shape: list, tensor dimensions (None for dynamic dimensions)
161
"""
162
```
163
164
#### Other Tensor Types
165
166
```python { .api }
167
class BooleanTensorType(TensorType):
168
"""
169
Boolean tensor type.
170
171
Parameters:
172
- shape: list, tensor dimensions (None for dynamic dimensions)
173
"""
174
175
class StringTensorType(TensorType):
176
"""
177
String tensor type.
178
179
Parameters:
180
- shape: list, tensor dimensions (None for dynamic dimensions)
181
"""
182
183
class Complex64TensorType(TensorType):
184
"""
185
64-bit complex number tensor type.
186
187
Parameters:
188
- shape: list, tensor dimensions (None for dynamic dimensions)
189
"""
190
191
class Complex128TensorType(TensorType):
192
"""
193
128-bit complex number tensor type.
194
195
Parameters:
196
- shape: list, tensor dimensions (None for dynamic dimensions)
197
"""
198
```
199
200
### Type Inference Functions
201
202
Automatic type detection and conversion utilities that analyze Python/NumPy objects to determine appropriate ONNX types.
203
204
```python { .api }
205
def guess_data_type(data_type):
206
"""
207
Infer ONNX data type from Python/NumPy type.
208
209
Parameters:
210
- data_type: Python type, NumPy dtype, or data sample
211
212
Returns:
213
- DataType: Appropriate ONNX data type
214
"""
215
216
def guess_numpy_type(data_type):
217
"""
218
Convert data type to NumPy equivalent.
219
220
Parameters:
221
- data_type: DataType instance
222
223
Returns:
224
- numpy.dtype: Equivalent NumPy data type
225
"""
226
227
def guess_proto_type(data_type):
228
"""
229
Convert data type to ONNX protobuf type.
230
231
Parameters:
232
- data_type: DataType instance
233
234
Returns:
235
- int: ONNX protobuf type identifier
236
"""
237
238
def guess_tensor_type(data_type):
239
"""
240
Convert scalar type to tensor type.
241
242
Parameters:
243
- data_type: DataType instance
244
245
Returns:
246
- TensorType: Corresponding tensor type
247
"""
248
249
def copy_type(data_type):
250
"""
251
Create a copy of existing data type.
252
253
Parameters:
254
- data_type: DataType instance to copy
255
256
Returns:
257
- DataType: Copy of the input type
258
"""
259
```
260
261
### Automatic Type Inference from Data
262
263
```python { .api }
264
def guess_initial_types(X, initial_types=None):
265
"""
266
Automatically infer initial types from input data.
267
268
Parameters:
269
- X: array-like, input data sample
270
- initial_types: list, existing type specifications (optional)
271
272
Returns:
273
- list: List of (name, type) tuples for model inputs
274
"""
275
```
276
277
## Usage Examples
278
279
### Basic Type Creation
280
281
```python
282
from skl2onnx.common.data_types import (
283
FloatTensorType, Int64TensorType, StringTensorType, BooleanTensorType
284
)
285
286
# Create tensor types with explicit shapes
287
float_input = FloatTensorType([None, 10]) # Variable batch size, 10 features
288
int_labels = Int64TensorType([None]) # Variable length label vector
289
string_features = StringTensorType([None, 5]) # Variable batch, 5 string features
290
bool_mask = BooleanTensorType([None, 10]) # Boolean mask tensor
291
```
292
293
### Dynamic and Fixed Shapes
294
295
```python
296
# Dynamic shapes (None for variable dimensions)
297
dynamic_input = FloatTensorType([None, None]) # Fully dynamic 2D tensor
298
batch_dynamic = FloatTensorType([None, 100]) # Variable batch, fixed features
299
300
# Fixed shapes
301
fixed_input = FloatTensorType([32, 64]) # Fixed 32x64 tensor
302
image_input = FloatTensorType([1, 3, 224, 224]) # Single RGB image
303
```
304
305
### Automatic Type Inference
306
307
```python
308
import numpy as np
309
from skl2onnx.common.data_types import guess_data_type, guess_initial_types
310
311
# Infer type from NumPy array
312
X = np.random.randn(100, 20).astype(np.float32)
313
inferred_type = guess_data_type(X.dtype)
314
print(inferred_type) # FloatTensorType
315
316
# Automatically create initial types from data
317
initial_types = guess_initial_types(X)
318
print(initial_types) # [('X', FloatTensorType([None, 20]))]
319
```
320
321
### Type Conversion and Validation
322
323
```python
324
from skl2onnx.common.data_types import (
325
guess_numpy_type, guess_proto_type, copy_type
326
)
327
328
# Create a tensor type
329
tensor_type = FloatTensorType([None, 10])
330
331
# Convert to NumPy equivalent
332
numpy_dtype = guess_numpy_type(tensor_type)
333
print(numpy_dtype) # float32
334
335
# Get ONNX protobuf type
336
proto_type = guess_proto_type(tensor_type)
337
print(proto_type) # ONNX TensorProto type ID
338
339
# Create a copy
340
type_copy = copy_type(tensor_type)
341
```
342
343
### Complex Data Types
344
345
```python
346
from skl2onnx.common.data_types import SequenceType, DictionaryType
347
348
# Sequence of float tensors
349
sequence_type = SequenceType(FloatTensorType([None, 5]))
350
351
# Dictionary with string keys and float values
352
dict_type = DictionaryType(StringType(), FloatTensorType([None]))
353
```
354
355
### Multi-Input Type Specifications
356
357
```python
358
# Multiple inputs with different types
359
initial_types = [
360
('numerical_features', FloatTensorType([None, 20])),
361
('categorical_features', Int64TensorType([None, 5])),
362
('text_features', StringTensorType([None, 1]))
363
]
364
```
365
366
### Precision Control
367
368
```python
369
# Different precision levels
370
half_precision = Float16TensorType([None, 10]) # Memory efficient
371
single_precision = FloatTensorType([None, 10]) # Standard precision
372
double_precision = DoubleTensorType([None, 10]) # High precision
373
374
# Integer precision levels
375
small_ints = Int8TensorType([None]) # -128 to 127
376
large_ints = Int64TensorType([None]) # Full 64-bit range
377
```
378
379
## Type System Guidelines
380
381
### Shape Specification
382
- Use `None` for variable/dynamic dimensions
383
- Specify exact values for fixed dimensions
384
- Consider batch dimension variability (typically first dimension is `None`)
385
386
### Data Type Selection
387
- **FloatTensorType**: Most common for numerical features and model outputs
388
- **Int64TensorType**: Integer labels, indices, categorical data
389
- **StringTensorType**: Text data, categorical strings
390
- **BooleanTensorType**: Binary masks, boolean features
391
392
### Performance Considerations
393
- **Float32** (FloatTensorType): Best balance of precision and performance
394
- **Float16**: Memory efficient but reduced precision
395
- **Float64**: High precision but increased memory usage
396
- Use appropriate integer types based on value ranges to optimize memory
397
398
### Compatibility Notes
399
- ONNX runtime support varies by data type and operator
400
- Some operators may require specific input types
401
- Consider target deployment environment limitations