or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

algebra.mdconversion.mddata-types.mdhelpers.mdindex.mdregistration.md

data-types.mddocs/

0

# Data Types and Type System

1

2

Comprehensive type system for ONNX conversion that maps Python/NumPy data types to ONNX types with automatic inference and shape validation. The type system ensures accurate data representation and compatibility between scikit-learn models and ONNX runtime environments.

3

4

## Capabilities

5

6

### Base Type Classes

7

8

Foundation classes for the type system hierarchy that provide common functionality and structure for all data types.

9

10

```python { .api }

11

class DataType:

12

"""

13

Base class for all data types in the conversion system.

14

15

Provides common interface for type operations and validation.

16

"""

17

18

class TensorType(DataType):

19

"""

20

Base class for tensor data types.

21

22

Represents multi-dimensional arrays with shape and element type information.

23

"""

24

```

25

26

### Container Types

27

28

Types for complex data structures that contain multiple values or nested data.

29

30

```python { .api }

31

class SequenceType(DataType):

32

"""

33

Represents sequence data containing ordered collections of elements.

34

35

Parameters:

36

- element_type: DataType, type of elements in the sequence

37

"""

38

39

class DictionaryType(DataType):

40

"""

41

Represents dictionary/map data with key-value pairs.

42

43

Parameters:

44

- key_type: DataType, type of dictionary keys

45

- value_type: DataType, type of dictionary values

46

"""

47

```

48

49

### Scalar Types

50

51

Simple data types representing single values without dimensions.

52

53

```python { .api }

54

class FloatType(DataType):

55

"""32-bit floating point scalar type."""

56

57

class Int64Type(DataType):

58

"""64-bit signed integer scalar type."""

59

60

class StringType(DataType):

61

"""String scalar type."""

62

```

63

64

### Tensor Types

65

66

Multi-dimensional array types supporting various numeric and string data representations.

67

68

#### Integer Tensor Types

69

70

```python { .api }

71

class Int8TensorType(TensorType):

72

"""

73

8-bit signed integer tensor type.

74

75

Parameters:

76

- shape: list, tensor dimensions (None for dynamic dimensions)

77

"""

78

79

class Int16TensorType(TensorType):

80

"""

81

16-bit signed integer tensor type.

82

83

Parameters:

84

- shape: list, tensor dimensions (None for dynamic dimensions)

85

"""

86

87

class Int32TensorType(TensorType):

88

"""

89

32-bit signed integer tensor type.

90

91

Parameters:

92

- shape: list, tensor dimensions (None for dynamic dimensions)

93

"""

94

95

class Int64TensorType(TensorType):

96

"""

97

64-bit signed integer tensor type.

98

99

Parameters:

100

- shape: list, tensor dimensions (None for dynamic dimensions)

101

"""

102

103

class UInt8TensorType(TensorType):

104

"""

105

8-bit unsigned integer tensor type.

106

107

Parameters:

108

- shape: list, tensor dimensions (None for dynamic dimensions)

109

"""

110

111

class UInt16TensorType(TensorType):

112

"""

113

16-bit unsigned integer tensor type.

114

115

Parameters:

116

- shape: list, tensor dimensions (None for dynamic dimensions)

117

"""

118

119

class UInt32TensorType(TensorType):

120

"""

121

32-bit unsigned integer tensor type.

122

123

Parameters:

124

- shape: list, tensor dimensions (None for dynamic dimensions)

125

"""

126

127

class UInt64TensorType(TensorType):

128

"""

129

64-bit unsigned integer tensor type.

130

131

Parameters:

132

- shape: list, tensor dimensions (None for dynamic dimensions)

133

"""

134

```

135

136

#### Floating Point Tensor Types

137

138

```python { .api }

139

class Float16TensorType(TensorType):

140

"""

141

16-bit floating point tensor type (half precision).

142

143

Parameters:

144

- shape: list, tensor dimensions (None for dynamic dimensions)

145

"""

146

147

class FloatTensorType(TensorType):

148

"""

149

32-bit floating point tensor type (single precision).

150

151

Parameters:

152

- shape: list, tensor dimensions (None for dynamic dimensions)

153

"""

154

155

class DoubleTensorType(TensorType):

156

"""

157

64-bit floating point tensor type (double precision).

158

159

Parameters:

160

- shape: list, tensor dimensions (None for dynamic dimensions)

161

"""

162

```

163

164

#### Other Tensor Types

165

166

```python { .api }

167

class BooleanTensorType(TensorType):

168

"""

169

Boolean tensor type.

170

171

Parameters:

172

- shape: list, tensor dimensions (None for dynamic dimensions)

173

"""

174

175

class StringTensorType(TensorType):

176

"""

177

String tensor type.

178

179

Parameters:

180

- shape: list, tensor dimensions (None for dynamic dimensions)

181

"""

182

183

class Complex64TensorType(TensorType):

184

"""

185

64-bit complex number tensor type.

186

187

Parameters:

188

- shape: list, tensor dimensions (None for dynamic dimensions)

189

"""

190

191

class Complex128TensorType(TensorType):

192

"""

193

128-bit complex number tensor type.

194

195

Parameters:

196

- shape: list, tensor dimensions (None for dynamic dimensions)

197

"""

198

```

199

200

### Type Inference Functions

201

202

Automatic type detection and conversion utilities that analyze Python/NumPy objects to determine appropriate ONNX types.

203

204

```python { .api }

205

def guess_data_type(data_type):

206

"""

207

Infer ONNX data type from Python/NumPy type.

208

209

Parameters:

210

- data_type: Python type, NumPy dtype, or data sample

211

212

Returns:

213

- DataType: Appropriate ONNX data type

214

"""

215

216

def guess_numpy_type(data_type):

217

"""

218

Convert data type to NumPy equivalent.

219

220

Parameters:

221

- data_type: DataType instance

222

223

Returns:

224

- numpy.dtype: Equivalent NumPy data type

225

"""

226

227

def guess_proto_type(data_type):

228

"""

229

Convert data type to ONNX protobuf type.

230

231

Parameters:

232

- data_type: DataType instance

233

234

Returns:

235

- int: ONNX protobuf type identifier

236

"""

237

238

def guess_tensor_type(data_type):

239

"""

240

Convert scalar type to tensor type.

241

242

Parameters:

243

- data_type: DataType instance

244

245

Returns:

246

- TensorType: Corresponding tensor type

247

"""

248

249

def copy_type(data_type):

250

"""

251

Create a copy of existing data type.

252

253

Parameters:

254

- data_type: DataType instance to copy

255

256

Returns:

257

- DataType: Copy of the input type

258

"""

259

```

260

261

### Automatic Type Inference from Data

262

263

```python { .api }

264

def guess_initial_types(X, initial_types=None):

265

"""

266

Automatically infer initial types from input data.

267

268

Parameters:

269

- X: array-like, input data sample

270

- initial_types: list, existing type specifications (optional)

271

272

Returns:

273

- list: List of (name, type) tuples for model inputs

274

"""

275

```

276

277

## Usage Examples

278

279

### Basic Type Creation

280

281

```python

282

from skl2onnx.common.data_types import (

283

FloatTensorType, Int64TensorType, StringTensorType, BooleanTensorType

284

)

285

286

# Create tensor types with explicit shapes

287

float_input = FloatTensorType([None, 10]) # Variable batch size, 10 features

288

int_labels = Int64TensorType([None]) # Variable length label vector

289

string_features = StringTensorType([None, 5]) # Variable batch, 5 string features

290

bool_mask = BooleanTensorType([None, 10]) # Boolean mask tensor

291

```

292

293

### Dynamic and Fixed Shapes

294

295

```python

296

# Dynamic shapes (None for variable dimensions)

297

dynamic_input = FloatTensorType([None, None]) # Fully dynamic 2D tensor

298

batch_dynamic = FloatTensorType([None, 100]) # Variable batch, fixed features

299

300

# Fixed shapes

301

fixed_input = FloatTensorType([32, 64]) # Fixed 32x64 tensor

302

image_input = FloatTensorType([1, 3, 224, 224]) # Single RGB image

303

```

304

305

### Automatic Type Inference

306

307

```python

308

import numpy as np

309

from skl2onnx.common.data_types import guess_data_type, guess_initial_types

310

311

# Infer type from NumPy array

312

X = np.random.randn(100, 20).astype(np.float32)

313

inferred_type = guess_data_type(X.dtype)

314

print(inferred_type) # FloatTensorType

315

316

# Automatically create initial types from data

317

initial_types = guess_initial_types(X)

318

print(initial_types) # [('X', FloatTensorType([None, 20]))]

319

```

320

321

### Type Conversion and Validation

322

323

```python

324

from skl2onnx.common.data_types import (

325

guess_numpy_type, guess_proto_type, copy_type

326

)

327

328

# Create a tensor type

329

tensor_type = FloatTensorType([None, 10])

330

331

# Convert to NumPy equivalent

332

numpy_dtype = guess_numpy_type(tensor_type)

333

print(numpy_dtype) # float32

334

335

# Get ONNX protobuf type

336

proto_type = guess_proto_type(tensor_type)

337

print(proto_type) # ONNX TensorProto type ID

338

339

# Create a copy

340

type_copy = copy_type(tensor_type)

341

```

342

343

### Complex Data Types

344

345

```python

346

from skl2onnx.common.data_types import SequenceType, DictionaryType

347

348

# Sequence of float tensors

349

sequence_type = SequenceType(FloatTensorType([None, 5]))

350

351

# Dictionary with string keys and float values

352

dict_type = DictionaryType(StringType(), FloatTensorType([None]))

353

```

354

355

### Multi-Input Type Specifications

356

357

```python

358

# Multiple inputs with different types

359

initial_types = [

360

('numerical_features', FloatTensorType([None, 20])),

361

('categorical_features', Int64TensorType([None, 5])),

362

('text_features', StringTensorType([None, 1]))

363

]

364

```

365

366

### Precision Control

367

368

```python

369

# Different precision levels

370

half_precision = Float16TensorType([None, 10]) # Memory efficient

371

single_precision = FloatTensorType([None, 10]) # Standard precision

372

double_precision = DoubleTensorType([None, 10]) # High precision

373

374

# Integer precision levels

375

small_ints = Int8TensorType([None]) # -128 to 127

376

large_ints = Int64TensorType([None]) # Full 64-bit range

377

```

378

379

## Type System Guidelines

380

381

### Shape Specification

382

- Use `None` for variable/dynamic dimensions

383

- Specify exact values for fixed dimensions

384

- Consider batch dimension variability (typically first dimension is `None`)

385

386

### Data Type Selection

387

- **FloatTensorType**: Most common for numerical features and model outputs

388

- **Int64TensorType**: Integer labels, indices, categorical data

389

- **StringTensorType**: Text data, categorical strings

390

- **BooleanTensorType**: Binary masks, boolean features

391

392

### Performance Considerations

393

- **Float32** (FloatTensorType): Best balance of precision and performance

394

- **Float16**: Memory efficient but reduced precision

395

- **Float64**: High precision but increased memory usage

396

- Use appropriate integer types based on value ranges to optimize memory

397

398

### Compatibility Notes

399

- ONNX runtime support varies by data type and operator

400

- Some operators may require specific input types

401

- Consider target deployment environment limitations