tessl/pypi-skl2onnx

Convert scikit-learn models to ONNX format for cross-platform inference and deployment

—

Pending

Overview

Eval results

Files

Data Types and Type System

Name: tessl/pypi-skl2onnx
Author: tessl

Comprehensive type system for ONNX conversion that maps Python/NumPy data types to ONNX types with automatic inference and shape validation. The type system ensures accurate data representation and compatibility between scikit-learn models and ONNX runtime environments.

Capabilities

Base Type Classes

Foundation classes for the type system hierarchy that provide common functionality and structure for all data types.

class DataType:
    """
    Base class for all data types in the conversion system.
    
    Provides common interface for type operations and validation.
    """

class TensorType(DataType):
    """
    Base class for tensor data types.
    
    Represents multi-dimensional arrays with shape and element type information.
    """

Container Types

Types for complex data structures that contain multiple values or nested data.

class SequenceType(DataType):
    """
    Represents sequence data containing ordered collections of elements.
    
    Parameters:
    - element_type: DataType, type of elements in the sequence
    """

class DictionaryType(DataType):
    """
    Represents dictionary/map data with key-value pairs.
    
    Parameters:
    - key_type: DataType, type of dictionary keys
    - value_type: DataType, type of dictionary values
    """

Scalar Types

Simple data types representing single values without dimensions.

class FloatType(DataType):
    """32-bit floating point scalar type."""

class Int64Type(DataType):
    """64-bit signed integer scalar type."""

class StringType(DataType):
    """String scalar type."""

Tensor Types

Multi-dimensional array types supporting various numeric and string data representations.

Integer Tensor Types

class Int8TensorType(TensorType):
    """
    8-bit signed integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class Int16TensorType(TensorType):
    """
    16-bit signed integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class Int32TensorType(TensorType):
    """
    32-bit signed integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class Int64TensorType(TensorType):
    """
    64-bit signed integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class UInt8TensorType(TensorType):
    """
    8-bit unsigned integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class UInt16TensorType(TensorType):
    """
    16-bit unsigned integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class UInt32TensorType(TensorType):
    """
    32-bit unsigned integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class UInt64TensorType(TensorType):
    """
    64-bit unsigned integer tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

Floating Point Tensor Types

class Float16TensorType(TensorType):
    """
    16-bit floating point tensor type (half precision).
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class FloatTensorType(TensorType):
    """
    32-bit floating point tensor type (single precision).
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class DoubleTensorType(TensorType):
    """
    64-bit floating point tensor type (double precision).
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

Other Tensor Types

class BooleanTensorType(TensorType):
    """
    Boolean tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class StringTensorType(TensorType):
    """
    String tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class Complex64TensorType(TensorType):
    """
    64-bit complex number tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

class Complex128TensorType(TensorType):
    """
    128-bit complex number tensor type.
    
    Parameters:
    - shape: list, tensor dimensions (None for dynamic dimensions)
    """

Type Inference Functions

Automatic type detection and conversion utilities that analyze Python/NumPy objects to determine appropriate ONNX types.

def guess_data_type(data_type):
    """
    Infer ONNX data type from Python/NumPy type.
    
    Parameters:
    - data_type: Python type, NumPy dtype, or data sample
    
    Returns:
    - DataType: Appropriate ONNX data type
    """

def guess_numpy_type(data_type):
    """
    Convert data type to NumPy equivalent.
    
    Parameters:
    - data_type: DataType instance
    
    Returns:
    - numpy.dtype: Equivalent NumPy data type
    """

def guess_proto_type(data_type):
    """
    Convert data type to ONNX protobuf type.
    
    Parameters:
    - data_type: DataType instance
    
    Returns:
    - int: ONNX protobuf type identifier
    """

def guess_tensor_type(data_type):
    """
    Convert scalar type to tensor type.
    
    Parameters:
    - data_type: DataType instance
    
    Returns:
    - TensorType: Corresponding tensor type
    """

def copy_type(data_type):
    """
    Create a copy of existing data type.
    
    Parameters:
    - data_type: DataType instance to copy
    
    Returns:
    - DataType: Copy of the input type
    """

Automatic Type Inference from Data

def guess_initial_types(X, initial_types=None):
    """
    Automatically infer initial types from input data.
    
    Parameters:
    - X: array-like, input data sample
    - initial_types: list, existing type specifications (optional)
    
    Returns:
    - list: List of (name, type) tuples for model inputs
    """

Usage Examples

Basic Type Creation

from skl2onnx.common.data_types import (
    FloatTensorType, Int64TensorType, StringTensorType, BooleanTensorType
)

# Create tensor types with explicit shapes
float_input = FloatTensorType([None, 10])  # Variable batch size, 10 features
int_labels = Int64TensorType([None])       # Variable length label vector
string_features = StringTensorType([None, 5])  # Variable batch, 5 string features
bool_mask = BooleanTensorType([None, 10])  # Boolean mask tensor

Dynamic and Fixed Shapes

# Dynamic shapes (None for variable dimensions)
dynamic_input = FloatTensorType([None, None])  # Fully dynamic 2D tensor
batch_dynamic = FloatTensorType([None, 100])   # Variable batch, fixed features

# Fixed shapes
fixed_input = FloatTensorType([32, 64])  # Fixed 32x64 tensor
image_input = FloatTensorType([1, 3, 224, 224])  # Single RGB image

Automatic Type Inference

import numpy as np
from skl2onnx.common.data_types import guess_data_type, guess_initial_types

# Infer type from NumPy array
X = np.random.randn(100, 20).astype(np.float32)
inferred_type = guess_data_type(X.dtype)
print(inferred_type)  # FloatTensorType

# Automatically create initial types from data
initial_types = guess_initial_types(X)
print(initial_types)  # [('X', FloatTensorType([None, 20]))]

Type Conversion and Validation

from skl2onnx.common.data_types import (
    guess_numpy_type, guess_proto_type, copy_type
)

# Create a tensor type
tensor_type = FloatTensorType([None, 10])

# Convert to NumPy equivalent
numpy_dtype = guess_numpy_type(tensor_type)
print(numpy_dtype)  # float32

# Get ONNX protobuf type
proto_type = guess_proto_type(tensor_type)
print(proto_type)  # ONNX TensorProto type ID

# Create a copy
type_copy = copy_type(tensor_type)

Complex Data Types

from skl2onnx.common.data_types import SequenceType, DictionaryType

# Sequence of float tensors
sequence_type = SequenceType(FloatTensorType([None, 5]))

# Dictionary with string keys and float values
dict_type = DictionaryType(StringType(), FloatTensorType([None]))

Multi-Input Type Specifications

# Multiple inputs with different types
initial_types = [
    ('numerical_features', FloatTensorType([None, 20])),
    ('categorical_features', Int64TensorType([None, 5])),
    ('text_features', StringTensorType([None, 1]))
]

Precision Control

# Different precision levels
half_precision = Float16TensorType([None, 10])    # Memory efficient
single_precision = FloatTensorType([None, 10])    # Standard precision
double_precision = DoubleTensorType([None, 10])   # High precision

# Integer precision levels
small_ints = Int8TensorType([None])     # -128 to 127
large_ints = Int64TensorType([None])    # Full 64-bit range

Type System Guidelines

Shape Specification

Use None for variable/dynamic dimensions
Specify exact values for fixed dimensions
Consider batch dimension variability (typically first dimension is None)

Data Type Selection

FloatTensorType: Most common for numerical features and model outputs
Int64TensorType: Integer labels, indices, categorical data
StringTensorType: Text data, categorical strings
BooleanTensorType: Binary masks, boolean features

Performance Considerations

Float32 (FloatTensorType): Best balance of precision and performance
Float16: Memory efficient but reduced precision
Float64: High precision but increased memory usage
Use appropriate integer types based on value ranges to optimize memory

Compatibility Notes

ONNX runtime support varies by data type and operator
Some operators may require specific input types
Consider target deployment environment limitations

Install with Tessl CLI

npx tessl i tessl/pypi-skl2onnx

docs