A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.
87
Build a reusable preprocessing helper for mixed-type tabular data. Given lists of numeric and categorical columns and a pandas-like table, it should fit once and produce a dense numeric feature matrix for both training and future inference. Processing steps (in order): numeric columns are imputed with their column medians, scaled to zero mean and unit variance, expanded with degree-2 polynomial terms (squares plus pairwise interaction, no bias term), then concatenated with categorical columns that are imputed with their most frequent values and one-hot encoded. Column order is deterministic: scaled numerics first, then the polynomial expansions, then encoded categorical columns sorted by feature name and category label. Unseen categorical values at transform time must not raise errors and should yield all zeros for that feature’s encoded slice.
feature_names reflects the expected column ordering. @testrooms=2, sqft=800 the squared and interaction values correspond to those computed from the scaled numeric values. @testfeature_names. @test@generates
from typing import Sequence, Any
import numpy as np
class TabularPreprocessor:
def __init__(self, numeric_features: Sequence[str], categorical_features: Sequence[str]): ...
def fit(self, data: Any) -> "TabularPreprocessor": ...
def transform(self, data: Any) -> np.ndarray: ...
def fit_transform(self, data: Any) -> np.ndarray: ...
@property
def feature_names(self) -> list[str]: ...Provides preprocessing transformers for imputation, scaling, encoding, and feature expansion.
Install with Tessl CLI
npx tessl i tessl/pypi-scikit-learndocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10