tessl/pypi-scikit-learn

A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.

0.98x

Overview

Eval results

Files

Tabular Preprocessing Helper

Name: tessl/pypi-scikit-learn
Rating: 0.87 (1 reviews)
Author: tessl

Build a reusable preprocessing helper for mixed-type tabular data. Given lists of numeric and categorical columns and a pandas-like table, it should fit once and produce a dense numeric feature matrix for both training and future inference. Processing steps (in order): numeric columns are imputed with their column medians, scaled to zero mean and unit variance, expanded with degree-2 polynomial terms (squares plus pairwise interaction, no bias term), then concatenated with categorical columns that are imputed with their most frequent values and one-hot encoded. Column order is deterministic: scaled numerics first, then the polynomial expansions, then encoded categorical columns sorted by feature name and category label. Unseen categorical values at transform time must not raise errors and should yield all zeros for that feature’s encoded slice.

Capabilities

Fit-time preprocessing

Fitting on a table that includes missing numeric entries and missing categorical entries produces a dense array where numeric columns are median-imputed then scaled, categorical columns are imputed with the most frequent value per column then one-hot encoded, and feature_names reflects the expected column ordering. @test

Polynomial numeric expansion

After fitting on the training table, the transformed output includes degree-2 polynomial terms for numeric features (each squared term plus one interaction term) appended after the scaled numeric columns; for the row rooms=2, sqft=800 the squared and interaction values correspond to those computed from the scaled numeric values. @test

Transforming new rows

Transforming new rows uses the fitted state: unseen categorical values do not raise errors and yield all-zero encodings for that feature, numeric values are scaled with training means and variances, and output column order matches feature_names. @test

Implementation

@generates

API

from typing import Sequence, Any
import numpy as np

class TabularPreprocessor:
    def __init__(self, numeric_features: Sequence[str], categorical_features: Sequence[str]): ...

    def fit(self, data: Any) -> "TabularPreprocessor": ...

    def transform(self, data: Any) -> np.ndarray: ...

    def fit_transform(self, data: Any) -> np.ndarray: ...

    @property
    def feature_names(self) -> list[str]: ...

Dependencies { .dependencies }

scikit-learn { .dependency }

Provides preprocessing transformers for imputation, scaling, encoding, and feature expansion.

Install with Tessl CLI

npx tessl i tessl/pypi-scikit-learn