or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/scikit-learn@1.7.x
tile.json

tessl/pypi-scikit-learn

tessl install tessl/pypi-scikit-learn@1.7.0

A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.

Agent Success

Agent success rate when using this tile

87%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.99x

Baseline

Agent success rate without this tile

88%

task.mdevals/scenario-8/

Tabular Preprocessing Helper

Build a reusable preprocessing helper for mixed-type tabular data. Given lists of numeric and categorical columns and a pandas-like table, it should fit once and produce a dense numeric feature matrix for both training and future inference. Processing steps (in order): numeric columns are imputed with their column medians, scaled to zero mean and unit variance, expanded with degree-2 polynomial terms (squares plus pairwise interaction, no bias term), then concatenated with categorical columns that are imputed with their most frequent values and one-hot encoded. Column order is deterministic: scaled numerics first, then the polynomial expansions, then encoded categorical columns sorted by feature name and category label. Unseen categorical values at transform time must not raise errors and should yield all zeros for that feature’s encoded slice.

Capabilities

Fit-time preprocessing

  • Fitting on a table that includes missing numeric entries and missing categorical entries produces a dense array where numeric columns are median-imputed then scaled, categorical columns are imputed with the most frequent value per column then one-hot encoded, and feature_names reflects the expected column ordering. @test

Polynomial numeric expansion

  • After fitting on the training table, the transformed output includes degree-2 polynomial terms for numeric features (each squared term plus one interaction term) appended after the scaled numeric columns; for the row rooms=2, sqft=800 the squared and interaction values correspond to those computed from the scaled numeric values. @test

Transforming new rows

  • Transforming new rows uses the fitted state: unseen categorical values do not raise errors and yield all-zero encodings for that feature, numeric values are scaled with training means and variances, and output column order matches feature_names. @test

Implementation

@generates

API

from typing import Sequence, Any
import numpy as np

class TabularPreprocessor:
    def __init__(self, numeric_features: Sequence[str], categorical_features: Sequence[str]): ...

    def fit(self, data: Any) -> "TabularPreprocessor": ...

    def transform(self, data: Any) -> np.ndarray: ...

    def fit_transform(self, data: Any) -> np.ndarray: ...

    @property
    def feature_names(self) -> list[str]: ...

Dependencies { .dependencies }

scikit-learn { .dependency }

Provides preprocessing transformers for imputation, scaling, encoding, and feature expansion.