tessl/pypi-scikit-learn

A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.

0.98x

Overview

Eval results

Files

Probabilistic Dimensionality Reduction and Clustering

Name: tessl/pypi-scikit-learn
Rating: 0.87 (1 reviews)
Author: tessl

Design a small workflow that projects high-dimensional numeric samples into a lower-dimensional space, clusters them with soft assignments, and produces a reproducible 2D layout for visualization.

Capabilities

Fit workflow

Accepts a 2D numeric array with at least two features, standardizes features, then reduces dimensionality with a variance-preserving linear method to the minimal number of components that reach at least 90% explained variance. Fits a probabilistic clustering model that supports soft assignments, selecting the cluster count from the provided positive candidates using the lowest information criterion. Uses the supplied random_state for any stochastic steps, exposes the chosen cluster count after fitting, and returns self. @test

Predict new samples

Using the fitted scaler, reducer, and clusterer, transforms new samples and returns hard labels plus the maximum assignment probability per sample; rejects prediction attempts before fitting. Shapes must be (n_samples,) for labels and probabilities. @test

2D embedding

Produces a deterministic 2D manifold embedding of the reduced training data that preserves neighborhood structure. Repeated calls with the same random seed yield identical embeddings; raises an error if invoked before fitting. Output shape is (n_samples, 2). @test

Invalid training data handling

Fitting raises a ValueError when the smallest requested cluster count exceeds the number of samples or when non-finite values are present in the training data. @test

Implementation

@generates

API

import numpy as np
from typing import Sequence, Tuple

class ClusterWorkflow:
    def __init__(self, cluster_counts: Sequence[int], random_state: int | None = None): ...
    @property
    def selected_cluster_count(self) -> int: ...
    def fit(self, samples: np.ndarray) -> "ClusterWorkflow": ...
    def predict(self, samples: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: ...
    def embedding_2d(self) -> np.ndarray: ...

Dependencies { .dependencies }

scikit-learn { .dependency }

Provides preprocessing, dimensionality reduction, mixture modelling, and manifold embedding utilities.

Install with Tessl CLI

npx tessl i tessl/pypi-scikit-learn