CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-scikit-learn

A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.

87

0.98x
Overview
Eval results
Files

task.mdevals/scenario-8/

Probabilistic Dimensionality Reduction and Clustering

Design a small workflow that projects high-dimensional numeric samples into a lower-dimensional space, clusters them with soft assignments, and produces a reproducible 2D layout for visualization.

Capabilities

Fit workflow

  • Accepts a 2D numeric array with at least two features, standardizes features, then reduces dimensionality with a variance-preserving linear method to the minimal number of components that reach at least 90% explained variance. Fits a probabilistic clustering model that supports soft assignments, selecting the cluster count from the provided positive candidates using the lowest information criterion. Uses the supplied random_state for any stochastic steps, exposes the chosen cluster count after fitting, and returns self. @test

Predict new samples

  • Using the fitted scaler, reducer, and clusterer, transforms new samples and returns hard labels plus the maximum assignment probability per sample; rejects prediction attempts before fitting. Shapes must be (n_samples,) for labels and probabilities. @test

2D embedding

  • Produces a deterministic 2D manifold embedding of the reduced training data that preserves neighborhood structure. Repeated calls with the same random seed yield identical embeddings; raises an error if invoked before fitting. Output shape is (n_samples, 2). @test

Invalid training data handling

  • Fitting raises a ValueError when the smallest requested cluster count exceeds the number of samples or when non-finite values are present in the training data. @test

Implementation

@generates

API

import numpy as np
from typing import Sequence, Tuple

class ClusterWorkflow:
    def __init__(self, cluster_counts: Sequence[int], random_state: int | None = None): ...
    @property
    def selected_cluster_count(self) -> int: ...
    def fit(self, samples: np.ndarray) -> "ClusterWorkflow": ...
    def predict(self, samples: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: ...
    def embedding_2d(self) -> np.ndarray: ...

Dependencies { .dependencies }

scikit-learn { .dependency }

Provides preprocessing, dimensionality reduction, mixture modelling, and manifold embedding utilities.

Install with Tessl CLI

npx tessl i tessl/pypi-scikit-learn

tile.json