or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/scikit-learn@1.7.x
tile.json

tessl/pypi-scikit-learn

tessl install tessl/pypi-scikit-learn@1.7.0

A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.

Agent Success

Agent success rate when using this tile

87%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.99x

Baseline

Agent success rate without this tile

88%

task.mdevals/scenario-2/

Probabilistic Dimensionality Reduction and Clustering

Design a small workflow that projects high-dimensional numeric samples into a lower-dimensional space, clusters them with soft assignments, and produces a reproducible 2D layout for visualization.

Capabilities

Fit workflow

  • Accepts a 2D numeric array with at least two features, standardizes features, then reduces dimensionality with a variance-preserving linear method to the minimal number of components that reach at least 90% explained variance. Fits a probabilistic clustering model that supports soft assignments, selecting the cluster count from the provided positive candidates using the lowest information criterion. Uses the supplied random_state for any stochastic steps, exposes the chosen cluster count after fitting, and returns self. @test

Predict new samples

  • Using the fitted scaler, reducer, and clusterer, transforms new samples and returns hard labels plus the maximum assignment probability per sample; rejects prediction attempts before fitting. Shapes must be (n_samples,) for labels and probabilities. @test

2D embedding

  • Produces a deterministic 2D manifold embedding of the reduced training data that preserves neighborhood structure. Repeated calls with the same random seed yield identical embeddings; raises an error if invoked before fitting. Output shape is (n_samples, 2). @test

Invalid training data handling

  • Fitting raises a ValueError when the smallest requested cluster count exceeds the number of samples or when non-finite values are present in the training data. @test

Implementation

@generates

API

import numpy as np
from typing import Sequence, Tuple

class ClusterWorkflow:
    def __init__(self, cluster_counts: Sequence[int], random_state: int | None = None): ...
    @property
    def selected_cluster_count(self) -> int: ...
    def fit(self, samples: np.ndarray) -> "ClusterWorkflow": ...
    def predict(self, samples: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: ...
    def embedding_2d(self) -> np.ndarray: ...

Dependencies { .dependencies }

scikit-learn { .dependency }

Provides preprocessing, dimensionality reduction, mixture modelling, and manifold embedding utilities.