or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

datasets.mdfeature-extraction.mdindex.mdmetrics.mdmodel-selection.mdneighbors.mdpipelines.mdpreprocessing.mdsupervised-learning.mdunsupervised-learning.mdutilities.md
tile.json

tessl/pypi-scikit-learn

A comprehensive machine learning library providing supervised and unsupervised learning algorithms with consistent APIs and extensive tools for data preprocessing, model evaluation, and deployment.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/scikit-learn@1.7.x

To install, run

npx @tessl/cli install tessl/pypi-scikit-learn@1.7.0

index.mddocs/

scikit-learn

scikit-learn is a comprehensive machine learning library for Python that provides simple and efficient tools for predictive data analysis. It features various classification, regression, and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Package Information

Name: scikit-learn
Language: Python
Installation: pip install scikit-learn
Version: 1.7.1

Core Imports

import sklearn
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, classification_report

Basic Usage

Here's a simple example demonstrating scikit-learn's consistent API for machine learning:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")

Architecture

scikit-learn follows several key design principles:

Estimator Pattern

All learning algorithms follow the same interface:

  • fit(X, y) - Learn from training data
  • predict(X) - Make predictions on new data
  • transform(X) - Transform data (for transformers)

Pipeline Architecture

Combine multiple processing steps:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', SVC())
])

Consistent API Design

  • Estimators: All learning algorithms (classifiers, regressors, clusterers)
  • Transformers: Data preprocessing and feature engineering
  • Meta-estimators: Combine multiple estimators (ensembles, pipelines)

Core Capabilities

Supervised Learning

# Classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# Regression  
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

Supervised Learning

Unsupervised Learning

# Clustering
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.mixture import GaussianMixture

# Dimensionality Reduction
from sklearn.decomposition import PCA, FastICA, NMF
from sklearn.manifold import TSNE, Isomap

Unsupervised Learning

Data Preprocessing

# Scaling and Normalization
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

# Feature Engineering
from sklearn.preprocessing import PolynomialFeatures
from sklearn.feature_selection import SelectKBest, RFE

Data Preprocessing and Feature Engineering

Model Selection and Evaluation

# Cross-Validation
from sklearn.model_selection import cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.model_selection import KFold, StratifiedKFold, train_test_split

# Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import mean_squared_error, r2_score, roc_auc_score

Model Selection and Evaluation

Built-in Datasets

# Load toy datasets
from sklearn.datasets import load_iris, load_diabetes, load_wine, load_breast_cancer

# Generate synthetic data
from sklearn.datasets import make_classification, make_regression, make_blobs

# Fetch real-world datasets
from sklearn.datasets import fetch_20newsgroups, fetch_california_housing

Datasets and Data Generation

Performance Metrics and Visualization

# Classification metrics
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import ConfusionMatrixDisplay, RocCurveDisplay

# Regression metrics  
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.metrics import PredictionErrorDisplay

Metrics and Visualization

Feature Extraction and Text Processing

# Text vectorization
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.feature_extraction.text import HashingVectorizer, TfidfTransformer

# Dictionary and hashing
from sklearn.feature_extraction import DictVectorizer, FeatureHasher

# Image processing
from sklearn.feature_extraction.image import img_to_graph, grid_to_graph

Feature Extraction

Pipelines and Workflow Composition

# Pipeline construction
from sklearn.pipeline import Pipeline, make_pipeline, FeatureUnion

# Column-wise transformations
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.compose import TransformedTargetRegressor

Pipelines and Composition

Nearest Neighbors Algorithms

# Classification and regression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.neighbors import RadiusNeighborsClassifier, RadiusNeighborsRegressor

# Outlier detection and density estimation
from sklearn.neighbors import LocalOutlierFactor, KernelDensity
from sklearn.neighbors import NearestNeighbors, NearestCentroid

Nearest Neighbors

Utilities and Configuration

# Core utilities
from sklearn.base import clone
from sklearn import get_config, set_config, config_context

# Version and system information
import sklearn
sklearn.__version__, sklearn.show_versions()

Utilities and Core Functions

Version Information

import sklearn
print(sklearn.__version__)  # "1.7.1"

# Get system information
sklearn.show_versions()

Key Features

  • Consistent API: All algorithms follow the same interface patterns
  • Comprehensive: 300+ classes and 150+ functions covering all ML tasks
  • Well-tested: Extensive test suite ensuring reliability
  • Documentation: Comprehensive user guide and API reference
  • Community: Large, active community with regular releases
  • Integration: Works seamlessly with NumPy, SciPy, pandas, and matplotlib
  • Performance: Optimized implementations with optional parallelization

scikit-learn provides everything needed for machine learning workflows, from data preprocessing to model evaluation, making it the go-to library for machine learning in Python.