or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced.mdclustering.mddaal4py-mb.mddecomposition.mdensemble.mdindex.mdlinear-models.mdmetrics-model-selection.mdneighbors.mdpatching-config.mdstats-manifold.mdsvm.md
tile.json

tessl/pypi-scikit-learn-intelex

Intel Extension for Scikit-learn providing hardware-accelerated implementations of scikit-learn algorithms optimized for Intel CPUs and GPUs.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/scikit-learn-intelex@2024.7.x

To install, run

npx @tessl/cli install tessl/pypi-scikit-learn-intelex@2024.7.0

index.mddocs/

Scikit-learn Intel Extension

Intel's Extension for Scikit-learn provides hardware-accelerated implementations of scikit-learn algorithms optimized for Intel CPUs and GPUs. It offers seamless drop-in replacements for existing scikit-learn applications, delivering 10-100x performance improvements through Intel hardware optimization, vector instructions, and AI-specific memory optimizations without requiring code modifications.

Package Information

  • Package Name: scikit-learn-intelex
  • Language: Python
  • Installation: pip install scikit-learn-intelex
  • License: Apache 2.0

Core Imports

import sklearnex

For enabling optimizations globally:

from sklearnex import patch_sklearn
patch_sklearn()

Direct imports of optimized algorithms:

from sklearnex.ensemble import RandomForestClassifier
from sklearnex.linear_model import LinearRegression
from sklearnex.cluster import KMeans

Basic Usage

import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()

# After patching, all sklearn imports use Intel optimizations
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Use accelerated Random Forest (same API as sklearn)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)
predictions = rf.predict(X)

print(f"Accuracy: {rf.score(X, y):.3f}")

Alternative approach using direct imports:

import numpy as np
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Directly use Intel-optimized implementation
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)
predictions = rf.predict(X)

Architecture

The package provides three main integration patterns:

  • Global Patching: Replace sklearn implementations system-wide using patch_sklearn()
  • Direct Imports: Import specific Intel-optimized algorithms directly from sklearnex modules
  • Distributed Computing: Use SPMD (Single Program Multiple Data) variants for multi-node execution

All implementations maintain full API compatibility with scikit-learn while providing significant performance improvements through Intel hardware acceleration.

Capabilities

Patching and Configuration

Core functions for enabling Intel optimizations globally and managing configuration settings. These functions control how scikit-learn algorithms are accelerated.

def patch_sklearn(): ...
def unpatch_sklearn(): ...
def sklearn_is_patched() -> bool: ...
def get_patch_map() -> dict: ...
def get_patch_names() -> list: ...
def is_patched_instance(estimator) -> bool: ...
def set_config(**params): ...
def get_config() -> dict: ...
def get_hyperparameters() -> dict: ...

Patching and Configuration

Clustering Algorithms

High-performance implementations of clustering algorithms including K-means and DBSCAN with Intel hardware acceleration.

class KMeans:
    def __init__(self, n_clusters=8, **kwargs): ...
    def fit(self, X, y=None): ...
    def predict(self, X): ...

class DBSCAN:
    def __init__(self, eps=0.5, min_samples=5, **kwargs): ...
    def fit(self, X, y=None): ...
    def fit_predict(self, X, y=None): ...

Clustering

Linear Models

Accelerated linear regression, logistic regression, and regularized models with Intel optimization for large datasets.

class LinearRegression:
    def __init__(self, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...

class LogisticRegression:
    def __init__(self, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class Ridge:
    def __init__(self, alpha=1.0, **kwargs): ...

class Lasso:
    def __init__(self, alpha=1.0, **kwargs): ...

class ElasticNet:
    def __init__(self, alpha=1.0, l1_ratio=0.5, **kwargs): ...

class IncrementalLinearRegression:
    def __init__(self, **kwargs): ...
    def partial_fit(self, X, y): ...

Linear Models

Ensemble Methods

Intel-accelerated ensemble algorithms including Random Forest and Extra Trees for both classification and regression.

class RandomForestClassifier:
    def __init__(self, n_estimators=100, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class RandomForestRegressor:
    def __init__(self, n_estimators=100, **kwargs): ...

class ExtraTreesClassifier:
    def __init__(self, n_estimators=100, **kwargs): ...

class ExtraTreesRegressor:
    def __init__(self, n_estimators=100, **kwargs): ...

Ensemble Methods

Dimensionality Reduction

Principal Component Analysis with Intel acceleration for efficient dimensionality reduction on large datasets.

class PCA:
    def __init__(self, n_components=None, **kwargs): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def fit_transform(self, X, y=None): ...

Decomposition

Nearest Neighbors

Accelerated k-nearest neighbors algorithms for classification, regression, and unsupervised learning with optimized distance computations.

class KNeighborsClassifier:
    def __init__(self, n_neighbors=5, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class KNeighborsRegressor:
    def __init__(self, n_neighbors=5, **kwargs): ...

class NearestNeighbors:
    def __init__(self, n_neighbors=5, **kwargs): ...
    def fit(self, X, y=None): ...
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True): ...

class LocalOutlierFactor:
    def __init__(self, n_neighbors=20, **kwargs): ...
    def fit_predict(self, X): ...

Nearest Neighbors

Support Vector Machines

Intel-optimized Support Vector Machine implementations for classification and regression with accelerated kernel computations.

class SVC:
    def __init__(self, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...

class SVR:
    def __init__(self, **kwargs): ...

class NuSVC:
    def __init__(self, **kwargs): ...

class NuSVR:
    def __init__(self, **kwargs): ...

Support Vector Machines

Metrics and Model Selection

Performance metrics and data splitting utilities with Intel acceleration for large-scale evaluation.

def roc_auc_score(y_true, y_score, **kwargs): ...
def pairwise_distances(X, Y=None, metric='euclidean', **kwargs): ...
def train_test_split(*arrays, **options): ...

Metrics and Model Selection

Basic Statistics and Manifold Learning

Statistical computations and manifold learning algorithms with Intel optimization.

class BasicStatistics:
    def __init__(self, **kwargs): ...
    def fit(self, X, y=None): ...

class IncrementalBasicStatistics:
    def __init__(self, **kwargs): ...
    def partial_fit(self, X, y=None): ...

class IncrementalEmpiricalCovariance:
    def __init__(self, **kwargs): ...
    def fit(self, X, y=None): ...
    def partial_fit(self, X, y=None): ...

class TSNE:
    def __init__(self, n_components=2, **kwargs): ...
    def fit_transform(self, X, y=None): ...

Statistics and Manifold Learning

Model Builder API

Convert external gradient boosting models (XGBoost, LightGBM, CatBoost) to Intel oneDAL format for accelerated inference.

from daal4py.mb import GBTDAALBaseModel, convert_model

def convert_model(model): ...

class GBTDAALBaseModel:
    def __init__(self): ...

Model Builder API

Advanced Features

Preview and SPMD (distributed) capabilities for cutting-edge algorithms and multi-node execution.

# Preview features (requires SKLEARNEX_PREVIEW environment variable)
from sklearnex.preview.covariance import EmpiricalCovariance
from sklearnex.preview.decomposition import IncrementalPCA

# SPMD distributed computing
from sklearnex.spmd.cluster import KMeans as SPMDKMeans
from sklearnex.spmd.linear_model import LinearRegression as SPMDLinearRegression

# Utility functions
from sklearnex.utils import get_namespace, _assert_all_finite

Advanced Features

Environment Variables

  • OFF_ONEDAL_IFACE: Set to "1" to disable oneDAL interface
  • SKLEARNEX_PREVIEW: Enable preview features
  • DALROOT: Path to Intel oneDAL installation

Performance Notes

  • Expect 10-100x speedups on Intel hardware
  • Optimizations work best with larger datasets (>1000 samples)
  • All optimized algorithms maintain identical APIs to scikit-learn
  • Can be used as drop-in replacements in existing code