or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

datasets.mdfeature-extraction.mdindex.mdmetrics.mdmodel-selection.mdneighbors.mdpipelines.mdpreprocessing.mdsupervised-learning.mdunsupervised-learning.mdutilities.md

index.mddocs/

0

# scikit-learn

1

2

scikit-learn is a comprehensive machine learning library for Python that provides simple and efficient tools for predictive data analysis. It features various classification, regression, and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

3

4

## Package Information

5

6

**Name**: scikit-learn

7

**Language**: Python

8

**Installation**: `pip install scikit-learn`

9

**Version**: 1.7.1

10

11

## Core Imports

12

13

```python

14

import sklearn

15

from sklearn import datasets

16

from sklearn.model_selection import train_test_split

17

from sklearn.preprocessing import StandardScaler

18

from sklearn.linear_model import LogisticRegression

19

from sklearn.ensemble import RandomForestClassifier

20

from sklearn.cluster import KMeans

21

from sklearn.metrics import accuracy_score, classification_report

22

```

23

24

## Basic Usage

25

26

Here's a simple example demonstrating scikit-learn's consistent API for machine learning:

27

28

```python

29

from sklearn.datasets import load_iris

30

from sklearn.model_selection import train_test_split

31

from sklearn.ensemble import RandomForestClassifier

32

from sklearn.metrics import accuracy_score

33

34

# Load dataset

35

iris = load_iris()

36

X, y = iris.data, iris.target

37

38

# Split data

39

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

40

41

# Train model

42

clf = RandomForestClassifier(n_estimators=100, random_state=42)

43

clf.fit(X_train, y_train)

44

45

# Make predictions

46

y_pred = clf.predict(X_test)

47

48

# Evaluate

49

accuracy = accuracy_score(y_test, y_pred)

50

print(f"Accuracy: {accuracy:.3f}")

51

```

52

53

## Architecture

54

55

scikit-learn follows several key design principles:

56

57

### Estimator Pattern

58

All learning algorithms follow the same interface:

59

- `fit(X, y)` - Learn from training data

60

- `predict(X)` - Make predictions on new data

61

- `transform(X)` - Transform data (for transformers)

62

63

### Pipeline Architecture

64

Combine multiple processing steps:

65

66

```python

67

from sklearn.pipeline import Pipeline

68

from sklearn.preprocessing import StandardScaler

69

from sklearn.svm import SVC

70

71

pipeline = Pipeline([

72

('scaler', StandardScaler()),

73

('classifier', SVC())

74

])

75

```

76

77

### Consistent API Design

78

- **Estimators**: All learning algorithms (classifiers, regressors, clusterers)

79

- **Transformers**: Data preprocessing and feature engineering

80

- **Meta-estimators**: Combine multiple estimators (ensembles, pipelines)

81

82

## Core Capabilities

83

84

### Supervised Learning

85

```python

86

# Classification

87

from sklearn.linear_model import LogisticRegression

88

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

89

from sklearn.svm import SVC

90

from sklearn.naive_bayes import GaussianNB

91

92

# Regression

93

from sklearn.linear_model import LinearRegression, Ridge, Lasso

94

from sklearn.ensemble import RandomForestRegressor

95

from sklearn.svm import SVR

96

```

97

98

[Supervised Learning](./supervised-learning.md)

99

100

### Unsupervised Learning

101

```python

102

# Clustering

103

from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering

104

from sklearn.mixture import GaussianMixture

105

106

# Dimensionality Reduction

107

from sklearn.decomposition import PCA, FastICA, NMF

108

from sklearn.manifold import TSNE, Isomap

109

```

110

111

[Unsupervised Learning](./unsupervised-learning.md)

112

113

### Data Preprocessing

114

```python

115

# Scaling and Normalization

116

from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

117

118

# Encoding

119

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

120

121

# Feature Engineering

122

from sklearn.preprocessing import PolynomialFeatures

123

from sklearn.feature_selection import SelectKBest, RFE

124

```

125

126

[Data Preprocessing and Feature Engineering](./preprocessing.md)

127

128

### Model Selection and Evaluation

129

```python

130

# Cross-Validation

131

from sklearn.model_selection import cross_val_score, GridSearchCV, RandomizedSearchCV

132

from sklearn.model_selection import KFold, StratifiedKFold, train_test_split

133

134

# Metrics

135

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

136

from sklearn.metrics import mean_squared_error, r2_score, roc_auc_score

137

```

138

139

[Model Selection and Evaluation](./model-selection.md)

140

141

### Built-in Datasets

142

```python

143

# Load toy datasets

144

from sklearn.datasets import load_iris, load_diabetes, load_wine, load_breast_cancer

145

146

# Generate synthetic data

147

from sklearn.datasets import make_classification, make_regression, make_blobs

148

149

# Fetch real-world datasets

150

from sklearn.datasets import fetch_20newsgroups, fetch_california_housing

151

```

152

153

[Datasets and Data Generation](./datasets.md)

154

155

### Performance Metrics and Visualization

156

```python

157

# Classification metrics

158

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

159

from sklearn.metrics import ConfusionMatrixDisplay, RocCurveDisplay

160

161

# Regression metrics

162

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

163

from sklearn.metrics import PredictionErrorDisplay

164

```

165

166

[Metrics and Visualization](./metrics.md)

167

168

### Feature Extraction and Text Processing

169

```python

170

# Text vectorization

171

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

172

from sklearn.feature_extraction.text import HashingVectorizer, TfidfTransformer

173

174

# Dictionary and hashing

175

from sklearn.feature_extraction import DictVectorizer, FeatureHasher

176

177

# Image processing

178

from sklearn.feature_extraction.image import img_to_graph, grid_to_graph

179

```

180

181

[Feature Extraction](./feature-extraction.md)

182

183

### Pipelines and Workflow Composition

184

```python

185

# Pipeline construction

186

from sklearn.pipeline import Pipeline, make_pipeline, FeatureUnion

187

188

# Column-wise transformations

189

from sklearn.compose import ColumnTransformer, make_column_transformer

190

from sklearn.compose import TransformedTargetRegressor

191

```

192

193

[Pipelines and Composition](./pipelines.md)

194

195

### Nearest Neighbors Algorithms

196

```python

197

# Classification and regression

198

from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

199

from sklearn.neighbors import RadiusNeighborsClassifier, RadiusNeighborsRegressor

200

201

# Outlier detection and density estimation

202

from sklearn.neighbors import LocalOutlierFactor, KernelDensity

203

from sklearn.neighbors import NearestNeighbors, NearestCentroid

204

```

205

206

[Nearest Neighbors](./neighbors.md)

207

208

### Utilities and Configuration

209

```python

210

# Core utilities

211

from sklearn.base import clone

212

from sklearn import get_config, set_config, config_context

213

214

# Version and system information

215

import sklearn

216

sklearn.__version__, sklearn.show_versions()

217

```

218

219

[Utilities and Core Functions](./utilities.md)

220

221

## Version Information

222

223

```python

224

import sklearn

225

print(sklearn.__version__) # "1.7.1"

226

227

# Get system information

228

sklearn.show_versions()

229

```

230

231

## Key Features

232

233

- **Consistent API**: All algorithms follow the same interface patterns

234

- **Comprehensive**: 300+ classes and 150+ functions covering all ML tasks

235

- **Well-tested**: Extensive test suite ensuring reliability

236

- **Documentation**: Comprehensive user guide and API reference

237

- **Community**: Large, active community with regular releases

238

- **Integration**: Works seamlessly with NumPy, SciPy, pandas, and matplotlib

239

- **Performance**: Optimized implementations with optional parallelization

240

241

scikit-learn provides everything needed for machine learning workflows, from data preprocessing to model evaluation, making it the go-to library for machine learning in Python.