A suite of visual analysis and diagnostic tools for machine learning.
npx @tessl/cli install tessl/pypi-yellowbrick@1.5.00
# Yellowbrick
1
2
A comprehensive machine learning visualization library that extends scikit-learn with publication-quality visualizations for machine learning model evaluation, selection, and interpretation. Yellowbrick provides visual diagnostic tools called "Visualizers" that combine scikit-learn with matplotlib to streamline the machine learning workflow from data exploration through model interpretation.
3
4
## Package Information
5
6
- **Package Name**: yellowbrick
7
- **Language**: Python
8
- **Installation**: `pip install yellowbrick`
9
- **Scikit-learn Integration**: Compatible with scikit-learn 0.20+
10
- **Dependencies**: matplotlib, scipy, scikit-learn, numpy
11
12
## Core Imports
13
14
```python
15
import yellowbrick
16
```
17
18
Direct imports from yellowbrick:
19
20
```python
21
from yellowbrick import ROCAUC, ClassBalance, ClassificationScoreVisualizer
22
from yellowbrick import anscombe, datasaurus
23
from yellowbrick import set_aesthetic, set_style, set_palette, color_palette
24
```
25
26
Common pattern for visualizers:
27
28
```python
29
from yellowbrick.classifier import ROCAUC, ConfusionMatrix
30
from yellowbrick.regressor import ResidualsPlot
31
from yellowbrick.cluster import KElbow
32
```
33
34
Functional API imports:
35
36
```python
37
from yellowbrick.classifier import roc_auc, confusion_matrix
38
from yellowbrick.regressor import residuals_plot
39
```
40
41
## Basic Usage
42
43
```python
44
from yellowbrick.classifier import ROCAUC
45
from sklearn.model_selection import train_test_split
46
from sklearn.linear_model import LogisticRegression
47
from sklearn.datasets import make_classification
48
49
# Generate sample data
50
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2)
51
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
52
53
# Create and fit model
54
model = LogisticRegression()
55
56
# Visualize ROC/AUC curves
57
visualizer = ROCAUC(model, classes=['Class 0', 'Class 1'])
58
visualizer.fit(X_train, y_train)
59
visualizer.score(X_test, y_test)
60
visualizer.show()
61
62
# Using functional API
63
from yellowbrick.classifier import roc_auc
64
roc_auc(model, X_train, y_train, X_test, y_test, classes=['Class 0', 'Class 1'])
65
```
66
67
## Architecture
68
69
Yellowbrick follows the scikit-learn API design with Visualizers that inherit from `sklearn.base.BaseEstimator`:
70
71
- **Base Classes**: `Visualizer`, `ModelVisualizer`, `ScoreVisualizer` provide the foundation
72
- **Visualizer Pattern**: All visualizers implement `fit()`, `score()`, and `show()` methods
73
- **Pipeline Integration**: Visualizers can be used in scikit-learn pipelines
74
- **Dual API**: Both class-based and functional APIs for flexibility
75
- **Matplotlib Integration**: Built on matplotlib with consistent styling and themes
76
77
## Capabilities
78
79
### Classification Analysis
80
81
Comprehensive visualizers for evaluating classification models including ROC curves, confusion matrices, classification reports, class prediction errors, precision-recall curves, and discrimination thresholds.
82
83
```python { .api }
84
class ROCAUC(ClassificationScoreVisualizer):
85
def __init__(self, estimator, ax=None, micro=True, macro=True, per_class=True, binary=False, classes=None, encoder=None, is_fitted="auto", force_model=False, **kwargs): ...
86
def fit(self, X, y, **kwargs): ...
87
def score(self, X, y, **kwargs): ...
88
89
class ConfusionMatrix(ClassificationScoreVisualizer):
90
def __init__(self, estimator, ax=None, sample_weight=None, percent=False, classes=None, encoder=None, cmap="YlOrRd", fontsize=None, is_fitted="auto", force_model=False, **kwargs): ...
91
def fit(self, X, y, **kwargs): ...
92
def score(self, X, y, **kwargs): ...
93
94
class ClassificationReport(ClassificationScoreVisualizer):
95
def __init__(self, estimator, classes=None, **kwargs): ...
96
def fit(self, X, y, **kwargs): ...
97
def score(self, X, y, **kwargs): ...
98
99
# Functional APIs
100
def roc_auc(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...
101
def confusion_matrix(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...
102
def classification_report(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...
103
```
104
105
[Classification Analysis](./classification.md)
106
107
### Regression Analysis
108
109
Diagnostic visualizers for regression models including residuals plots, prediction error plots, alpha selection for regularized models, and Cook's distance for influence analysis.
110
111
```python { .api }
112
class ResidualsPlot(RegressionScoreVisualizer):
113
def __init__(self, estimator, **kwargs): ...
114
def fit(self, X, y, **kwargs): ...
115
def score(self, X, y, **kwargs): ...
116
117
class PredictionError(RegressionScoreVisualizer):
118
def __init__(self, estimator, **kwargs): ...
119
def fit(self, X, y, **kwargs): ...
120
def score(self, X, y, **kwargs): ...
121
122
class AlphaSelection(RegressionScoreVisualizer):
123
def __init__(self, estimator, **kwargs): ...
124
def fit(self, X, y, **kwargs): ...
125
def score(self, X, y, **kwargs): ...
126
127
# Functional APIs
128
def residuals_plot(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...
129
def prediction_error(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...
130
```
131
132
[Regression Analysis](./regression.md)
133
134
### Clustering Analysis
135
136
Visualizers for clustering evaluation including elbow method for optimal K selection, silhouette analysis, and intercluster distance mapping.
137
138
```python { .api }
139
class KElbow(ClusteringScoreVisualizer):
140
def __init__(self, estimator, k=10, metric='distortion', **kwargs): ...
141
def fit(self, X, y=None, **kwargs): ...
142
143
class SilhouetteVisualizer(ClusteringScoreVisualizer):
144
def __init__(self, estimator, **kwargs): ...
145
def fit(self, X, y=None, **kwargs): ...
146
147
class InterclusterDistance(ClusteringScoreVisualizer):
148
def __init__(self, estimator, **kwargs): ...
149
def fit(self, X, y=None, **kwargs): ...
150
151
# Functional APIs
152
def kelbow_visualizer(estimator, X, k=10, **kwargs): ...
153
def silhouette_visualizer(estimator, X, **kwargs): ...
154
```
155
156
[Clustering Analysis](./clustering.md)
157
158
### Feature Analysis
159
160
Tools for feature selection, analysis, and visualization including feature ranking, correlation analysis, PCA decomposition, manifold learning, and parallel coordinates.
161
162
```python { .api }
163
class Rank1D(Visualizer):
164
def __init__(self, algorithm='shapiro', **kwargs): ...
165
def fit(self, X, y=None, **kwargs): ...
166
167
class Rank2D(Visualizer):
168
def __init__(self, algorithm='pearson', **kwargs): ...
169
def fit(self, X, y=None, **kwargs): ...
170
171
class PCA(Visualizer):
172
def __init__(self, scale=True, proj_features=True, **kwargs): ...
173
def fit(self, X, y=None, **kwargs): ...
174
175
class ParallelCoordinates(Visualizer):
176
def __init__(self, classes=None, **kwargs): ...
177
def fit(self, X, y=None, **kwargs): ...
178
179
# Functional APIs
180
def rank1d(X, y=None, algorithm='shapiro', **kwargs): ...
181
def rank2d(X, y=None, algorithm='pearson', **kwargs): ...
182
def pca_decomposition(X, y=None, **kwargs): ...
183
```
184
185
[Feature Analysis](./features.md)
186
187
### Model Selection
188
189
Visualizers for model selection and hyperparameter tuning including learning curves, validation curves, cross-validation scores, and feature importance analysis.
190
191
```python { .api }
192
class LearningCurve(ModelVisualizer):
193
def __init__(self, estimator, **kwargs): ...
194
def fit(self, X, y, **kwargs): ...
195
196
class ValidationCurve(ModelVisualizer):
197
def __init__(self, estimator, param_name, param_range, **kwargs): ...
198
def fit(self, X, y, **kwargs): ...
199
200
class FeatureImportances(ModelVisualizer):
201
def __init__(self, estimator, **kwargs): ...
202
def fit(self, X, y, **kwargs): ...
203
204
class CVScores(ModelVisualizer):
205
def __init__(self, estimator, **kwargs): ...
206
def fit(self, X, y, **kwargs): ...
207
208
# Functional APIs
209
def learning_curve(estimator, X, y, **kwargs): ...
210
def validation_curve(estimator, X, y, param_name, param_range, **kwargs): ...
211
def feature_importances(estimator, X, y, **kwargs): ...
212
```
213
214
[Model Selection](./model-selection.md)
215
216
### Text Analysis
217
218
Specialized visualizers for text analysis and natural language processing including t-SNE/UMAP embeddings, frequency distributions, part-of-speech analysis, and word correlation plots.
219
220
```python { .api }
221
class TSNEVisualizer(Visualizer):
222
def __init__(self, **kwargs): ...
223
def fit(self, X, y=None, **kwargs): ...
224
225
class FreqDistVisualizer(Visualizer):
226
def __init__(self, **kwargs): ...
227
def fit(self, corpus, **kwargs): ...
228
229
class DispersionPlot(Visualizer):
230
def __init__(self, **kwargs): ...
231
def fit(self, corpus, **kwargs): ...
232
233
# Functional APIs
234
def tsne(X, y=None, **kwargs): ...
235
def freqdist(corpus, **kwargs): ...
236
def dispersion(corpus, **kwargs): ...
237
```
238
239
[Text Analysis](./text.md)
240
241
### Data Loading and Utilities
242
243
Built-in datasets for learning and testing, plus utility functions for data management and visualization styling.
244
245
```python { .api }
246
# Dataset loaders
247
def load_concrete(): ...
248
def load_energy(): ...
249
def load_credit(): ...
250
def load_occupancy(): ...
251
def load_mushroom(): ...
252
def load_hobbies(): ...
253
def load_bikeshare(): ...
254
255
# Style management
256
def set_aesthetic(aesthetic='whitegrid'): ...
257
def set_palette(palette='flatui'): ...
258
def color_palette(palette=None): ...
259
260
# Demo functions
261
def anscombe(): ...
262
def datasaurus(): ...
263
```
264
265
[Data Loading and Utilities](./data-utilities.md)
266
267
## Types
268
269
```python { .api }
270
from enum import Enum
271
272
class TargetType(Enum):
273
AUTO = "auto"
274
SINGLE = "single"
275
DISCRETE = "discrete"
276
CONTINUOUS = "continuous"
277
UNKNOWN = "unknown"
278
279
# Base visualizer classes
280
class Visualizer:
281
def __init__(self, ax=None, fig=None, size=None, color=None, title=None, **kwargs): ...
282
def fit(self, X, y=None, **kwargs): ...
283
def transform(self, X): ...
284
def show(self, outpath=None, **kwargs): ...
285
def finalize(self, **kwargs): ...
286
287
class ModelVisualizer(Visualizer):
288
def __init__(self, estimator, ax=None, fig=None, is_fitted="auto", **kwargs): ...
289
290
class ScoreVisualizer(ModelVisualizer):
291
def score(self, X, y, **kwargs): ...
292
```