or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classification.mdclustering.mddata-utilities.mdfeatures.mdindex.mdmodel-selection.mdregression.mdtext.md

index.mddocs/

0

# Yellowbrick

1

2

A comprehensive machine learning visualization library that extends scikit-learn with publication-quality visualizations for machine learning model evaluation, selection, and interpretation. Yellowbrick provides visual diagnostic tools called "Visualizers" that combine scikit-learn with matplotlib to streamline the machine learning workflow from data exploration through model interpretation.

3

4

## Package Information

5

6

- **Package Name**: yellowbrick

7

- **Language**: Python

8

- **Installation**: `pip install yellowbrick`

9

- **Scikit-learn Integration**: Compatible with scikit-learn 0.20+

10

- **Dependencies**: matplotlib, scipy, scikit-learn, numpy

11

12

## Core Imports

13

14

```python

15

import yellowbrick

16

```

17

18

Direct imports from yellowbrick:

19

20

```python

21

from yellowbrick import ROCAUC, ClassBalance, ClassificationScoreVisualizer

22

from yellowbrick import anscombe, datasaurus

23

from yellowbrick import set_aesthetic, set_style, set_palette, color_palette

24

```

25

26

Common pattern for visualizers:

27

28

```python

29

from yellowbrick.classifier import ROCAUC, ConfusionMatrix

30

from yellowbrick.regressor import ResidualsPlot

31

from yellowbrick.cluster import KElbow

32

```

33

34

Functional API imports:

35

36

```python

37

from yellowbrick.classifier import roc_auc, confusion_matrix

38

from yellowbrick.regressor import residuals_plot

39

```

40

41

## Basic Usage

42

43

```python

44

from yellowbrick.classifier import ROCAUC

45

from sklearn.model_selection import train_test_split

46

from sklearn.linear_model import LogisticRegression

47

from sklearn.datasets import make_classification

48

49

# Generate sample data

50

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2)

51

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

52

53

# Create and fit model

54

model = LogisticRegression()

55

56

# Visualize ROC/AUC curves

57

visualizer = ROCAUC(model, classes=['Class 0', 'Class 1'])

58

visualizer.fit(X_train, y_train)

59

visualizer.score(X_test, y_test)

60

visualizer.show()

61

62

# Using functional API

63

from yellowbrick.classifier import roc_auc

64

roc_auc(model, X_train, y_train, X_test, y_test, classes=['Class 0', 'Class 1'])

65

```

66

67

## Architecture

68

69

Yellowbrick follows the scikit-learn API design with Visualizers that inherit from `sklearn.base.BaseEstimator`:

70

71

- **Base Classes**: `Visualizer`, `ModelVisualizer`, `ScoreVisualizer` provide the foundation

72

- **Visualizer Pattern**: All visualizers implement `fit()`, `score()`, and `show()` methods

73

- **Pipeline Integration**: Visualizers can be used in scikit-learn pipelines

74

- **Dual API**: Both class-based and functional APIs for flexibility

75

- **Matplotlib Integration**: Built on matplotlib with consistent styling and themes

76

77

## Capabilities

78

79

### Classification Analysis

80

81

Comprehensive visualizers for evaluating classification models including ROC curves, confusion matrices, classification reports, class prediction errors, precision-recall curves, and discrimination thresholds.

82

83

```python { .api }

84

class ROCAUC(ClassificationScoreVisualizer):

85

def __init__(self, estimator, ax=None, micro=True, macro=True, per_class=True, binary=False, classes=None, encoder=None, is_fitted="auto", force_model=False, **kwargs): ...

86

def fit(self, X, y, **kwargs): ...

87

def score(self, X, y, **kwargs): ...

88

89

class ConfusionMatrix(ClassificationScoreVisualizer):

90

def __init__(self, estimator, ax=None, sample_weight=None, percent=False, classes=None, encoder=None, cmap="YlOrRd", fontsize=None, is_fitted="auto", force_model=False, **kwargs): ...

91

def fit(self, X, y, **kwargs): ...

92

def score(self, X, y, **kwargs): ...

93

94

class ClassificationReport(ClassificationScoreVisualizer):

95

def __init__(self, estimator, classes=None, **kwargs): ...

96

def fit(self, X, y, **kwargs): ...

97

def score(self, X, y, **kwargs): ...

98

99

# Functional APIs

100

def roc_auc(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...

101

def confusion_matrix(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...

102

def classification_report(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...

103

```

104

105

[Classification Analysis](./classification.md)

106

107

### Regression Analysis

108

109

Diagnostic visualizers for regression models including residuals plots, prediction error plots, alpha selection for regularized models, and Cook's distance for influence analysis.

110

111

```python { .api }

112

class ResidualsPlot(RegressionScoreVisualizer):

113

def __init__(self, estimator, **kwargs): ...

114

def fit(self, X, y, **kwargs): ...

115

def score(self, X, y, **kwargs): ...

116

117

class PredictionError(RegressionScoreVisualizer):

118

def __init__(self, estimator, **kwargs): ...

119

def fit(self, X, y, **kwargs): ...

120

def score(self, X, y, **kwargs): ...

121

122

class AlphaSelection(RegressionScoreVisualizer):

123

def __init__(self, estimator, **kwargs): ...

124

def fit(self, X, y, **kwargs): ...

125

def score(self, X, y, **kwargs): ...

126

127

# Functional APIs

128

def residuals_plot(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...

129

def prediction_error(estimator, X_train, y_train, X_test=None, y_test=None, **kwargs): ...

130

```

131

132

[Regression Analysis](./regression.md)

133

134

### Clustering Analysis

135

136

Visualizers for clustering evaluation including elbow method for optimal K selection, silhouette analysis, and intercluster distance mapping.

137

138

```python { .api }

139

class KElbow(ClusteringScoreVisualizer):

140

def __init__(self, estimator, k=10, metric='distortion', **kwargs): ...

141

def fit(self, X, y=None, **kwargs): ...

142

143

class SilhouetteVisualizer(ClusteringScoreVisualizer):

144

def __init__(self, estimator, **kwargs): ...

145

def fit(self, X, y=None, **kwargs): ...

146

147

class InterclusterDistance(ClusteringScoreVisualizer):

148

def __init__(self, estimator, **kwargs): ...

149

def fit(self, X, y=None, **kwargs): ...

150

151

# Functional APIs

152

def kelbow_visualizer(estimator, X, k=10, **kwargs): ...

153

def silhouette_visualizer(estimator, X, **kwargs): ...

154

```

155

156

[Clustering Analysis](./clustering.md)

157

158

### Feature Analysis

159

160

Tools for feature selection, analysis, and visualization including feature ranking, correlation analysis, PCA decomposition, manifold learning, and parallel coordinates.

161

162

```python { .api }

163

class Rank1D(Visualizer):

164

def __init__(self, algorithm='shapiro', **kwargs): ...

165

def fit(self, X, y=None, **kwargs): ...

166

167

class Rank2D(Visualizer):

168

def __init__(self, algorithm='pearson', **kwargs): ...

169

def fit(self, X, y=None, **kwargs): ...

170

171

class PCA(Visualizer):

172

def __init__(self, scale=True, proj_features=True, **kwargs): ...

173

def fit(self, X, y=None, **kwargs): ...

174

175

class ParallelCoordinates(Visualizer):

176

def __init__(self, classes=None, **kwargs): ...

177

def fit(self, X, y=None, **kwargs): ...

178

179

# Functional APIs

180

def rank1d(X, y=None, algorithm='shapiro', **kwargs): ...

181

def rank2d(X, y=None, algorithm='pearson', **kwargs): ...

182

def pca_decomposition(X, y=None, **kwargs): ...

183

```

184

185

[Feature Analysis](./features.md)

186

187

### Model Selection

188

189

Visualizers for model selection and hyperparameter tuning including learning curves, validation curves, cross-validation scores, and feature importance analysis.

190

191

```python { .api }

192

class LearningCurve(ModelVisualizer):

193

def __init__(self, estimator, **kwargs): ...

194

def fit(self, X, y, **kwargs): ...

195

196

class ValidationCurve(ModelVisualizer):

197

def __init__(self, estimator, param_name, param_range, **kwargs): ...

198

def fit(self, X, y, **kwargs): ...

199

200

class FeatureImportances(ModelVisualizer):

201

def __init__(self, estimator, **kwargs): ...

202

def fit(self, X, y, **kwargs): ...

203

204

class CVScores(ModelVisualizer):

205

def __init__(self, estimator, **kwargs): ...

206

def fit(self, X, y, **kwargs): ...

207

208

# Functional APIs

209

def learning_curve(estimator, X, y, **kwargs): ...

210

def validation_curve(estimator, X, y, param_name, param_range, **kwargs): ...

211

def feature_importances(estimator, X, y, **kwargs): ...

212

```

213

214

[Model Selection](./model-selection.md)

215

216

### Text Analysis

217

218

Specialized visualizers for text analysis and natural language processing including t-SNE/UMAP embeddings, frequency distributions, part-of-speech analysis, and word correlation plots.

219

220

```python { .api }

221

class TSNEVisualizer(Visualizer):

222

def __init__(self, **kwargs): ...

223

def fit(self, X, y=None, **kwargs): ...

224

225

class FreqDistVisualizer(Visualizer):

226

def __init__(self, **kwargs): ...

227

def fit(self, corpus, **kwargs): ...

228

229

class DispersionPlot(Visualizer):

230

def __init__(self, **kwargs): ...

231

def fit(self, corpus, **kwargs): ...

232

233

# Functional APIs

234

def tsne(X, y=None, **kwargs): ...

235

def freqdist(corpus, **kwargs): ...

236

def dispersion(corpus, **kwargs): ...

237

```

238

239

[Text Analysis](./text.md)

240

241

### Data Loading and Utilities

242

243

Built-in datasets for learning and testing, plus utility functions for data management and visualization styling.

244

245

```python { .api }

246

# Dataset loaders

247

def load_concrete(): ...

248

def load_energy(): ...

249

def load_credit(): ...

250

def load_occupancy(): ...

251

def load_mushroom(): ...

252

def load_hobbies(): ...

253

def load_bikeshare(): ...

254

255

# Style management

256

def set_aesthetic(aesthetic='whitegrid'): ...

257

def set_palette(palette='flatui'): ...

258

def color_palette(palette=None): ...

259

260

# Demo functions

261

def anscombe(): ...

262

def datasaurus(): ...

263

```

264

265

[Data Loading and Utilities](./data-utilities.md)

266

267

## Types

268

269

```python { .api }

270

from enum import Enum

271

272

class TargetType(Enum):

273

AUTO = "auto"

274

SINGLE = "single"

275

DISCRETE = "discrete"

276

CONTINUOUS = "continuous"

277

UNKNOWN = "unknown"

278

279

# Base visualizer classes

280

class Visualizer:

281

def __init__(self, ax=None, fig=None, size=None, color=None, title=None, **kwargs): ...

282

def fit(self, X, y=None, **kwargs): ...

283

def transform(self, X): ...

284

def show(self, outpath=None, **kwargs): ...

285

def finalize(self, **kwargs): ...

286

287

class ModelVisualizer(Visualizer):

288

def __init__(self, estimator, ax=None, fig=None, is_fitted="auto", **kwargs): ...

289

290

class ScoreVisualizer(ModelVisualizer):

291

def score(self, X, y, **kwargs): ...

292

```