Machine Learning Library Extensions providing essential tools for day-to-day data science tasks
npx @tessl/cli install tessl/pypi-mlxtend@0.23.00
# MLxtend
1
2
MLxtend (Machine Learning Extensions) is a comprehensive Python library that provides essential tools for day-to-day data science tasks, extending scikit-learn and other scientific computing libraries. The package offers advanced machine learning algorithms including ensemble methods, frequent pattern mining algorithms, feature selection and extraction techniques, model evaluation utilities, and specialized plotting functions for visualization of decision regions and model performance.
3
4
## Package Information
5
6
- **Package Name**: mlxtend
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install mlxtend`
10
- **Version**: 0.23.4
11
- **License**: BSD 3-Clause
12
13
## Core Imports
14
15
```python
16
import mlxtend
17
```
18
19
Common import patterns for specific modules:
20
21
```python
22
from mlxtend.classifier import EnsembleVoteClassifier, StackingClassifier
23
from mlxtend.feature_selection import SequentialFeatureSelector
24
from mlxtend.plotting import plot_decision_regions, plot_learning_curves
25
from mlxtend.evaluate import mcnemar, bootstrap_point632_score
26
from mlxtend.frequent_patterns import apriori, association_rules
27
```
28
29
## Basic Usage
30
31
```python
32
from mlxtend.classifier import EnsembleVoteClassifier
33
from mlxtend.plotting import plot_decision_regions
34
from sklearn.ensemble import RandomForestClassifier
35
from sklearn.svm import SVC
36
from sklearn.linear_model import LogisticRegression
37
from sklearn.datasets import make_classification
38
import matplotlib.pyplot as plt
39
40
# Create sample data
41
X, y = make_classification(n_samples=1000, n_features=2, n_redundant=0,
42
n_informative=2, random_state=42, n_clusters_per_class=1)
43
44
# Create ensemble classifier
45
clf1 = LogisticRegression(random_state=42)
46
clf2 = RandomForestClassifier(random_state=42)
47
clf3 = SVC(probability=True, random_state=42)
48
49
ensemble = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft')
50
ensemble.fit(X, y)
51
52
# Visualize decision regions
53
plot_decision_regions(X, y, clf=ensemble, legend=2)
54
plt.title('Ensemble Classifier Decision Regions')
55
plt.show()
56
```
57
58
## Architecture
59
60
MLxtend is organized into 14 specialized modules, each focusing on specific aspects of machine learning:
61
62
- **Classifiers**: Nine advanced classification algorithms including ensemble methods and neural networks
63
- **Regressors**: Stacking ensemble methods for regression tasks
64
- **Feature Engineering**: Selection, extraction, and preprocessing tools
65
- **Evaluation**: Comprehensive model evaluation and statistical testing utilities
66
- **Visualization**: Specialized plotting functions for ML model analysis
67
- **Pattern Mining**: Association rule and frequent pattern mining algorithms
68
- **Utilities**: Mathematical functions, text processing, and file I/O tools
69
70
This modular design allows users to import only the functionality they need while maintaining compatibility with the broader Python scientific ecosystem, particularly scikit-learn.
71
72
## Capabilities
73
74
### Classification Algorithms
75
76
Advanced classification methods including ensemble voting, stacking, neural networks, and classic algorithms like perceptron and logistic regression.
77
78
```python { .api }
79
class EnsembleVoteClassifier:
80
def __init__(self, clfs, voting='hard', weights=None): ...
81
def fit(self, X, y): ...
82
def predict(self, X): ...
83
def predict_proba(self, X): ...
84
85
class StackingClassifier:
86
def __init__(self, classifiers, meta_classifier): ...
87
def fit(self, X, y): ...
88
def predict(self, X): ...
89
90
class MultiLayerPerceptron:
91
def __init__(self, eta=0.5, epochs=50, hidden_layers=[50]): ...
92
def fit(self, X, y): ...
93
def predict(self, X): ...
94
```
95
96
[Classification Algorithms](./classification.md)
97
98
### Feature Selection and Extraction
99
100
Tools for selecting optimal feature subsets and extracting new features through dimensionality reduction techniques.
101
102
```python { .api }
103
class SequentialFeatureSelector:
104
def __init__(self, estimator, k_features=1, forward=True, scoring=None): ...
105
def fit(self, X, y): ...
106
def transform(self, X): ...
107
108
class PrincipalComponentAnalysis:
109
def __init__(self, n_components=None): ...
110
def fit(self, X, y=None): ...
111
def transform(self, X): ...
112
113
class LinearDiscriminantAnalysis:
114
def __init__(self, n_discriminants=None): ...
115
def fit(self, X, y): ...
116
def transform(self, X): ...
117
```
118
119
[Feature Engineering](./feature-engineering.md)
120
121
### Model Evaluation and Testing
122
123
Comprehensive model evaluation tools including statistical tests, bootstrap methods, and cross-validation utilities.
124
125
```python { .api }
126
def mcnemar(ary, corrected=True, exact=False):
127
"""McNemar test for classifier comparison"""
128
129
def bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632+'):
130
"""Bootstrap .632 and .632+ error estimation"""
131
132
def paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None):
133
"""5x2cv paired t-test for comparing classifiers"""
134
135
class BootstrapOutOfBag:
136
def __init__(self, n_splits=200, random_state=None): ...
137
def split(self, X, y=None): ...
138
```
139
140
[Model Evaluation](./evaluation.md)
141
142
### Visualization Tools
143
144
Specialized plotting functions for machine learning model analysis including decision regions, learning curves, and confusion matrices.
145
146
```python { .api }
147
def plot_decision_regions(X, y, clf, feature_idx=None, filler_feature_values=None):
148
"""Plot decision regions for 2D datasets"""
149
150
def plot_learning_curves(X_train, y_train, X_test, y_test, clf, scoring='misclassification error'):
151
"""Plot learning curves"""
152
153
def plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None):
154
"""Plot confusion matrix"""
155
156
def plot_sequential_feature_selection(metric_dict, kind='std_dev', color='blue'):
157
"""Plot sequential feature selection results"""
158
```
159
160
[Visualization Tools](./plotting.md)
161
162
### Frequent Pattern Mining
163
164
Association rule mining and frequent pattern discovery algorithms for transaction data analysis.
165
166
```python { .api }
167
def apriori(df, min_support=0.5, use_colnames=False, max_len=None):
168
"""Apriori algorithm for frequent itemset mining"""
169
170
def association_rules(df, metric="confidence", min_threshold=0.8):
171
"""Generate association rules from frequent itemsets"""
172
173
def fpgrowth(df, min_support=0.5, use_colnames=False, max_len=None):
174
"""FP-Growth algorithm for frequent itemset mining"""
175
176
def fpmax(df, min_support=0.5, use_colnames=False):
177
"""FPMax algorithm for maximal frequent itemsets"""
178
```
179
180
[Pattern Mining](./pattern-mining.md)
181
182
### Data Preprocessing
183
184
Data transformation utilities including scaling, encoding, and array manipulation functions.
185
186
```python { .api }
187
class MeanCenterer:
188
def fit(self, X): ...
189
def transform(self, X): ...
190
191
class TransactionEncoder:
192
def fit(self, X): ...
193
def transform(self, X): ...
194
195
def standardize(array, columns=None, ddof=0):
196
"""Standardize features by removing mean and scaling to unit variance"""
197
198
def minmax_scaling(array, columns=None, min_val=0, max_val=1):
199
"""Min-max feature scaling"""
200
```
201
202
[Data Preprocessing](./preprocessing.md)
203
204
### Clustering Algorithms
205
206
Unsupervised learning algorithms for data clustering and pattern discovery.
207
208
```python { .api }
209
class Kmeans:
210
def __init__(self, k, max_iter=100, convergence_tolerance=1e-05): ...
211
def fit(self, X): ...
212
def predict(self, X): ...
213
```
214
215
[Clustering](./clustering.md)
216
217
### Dataset Loading
218
219
Utilities for loading common machine learning datasets and generating synthetic data.
220
221
```python { .api }
222
def iris_data():
223
"""Load the Iris dataset"""
224
225
def wine_data():
226
"""Load the Wine dataset"""
227
228
def mnist_data():
229
"""Load the MNIST dataset"""
230
231
def boston_housing_data():
232
"""Load the Boston Housing dataset"""
233
```
234
235
[Dataset Loading](./datasets.md)
236
237
### Regression Algorithms
238
239
Ensemble regression methods including stacking for improved prediction performance.
240
241
```python { .api }
242
class LinearRegression:
243
def __init__(self, eta=0.01, epochs=50): ...
244
def fit(self, X, y): ...
245
def predict(self, X): ...
246
247
class StackingRegressor:
248
def __init__(self, regressors, meta_regressor): ...
249
def fit(self, X, y): ...
250
def predict(self, X): ...
251
```
252
253
[Regression Algorithms](./regression.md)
254
255
### Mathematical Utilities
256
257
Mathematical functions and utilities commonly used in machine learning computations.
258
259
```python { .api }
260
def num_combinations(n, r):
261
"""Calculate number of combinations"""
262
263
def num_permutations(n, r):
264
"""Calculate number of permutations"""
265
266
def factorial(n):
267
"""Calculate factorial"""
268
269
def vectorspace_orthonormalization(ary):
270
"""Orthonormalize vectors using Gram-Schmidt process"""
271
```
272
273
[Mathematical Utilities](./math-utils.md)
274
275
### Text Processing
276
277
Text processing utilities for natural language processing tasks.
278
279
```python { .api }
280
def generalize_names(name):
281
"""Generalize person names for consistency"""
282
283
def tokenizer_words_and_emoticons(text):
284
"""Tokenize text including emoticons"""
285
286
def tokenizer_emoticons(text):
287
"""Extract emoticons from text"""
288
```
289
290
[Text Processing](./text-processing.md)
291
292
### File I/O Utilities
293
294
File system utilities for finding and organizing files.
295
296
```python { .api }
297
def find_files(substring, path, recursive=True, check_ext=None, ignore_invisible=True):
298
"""Find files matching criteria"""
299
300
def find_filegroups(paths, substring='', extensions=None, ignore_invisible=True):
301
"""Group files by specified criteria"""
302
```
303
304
[File I/O](./file-io.md)
305
306
### General Utilities
307
308
General-purpose utilities for testing, data validation, and parameter handling.
309
310
```python { .api }
311
class Counter:
312
def __init__(self, iterable=None): ...
313
def update(self, iterable): ...
314
def most_common(self, n=None): ...
315
316
def check_Xy(X, y, y_int=True):
317
"""Validate input data format"""
318
319
def assert_raises(exception_type, callable_obj, *args, **kwargs):
320
"""Test utility for verifying exceptions"""
321
```
322
323
[General Utilities](./utilities.md)
324
325
## Types
326
327
```python { .api }
328
# Core types used across multiple modules
329
from typing import Union, Optional, List, Tuple, Dict, Any
330
from numpy import ndarray
331
from pandas import DataFrame
332
333
# Common type aliases
334
ArrayLike = Union[ndarray, List, Tuple]
335
DataFrameLike = Union[DataFrame, ndarray]
336
ClassifierLike = object # sklearn-compatible classifier
337
RegressorLike = object # sklearn-compatible regressor
338
```