0
# Clustering
1
2
Unsupervised learning algorithms for data clustering and pattern discovery.
3
4
## Capabilities
5
6
### K-means Clustering
7
8
Iterative clustering algorithm that partitions data into k clusters.
9
10
```python { .api }
11
class Kmeans:
12
def __init__(self, k, max_iter=100, convergence_tolerance=1e-05, random_seed=None,
13
print_progress=0):
14
"""
15
K-means clustering algorithm.
16
17
Parameters:
18
- k: int, number of clusters
19
- max_iter: int, maximum number of iterations
20
- convergence_tolerance: float, convergence threshold
21
- random_seed: int, random seed for centroid initialization
22
- print_progress: int, print progress every n iterations
23
"""
24
25
def fit(self, X, init_params=True):
26
"""
27
Fit K-means clustering to data.
28
29
Parameters:
30
- X: array-like, feature matrix (shape: [n_samples, n_features])
31
- init_params: bool, initialize parameters
32
33
Returns:
34
- self: fitted estimator
35
"""
36
37
def predict(self, X):
38
"""
39
Predict cluster labels for samples.
40
41
Parameters:
42
- X: array-like, feature matrix
43
44
Returns:
45
- labels: array, cluster labels for each sample
46
"""
47
48
def fit_predict(self, X):
49
"""Fit clustering and return cluster labels"""
50
51
centroids_: # Cluster centroids after fitting
52
clusters_: # Dictionary mapping cluster indices to sample indices
53
iterations_: # Number of iterations until convergence
54
```
55
56
## Usage Examples
57
58
### Basic K-means Clustering
59
60
```python
61
from mlxtend.cluster import Kmeans
62
from sklearn.datasets import make_blobs
63
import matplotlib.pyplot as plt
64
import numpy as np
65
66
# Generate sample data
67
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
68
69
# Apply K-means clustering
70
kmeans = Kmeans(k=4, random_seed=42)
71
cluster_labels = kmeans.fit_predict(X)
72
73
# Plot results
74
plt.figure(figsize=(8, 6))
75
plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis', alpha=0.6)
76
plt.scatter(kmeans.centroids_[:, 0], kmeans.centroids_[:, 1],
77
c='red', marker='x', s=200, linewidths=3)
78
plt.title('K-means Clustering Results')
79
plt.xlabel('Feature 1')
80
plt.ylabel('Feature 2')
81
plt.show()
82
83
print(f"Converged after {kmeans.iterations_} iterations")
84
print(f"Cluster centroids:\n{kmeans.centroids_}")
85
```
86
87
### Clustering with Different K Values
88
89
```python
90
from mlxtend.cluster import Kmeans
91
from sklearn.datasets import make_blobs
92
import matplotlib.pyplot as plt
93
94
# Generate data
95
X, _ = make_blobs(n_samples=300, centers=3, cluster_std=1.0, random_state=42)
96
97
# Try different k values
98
k_values = [2, 3, 4, 5]
99
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
100
axes = axes.ravel()
101
102
for i, k in enumerate(k_values):
103
kmeans = Kmeans(k=k, random_seed=42)
104
labels = kmeans.fit_predict(X)
105
106
axes[i].scatter(X[:, 0], X[:, 1], c=labels, cmap='tab10', alpha=0.6)
107
axes[i].scatter(kmeans.centroids_[:, 0], kmeans.centroids_[:, 1],
108
c='red', marker='x', s=100, linewidths=2)
109
axes[i].set_title(f'K-means with k={k}')
110
axes[i].set_xlabel('Feature 1')
111
axes[i].set_ylabel('Feature 2')
112
113
plt.tight_layout()
114
plt.show()
115
```