or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classification.mdclustering.mddatasets.mdevaluation.mdfeature-engineering.mdfile-io.mdindex.mdmath-utils.mdpattern-mining.mdplotting.mdpreprocessing.mdregression.mdtext-processing.mdutilities.md

clustering.mddocs/

0

# Clustering

1

2

Unsupervised learning algorithms for data clustering and pattern discovery.

3

4

## Capabilities

5

6

### K-means Clustering

7

8

Iterative clustering algorithm that partitions data into k clusters.

9

10

```python { .api }

11

class Kmeans:

12

def __init__(self, k, max_iter=100, convergence_tolerance=1e-05, random_seed=None,

13

print_progress=0):

14

"""

15

K-means clustering algorithm.

16

17

Parameters:

18

- k: int, number of clusters

19

- max_iter: int, maximum number of iterations

20

- convergence_tolerance: float, convergence threshold

21

- random_seed: int, random seed for centroid initialization

22

- print_progress: int, print progress every n iterations

23

"""

24

25

def fit(self, X, init_params=True):

26

"""

27

Fit K-means clustering to data.

28

29

Parameters:

30

- X: array-like, feature matrix (shape: [n_samples, n_features])

31

- init_params: bool, initialize parameters

32

33

Returns:

34

- self: fitted estimator

35

"""

36

37

def predict(self, X):

38

"""

39

Predict cluster labels for samples.

40

41

Parameters:

42

- X: array-like, feature matrix

43

44

Returns:

45

- labels: array, cluster labels for each sample

46

"""

47

48

def fit_predict(self, X):

49

"""Fit clustering and return cluster labels"""

50

51

centroids_: # Cluster centroids after fitting

52

clusters_: # Dictionary mapping cluster indices to sample indices

53

iterations_: # Number of iterations until convergence

54

```

55

56

## Usage Examples

57

58

### Basic K-means Clustering

59

60

```python

61

from mlxtend.cluster import Kmeans

62

from sklearn.datasets import make_blobs

63

import matplotlib.pyplot as plt

64

import numpy as np

65

66

# Generate sample data

67

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

68

69

# Apply K-means clustering

70

kmeans = Kmeans(k=4, random_seed=42)

71

cluster_labels = kmeans.fit_predict(X)

72

73

# Plot results

74

plt.figure(figsize=(8, 6))

75

plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis', alpha=0.6)

76

plt.scatter(kmeans.centroids_[:, 0], kmeans.centroids_[:, 1],

77

c='red', marker='x', s=200, linewidths=3)

78

plt.title('K-means Clustering Results')

79

plt.xlabel('Feature 1')

80

plt.ylabel('Feature 2')

81

plt.show()

82

83

print(f"Converged after {kmeans.iterations_} iterations")

84

print(f"Cluster centroids:\n{kmeans.centroids_}")

85

```

86

87

### Clustering with Different K Values

88

89

```python

90

from mlxtend.cluster import Kmeans

91

from sklearn.datasets import make_blobs

92

import matplotlib.pyplot as plt

93

94

# Generate data

95

X, _ = make_blobs(n_samples=300, centers=3, cluster_std=1.0, random_state=42)

96

97

# Try different k values

98

k_values = [2, 3, 4, 5]

99

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

100

axes = axes.ravel()

101

102

for i, k in enumerate(k_values):

103

kmeans = Kmeans(k=k, random_seed=42)

104

labels = kmeans.fit_predict(X)

105

106

axes[i].scatter(X[:, 0], X[:, 1], c=labels, cmap='tab10', alpha=0.6)

107

axes[i].scatter(kmeans.centroids_[:, 0], kmeans.centroids_[:, 1],

108

c='red', marker='x', s=100, linewidths=2)

109

axes[i].set_title(f'K-means with k={k}')

110

axes[i].set_xlabel('Feature 1')

111

axes[i].set_ylabel('Feature 2')

112

113

plt.tight_layout()

114

plt.show()

115

```