0
# Machine Learning
1
2
Classification algorithms and clustering methods for supervised and unsupervised learning tasks.
3
4
## Core Imports
5
6
```typescript
7
import {
8
BayesianClassifier,
9
PerceptronModel,
10
ckmeans,
11
kernelDensityEstimation
12
} from "simple-statistics";
13
```
14
15
## Classification Algorithms
16
17
### BayesianClassifier (alias: bayesian) { .api }
18
19
```typescript { .api }
20
class BayesianClassifier {
21
totalCount: number;
22
data: Record<string, any>;
23
24
constructor();
25
train(item: Record<string, any>, category: string): void;
26
score(item: Record<string, any>): Record<string, number>;
27
}
28
29
const bayesian: typeof BayesianClassifier;
30
```
31
32
Naive Bayesian classifier for categorical data. Assumes independence between features.
33
34
**Properties:**
35
- `totalCount: number` - Total number of training examples
36
- `data: Record<string, any>` - Internal storage for training data
37
38
**Methods:**
39
40
#### train { .api }
41
```typescript { .api }
42
train(item: Record<string, any>, category: string): void;
43
```
44
45
Trains the classifier with a labeled example.
46
47
**Parameters:**
48
- `item: Record<string, any>` - Feature object with key-value pairs
49
- `category: string` - Category label for this item
50
51
#### score { .api }
52
```typescript { .api }
53
score(item: Record<string, any>): Record<string, number>;
54
```
55
56
Classifies an item and returns probability scores for each category.
57
58
**Parameters:**
59
- `item: Record<string, any>` - Feature object to classify
60
61
**Returns:** `Record<string, number>` - Probability scores for each category
62
63
```typescript
64
import { BayesianClassifier } from "simple-statistics";
65
66
// Email spam classification
67
const classifier = new BayesianClassifier();
68
69
// Train with examples
70
classifier.train({ word: "free", count: 1 }, "spam");
71
classifier.train({ word: "meeting", count: 1 }, "ham");
72
classifier.train({ word: "free", count: 2, urgent: true }, "spam");
73
classifier.train({ word: "project", count: 1 }, "ham");
74
75
// Classify new email
76
const scores = classifier.score({ word: "free", count: 1 });
77
console.log(scores); // { spam: 0.67, ham: 0.33 }
78
79
// Get most likely category
80
const category = Object.keys(scores).reduce((a, b) =>
81
scores[a] > scores[b] ? a : b
82
);
83
console.log(`Predicted category: ${category}`);
84
```
85
86
### PerceptronModel (alias: perceptron) { .api }
87
88
```typescript { .api }
89
class PerceptronModel {
90
weights: number[];
91
bias: number;
92
93
constructor();
94
predict(features: number[]): number;
95
train(features: number[], label: number): PerceptronModel;
96
}
97
98
const perceptron: typeof PerceptronModel;
99
```
100
101
Linear perceptron for binary classification. Good for linearly separable data.
102
103
**Properties:**
104
- `weights: number[]` - Feature weights learned during training
105
- `bias: number` - Bias term
106
107
**Methods:**
108
109
#### predict { .api }
110
```typescript { .api }
111
predict(features: number[]): number;
112
```
113
114
Makes a prediction for given features.
115
116
**Parameters:**
117
- `features: number[]` - Feature vector
118
119
**Returns:** `number` - Predicted class (0 or 1)
120
121
#### train { .api }
122
```typescript { .api }
123
train(features: number[], label: number): PerceptronModel;
124
```
125
126
Updates the model with a training example.
127
128
**Parameters:**
129
- `features: number[]` - Feature vector
130
- `label: number` - True label (0 or 1)
131
132
**Returns:** `PerceptronModel` - Updated model (for chaining)
133
134
```typescript
135
import { PerceptronModel } from "simple-statistics";
136
137
// Binary classification: predict loan approval
138
const perceptron = new PerceptronModel();
139
140
// Training data: [income, credit_score, debt_ratio]
141
const trainingData = [
142
{ features: [50000, 700, 0.3], label: 1 }, // approved
143
{ features: [30000, 600, 0.8], label: 0 }, // denied
144
{ features: [80000, 750, 0.2], label: 1 }, // approved
145
{ features: [25000, 550, 0.9], label: 0 }, // denied
146
];
147
148
// Train the model
149
trainingData.forEach(({ features, label }) => {
150
perceptron.train(features, label);
151
});
152
153
// Make predictions
154
const applicant1 = [60000, 720, 0.4];
155
const prediction1 = perceptron.predict(applicant1);
156
console.log(`Loan decision: ${prediction1 ? 'Approved' : 'Denied'}`);
157
158
const applicant2 = [20000, 500, 1.2];
159
const prediction2 = perceptron.predict(applicant2);
160
console.log(`Loan decision: ${prediction2 ? 'Approved' : 'Denied'}`);
161
```
162
163
## Clustering
164
165
### ckmeans { .api }
166
167
```typescript { .api }
168
function ckmeans<T>(data: T[], nClusters: number): T[][];
169
```
170
171
Optimal k-means clustering in 1D using dynamic programming. Also known as Jenks natural breaks.
172
173
**Parameters:**
174
- `data: T[]` - Array of values to cluster
175
- `nClusters: number` - Number of clusters to create
176
177
**Returns:** `T[][]` - Array of clusters, each containing grouped values
178
179
**Use Cases:**
180
- Data visualization (choropleth maps)
181
- Natural breakpoints in continuous data
182
- Optimal binning for histograms
183
184
```typescript
185
import { ckmeans } from "simple-statistics";
186
187
// Income distribution clustering
188
const incomes = [25000, 28000, 30000, 45000, 48000, 50000, 75000, 80000, 150000, 200000];
189
const clusters = ckmeans(incomes, 3);
190
191
console.log("Income clusters:");
192
clusters.forEach((cluster, i) => {
193
const min = Math.min(...cluster);
194
const max = Math.max(...cluster);
195
console.log(`Cluster ${i + 1}: $${min.toLocaleString()} - $${max.toLocaleString()}`);
196
});
197
198
// Result might be:
199
// Cluster 1: $25,000 - $30,000 (Low income)
200
// Cluster 2: $45,000 - $50,000 (Middle income)
201
// Cluster 3: $75,000 - $200,000 (High income)
202
```
203
204
## Density Estimation
205
206
### kernelDensityEstimation (alias: kde) { .api }
207
208
```typescript { .api }
209
type Kernel = 'gaussian' | ((x: number) => number);
210
type BandwidthMethod = 'nrd' | number;
211
212
function kernelDensityEstimation(
213
X: number[],
214
kernel?: Kernel,
215
bandwidthMethod?: BandwidthMethod
216
): (x: number) => number;
217
```
218
219
Estimates probability density function from sample data using kernel density estimation.
220
221
**Parameters:**
222
- `X: number[]` - Sample data points
223
- `kernel?: Kernel` - Kernel function ('gaussian' or custom function)
224
- `bandwidthMethod?: BandwidthMethod` - Bandwidth selection ('nrd' or numeric value)
225
226
**Returns:** `(x: number) => number` - Density function that estimates P(X=x)
227
228
```typescript
229
import { kernelDensityEstimation } from "simple-statistics";
230
231
// Estimate probability density of test scores
232
const testScores = [65, 70, 75, 78, 80, 82, 85, 88, 90, 92];
233
const densityFunction = kernelDensityEstimation(testScores);
234
235
// Get density estimates
236
const density75 = densityFunction(75);
237
const density85 = densityFunction(85);
238
const density95 = densityFunction(95);
239
240
console.log(`Density at 75: ${density75.toFixed(4)}`);
241
console.log(`Density at 85: ${density85.toFixed(4)}`);
242
console.log(`Density at 95: ${density95.toFixed(4)}`);
243
244
// Plot density curve
245
const xRange = Array.from({ length: 50 }, (_, i) => 60 + i);
246
const densityCurve = xRange.map(x => ({ x, density: densityFunction(x) }));
247
console.log("Density curve:", densityCurve);
248
```
249
250
## Usage Examples
251
252
### Text Classification with Naive Bayes
253
254
```typescript
255
import { BayesianClassifier } from "simple-statistics";
256
257
// Sentiment analysis classifier
258
const sentimentClassifier = new BayesianClassifier();
259
260
// Training data
261
const trainingTexts = [
262
{ text: "love this product amazing quality", sentiment: "positive" },
263
{ text: "terrible service very disappointed", sentiment: "negative" },
264
{ text: "excellent fast shipping", sentiment: "positive" },
265
{ text: "defective item poor quality", sentiment: "negative" },
266
{ text: "great value highly recommend", sentiment: "positive" },
267
];
268
269
// Simple feature extraction (word presence)
270
function extractFeatures(text: string): Record<string, boolean> {
271
const words = text.toLowerCase().split(' ');
272
const features: Record<string, boolean> = {};
273
words.forEach(word => {
274
features[`word_${word}`] = true;
275
});
276
return features;
277
}
278
279
// Train classifier
280
trainingTexts.forEach(({ text, sentiment }) => {
281
const features = extractFeatures(text);
282
sentimentClassifier.train(features, sentiment);
283
});
284
285
// Classify new text
286
const newReview = "fast delivery great product";
287
const features = extractFeatures(newReview);
288
const scores = sentimentClassifier.score(features);
289
290
console.log("Review:", newReview);
291
console.log("Sentiment scores:", scores);
292
console.log("Predicted sentiment:",
293
Object.keys(scores).reduce((a, b) => scores[a] > scores[b] ? a : b)
294
);
295
```
296
297
### Customer Segmentation with Clustering
298
299
```typescript
300
import { ckmeans, mean, standardDeviation } from "simple-statistics";
301
302
// Customer purchase amounts over past year
303
const customerSpending = [
304
50, 75, 80, 120, 150, 180, 200, 250, 300, 350,
305
400, 500, 600, 800, 1000, 1200, 1500, 2000, 2500, 3000
306
];
307
308
// Find natural customer segments
309
const segments = ckmeans(customerSpending, 4);
310
311
console.log("Customer Segments:");
312
segments.forEach((segment, i) => {
313
const segmentMean = mean(segment);
314
const segmentStd = standardDeviation(segment);
315
const min = Math.min(...segment);
316
const max = Math.max(...segment);
317
318
console.log(`Segment ${i + 1}:`);
319
console.log(` Range: $${min} - $${max}`);
320
console.log(` Average: $${segmentMean.toFixed(0)}`);
321
console.log(` Std Dev: $${segmentStd.toFixed(0)}`);
322
console.log(` Count: ${segment.length} customers`);
323
});
324
325
// Use segments for targeted marketing strategies
326
const segmentNames = ["Low Spenders", "Regular Customers", "High Value", "VIP"];
327
segments.forEach((segment, i) => {
328
console.log(`${segmentNames[i]}: ${segment.length} customers spending $${Math.min(...segment)}-$${Math.max(...segment)}`);
329
});
330
```
331
332
### Anomaly Detection with Density Estimation
333
334
```typescript
335
import { kernelDensityEstimation, mean, standardDeviation } from "simple-statistics";
336
337
// Network traffic data (requests per minute)
338
const normalTraffic = [
339
45, 52, 48, 55, 50, 47, 53, 49, 51, 46,
340
54, 48, 52, 50, 49, 47, 53, 51, 48, 50
341
];
342
343
// Build density model of normal traffic
344
const densityModel = kernelDensityEstimation(normalTraffic);
345
const meanTraffic = mean(normalTraffic);
346
const stdTraffic = standardDeviation(normalTraffic);
347
348
// Monitor new traffic values
349
const newTrafficValues = [52, 48, 95, 51, 150, 49]; // Some potential anomalies
350
351
console.log(`Normal traffic: ${meanTraffic.toFixed(1)} ± ${stdTraffic.toFixed(1)} requests/min`);
352
console.log("\nAnomaly Detection:");
353
354
newTrafficValues.forEach(traffic => {
355
const density = densityModel(traffic);
356
const zScore = Math.abs((traffic - meanTraffic) / stdTraffic);
357
358
console.log(`Traffic: ${traffic} req/min`);
359
console.log(` Density: ${density.toFixed(6)}`);
360
console.log(` Z-score: ${zScore.toFixed(2)}`);
361
362
if (density < 0.001 || zScore > 3) {
363
console.log(` 🚨 ANOMALY DETECTED`);
364
} else {
365
console.log(` ✓ Normal`);
366
}
367
});
368
```