Tessl Tile for npm/simple-statistics@6.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

array-operations.md combinatorics.md data-manipulation.md descriptive-statistics.md distributions.md index.md machine-learning.md math-utilities.md quantiles.md regression.md testing.md

machine-learning.mddocs/

0
# Machine Learning
1

2
Classification algorithms and clustering methods for supervised and unsupervised learning tasks.
3

4
## Core Imports
5

6
```typescript
7
import { 
8
  BayesianClassifier,
9
  PerceptronModel,
10
  ckmeans,
11
  kernelDensityEstimation
12
} from "simple-statistics";
13
```
14

15
## Classification Algorithms
16

17
### BayesianClassifier (alias: bayesian) { .api }
18

19
```typescript { .api }
20
class BayesianClassifier {
21
  totalCount: number;
22
  data: Record<string, any>;
23
  
24
  constructor();
25
  train(item: Record<string, any>, category: string): void;
26
  score(item: Record<string, any>): Record<string, number>;
27
}
28

29
const bayesian: typeof BayesianClassifier;
30
```
31

32
Naive Bayesian classifier for categorical data. Assumes independence between features.
33

34
**Properties:**
35
- `totalCount: number` - Total number of training examples
36
- `data: Record<string, any>` - Internal storage for training data
37

38
**Methods:**
39

40
#### train { .api }
41
```typescript { .api }
42
train(item: Record<string, any>, category: string): void;
43
```
44

45
Trains the classifier with a labeled example.
46

47
**Parameters:**
48
- `item: Record<string, any>` - Feature object with key-value pairs
49
- `category: string` - Category label for this item
50

51
#### score { .api }
52
```typescript { .api }
53
score(item: Record<string, any>): Record<string, number>;
54
```
55

56
Classifies an item and returns probability scores for each category.
57

58
**Parameters:**
59
- `item: Record<string, any>` - Feature object to classify
60

61
**Returns:** `Record<string, number>` - Probability scores for each category
62

63
```typescript
64
import { BayesianClassifier } from "simple-statistics";
65

66
// Email spam classification
67
const classifier = new BayesianClassifier();
68

69
// Train with examples
70
classifier.train({ word: "free", count: 1 }, "spam");
71
classifier.train({ word: "meeting", count: 1 }, "ham");
72
classifier.train({ word: "free", count: 2, urgent: true }, "spam");
73
classifier.train({ word: "project", count: 1 }, "ham");
74

75
// Classify new email
76
const scores = classifier.score({ word: "free", count: 1 });
77
console.log(scores); // { spam: 0.67, ham: 0.33 }
78

79
// Get most likely category
80
const category = Object.keys(scores).reduce((a, b) => 
81
  scores[a] > scores[b] ? a : b
82
);
83
console.log(`Predicted category: ${category}`);
84
```
85

86
### PerceptronModel (alias: perceptron) { .api }
87

88
```typescript { .api }
89
class PerceptronModel {
90
  weights: number[];
91
  bias: number;
92
  
93
  constructor();
94
  predict(features: number[]): number;
95
  train(features: number[], label: number): PerceptronModel;
96
}
97

98
const perceptron: typeof PerceptronModel;
99
```
100

101
Linear perceptron for binary classification. Good for linearly separable data.
102

103
**Properties:**
104
- `weights: number[]` - Feature weights learned during training
105
- `bias: number` - Bias term
106

107
**Methods:**
108

109
#### predict { .api }
110
```typescript { .api }
111
predict(features: number[]): number;
112
```
113

114
Makes a prediction for given features.
115

116
**Parameters:**
117
- `features: number[]` - Feature vector
118

119
**Returns:** `number` - Predicted class (0 or 1)
120

121
#### train { .api }
122
```typescript { .api }
123
train(features: number[], label: number): PerceptronModel;
124
```
125

126
Updates the model with a training example.
127

128
**Parameters:**
129
- `features: number[]` - Feature vector
130
- `label: number` - True label (0 or 1)
131

132
**Returns:** `PerceptronModel` - Updated model (for chaining)
133

134
```typescript
135
import { PerceptronModel } from "simple-statistics";
136

137
// Binary classification: predict loan approval
138
const perceptron = new PerceptronModel();
139

140
// Training data: [income, credit_score, debt_ratio]
141
const trainingData = [
142
  { features: [50000, 700, 0.3], label: 1 }, // approved
143
  { features: [30000, 600, 0.8], label: 0 }, // denied
144
  { features: [80000, 750, 0.2], label: 1 }, // approved
145
  { features: [25000, 550, 0.9], label: 0 }, // denied
146
];
147

148
// Train the model
149
trainingData.forEach(({ features, label }) => {
150
  perceptron.train(features, label);
151
});
152

153
// Make predictions
154
const applicant1 = [60000, 720, 0.4];
155
const prediction1 = perceptron.predict(applicant1);
156
console.log(`Loan decision: ${prediction1 ? 'Approved' : 'Denied'}`);
157

158
const applicant2 = [20000, 500, 1.2];
159
const prediction2 = perceptron.predict(applicant2);
160
console.log(`Loan decision: ${prediction2 ? 'Approved' : 'Denied'}`);
161
```
162

163
## Clustering
164

165
### ckmeans { .api }
166

167
```typescript { .api }
168
function ckmeans<T>(data: T[], nClusters: number): T[][];
169
```
170

171
Optimal k-means clustering in 1D using dynamic programming. Also known as Jenks natural breaks.
172

173
**Parameters:**
174
- `data: T[]` - Array of values to cluster
175
- `nClusters: number` - Number of clusters to create
176

177
**Returns:** `T[][]` - Array of clusters, each containing grouped values
178

179
**Use Cases:**
180
- Data visualization (choropleth maps)
181
- Natural breakpoints in continuous data
182
- Optimal binning for histograms
183

184
```typescript
185
import { ckmeans } from "simple-statistics";
186

187
// Income distribution clustering
188
const incomes = [25000, 28000, 30000, 45000, 48000, 50000, 75000, 80000, 150000, 200000];
189
const clusters = ckmeans(incomes, 3);
190

191
console.log("Income clusters:");
192
clusters.forEach((cluster, i) => {
193
  const min = Math.min(...cluster);
194
  const max = Math.max(...cluster);
195
  console.log(`Cluster ${i + 1}: $${min.toLocaleString()} - $${max.toLocaleString()}`);
196
});
197

198
// Result might be:
199
// Cluster 1: $25,000 - $30,000 (Low income)
200
// Cluster 2: $45,000 - $50,000 (Middle income)  
201
// Cluster 3: $75,000 - $200,000 (High income)
202
```
203

204
## Density Estimation
205

206
### kernelDensityEstimation (alias: kde) { .api }
207

208
```typescript { .api }
209
type Kernel = 'gaussian' | ((x: number) => number);
210
type BandwidthMethod = 'nrd' | number;
211

212
function kernelDensityEstimation(
213
  X: number[], 
214
  kernel?: Kernel, 
215
  bandwidthMethod?: BandwidthMethod
216
): (x: number) => number;
217
```
218

219
Estimates probability density function from sample data using kernel density estimation.
220

221
**Parameters:**
222
- `X: number[]` - Sample data points
223
- `kernel?: Kernel` - Kernel function ('gaussian' or custom function)
224
- `bandwidthMethod?: BandwidthMethod` - Bandwidth selection ('nrd' or numeric value)
225

226
**Returns:** `(x: number) => number` - Density function that estimates P(X=x)
227

228
```typescript
229
import { kernelDensityEstimation } from "simple-statistics";
230

231
// Estimate probability density of test scores
232
const testScores = [65, 70, 75, 78, 80, 82, 85, 88, 90, 92];
233
const densityFunction = kernelDensityEstimation(testScores);
234

235
// Get density estimates
236
const density75 = densityFunction(75);
237
const density85 = densityFunction(85);
238
const density95 = densityFunction(95);
239

240
console.log(`Density at 75: ${density75.toFixed(4)}`);
241
console.log(`Density at 85: ${density85.toFixed(4)}`);
242
console.log(`Density at 95: ${density95.toFixed(4)}`);
243

244
// Plot density curve
245
const xRange = Array.from({ length: 50 }, (_, i) => 60 + i);
246
const densityCurve = xRange.map(x => ({ x, density: densityFunction(x) }));
247
console.log("Density curve:", densityCurve);
248
```
249

250
## Usage Examples
251

252
### Text Classification with Naive Bayes
253

254
```typescript
255
import { BayesianClassifier } from "simple-statistics";
256

257
// Sentiment analysis classifier
258
const sentimentClassifier = new BayesianClassifier();
259

260
// Training data
261
const trainingTexts = [
262
  { text: "love this product amazing quality", sentiment: "positive" },
263
  { text: "terrible service very disappointed", sentiment: "negative" },
264
  { text: "excellent fast shipping", sentiment: "positive" },
265
  { text: "defective item poor quality", sentiment: "negative" },
266
  { text: "great value highly recommend", sentiment: "positive" },
267
];
268

269
// Simple feature extraction (word presence)
270
function extractFeatures(text: string): Record<string, boolean> {
271
  const words = text.toLowerCase().split(' ');
272
  const features: Record<string, boolean> = {};
273
  words.forEach(word => {
274
    features[`word_${word}`] = true;
275
  });
276
  return features;
277
}
278

279
// Train classifier
280
trainingTexts.forEach(({ text, sentiment }) => {
281
  const features = extractFeatures(text);
282
  sentimentClassifier.train(features, sentiment);
283
});
284

285
// Classify new text
286
const newReview = "fast delivery great product";
287
const features = extractFeatures(newReview);
288
const scores = sentimentClassifier.score(features);
289

290
console.log("Review:", newReview);
291
console.log("Sentiment scores:", scores);
292
console.log("Predicted sentiment:", 
293
  Object.keys(scores).reduce((a, b) => scores[a] > scores[b] ? a : b)
294
);
295
```
296

297
### Customer Segmentation with Clustering
298

299
```typescript
300
import { ckmeans, mean, standardDeviation } from "simple-statistics";
301

302
// Customer purchase amounts over past year
303
const customerSpending = [
304
  50, 75, 80, 120, 150, 180, 200, 250, 300, 350,
305
  400, 500, 600, 800, 1000, 1200, 1500, 2000, 2500, 3000
306
];
307

308
// Find natural customer segments
309
const segments = ckmeans(customerSpending, 4);
310

311
console.log("Customer Segments:");
312
segments.forEach((segment, i) => {
313
  const segmentMean = mean(segment);
314
  const segmentStd = standardDeviation(segment);
315
  const min = Math.min(...segment);
316
  const max = Math.max(...segment);
317
  
318
  console.log(`Segment ${i + 1}:`);
319
  console.log(`  Range: $${min} - $${max}`);
320
  console.log(`  Average: $${segmentMean.toFixed(0)}`);
321
  console.log(`  Std Dev: $${segmentStd.toFixed(0)}`);
322
  console.log(`  Count: ${segment.length} customers`);
323
});
324

325
// Use segments for targeted marketing strategies
326
const segmentNames = ["Low Spenders", "Regular Customers", "High Value", "VIP"];
327
segments.forEach((segment, i) => {
328
  console.log(`${segmentNames[i]}: ${segment.length} customers spending $${Math.min(...segment)}-$${Math.max(...segment)}`);
329
});
330
```
331

332
### Anomaly Detection with Density Estimation
333

334
```typescript
335
import { kernelDensityEstimation, mean, standardDeviation } from "simple-statistics";
336

337
// Network traffic data (requests per minute)
338
const normalTraffic = [
339
  45, 52, 48, 55, 50, 47, 53, 49, 51, 46,
340
  54, 48, 52, 50, 49, 47, 53, 51, 48, 50
341
];
342

343
// Build density model of normal traffic
344
const densityModel = kernelDensityEstimation(normalTraffic);
345
const meanTraffic = mean(normalTraffic);
346
const stdTraffic = standardDeviation(normalTraffic);
347

348
// Monitor new traffic values
349
const newTrafficValues = [52, 48, 95, 51, 150, 49]; // Some potential anomalies
350

351
console.log(`Normal traffic: ${meanTraffic.toFixed(1)} ± ${stdTraffic.toFixed(1)} requests/min`);
352
console.log("\nAnomaly Detection:");
353

354
newTrafficValues.forEach(traffic => {
355
  const density = densityModel(traffic);
356
  const zScore = Math.abs((traffic - meanTraffic) / stdTraffic);
357
  
358
  console.log(`Traffic: ${traffic} req/min`);
359
  console.log(`  Density: ${density.toFixed(6)}`);
360
  console.log(`  Z-score: ${zScore.toFixed(2)}`);
361
  
362
  if (density < 0.001 || zScore > 3) {
363
    console.log(`  🚨 ANOMALY DETECTED`);
364
  } else {
365
    console.log(`  ✓ Normal`);
366
  }
367
});
368
```

Version

Tile

Files

machine-learning.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

machine-learning.mddocs/