or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-operations.mdcombinatorics.mddata-manipulation.mddescriptive-statistics.mddistributions.mdindex.mdmachine-learning.mdmath-utilities.mdquantiles.mdregression.mdtesting.md

machine-learning.mddocs/

0

# Machine Learning

1

2

Classification algorithms and clustering methods for supervised and unsupervised learning tasks.

3

4

## Core Imports

5

6

```typescript

7

import {

8

BayesianClassifier,

9

PerceptronModel,

10

ckmeans,

11

kernelDensityEstimation

12

} from "simple-statistics";

13

```

14

15

## Classification Algorithms

16

17

### BayesianClassifier (alias: bayesian) { .api }

18

19

```typescript { .api }

20

class BayesianClassifier {

21

totalCount: number;

22

data: Record<string, any>;

23

24

constructor();

25

train(item: Record<string, any>, category: string): void;

26

score(item: Record<string, any>): Record<string, number>;

27

}

28

29

const bayesian: typeof BayesianClassifier;

30

```

31

32

Naive Bayesian classifier for categorical data. Assumes independence between features.

33

34

**Properties:**

35

- `totalCount: number` - Total number of training examples

36

- `data: Record<string, any>` - Internal storage for training data

37

38

**Methods:**

39

40

#### train { .api }

41

```typescript { .api }

42

train(item: Record<string, any>, category: string): void;

43

```

44

45

Trains the classifier with a labeled example.

46

47

**Parameters:**

48

- `item: Record<string, any>` - Feature object with key-value pairs

49

- `category: string` - Category label for this item

50

51

#### score { .api }

52

```typescript { .api }

53

score(item: Record<string, any>): Record<string, number>;

54

```

55

56

Classifies an item and returns probability scores for each category.

57

58

**Parameters:**

59

- `item: Record<string, any>` - Feature object to classify

60

61

**Returns:** `Record<string, number>` - Probability scores for each category

62

63

```typescript

64

import { BayesianClassifier } from "simple-statistics";

65

66

// Email spam classification

67

const classifier = new BayesianClassifier();

68

69

// Train with examples

70

classifier.train({ word: "free", count: 1 }, "spam");

71

classifier.train({ word: "meeting", count: 1 }, "ham");

72

classifier.train({ word: "free", count: 2, urgent: true }, "spam");

73

classifier.train({ word: "project", count: 1 }, "ham");

74

75

// Classify new email

76

const scores = classifier.score({ word: "free", count: 1 });

77

console.log(scores); // { spam: 0.67, ham: 0.33 }

78

79

// Get most likely category

80

const category = Object.keys(scores).reduce((a, b) =>

81

scores[a] > scores[b] ? a : b

82

);

83

console.log(`Predicted category: ${category}`);

84

```

85

86

### PerceptronModel (alias: perceptron) { .api }

87

88

```typescript { .api }

89

class PerceptronModel {

90

weights: number[];

91

bias: number;

92

93

constructor();

94

predict(features: number[]): number;

95

train(features: number[], label: number): PerceptronModel;

96

}

97

98

const perceptron: typeof PerceptronModel;

99

```

100

101

Linear perceptron for binary classification. Good for linearly separable data.

102

103

**Properties:**

104

- `weights: number[]` - Feature weights learned during training

105

- `bias: number` - Bias term

106

107

**Methods:**

108

109

#### predict { .api }

110

```typescript { .api }

111

predict(features: number[]): number;

112

```

113

114

Makes a prediction for given features.

115

116

**Parameters:**

117

- `features: number[]` - Feature vector

118

119

**Returns:** `number` - Predicted class (0 or 1)

120

121

#### train { .api }

122

```typescript { .api }

123

train(features: number[], label: number): PerceptronModel;

124

```

125

126

Updates the model with a training example.

127

128

**Parameters:**

129

- `features: number[]` - Feature vector

130

- `label: number` - True label (0 or 1)

131

132

**Returns:** `PerceptronModel` - Updated model (for chaining)

133

134

```typescript

135

import { PerceptronModel } from "simple-statistics";

136

137

// Binary classification: predict loan approval

138

const perceptron = new PerceptronModel();

139

140

// Training data: [income, credit_score, debt_ratio]

141

const trainingData = [

142

{ features: [50000, 700, 0.3], label: 1 }, // approved

143

{ features: [30000, 600, 0.8], label: 0 }, // denied

144

{ features: [80000, 750, 0.2], label: 1 }, // approved

145

{ features: [25000, 550, 0.9], label: 0 }, // denied

146

];

147

148

// Train the model

149

trainingData.forEach(({ features, label }) => {

150

perceptron.train(features, label);

151

});

152

153

// Make predictions

154

const applicant1 = [60000, 720, 0.4];

155

const prediction1 = perceptron.predict(applicant1);

156

console.log(`Loan decision: ${prediction1 ? 'Approved' : 'Denied'}`);

157

158

const applicant2 = [20000, 500, 1.2];

159

const prediction2 = perceptron.predict(applicant2);

160

console.log(`Loan decision: ${prediction2 ? 'Approved' : 'Denied'}`);

161

```

162

163

## Clustering

164

165

### ckmeans { .api }

166

167

```typescript { .api }

168

function ckmeans<T>(data: T[], nClusters: number): T[][];

169

```

170

171

Optimal k-means clustering in 1D using dynamic programming. Also known as Jenks natural breaks.

172

173

**Parameters:**

174

- `data: T[]` - Array of values to cluster

175

- `nClusters: number` - Number of clusters to create

176

177

**Returns:** `T[][]` - Array of clusters, each containing grouped values

178

179

**Use Cases:**

180

- Data visualization (choropleth maps)

181

- Natural breakpoints in continuous data

182

- Optimal binning for histograms

183

184

```typescript

185

import { ckmeans } from "simple-statistics";

186

187

// Income distribution clustering

188

const incomes = [25000, 28000, 30000, 45000, 48000, 50000, 75000, 80000, 150000, 200000];

189

const clusters = ckmeans(incomes, 3);

190

191

console.log("Income clusters:");

192

clusters.forEach((cluster, i) => {

193

const min = Math.min(...cluster);

194

const max = Math.max(...cluster);

195

console.log(`Cluster ${i + 1}: $${min.toLocaleString()} - $${max.toLocaleString()}`);

196

});

197

198

// Result might be:

199

// Cluster 1: $25,000 - $30,000 (Low income)

200

// Cluster 2: $45,000 - $50,000 (Middle income)

201

// Cluster 3: $75,000 - $200,000 (High income)

202

```

203

204

## Density Estimation

205

206

### kernelDensityEstimation (alias: kde) { .api }

207

208

```typescript { .api }

209

type Kernel = 'gaussian' | ((x: number) => number);

210

type BandwidthMethod = 'nrd' | number;

211

212

function kernelDensityEstimation(

213

X: number[],

214

kernel?: Kernel,

215

bandwidthMethod?: BandwidthMethod

216

): (x: number) => number;

217

```

218

219

Estimates probability density function from sample data using kernel density estimation.

220

221

**Parameters:**

222

- `X: number[]` - Sample data points

223

- `kernel?: Kernel` - Kernel function ('gaussian' or custom function)

224

- `bandwidthMethod?: BandwidthMethod` - Bandwidth selection ('nrd' or numeric value)

225

226

**Returns:** `(x: number) => number` - Density function that estimates P(X=x)

227

228

```typescript

229

import { kernelDensityEstimation } from "simple-statistics";

230

231

// Estimate probability density of test scores

232

const testScores = [65, 70, 75, 78, 80, 82, 85, 88, 90, 92];

233

const densityFunction = kernelDensityEstimation(testScores);

234

235

// Get density estimates

236

const density75 = densityFunction(75);

237

const density85 = densityFunction(85);

238

const density95 = densityFunction(95);

239

240

console.log(`Density at 75: ${density75.toFixed(4)}`);

241

console.log(`Density at 85: ${density85.toFixed(4)}`);

242

console.log(`Density at 95: ${density95.toFixed(4)}`);

243

244

// Plot density curve

245

const xRange = Array.from({ length: 50 }, (_, i) => 60 + i);

246

const densityCurve = xRange.map(x => ({ x, density: densityFunction(x) }));

247

console.log("Density curve:", densityCurve);

248

```

249

250

## Usage Examples

251

252

### Text Classification with Naive Bayes

253

254

```typescript

255

import { BayesianClassifier } from "simple-statistics";

256

257

// Sentiment analysis classifier

258

const sentimentClassifier = new BayesianClassifier();

259

260

// Training data

261

const trainingTexts = [

262

{ text: "love this product amazing quality", sentiment: "positive" },

263

{ text: "terrible service very disappointed", sentiment: "negative" },

264

{ text: "excellent fast shipping", sentiment: "positive" },

265

{ text: "defective item poor quality", sentiment: "negative" },

266

{ text: "great value highly recommend", sentiment: "positive" },

267

];

268

269

// Simple feature extraction (word presence)

270

function extractFeatures(text: string): Record<string, boolean> {

271

const words = text.toLowerCase().split(' ');

272

const features: Record<string, boolean> = {};

273

words.forEach(word => {

274

features[`word_${word}`] = true;

275

});

276

return features;

277

}

278

279

// Train classifier

280

trainingTexts.forEach(({ text, sentiment }) => {

281

const features = extractFeatures(text);

282

sentimentClassifier.train(features, sentiment);

283

});

284

285

// Classify new text

286

const newReview = "fast delivery great product";

287

const features = extractFeatures(newReview);

288

const scores = sentimentClassifier.score(features);

289

290

console.log("Review:", newReview);

291

console.log("Sentiment scores:", scores);

292

console.log("Predicted sentiment:",

293

Object.keys(scores).reduce((a, b) => scores[a] > scores[b] ? a : b)

294

);

295

```

296

297

### Customer Segmentation with Clustering

298

299

```typescript

300

import { ckmeans, mean, standardDeviation } from "simple-statistics";

301

302

// Customer purchase amounts over past year

303

const customerSpending = [

304

50, 75, 80, 120, 150, 180, 200, 250, 300, 350,

305

400, 500, 600, 800, 1000, 1200, 1500, 2000, 2500, 3000

306

];

307

308

// Find natural customer segments

309

const segments = ckmeans(customerSpending, 4);

310

311

console.log("Customer Segments:");

312

segments.forEach((segment, i) => {

313

const segmentMean = mean(segment);

314

const segmentStd = standardDeviation(segment);

315

const min = Math.min(...segment);

316

const max = Math.max(...segment);

317

318

console.log(`Segment ${i + 1}:`);

319

console.log(` Range: $${min} - $${max}`);

320

console.log(` Average: $${segmentMean.toFixed(0)}`);

321

console.log(` Std Dev: $${segmentStd.toFixed(0)}`);

322

console.log(` Count: ${segment.length} customers`);

323

});

324

325

// Use segments for targeted marketing strategies

326

const segmentNames = ["Low Spenders", "Regular Customers", "High Value", "VIP"];

327

segments.forEach((segment, i) => {

328

console.log(`${segmentNames[i]}: ${segment.length} customers spending $${Math.min(...segment)}-$${Math.max(...segment)}`);

329

});

330

```

331

332

### Anomaly Detection with Density Estimation

333

334

```typescript

335

import { kernelDensityEstimation, mean, standardDeviation } from "simple-statistics";

336

337

// Network traffic data (requests per minute)

338

const normalTraffic = [

339

45, 52, 48, 55, 50, 47, 53, 49, 51, 46,

340

54, 48, 52, 50, 49, 47, 53, 51, 48, 50

341

];

342

343

// Build density model of normal traffic

344

const densityModel = kernelDensityEstimation(normalTraffic);

345

const meanTraffic = mean(normalTraffic);

346

const stdTraffic = standardDeviation(normalTraffic);

347

348

// Monitor new traffic values

349

const newTrafficValues = [52, 48, 95, 51, 150, 49]; // Some potential anomalies

350

351

console.log(`Normal traffic: ${meanTraffic.toFixed(1)} ± ${stdTraffic.toFixed(1)} requests/min`);

352

console.log("\nAnomaly Detection:");

353

354

newTrafficValues.forEach(traffic => {

355

const density = densityModel(traffic);

356

const zScore = Math.abs((traffic - meanTraffic) / stdTraffic);

357

358

console.log(`Traffic: ${traffic} req/min`);

359

console.log(` Density: ${density.toFixed(6)}`);

360

console.log(` Z-score: ${zScore.toFixed(2)}`);

361

362

if (density < 0.001 || zScore > 3) {

363

console.log(` 🚨 ANOMALY DETECTED`);

364

} else {

365

console.log(` ✓ Normal`);

366

}

367

});

368

```