Tessl Tile for npm/mnemonist@0.40.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

caches.md heaps.md index.md linear.md maps.md sets.md specialized.md trees.md vectors.md

specialized.mddocs/

0
# Specialized Algorithms & Structures
1

2
Advanced data structures for specific algorithmic applications including bloom filters, suffix arrays, and search indexes.
3

4
## Capabilities
5

6
### BloomFilter
7

8
Probabilistic data structure for membership testing with configurable false positive rate.
9

10
```typescript { .api }
11
/**
12
 * Bloom filter for probabilistic membership testing
13
 * @param capacity Expected number of items
14
 */
15
function BloomFilter(capacity: number): BloomFilter;
16

17
/**
18
 * Bloom filter with options object
19
 * @param options Configuration object with capacity and optional errorRate
20
 */
21
function BloomFilter(options: BloomFilterOptions): BloomFilter;
22

23
type BloomFilterOptions = {
24
  capacity: number;
25
  errorRate?: number;
26
}
27

28
interface BloomFilter {
29
  /** Expected capacity */
30
  readonly capacity: number;
31
  
32
  /** Configured error rate */
33
  readonly errorRate: number;
34
  
35
  /** Number of hash functions used */
36
  readonly hashFunctions: number;
37
  
38
  /** Add string to filter */
39
  add(item: string): this;
40
  
41
  /** Test if string might be in filter */
42
  test(item: string): boolean;
43
  
44
  /** Remove all items */
45
  clear(): void;
46
  
47
  /** Export filter state as Uint8Array */
48
  toJSON(): Uint8Array;
49
  
50
  inspect(): any;
51
  
52
  /** Create bloom filter from iterable */
53
  static from(iterable: Iterable<string>, options?: number | BloomFilterOptions): BloomFilter;
54
}
55
```
56

57
### SuffixArray
58

59
Suffix array for efficient string matching and substring queries.
60

61
```typescript { .api }
62
/**
63
 * Suffix array for string pattern matching
64
 * @param text Input string or array of strings
65
 */
66
function SuffixArray(text: string | Array<string>): SuffixArray;
67

68
interface SuffixArray {
69
  /** The suffix array */
70
  readonly array: Uint32Array;
71
  
72
  /** Length of the suffix array */
73
  readonly length: number;
74
  
75
  /** Original string(s) */
76
  readonly string: string | Array<string>;
77
  
78
  /** Search for pattern occurrences */
79
  search(pattern: string): Array<number>;
80
  
81
  /** Find longest common substring */
82
  longestCommonSubstring(): {length: number, index: number} | null;
83
  
84
  inspect(): any;
85
}
86
```
87

88
### GeneralizedSuffixArray
89

90
Suffix array for multiple strings with cross-string pattern matching.
91

92
```typescript { .api }
93
/**
94
 * Generalized suffix array for multiple strings
95
 * @param strings Array of input strings
96
 */
97
function GeneralizedSuffixArray(strings: Array<string>): GeneralizedSuffixArray;
98

99
interface GeneralizedSuffixArray {
100
  readonly array: Uint32Array;
101
  readonly length: number;
102
  readonly strings: Array<string>;
103
  
104
  /** Search for pattern across all strings */
105
  search(pattern: string): Array<{string: number, index: number}>;
106
  
107
  /** Find longest common substring across strings */
108
  longestCommonSubstring(): {length: number, strings: Array<number>, index: number} | null;
109
  
110
  inspect(): any;
111
}
112
```
113

114
### InvertedIndex
115

116
Full-text search index with TF-IDF scoring and configurable tokenization.
117

118
```typescript { .api }
119
/**
120
 * Inverted index for full-text search
121
 * @param tokenizer Optional custom tokenizer function
122
 */
123
function InvertedIndex(tokenizer?: TokenizerFunction): InvertedIndex;
124

125
interface InvertedIndex {
126
  readonly size: number;
127
  
128
  /** Add document with its tokens */
129
  add(document: any, tokens: Array<string>): this;
130
  
131
  /** Search for documents matching query */
132
  search(query: Array<string>): Array<{document: any, score: number}>;
133
  
134
  /** Get all documents containing term */
135
  get(term: string): Array<any>;
136
  
137
  /** Remove all documents and terms */
138
  clear(): void;
139
  
140
  inspect(): any;
141
}
142

143
type TokenizerFunction = (text: string) => Array<string>;
144
```
145

146
### SymSpell
147

148
Spelling correction data structure using symmetric delete spelling correction algorithm.
149

150
```typescript { .api }
151
/**
152
 * SymSpell algorithm for spelling correction
153
 * @param options Configuration options
154
 */
155
function SymSpell(options?: SymSpellOptions): SymSpell;
156

157
interface SymSpellOptions {
158
  /** Maximum edit distance for corrections (default: 2) */
159
  maxEditDistance?: number;
160
  /** Minimum frequency threshold (default: 1) */
161
  countThreshold?: number;
162
  /** Maximum number of suggestions (default: 5) */
163
  maxSuggestions?: number;
164
}
165

166
interface SymSpell {
167
  readonly size: number;
168
  
169
  /** Add word with optional frequency */
170
  add(word: string, frequency?: number): this;
171
  
172
  /** Get spelling corrections for query */
173
  search(query: string, maxEditDistance?: number, suggestion?: SuggestionVerbosity): Array<Suggestion>;
174
  
175
  /** Remove all words */
176
  clear(): void;
177
  
178
  inspect(): any;
179
}
180

181
interface Suggestion {
182
  term: string;
183
  distance: number;
184
  frequency: number;
185
}
186

187
enum SuggestionVerbosity {
188
  Top,      // Only top suggestion
189
  Closest,  // All suggestions with minimum edit distance
190
  All       // All suggestions within edit distance
191
}
192
```
193

194
### PassjoinIndex
195

196
Index for efficient similarity joins using the Passjoin algorithm.
197

198
```typescript { .api }
199
/**
200
 * Passjoin index for similarity joins
201
 * @param options Configuration for similarity matching
202
 */
203
function PassjoinIndex(options: PassjoinOptions): PassjoinIndex;
204

205
interface PassjoinOptions {
206
  /** Similarity threshold (0-1) */
207
  threshold: number;
208
  /** Tokenizer function */
209
  tokenizer: TokenizerFunction;
210
  /** Optional distance function */
211
  distance?: DistanceFunction;
212
}
213

214
interface PassjoinIndex {
215
  readonly size: number;
216
  
217
  /** Add record to index */
218
  add(record: any): this;
219
  
220
  /** Find similar records above threshold */
221
  search(query: any, threshold?: number): Array<{record: any, similarity: number}>;
222
  
223
  /** Remove all records */
224
  clear(): void;
225
  
226
  inspect(): any;
227
}
228
```
229

230
### HashedArrayTree
231

232
Hash array mapped trie for efficient dynamic array operations.
233

234
```typescript { .api }
235
/**
236
 * Hashed Array Tree for dynamic arrays
237
 * @param options Optional configuration
238
 */
239
function HashedArrayTree<T>(options?: HATOptions): HashedArrayTree<T>;
240

241
interface HATOptions {
242
  initialCapacity?: number;
243
  factor?: number;
244
}
245

246
interface HashedArrayTree<T> {
247
  readonly length: number;
248
  readonly size: number;
249
  
250
  /** Add item to end */
251
  push(item: T): number;
252
  
253
  /** Get item at index */
254
  get(index: number): T | undefined;
255
  
256
  /** Set item at index */
257
  set(index: number, value: T): void;
258
  
259
  /** Remove and return last item */
260
  pop(): T | undefined;
261
  
262
  /** Remove all items */
263
  clear(): void;
264
  
265
  values(): IterableIterator<T>;
266
  [Symbol.iterator](): IterableIterator<T>;
267
  
268
  inspect(): any;
269
}
270
```
271

272
## Usage Examples
273

274
### BloomFilter Example
275

276
```typescript
277
import {BloomFilter} from "mnemonist";
278

279
// URL deduplication with 1% false positive rate
280
const seenUrls = new BloomFilter(1000000, 0.01);
281

282
function crawlUrl(url: string) {
283
  if (seenUrls.test(url)) {
284
    console.log("Probably already crawled:", url);
285
    return; // Might be false positive, but likely already seen
286
  }
287
  
288
  // Crawl the URL
289
  console.log("Crawling new URL:", url);
290
  seenUrls.add(url);
291
}
292

293
crawlUrl("https://example.com");
294
crawlUrl("https://example.org");
295
crawlUrl("https://example.com"); // "Probably already crawled"
296
```
297

298
### SuffixArray Example
299

300
```typescript
301
import {SuffixArray} from "mnemonist";
302

303
// Text search and analysis
304
const text = "banana";
305
const suffixArray = new SuffixArray(text);
306

307
// Find all occurrences of pattern
308
const matches = suffixArray.search("ana");
309
console.log(matches); // [1, 3] - positions where "ana" occurs
310

311
// Find longest repeated substring
312
const lcs = suffixArray.longestCommonSubstring();
313
console.log(lcs); // {length: 3, index: 1} - "ana" at position 1
314

315
// DNA sequence analysis
316
const dna = "ATCGATCGATCG";
317
const dnaArray = new SuffixArray(dna);
318
const repeats = dnaArray.search("ATCG");
319
console.log(repeats); // All positions of ATCG pattern
320
```
321

322
### InvertedIndex Example
323

324
```typescript
325
import {InvertedIndex} from "mnemonist";
326

327
// Simple search engine
328
const searchIndex = new InvertedIndex();
329

330
// Custom tokenizer
331
function tokenize(text: string): Array<string> {
332
  return text.toLowerCase()
333
    .replace(/[^\w\s]/g, "")
334
    .split(/\s+/)
335
    .filter(token => token.length > 2);
336
}
337

338
const index = new InvertedIndex(tokenize);
339

340
// Add documents
341
index.add({id: 1, title: "Machine Learning Basics"}, 
342
          tokenize("Introduction to machine learning algorithms"));
343
index.add({id: 2, title: "Deep Learning Guide"},
344
          tokenize("Deep neural networks and machine learning"));
345
index.add({id: 3, title: "Data Science Tools"},
346
          tokenize("Tools for data analysis and statistics"));
347

348
// Search for documents
349
const results = index.search(tokenize("machine learning"));
350
results.forEach(result => {
351
  console.log(`Document ${result.document.id}: ${result.score}`);
352
});
353
```
354

355
### SymSpell Example
356

357
```typescript
358
import {SymSpell} from "mnemonist";
359

360
// Spell checker
361
const spellChecker = new SymSpell({
362
  maxEditDistance: 2,
363
  maxSuggestions: 5
364
});
365

366
// Add dictionary words with frequencies
367
const dictionary = [
368
  {word: "javascript", freq: 1000},
369
  {word: "typescript", freq: 800},
370
  {word: "python", freq: 1200},
371
  {word: "programming", freq: 900}
372
];
373

374
dictionary.forEach(({word, freq}) => {
375
  spellChecker.add(word, freq);
376
});
377

378
// Get spelling suggestions
379
const suggestions = spellChecker.search("javascrip");
380
suggestions.forEach(suggestion => {
381
  console.log(`${suggestion.term} (distance: ${suggestion.distance}, freq: ${suggestion.frequency})`);
382
});
383
// Output: javascript (distance: 1, freq: 1000)
384
```
385

386
### PassjoinIndex Example
387

388
```typescript
389
import {PassjoinIndex} from "mnemonist";
390

391
// Duplicate detection in records
392
function jaccardSimilarity(tokensA: Set<string>, tokensB: Set<string>): number {
393
  const intersection = new Set([...tokensA].filter(x => tokensB.has(x)));
394
  const union = new Set([...tokensA, ...tokensB]);
395
  return intersection.size / union.size;
396
}
397

398
const duplicateDetector = new PassjoinIndex({
399
  threshold: 0.8,
400
  tokenizer: (text: string) => text.toLowerCase().split(/\s+/),
401
  distance: jaccardSimilarity
402
});
403

404
// Add product records
405
duplicateDetector.add({id: 1, name: "Apple iPhone 13 Pro Max"});
406
duplicateDetector.add({id: 2, name: "Samsung Galaxy S21 Ultra"});
407
duplicateDetector.add({id: 3, name: "iPhone 13 Pro Max Apple"});
408

409
// Find duplicates
410
const query = {name: "Apple iPhone 13 Pro Max 256GB"};
411
const similar = duplicateDetector.search(query, 0.7);
412
similar.forEach(match => {
413
  console.log(`Similar: ${match.record.name} (${match.similarity})`);
414
});
415
```
416

417
## Performance Characteristics
418

419
| Structure | Add/Insert | Search | Space | Notes |
420
|-----------|------------|--------|-------|--------|
421
| BloomFilter | O(k) | O(k) | O(m) | k=hash functions, m=bits |
422
| SuffixArray | O(n log n) | O(m log n) | O(n) | n=text length, m=pattern |
423
| InvertedIndex | O(d) | O(q + r) | O(V + D) | d=doc terms, q=query, r=results |
424
| SymSpell | O(1) | O(n) | O(n·k) | n=dictionary size, k=deletions |
425
| PassjoinIndex | O(s) | O(s·c) | O(n·s) | s=signature size, c=candidates |
426
| HashedArrayTree | O(1) amortized | O(1) | O(n) | Dynamic array operations |
427
428
## Use Cases
429

430
### BloomFilter
431
- Web crawling deduplication
432
- Database query optimization
433
- Distributed caching
434
- Network packet filtering
435

436
### SuffixArray
437
- Bioinformatics (DNA/protein analysis)
438
- Full-text search engines
439
- Data compression algorithms
440
- Plagiarism detection
441

442
### InvertedIndex
443
- Search engines
444
- Document retrieval systems
445
- Log analysis
446
- Content management systems
447

448
### SymSpell
449
- Spell checkers
450
- Query suggestion systems
451
- OCR error correction
452
- Fuzzy search applications
453

454
### PassjoinIndex
455
- Duplicate record detection
456
- Data cleaning pipelines
457
- Entity resolution
458
- Similarity search systems

Version

Tile

Files

specialized.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

specialized.mddocs/