Tessl Tile for npm/next-token-prediction@1.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md language-model.md text-prediction.md training-system.md vector-operations.md

vector-operations.mddocs/

0
# Vector Operations
1

2
The Vector Operations system provides internal fixed-dimension vector functionality for embedding representations and similarity calculations. The Vector class extends JavaScript's native Array with validation and utility methods specifically designed for high-dimensional token embeddings. This is an internal component not directly exported from the main package.
3

4
## Capabilities
5

6
### Vector Class
7

8
A specialized Array extension that enforces fixed dimensions and provides vector-specific functionality.
9

10
```javascript { .api }
11
/**
12
 * Vector class extending Array with fixed dimensions
13
 * @extends Array
14
 */
15
class Vector extends Array {
16
  /**
17
   * Create zero-filled vector with default dimensions
18
   * @returns {Vector} New vector filled with zeros
19
   */
20
  static fromNull();
21

22
  /**
23
   * Create vector with length validation
24
   * @param {...*} arguments - Array constructor arguments
25
   * @throws {string} "RangeError: Invalid vector length." if length doesn't match DIMENSIONS
26
   */
27
  constructor(...arguments);
28
}
29
```
30

31
**Usage Examples:**
32

33
```javascript
34
const Vector = require('next-token-prediction/components/Vector');
35

36
// Create zero vector
37
const zeroVector = Vector.fromNull();
38
console.log(zeroVector.length); // 144 (default DIMENSIONS)
39
console.log(zeroVector[0]);     // 0
40

41
// Create vector with specific values (must match DIMENSIONS)
42
const customVector = new Vector(...new Array(144).fill(0.5));
43

44
// This would throw an error (wrong length)
45
try {
46
  const invalidVector = new Vector(1, 2, 3); // Only 3 elements, needs 144
47
} catch (error) {
48
  console.log(error); // "RangeError: Invalid vector length."
49
}
50
```
51

52
### Vector Utilities
53

54
Internal utility functions for vector operations used throughout the prediction system.
55

56
```javascript { .api }
57
/**
58
 * Calculate dot product of two vectors
59
 * @param {Vector|number[]} vectorA - First vector (defaults to null vector)
60
 * @param {Vector|number[]} vectorB - Second vector (defaults to null vector)
61
 * @returns {number} Dot product result
62
 */
63
function dotProduct(vectorA = Vector.fromNull(), vectorB = Vector.fromNull());
64
```
65

66
**Usage Examples:**
67

68
```javascript
69
// Calculate similarity between token embeddings
70
const embedding1 = Vector.fromNull();
71
const embedding2 = Vector.fromNull();
72

73
// Set some values for demonstration
74
embedding1[0] = 0.8;
75
embedding1[1] = 0.6;
76
embedding2[0] = 0.6;
77
embedding2[1] = 0.8;
78

79
const similarity = dotProduct(embedding1, embedding2);
80
console.log('Similarity score:', similarity); // 0.96 (0.8*0.6 + 0.6*0.8)
81

82
// Use in token similarity comparison
83
const tokenEmbedding = embeddingSearch('hello', 'world');
84
const candidateEmbedding = embeddingSearch('hello', 'there');
85
const score = dotProduct(tokenEmbedding, candidateEmbedding);
86
```
87

88
## Vector Configuration
89

90
The vector system is configured through environment variables:
91

92
```javascript { .api }
93
/**
94
 * Vector configuration environment variables
95
 */
96
interface VectorConfig {
97
  DIMENSIONS: number;  // Vector dimensionality (default: 144)
98
}
99
```
100

101
**Configuration Examples:**
102

103
```bash
104
# Standard configuration (default)
105
export DIMENSIONS=144
106

107
# Higher dimensional embeddings for more complex models
108
export DIMENSIONS=256
109

110
# Lower dimensional for faster processing/less memory
111
export DIMENSIONS=64
112
```
113

114
## Embedding Structure
115

116
Vectors are used to represent token embeddings with specific meaning for each dimension:
117

118
### Dimension Mapping
119

120
```javascript { .api }
121
/**
122
 * Vector dimension allocation for token embeddings
123
 */
124
interface VectorDimensions {
125
  // Character composition (positions 0-65): 66 dimensions
126
  characterDistribution: {
127
    startIndex: 0;
128
    length: 66;
129
    description: "Distribution of alphanumeric characters (A-Z, a-z, 0-9, #$%&)";
130
  };
131

132
  // Parts of speech (positions 66-101): 36 dimensions
133
  partOfSpeech: {
134
    startIndex: 66;
135
    length: 36;
136
    description: "Grammatical role indicators (CC, CD, DT, EX, FW, IN, JJ, etc.)";
137
  };
138

139
  // Token prevalence (position 102): 1 dimension
140
  prevalence: {
141
    startIndex: 102;
142
    length: 1;
143
    description: "Token frequency in training dataset (normalized 0-1)";
144
  };
145

146
  // Word suffixes (positions 103-139): 37 dimensions
147
  suffixes: {
148
    startIndex: 103;
149
    length: 37;
150
    description: "Common rhyme/ending patterns (ack, ail, ain, ake, etc.)";
151
  };
152

153
  // Next-word frequency (position 140): 1 dimension
154
  nextWordFrequency: {
155
    startIndex: 140;
156
    length: 1;
157
    description: "Normalized co-occurrence frequency";
158
  };
159

160
  // Content filtering (position 141): 1 dimension
161
  vulgarity: {
162
    startIndex: 141;
163
    length: 1;
164
    description: "Profanity detection (currently placeholder)";
165
  };
166

167
  // Style features (positions 142-143): 2 dimensions
168
  style: {
169
    startIndex: 142;
170
    length: 2;
171
    description: "Stylistic features [pirate, victorian]";
172
  };
173
}
174
```
175

176
### Vector Operations in Practice
177

178
```javascript { .api }
179
/**
180
 * Example vector operations for token similarity
181
 */
182
interface VectorOperations {
183
  // Get embedding for token pair
184
  embeddingSearch(prevToken: string, token: string): Vector;
185

186
  // Find similar tokens using vector similarity
187
  getSimilarToken(prevToken: string, token: string): {
188
    token: string;
189
    rankedTokenList: string[];
190
  };
191

192
  // Calculate token relationship strength
193
  dotProduct(vectorA: Vector, vectorB: Vector): number;
194
}
195
```
196

197
**Advanced Usage Examples:**
198

199
```javascript
200
// Analyze token relationships
201
function analyzeTokenRelationship(token1, token2, prevContext) {
202
  const embedding1 = embeddingSearch(prevContext, token1);
203
  const embedding2 = embeddingSearch(prevContext, token2);
204

205
  const similarity = dotProduct(embedding1, embedding2);
206

207
  return {
208
    similarity,
209
    isHighlySimilar: similarity > 0.8,
210
    characterSimilarity: dotProduct(
211
      embedding1.slice(0, 66),
212
      embedding2.slice(0, 66)
213
    ),
214
    grammaticalSimilarity: dotProduct(
215
      embedding1.slice(66, 102),
216
      embedding2.slice(66, 102)
217
    )
218
  };
219
}
220

221
// Custom vector creation for analysis
222
function createCustomEmbedding() {
223
  const vector = Vector.fromNull();
224

225
  // Set character distribution (first 66 dimensions)
226
  vector[0] = 0.1; // 'A' frequency
227
  vector[1] = 0.05; // 'B' frequency
228
  // ... continue for all characters
229

230
  // Set part-of-speech indicators (dimensions 66-101)
231
  vector[66] = 1.0; // CC (coordinating conjunction)
232
  vector[67] = 0.0; // CD (cardinal digit)
233
  // ... continue for all POS tags
234

235
  return vector;
236
}
237
```
238

239
## Memory and Performance
240

241
### Memory Usage
242

243
- Each Vector instance: `DIMENSIONS * 8 bytes` (64-bit numbers)
244
- Default 144-dimension vector: ~1.15KB per instance
245
- Large embedding sets: Multiple MB for thousands of token pairs
246
- Zero vectors cached for efficiency via `Vector.fromNull()`
247

248
### Performance Characteristics
249

250
- Vector creation: O(DIMENSIONS) time complexity
251
- Dot product calculation: O(DIMENSIONS) time complexity
252
- Memory access: Native Array performance with dimension validation
253
- Similarity search: O(n * DIMENSIONS) where n = number of candidate tokens
254

255
### Optimization Tips
256

257
```javascript
258
// Reuse zero vectors instead of creating new ones
259
const sharedZero = Vector.fromNull();
260

261
// Cache frequently accessed embeddings
262
const embeddingCache = new Map();
263
function getCachedEmbedding(prev, token) {
264
  const key = `${prev}:${token}`;
265
  if (!embeddingCache.has(key)) {
266
    embeddingCache.set(key, embeddingSearch(prev, token));
267
  }
268
  return embeddingCache.get(key);
269
}
270

271
// Batch similarity calculations for efficiency
272
function batchSimilarity(queryEmbedding, candidateEmbeddings) {
273
  return candidateEmbeddings.map(candidate =>
274
    dotProduct(queryEmbedding, candidate)
275
  );
276
}
277
```
278

279
## Error Handling
280

281
The vector system provides clear error handling for common issues:
282

283
```javascript
284
// Dimension validation error
285
try {
286
  const wrongSize = new Vector(1, 2, 3); // Wrong number of elements
287
} catch (error) {
288
  console.log(error); // "RangeError: Invalid vector length."
289
}
290

291
// Safe vector operations with defaults
292
const safeResult = dotProduct(
293
  undefinedVector,  // Will default to Vector.fromNull()
294
  anotherVector
295
);
296

297
// Null vector fallback in embeddings
298
const embedding = embeddingSearch('unknown', 'token') || Vector.fromNull();
299
```

Version

Tile

Files

vector-operations.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

vector-operations.mddocs/