or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdlanguage-model.mdtext-prediction.mdtraining-system.mdvector-operations.md

vector-operations.mddocs/

0

# Vector Operations

1

2

The Vector Operations system provides internal fixed-dimension vector functionality for embedding representations and similarity calculations. The Vector class extends JavaScript's native Array with validation and utility methods specifically designed for high-dimensional token embeddings. This is an internal component not directly exported from the main package.

3

4

## Capabilities

5

6

### Vector Class

7

8

A specialized Array extension that enforces fixed dimensions and provides vector-specific functionality.

9

10

```javascript { .api }

11

/**

12

* Vector class extending Array with fixed dimensions

13

* @extends Array

14

*/

15

class Vector extends Array {

16

/**

17

* Create zero-filled vector with default dimensions

18

* @returns {Vector} New vector filled with zeros

19

*/

20

static fromNull();

21

22

/**

23

* Create vector with length validation

24

* @param {...*} arguments - Array constructor arguments

25

* @throws {string} "RangeError: Invalid vector length." if length doesn't match DIMENSIONS

26

*/

27

constructor(...arguments);

28

}

29

```

30

31

**Usage Examples:**

32

33

```javascript

34

const Vector = require('next-token-prediction/components/Vector');

35

36

// Create zero vector

37

const zeroVector = Vector.fromNull();

38

console.log(zeroVector.length); // 144 (default DIMENSIONS)

39

console.log(zeroVector[0]); // 0

40

41

// Create vector with specific values (must match DIMENSIONS)

42

const customVector = new Vector(...new Array(144).fill(0.5));

43

44

// This would throw an error (wrong length)

45

try {

46

const invalidVector = new Vector(1, 2, 3); // Only 3 elements, needs 144

47

} catch (error) {

48

console.log(error); // "RangeError: Invalid vector length."

49

}

50

```

51

52

### Vector Utilities

53

54

Internal utility functions for vector operations used throughout the prediction system.

55

56

```javascript { .api }

57

/**

58

* Calculate dot product of two vectors

59

* @param {Vector|number[]} vectorA - First vector (defaults to null vector)

60

* @param {Vector|number[]} vectorB - Second vector (defaults to null vector)

61

* @returns {number} Dot product result

62

*/

63

function dotProduct(vectorA = Vector.fromNull(), vectorB = Vector.fromNull());

64

```

65

66

**Usage Examples:**

67

68

```javascript

69

// Calculate similarity between token embeddings

70

const embedding1 = Vector.fromNull();

71

const embedding2 = Vector.fromNull();

72

73

// Set some values for demonstration

74

embedding1[0] = 0.8;

75

embedding1[1] = 0.6;

76

embedding2[0] = 0.6;

77

embedding2[1] = 0.8;

78

79

const similarity = dotProduct(embedding1, embedding2);

80

console.log('Similarity score:', similarity); // 0.96 (0.8*0.6 + 0.6*0.8)

81

82

// Use in token similarity comparison

83

const tokenEmbedding = embeddingSearch('hello', 'world');

84

const candidateEmbedding = embeddingSearch('hello', 'there');

85

const score = dotProduct(tokenEmbedding, candidateEmbedding);

86

```

87

88

## Vector Configuration

89

90

The vector system is configured through environment variables:

91

92

```javascript { .api }

93

/**

94

* Vector configuration environment variables

95

*/

96

interface VectorConfig {

97

DIMENSIONS: number; // Vector dimensionality (default: 144)

98

}

99

```

100

101

**Configuration Examples:**

102

103

```bash

104

# Standard configuration (default)

105

export DIMENSIONS=144

106

107

# Higher dimensional embeddings for more complex models

108

export DIMENSIONS=256

109

110

# Lower dimensional for faster processing/less memory

111

export DIMENSIONS=64

112

```

113

114

## Embedding Structure

115

116

Vectors are used to represent token embeddings with specific meaning for each dimension:

117

118

### Dimension Mapping

119

120

```javascript { .api }

121

/**

122

* Vector dimension allocation for token embeddings

123

*/

124

interface VectorDimensions {

125

// Character composition (positions 0-65): 66 dimensions

126

characterDistribution: {

127

startIndex: 0;

128

length: 66;

129

description: "Distribution of alphanumeric characters (A-Z, a-z, 0-9, #$%&)";

130

};

131

132

// Parts of speech (positions 66-101): 36 dimensions

133

partOfSpeech: {

134

startIndex: 66;

135

length: 36;

136

description: "Grammatical role indicators (CC, CD, DT, EX, FW, IN, JJ, etc.)";

137

};

138

139

// Token prevalence (position 102): 1 dimension

140

prevalence: {

141

startIndex: 102;

142

length: 1;

143

description: "Token frequency in training dataset (normalized 0-1)";

144

};

145

146

// Word suffixes (positions 103-139): 37 dimensions

147

suffixes: {

148

startIndex: 103;

149

length: 37;

150

description: "Common rhyme/ending patterns (ack, ail, ain, ake, etc.)";

151

};

152

153

// Next-word frequency (position 140): 1 dimension

154

nextWordFrequency: {

155

startIndex: 140;

156

length: 1;

157

description: "Normalized co-occurrence frequency";

158

};

159

160

// Content filtering (position 141): 1 dimension

161

vulgarity: {

162

startIndex: 141;

163

length: 1;

164

description: "Profanity detection (currently placeholder)";

165

};

166

167

// Style features (positions 142-143): 2 dimensions

168

style: {

169

startIndex: 142;

170

length: 2;

171

description: "Stylistic features [pirate, victorian]";

172

};

173

}

174

```

175

176

### Vector Operations in Practice

177

178

```javascript { .api }

179

/**

180

* Example vector operations for token similarity

181

*/

182

interface VectorOperations {

183

// Get embedding for token pair

184

embeddingSearch(prevToken: string, token: string): Vector;

185

186

// Find similar tokens using vector similarity

187

getSimilarToken(prevToken: string, token: string): {

188

token: string;

189

rankedTokenList: string[];

190

};

191

192

// Calculate token relationship strength

193

dotProduct(vectorA: Vector, vectorB: Vector): number;

194

}

195

```

196

197

**Advanced Usage Examples:**

198

199

```javascript

200

// Analyze token relationships

201

function analyzeTokenRelationship(token1, token2, prevContext) {

202

const embedding1 = embeddingSearch(prevContext, token1);

203

const embedding2 = embeddingSearch(prevContext, token2);

204

205

const similarity = dotProduct(embedding1, embedding2);

206

207

return {

208

similarity,

209

isHighlySimilar: similarity > 0.8,

210

characterSimilarity: dotProduct(

211

embedding1.slice(0, 66),

212

embedding2.slice(0, 66)

213

),

214

grammaticalSimilarity: dotProduct(

215

embedding1.slice(66, 102),

216

embedding2.slice(66, 102)

217

)

218

};

219

}

220

221

// Custom vector creation for analysis

222

function createCustomEmbedding() {

223

const vector = Vector.fromNull();

224

225

// Set character distribution (first 66 dimensions)

226

vector[0] = 0.1; // 'A' frequency

227

vector[1] = 0.05; // 'B' frequency

228

// ... continue for all characters

229

230

// Set part-of-speech indicators (dimensions 66-101)

231

vector[66] = 1.0; // CC (coordinating conjunction)

232

vector[67] = 0.0; // CD (cardinal digit)

233

// ... continue for all POS tags

234

235

return vector;

236

}

237

```

238

239

## Memory and Performance

240

241

### Memory Usage

242

243

- Each Vector instance: `DIMENSIONS * 8 bytes` (64-bit numbers)

244

- Default 144-dimension vector: ~1.15KB per instance

245

- Large embedding sets: Multiple MB for thousands of token pairs

246

- Zero vectors cached for efficiency via `Vector.fromNull()`

247

248

### Performance Characteristics

249

250

- Vector creation: O(DIMENSIONS) time complexity

251

- Dot product calculation: O(DIMENSIONS) time complexity

252

- Memory access: Native Array performance with dimension validation

253

- Similarity search: O(n * DIMENSIONS) where n = number of candidate tokens

254

255

### Optimization Tips

256

257

```javascript

258

// Reuse zero vectors instead of creating new ones

259

const sharedZero = Vector.fromNull();

260

261

// Cache frequently accessed embeddings

262

const embeddingCache = new Map();

263

function getCachedEmbedding(prev, token) {

264

const key = `${prev}:${token}`;

265

if (!embeddingCache.has(key)) {

266

embeddingCache.set(key, embeddingSearch(prev, token));

267

}

268

return embeddingCache.get(key);

269

}

270

271

// Batch similarity calculations for efficiency

272

function batchSimilarity(queryEmbedding, candidateEmbeddings) {

273

return candidateEmbeddings.map(candidate =>

274

dotProduct(queryEmbedding, candidate)

275

);

276

}

277

```

278

279

## Error Handling

280

281

The vector system provides clear error handling for common issues:

282

283

```javascript

284

// Dimension validation error

285

try {

286

const wrongSize = new Vector(1, 2, 3); // Wrong number of elements

287

} catch (error) {

288

console.log(error); // "RangeError: Invalid vector length."

289

}

290

291

// Safe vector operations with defaults

292

const safeResult = dotProduct(

293

undefinedVector, // Will default to Vector.fromNull()

294

anotherVector

295

);

296

297

// Null vector fallback in embeddings

298

const embedding = embeddingSearch('unknown', 'token') || Vector.fromNull();

299

```