or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdembeddings.mdindex.mdlangchain-integration.mdmodel-operations.mdutilities.mdweb-ui.md

embeddings.mddocs/

0

# Embeddings

1

2

Vector embeddings functionality for semantic similarity and RAG applications. PyLLaMACpp supports generating embeddings for individual prompts or extracting embeddings from current model context, enabling vector-based semantic search and retrieval-augmented generation workflows.

3

4

## Capabilities

5

6

### Context Embeddings

7

8

Extract embeddings from the current model context. This method returns the embeddings vector from the last processed input in the model's context.

9

10

```python { .api }

11

def get_embeddings(self) -> List[float]:

12

"""

13

Get embeddings from the current model context.

14

15

Returns the last embeddings vector from the context.

16

The model must be initialized with embedding=True.

17

18

Returns:

19

List[float]: 1-dimensional embeddings vector with shape [n_embd]

20

21

Raises:

22

AssertionError: If model was not initialized with embedding=True

23

"""

24

```

25

26

Example usage:

27

28

```python

29

from pyllamacpp.model import Model

30

31

# Initialize model in embedding mode

32

model = Model(

33

model_path="/path/to/model.ggml",

34

embedding=True,

35

n_ctx=512

36

)

37

38

# Process some text to set context

39

list(model.generate("The quick brown fox", n_predict=1))

40

41

# Extract embeddings from current context

42

embeddings = model.get_embeddings()

43

print(f"Embeddings shape: {len(embeddings)}")

44

print(f"First few values: {embeddings[:5]}")

45

```

46

47

### Prompt Embeddings

48

49

Generate embeddings for specific text prompts. This method processes the given prompt and returns its embeddings vector, resetting the context afterward.

50

51

```python { .api }

52

def get_prompt_embeddings(

53

self,

54

prompt: str,

55

n_threads: int = 4,

56

n_batch: int = 512

57

) -> List[float]:

58

"""

59

Generate embeddings for a specific prompt.

60

61

This method resets the model context, processes the prompt,

62

extracts embeddings, and resets the context again.

63

64

Parameters:

65

- prompt: str, text to generate embeddings for

66

- n_threads: int, number of CPU threads to use (default: 4)

67

- n_batch: int, batch size for processing (default: 512, must be >=32 for BLAS)

68

69

Returns:

70

List[float]: Embeddings vector for the prompt

71

72

Raises:

73

AssertionError: If model was not initialized with embedding=True

74

"""

75

```

76

77

Example usage:

78

79

```python

80

from pyllamacpp.model import Model

81

82

# Initialize model in embedding mode

83

model = Model(

84

model_path="/path/to/model.ggml",

85

embedding=True

86

)

87

88

# Generate embeddings for different texts

89

texts = [

90

"Machine learning is a subset of artificial intelligence",

91

"Deep learning uses neural networks with multiple layers",

92

"Natural language processing handles human language",

93

"Computer vision analyzes and interprets visual information"

94

]

95

96

embeddings_list = []

97

for text in texts:

98

embedding = model.get_prompt_embeddings(

99

prompt=text,

100

n_threads=8,

101

n_batch=512

102

)

103

embeddings_list.append(embedding)

104

print(f"Generated embeddings for: {text[:30]}...")

105

106

print(f"Generated {len(embeddings_list)} embedding vectors")

107

```

108

109

### Similarity Computation

110

111

Compute semantic similarity between embeddings using cosine similarity or other distance metrics.

112

113

```python

114

import numpy as np

115

from scipy.spatial.distance import cosine

116

117

def cosine_similarity(embedding1, embedding2):

118

"""Compute cosine similarity between two embeddings."""

119

return 1 - cosine(embedding1, embedding2)

120

121

def euclidean_similarity(embedding1, embedding2):

122

"""Compute euclidean similarity between two embeddings."""

123

return 1 / (1 + np.linalg.norm(np.array(embedding1) - np.array(embedding2)))

124

125

# Example usage

126

model = Model(model_path="/path/to/model.ggml", embedding=True)

127

128

# Generate embeddings for comparison

129

text1 = "Artificial intelligence and machine learning"

130

text2 = "AI and ML technologies"

131

text3 = "Weather forecast for tomorrow"

132

133

embed1 = model.get_prompt_embeddings(text1)

134

embed2 = model.get_prompt_embeddings(text2)

135

embed3 = model.get_prompt_embeddings(text3)

136

137

# Compare similarities

138

sim_1_2 = cosine_similarity(embed1, embed2)

139

sim_1_3 = cosine_similarity(embed1, embed3)

140

141

print(f"Similarity between text1 and text2: {sim_1_2:.3f}")

142

print(f"Similarity between text1 and text3: {sim_1_3:.3f}")

143

```

144

145

### Document Retrieval System

146

147

Build a simple document retrieval system using embeddings for semantic search.

148

149

```python

150

import numpy as np

151

from typing import List, Tuple

152

153

class DocumentRetriever:

154

def __init__(self, model_path: str):

155

self.model = Model(model_path=model_path, embedding=True)

156

self.documents = []

157

self.embeddings = []

158

159

def add_document(self, text: str):

160

"""Add a document to the retrieval system."""

161

embedding = self.model.get_prompt_embeddings(text)

162

self.documents.append(text)

163

self.embeddings.append(embedding)

164

165

def add_documents(self, texts: List[str]):

166

"""Add multiple documents to the retrieval system."""

167

for text in texts:

168

self.add_document(text)

169

170

def search(self, query: str, top_k: int = 5) -> List[Tuple[str, float]]:

171

"""Search for most similar documents to the query."""

172

query_embedding = self.model.get_prompt_embeddings(query)

173

174

similarities = []

175

for i, doc_embedding in enumerate(self.embeddings):

176

similarity = cosine_similarity(query_embedding, doc_embedding)

177

similarities.append((self.documents[i], similarity))

178

179

# Sort by similarity (descending)

180

similarities.sort(key=lambda x: x[1], reverse=True)

181

return similarities[:top_k]

182

183

# Example usage

184

retriever = DocumentRetriever("/path/to/model.ggml")

185

186

# Add documents

187

documents = [

188

"Python is a high-level programming language known for its simplicity",

189

"Machine learning algorithms can learn patterns from data",

190

"Neural networks are inspired by biological neural systems",

191

"Deep learning is a subset of machine learning using deep neural networks",

192

"Natural language processing enables computers to understand human language"

193

]

194

195

retriever.add_documents(documents)

196

197

# Search for similar documents

198

query = "What is deep neural network learning?"

199

results = retriever.search(query, top_k=3)

200

201

print(f"Query: {query}")

202

print("Most similar documents:")

203

for doc, similarity in results:

204

print(f" Similarity: {similarity:.3f} - {doc}")

205

```

206

207

### RAG (Retrieval-Augmented Generation) Integration

208

209

Combine embeddings with text generation for retrieval-augmented generation workflows.

210

211

```python

212

class RAGSystem:

213

def __init__(self, model_path: str):

214

# Separate models for embeddings and generation

215

self.embedding_model = Model(model_path=model_path, embedding=True)

216

self.generation_model = Model(model_path=model_path, embedding=False)

217

self.retriever = DocumentRetriever(model_path)

218

219

def add_knowledge_base(self, documents: List[str]):

220

"""Add documents to the knowledge base."""

221

self.retriever.add_documents(documents)

222

223

def generate_with_context(self, query: str, top_k: int = 3) -> str:

224

"""Generate response using retrieved context."""

225

# Retrieve relevant documents

226

relevant_docs = self.retriever.search(query, top_k=top_k)

227

228

# Build context from retrieved documents

229

context = "\n\n".join([doc for doc, _ in relevant_docs])

230

231

# Create prompt with context

232

prompt = f"""Context:

233

{context}

234

235

Question: {query}

236

237

Answer based on the context:"""

238

239

# Generate response

240

response = self.generation_model.cpp_generate(

241

prompt=prompt,

242

n_predict=200,

243

temp=0.3

244

)

245

246

return response

247

248

# Example usage

249

rag = RAGSystem("/path/to/model.ggml")

250

251

# Add knowledge base

252

knowledge_base = [

253

"Python supports multiple programming paradigms including procedural, object-oriented, and functional programming.",

254

"Machine learning models require training data to learn patterns and make predictions on new data.",

255

"Transformers are a type of neural network architecture that uses attention mechanisms.",

256

"Large language models are trained on vast amounts of text data to understand and generate human-like text.",

257

]

258

259

rag.add_knowledge_base(knowledge_base)

260

261

# Ask questions with context

262

question = "How do machine learning models work?"

263

answer = rag.generate_with_context(question)

264

print(f"Question: {question}")

265

print(f"Answer: {answer}")

266

```

267

268

### Batch Processing

269

270

Process multiple texts efficiently for large-scale embedding generation.

271

272

```python

273

from concurrent.futures import ThreadPoolExecutor

274

import time

275

276

def generate_embeddings_batch(model_path: str, texts: List[str], n_threads: int = 4) -> List[List[float]]:

277

"""Generate embeddings for multiple texts efficiently."""

278

model = Model(model_path=model_path, embedding=True)

279

280

embeddings = []

281

start_time = time.time()

282

283

for i, text in enumerate(texts):

284

embedding = model.get_prompt_embeddings(text, n_threads=n_threads)

285

embeddings.append(embedding)

286

287

if (i + 1) % 10 == 0:

288

elapsed = time.time() - start_time

289

print(f"Processed {i + 1}/{len(texts)} texts in {elapsed:.2f}s")

290

291

return embeddings

292

293

# Example usage

294

texts = [

295

"First document about AI",

296

"Second document about ML",

297

"Third document about NLP",

298

# ... more texts

299

]

300

301

embeddings = generate_embeddings_batch("/path/to/model.ggml", texts, n_threads=8)

302

print(f"Generated {len(embeddings)} embeddings")

303

```

304

305

## Best Practices

306

307

1. **Model Initialization**: Always set `embedding=True` when creating models for embedding tasks

308

2. **Context Management**: Use `get_prompt_embeddings` for clean embeddings, `get_embeddings` for context-aware embeddings

309

3. **Batch Processing**: Process multiple texts in batches for efficiency

310

4. **Similarity Metrics**: Choose appropriate similarity metrics (cosine, euclidean) based on your use case

311

5. **Memory Management**: Reset model context periodically for long-running embedding tasks

312

6. **Threading**: Increase `n_threads` for faster processing on multi-core systems