0
# Embeddings
1
2
Vector embeddings functionality for semantic similarity and RAG applications. PyLLaMACpp supports generating embeddings for individual prompts or extracting embeddings from current model context, enabling vector-based semantic search and retrieval-augmented generation workflows.
3
4
## Capabilities
5
6
### Context Embeddings
7
8
Extract embeddings from the current model context. This method returns the embeddings vector from the last processed input in the model's context.
9
10
```python { .api }
11
def get_embeddings(self) -> List[float]:
12
"""
13
Get embeddings from the current model context.
14
15
Returns the last embeddings vector from the context.
16
The model must be initialized with embedding=True.
17
18
Returns:
19
List[float]: 1-dimensional embeddings vector with shape [n_embd]
20
21
Raises:
22
AssertionError: If model was not initialized with embedding=True
23
"""
24
```
25
26
Example usage:
27
28
```python
29
from pyllamacpp.model import Model
30
31
# Initialize model in embedding mode
32
model = Model(
33
model_path="/path/to/model.ggml",
34
embedding=True,
35
n_ctx=512
36
)
37
38
# Process some text to set context
39
list(model.generate("The quick brown fox", n_predict=1))
40
41
# Extract embeddings from current context
42
embeddings = model.get_embeddings()
43
print(f"Embeddings shape: {len(embeddings)}")
44
print(f"First few values: {embeddings[:5]}")
45
```
46
47
### Prompt Embeddings
48
49
Generate embeddings for specific text prompts. This method processes the given prompt and returns its embeddings vector, resetting the context afterward.
50
51
```python { .api }
52
def get_prompt_embeddings(
53
self,
54
prompt: str,
55
n_threads: int = 4,
56
n_batch: int = 512
57
) -> List[float]:
58
"""
59
Generate embeddings for a specific prompt.
60
61
This method resets the model context, processes the prompt,
62
extracts embeddings, and resets the context again.
63
64
Parameters:
65
- prompt: str, text to generate embeddings for
66
- n_threads: int, number of CPU threads to use (default: 4)
67
- n_batch: int, batch size for processing (default: 512, must be >=32 for BLAS)
68
69
Returns:
70
List[float]: Embeddings vector for the prompt
71
72
Raises:
73
AssertionError: If model was not initialized with embedding=True
74
"""
75
```
76
77
Example usage:
78
79
```python
80
from pyllamacpp.model import Model
81
82
# Initialize model in embedding mode
83
model = Model(
84
model_path="/path/to/model.ggml",
85
embedding=True
86
)
87
88
# Generate embeddings for different texts
89
texts = [
90
"Machine learning is a subset of artificial intelligence",
91
"Deep learning uses neural networks with multiple layers",
92
"Natural language processing handles human language",
93
"Computer vision analyzes and interprets visual information"
94
]
95
96
embeddings_list = []
97
for text in texts:
98
embedding = model.get_prompt_embeddings(
99
prompt=text,
100
n_threads=8,
101
n_batch=512
102
)
103
embeddings_list.append(embedding)
104
print(f"Generated embeddings for: {text[:30]}...")
105
106
print(f"Generated {len(embeddings_list)} embedding vectors")
107
```
108
109
### Similarity Computation
110
111
Compute semantic similarity between embeddings using cosine similarity or other distance metrics.
112
113
```python
114
import numpy as np
115
from scipy.spatial.distance import cosine
116
117
def cosine_similarity(embedding1, embedding2):
118
"""Compute cosine similarity between two embeddings."""
119
return 1 - cosine(embedding1, embedding2)
120
121
def euclidean_similarity(embedding1, embedding2):
122
"""Compute euclidean similarity between two embeddings."""
123
return 1 / (1 + np.linalg.norm(np.array(embedding1) - np.array(embedding2)))
124
125
# Example usage
126
model = Model(model_path="/path/to/model.ggml", embedding=True)
127
128
# Generate embeddings for comparison
129
text1 = "Artificial intelligence and machine learning"
130
text2 = "AI and ML technologies"
131
text3 = "Weather forecast for tomorrow"
132
133
embed1 = model.get_prompt_embeddings(text1)
134
embed2 = model.get_prompt_embeddings(text2)
135
embed3 = model.get_prompt_embeddings(text3)
136
137
# Compare similarities
138
sim_1_2 = cosine_similarity(embed1, embed2)
139
sim_1_3 = cosine_similarity(embed1, embed3)
140
141
print(f"Similarity between text1 and text2: {sim_1_2:.3f}")
142
print(f"Similarity between text1 and text3: {sim_1_3:.3f}")
143
```
144
145
### Document Retrieval System
146
147
Build a simple document retrieval system using embeddings for semantic search.
148
149
```python
150
import numpy as np
151
from typing import List, Tuple
152
153
class DocumentRetriever:
154
def __init__(self, model_path: str):
155
self.model = Model(model_path=model_path, embedding=True)
156
self.documents = []
157
self.embeddings = []
158
159
def add_document(self, text: str):
160
"""Add a document to the retrieval system."""
161
embedding = self.model.get_prompt_embeddings(text)
162
self.documents.append(text)
163
self.embeddings.append(embedding)
164
165
def add_documents(self, texts: List[str]):
166
"""Add multiple documents to the retrieval system."""
167
for text in texts:
168
self.add_document(text)
169
170
def search(self, query: str, top_k: int = 5) -> List[Tuple[str, float]]:
171
"""Search for most similar documents to the query."""
172
query_embedding = self.model.get_prompt_embeddings(query)
173
174
similarities = []
175
for i, doc_embedding in enumerate(self.embeddings):
176
similarity = cosine_similarity(query_embedding, doc_embedding)
177
similarities.append((self.documents[i], similarity))
178
179
# Sort by similarity (descending)
180
similarities.sort(key=lambda x: x[1], reverse=True)
181
return similarities[:top_k]
182
183
# Example usage
184
retriever = DocumentRetriever("/path/to/model.ggml")
185
186
# Add documents
187
documents = [
188
"Python is a high-level programming language known for its simplicity",
189
"Machine learning algorithms can learn patterns from data",
190
"Neural networks are inspired by biological neural systems",
191
"Deep learning is a subset of machine learning using deep neural networks",
192
"Natural language processing enables computers to understand human language"
193
]
194
195
retriever.add_documents(documents)
196
197
# Search for similar documents
198
query = "What is deep neural network learning?"
199
results = retriever.search(query, top_k=3)
200
201
print(f"Query: {query}")
202
print("Most similar documents:")
203
for doc, similarity in results:
204
print(f" Similarity: {similarity:.3f} - {doc}")
205
```
206
207
### RAG (Retrieval-Augmented Generation) Integration
208
209
Combine embeddings with text generation for retrieval-augmented generation workflows.
210
211
```python
212
class RAGSystem:
213
def __init__(self, model_path: str):
214
# Separate models for embeddings and generation
215
self.embedding_model = Model(model_path=model_path, embedding=True)
216
self.generation_model = Model(model_path=model_path, embedding=False)
217
self.retriever = DocumentRetriever(model_path)
218
219
def add_knowledge_base(self, documents: List[str]):
220
"""Add documents to the knowledge base."""
221
self.retriever.add_documents(documents)
222
223
def generate_with_context(self, query: str, top_k: int = 3) -> str:
224
"""Generate response using retrieved context."""
225
# Retrieve relevant documents
226
relevant_docs = self.retriever.search(query, top_k=top_k)
227
228
# Build context from retrieved documents
229
context = "\n\n".join([doc for doc, _ in relevant_docs])
230
231
# Create prompt with context
232
prompt = f"""Context:
233
{context}
234
235
Question: {query}
236
237
Answer based on the context:"""
238
239
# Generate response
240
response = self.generation_model.cpp_generate(
241
prompt=prompt,
242
n_predict=200,
243
temp=0.3
244
)
245
246
return response
247
248
# Example usage
249
rag = RAGSystem("/path/to/model.ggml")
250
251
# Add knowledge base
252
knowledge_base = [
253
"Python supports multiple programming paradigms including procedural, object-oriented, and functional programming.",
254
"Machine learning models require training data to learn patterns and make predictions on new data.",
255
"Transformers are a type of neural network architecture that uses attention mechanisms.",
256
"Large language models are trained on vast amounts of text data to understand and generate human-like text.",
257
]
258
259
rag.add_knowledge_base(knowledge_base)
260
261
# Ask questions with context
262
question = "How do machine learning models work?"
263
answer = rag.generate_with_context(question)
264
print(f"Question: {question}")
265
print(f"Answer: {answer}")
266
```
267
268
### Batch Processing
269
270
Process multiple texts efficiently for large-scale embedding generation.
271
272
```python
273
from concurrent.futures import ThreadPoolExecutor
274
import time
275
276
def generate_embeddings_batch(model_path: str, texts: List[str], n_threads: int = 4) -> List[List[float]]:
277
"""Generate embeddings for multiple texts efficiently."""
278
model = Model(model_path=model_path, embedding=True)
279
280
embeddings = []
281
start_time = time.time()
282
283
for i, text in enumerate(texts):
284
embedding = model.get_prompt_embeddings(text, n_threads=n_threads)
285
embeddings.append(embedding)
286
287
if (i + 1) % 10 == 0:
288
elapsed = time.time() - start_time
289
print(f"Processed {i + 1}/{len(texts)} texts in {elapsed:.2f}s")
290
291
return embeddings
292
293
# Example usage
294
texts = [
295
"First document about AI",
296
"Second document about ML",
297
"Third document about NLP",
298
# ... more texts
299
]
300
301
embeddings = generate_embeddings_batch("/path/to/model.ggml", texts, n_threads=8)
302
print(f"Generated {len(embeddings)} embeddings")
303
```
304
305
## Best Practices
306
307
1. **Model Initialization**: Always set `embedding=True` when creating models for embedding tasks
308
2. **Context Management**: Use `get_prompt_embeddings` for clean embeddings, `get_embeddings` for context-aware embeddings
309
3. **Batch Processing**: Process multiple texts in batches for efficiency
310
4. **Similarity Metrics**: Choose appropriate similarity metrics (cosine, euclidean) based on your use case
311
5. **Memory Management**: Reset model context periodically for long-running embedding tasks
312
6. **Threading**: Increase `n_threads` for faster processing on multi-core systems