Tessl Tile for pypi/pyllamacpp@2.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli.md embeddings.md index.md langchain-integration.md model-operations.md utilities.md web-ui.md

embeddings.mddocs/

0
# Embeddings
1

2
Vector embeddings functionality for semantic similarity and RAG applications. PyLLaMACpp supports generating embeddings for individual prompts or extracting embeddings from current model context, enabling vector-based semantic search and retrieval-augmented generation workflows.
3

4
## Capabilities
5

6
### Context Embeddings
7

8
Extract embeddings from the current model context. This method returns the embeddings vector from the last processed input in the model's context.
9

10
```python { .api }
11
def get_embeddings(self) -> List[float]:
12
    """
13
    Get embeddings from the current model context.
14
    
15
    Returns the last embeddings vector from the context.
16
    The model must be initialized with embedding=True.
17
    
18
    Returns:
19
    List[float]: 1-dimensional embeddings vector with shape [n_embd]
20
    
21
    Raises:
22
    AssertionError: If model was not initialized with embedding=True
23
    """
24
```
25

26
Example usage:
27

28
```python
29
from pyllamacpp.model import Model
30

31
# Initialize model in embedding mode
32
model = Model(
33
    model_path="/path/to/model.ggml",
34
    embedding=True,
35
    n_ctx=512
36
)
37

38
# Process some text to set context
39
list(model.generate("The quick brown fox", n_predict=1))
40

41
# Extract embeddings from current context
42
embeddings = model.get_embeddings()
43
print(f"Embeddings shape: {len(embeddings)}")
44
print(f"First few values: {embeddings[:5]}")
45
```
46

47
### Prompt Embeddings
48

49
Generate embeddings for specific text prompts. This method processes the given prompt and returns its embeddings vector, resetting the context afterward.
50

51
```python { .api }
52
def get_prompt_embeddings(
53
    self,
54
    prompt: str,
55
    n_threads: int = 4,
56
    n_batch: int = 512
57
) -> List[float]:
58
    """
59
    Generate embeddings for a specific prompt.
60
    
61
    This method resets the model context, processes the prompt,
62
    extracts embeddings, and resets the context again.
63
    
64
    Parameters:
65
    - prompt: str, text to generate embeddings for
66
    - n_threads: int, number of CPU threads to use (default: 4)
67
    - n_batch: int, batch size for processing (default: 512, must be >=32 for BLAS)
68
    
69
    Returns:
70
    List[float]: Embeddings vector for the prompt
71
    
72
    Raises:
73
    AssertionError: If model was not initialized with embedding=True
74
    """
75
```
76

77
Example usage:
78

79
```python
80
from pyllamacpp.model import Model
81

82
# Initialize model in embedding mode
83
model = Model(
84
    model_path="/path/to/model.ggml",
85
    embedding=True
86
)
87

88
# Generate embeddings for different texts
89
texts = [
90
    "Machine learning is a subset of artificial intelligence",
91
    "Deep learning uses neural networks with multiple layers",
92
    "Natural language processing handles human language",
93
    "Computer vision analyzes and interprets visual information"
94
]
95

96
embeddings_list = []
97
for text in texts:
98
    embedding = model.get_prompt_embeddings(
99
        prompt=text,
100
        n_threads=8,
101
        n_batch=512
102
    )
103
    embeddings_list.append(embedding)
104
    print(f"Generated embeddings for: {text[:30]}...")
105

106
print(f"Generated {len(embeddings_list)} embedding vectors")
107
```
108

109
### Similarity Computation
110

111
Compute semantic similarity between embeddings using cosine similarity or other distance metrics.
112

113
```python
114
import numpy as np
115
from scipy.spatial.distance import cosine
116

117
def cosine_similarity(embedding1, embedding2):
118
    """Compute cosine similarity between two embeddings."""
119
    return 1 - cosine(embedding1, embedding2)
120

121
def euclidean_similarity(embedding1, embedding2):
122
    """Compute euclidean similarity between two embeddings."""
123
    return 1 / (1 + np.linalg.norm(np.array(embedding1) - np.array(embedding2)))
124

125
# Example usage
126
model = Model(model_path="/path/to/model.ggml", embedding=True)
127

128
# Generate embeddings for comparison
129
text1 = "Artificial intelligence and machine learning"
130
text2 = "AI and ML technologies"
131
text3 = "Weather forecast for tomorrow"
132

133
embed1 = model.get_prompt_embeddings(text1)
134
embed2 = model.get_prompt_embeddings(text2)
135
embed3 = model.get_prompt_embeddings(text3)
136

137
# Compare similarities
138
sim_1_2 = cosine_similarity(embed1, embed2)
139
sim_1_3 = cosine_similarity(embed1, embed3)
140

141
print(f"Similarity between text1 and text2: {sim_1_2:.3f}")
142
print(f"Similarity between text1 and text3: {sim_1_3:.3f}")
143
```
144

145
### Document Retrieval System
146

147
Build a simple document retrieval system using embeddings for semantic search.
148

149
```python
150
import numpy as np
151
from typing import List, Tuple
152

153
class DocumentRetriever:
154
    def __init__(self, model_path: str):
155
        self.model = Model(model_path=model_path, embedding=True)
156
        self.documents = []
157
        self.embeddings = []
158
    
159
    def add_document(self, text: str):
160
        """Add a document to the retrieval system."""
161
        embedding = self.model.get_prompt_embeddings(text)
162
        self.documents.append(text)
163
        self.embeddings.append(embedding)
164
    
165
    def add_documents(self, texts: List[str]):
166
        """Add multiple documents to the retrieval system."""
167
        for text in texts:
168
            self.add_document(text)
169
    
170
    def search(self, query: str, top_k: int = 5) -> List[Tuple[str, float]]:
171
        """Search for most similar documents to the query."""
172
        query_embedding = self.model.get_prompt_embeddings(query)
173
        
174
        similarities = []
175
        for i, doc_embedding in enumerate(self.embeddings):
176
            similarity = cosine_similarity(query_embedding, doc_embedding)
177
            similarities.append((self.documents[i], similarity))
178
        
179
        # Sort by similarity (descending)
180
        similarities.sort(key=lambda x: x[1], reverse=True)
181
        return similarities[:top_k]
182

183
# Example usage
184
retriever = DocumentRetriever("/path/to/model.ggml")
185

186
# Add documents
187
documents = [
188
    "Python is a high-level programming language known for its simplicity",
189
    "Machine learning algorithms can learn patterns from data",
190
    "Neural networks are inspired by biological neural systems",
191
    "Deep learning is a subset of machine learning using deep neural networks",
192
    "Natural language processing enables computers to understand human language"
193
]
194

195
retriever.add_documents(documents)
196

197
# Search for similar documents
198
query = "What is deep neural network learning?"
199
results = retriever.search(query, top_k=3)
200

201
print(f"Query: {query}")
202
print("Most similar documents:")
203
for doc, similarity in results:
204
    print(f"  Similarity: {similarity:.3f} - {doc}")
205
```
206

207
### RAG (Retrieval-Augmented Generation) Integration
208

209
Combine embeddings with text generation for retrieval-augmented generation workflows.
210

211
```python
212
class RAGSystem:
213
    def __init__(self, model_path: str):
214
        # Separate models for embeddings and generation
215
        self.embedding_model = Model(model_path=model_path, embedding=True)
216
        self.generation_model = Model(model_path=model_path, embedding=False)
217
        self.retriever = DocumentRetriever(model_path)
218
    
219
    def add_knowledge_base(self, documents: List[str]):
220
        """Add documents to the knowledge base."""
221
        self.retriever.add_documents(documents)
222
    
223
    def generate_with_context(self, query: str, top_k: int = 3) -> str:
224
        """Generate response using retrieved context."""
225
        # Retrieve relevant documents
226
        relevant_docs = self.retriever.search(query, top_k=top_k)
227
        
228
        # Build context from retrieved documents
229
        context = "\n\n".join([doc for doc, _ in relevant_docs])
230
        
231
        # Create prompt with context
232
        prompt = f"""Context:
233
{context}
234

235
Question: {query}
236

237
Answer based on the context:"""
238
        
239
        # Generate response
240
        response = self.generation_model.cpp_generate(
241
            prompt=prompt,
242
            n_predict=200,
243
            temp=0.3
244
        )
245
        
246
        return response
247

248
# Example usage
249
rag = RAGSystem("/path/to/model.ggml")
250

251
# Add knowledge base
252
knowledge_base = [
253
    "Python supports multiple programming paradigms including procedural, object-oriented, and functional programming.",
254
    "Machine learning models require training data to learn patterns and make predictions on new data.",
255
    "Transformers are a type of neural network architecture that uses attention mechanisms.",
256
    "Large language models are trained on vast amounts of text data to understand and generate human-like text.",
257
]
258

259
rag.add_knowledge_base(knowledge_base)
260

261
# Ask questions with context
262
question = "How do machine learning models work?"
263
answer = rag.generate_with_context(question)
264
print(f"Question: {question}")
265
print(f"Answer: {answer}")
266
```
267

268
### Batch Processing
269

270
Process multiple texts efficiently for large-scale embedding generation.
271

272
```python
273
from concurrent.futures import ThreadPoolExecutor
274
import time
275

276
def generate_embeddings_batch(model_path: str, texts: List[str], n_threads: int = 4) -> List[List[float]]:
277
    """Generate embeddings for multiple texts efficiently."""
278
    model = Model(model_path=model_path, embedding=True)
279
    
280
    embeddings = []
281
    start_time = time.time()
282
    
283
    for i, text in enumerate(texts):
284
        embedding = model.get_prompt_embeddings(text, n_threads=n_threads)
285
        embeddings.append(embedding)
286
        
287
        if (i + 1) % 10 == 0:
288
            elapsed = time.time() - start_time
289
            print(f"Processed {i + 1}/{len(texts)} texts in {elapsed:.2f}s")
290
    
291
    return embeddings
292

293
# Example usage
294
texts = [
295
    "First document about AI",
296
    "Second document about ML",
297
    "Third document about NLP",
298
    # ... more texts
299
]
300

301
embeddings = generate_embeddings_batch("/path/to/model.ggml", texts, n_threads=8)
302
print(f"Generated {len(embeddings)} embeddings")
303
```
304

305
## Best Practices
306

307
1. **Model Initialization**: Always set `embedding=True` when creating models for embedding tasks
308
2. **Context Management**: Use `get_prompt_embeddings` for clean embeddings, `get_embeddings` for context-aware embeddings
309
3. **Batch Processing**: Process multiple texts in batches for efficiency
310
4. **Similarity Metrics**: Choose appropriate similarity metrics (cosine, euclidean) based on your use case
311
5. **Memory Management**: Reset model context periodically for long-running embedding tasks
312
6. **Threading**: Increase `n_threads` for faster processing on multi-core systems

Version

Tile

Files

embeddings.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

embeddings.mddocs/