Tessl Tile for pypi/flagembedding@1.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

auto-models.md base-classes.md decoder-embedders.md encoder-embedders.md index.md model-types.md rerankers.md

decoder-embedders.mddocs/

0
# Decoder-Only Embedders
1

2
Embedders designed for decoder-only transformer models (LLM-like architectures). These models leverage large language model capabilities for embedding generation, often using the last token for representation and supporting instruction-based formatting.
3

4
## Capabilities
5

6
### FlagLLMModel (Base LLM Embedder)
7

8
Standard embedder for decoder-only models using last token pooling. Designed for large language models that generate embeddings through their natural language understanding capabilities.
9

10
```python { .api }
11
from typing import Union
12

13
class FlagLLMModel(AbsEmbedder):
14
    def __init__(
15
        self,
16
        model_name_or_path: str,
17
        pooling_method: str = "last_token",
18
        normalize_embeddings: bool = True,
19
        use_fp16: bool = True,
20
        query_instruction_for_retrieval: Optional[str] = None,
21
        query_instruction_format: str = "Instruct: {}\nQuery: {}",
22
        devices: Optional[Union[str, List[str]]] = None,
23
        batch_size: int = 256,
24
        query_max_length: int = 512,
25
        passage_max_length: int = 512,
26
        convert_to_numpy: bool = True,
27
        **kwargs
28
    ):
29
        """
30
        Initialize decoder-only LLM embedder.
31
        
32
        Args:
33
            model_name_or_path: Path to model or HuggingFace model name
34
            pooling_method: Pooling strategy ("last_token")
35
            normalize_embeddings: Whether to normalize output embeddings
36
            use_fp16: Use half precision for inference
37
            query_instruction_for_retrieval: Instruction for retrieval queries
38
            query_instruction_format: Format string for instructions
39
            devices: List of devices for multi-GPU inference
40
            batch_size: Default batch size for encoding
41
            query_max_length: Maximum query token length
42
            passage_max_length: Maximum passage token length
43
            convert_to_numpy: Convert outputs to numpy arrays
44
            **kwargs: Additional model parameters
45
        """
46
```
47

48
### FlagICLModel (In-Context Learning Embedder)
49

50
Specialized embedder for in-context learning approaches with large language models. Leverages few-shot examples and context to generate high-quality embeddings.
51

52
```python { .api }
53
class FlagICLModel(AbsEmbedder):
54
    def __init__(
55
        self,
56
        model_name_or_path: str,
57
        pooling_method: str = "last_token",
58
        normalize_embeddings: bool = True,
59
        use_fp16: bool = True,
60
        query_instruction_for_retrieval: Optional[str] = None,
61
        query_instruction_format: str = "{}{}",
62
        devices: Optional[Union[str, List[str]]] = None,
63
        batch_size: int = 256,
64
        query_max_length: int = 512,
65
        passage_max_length: int = 512,
66
        convert_to_numpy: bool = True,
67
        **kwargs
68
    ):
69
        """
70
        Initialize in-context learning embedder.
71
        
72
        Args:
73
            model_name_or_path: Path to ICL-capable model
74
            pooling_method: Pooling strategy ("last_token")
75
            normalize_embeddings: Whether to normalize output embeddings
76
            use_fp16: Use half precision for inference
77
            query_instruction_for_retrieval: Instruction for retrieval queries
78
            query_instruction_format: Format string for instructions
79
            devices: List of devices for multi-GPU inference
80
            batch_size: Default batch size for encoding
81
            query_max_length: Maximum query token length
82
            passage_max_length: Maximum passage token length
83
            convert_to_numpy: Convert outputs to numpy arrays
84
            **kwargs: Additional model parameters
85
        """
86
```
87

88
## Usage Examples
89

90
### Basic LLM Embedder
91

92
```python
93
from FlagEmbedding import FlagLLMModel
94

95
# Initialize LLM embedder with last token pooling
96
embedder = FlagLLMModel(
97
    'e5-mistral-7b-instruct',
98
    pooling_method="last_token",
99
    use_fp16=True
100
)
101

102
# Encode queries and passages
103
queries = ["What are the applications of machine learning?"]
104
passages = ["Machine learning is applied in healthcare, finance, and autonomous systems"]
105

106
query_embeddings = embedder.encode_queries(queries)
107
passage_embeddings = embedder.encode_corpus(passages)
108

109
print(f"Query embedding shape: {query_embeddings.shape}")
110
print(f"Passage embedding shape: {passage_embeddings.shape}")
111
```
112

113
### Custom Instruction Formatting
114

115
```python
116
from FlagEmbedding import FlagLLMModel
117

118
# Use custom instruction format for queries
119
embedder = FlagLLMModel(
120
    'e5-mistral-7b-instruct',
121
    query_instruction_for_retrieval="Given a question, retrieve relevant documents that answer the question",
122
    query_instruction_format="Instruct: {}\\nQuery: {}",
123
    use_fp16=True
124
)
125

126
# Queries will be formatted with custom instructions
127
queries = ["How do neural networks learn?"]
128
embeddings = embedder.encode_queries(queries)
129
```
130

131
### In-Context Learning Embedder
132

133
```python
134
from FlagEmbedding import FlagICLModel
135

136
# Initialize ICL embedder for few-shot learning
137
embedder = FlagICLModel(
138
    'bge-en-icl',
139
    use_fp16=True,
140
    batch_size=64  # Smaller batch for memory efficiency
141
)
142

143
# ICL works well with examples in context
144
queries = [
145
    "Example: 'What is AI?' -> AI concepts. Query: 'What is machine learning?'"
146
]
147

148
embeddings = embedder.encode_queries(queries)
149
```
150

151
### Multi-GPU LLM Processing
152

153
```python
154
from FlagEmbedding import FlagLLMModel
155

156
# Use multiple GPUs for large LLM models
157
embedder = FlagLLMModel(
158
    'e5-mistral-7b-instruct',
159
    devices=['cuda:0', 'cuda:1'],
160
    batch_size=32,  # Smaller batch size for large models
161
    use_fp16=True
162
)
163

164
# Process documents efficiently across GPUs
165
documents = [f"Document {i} content" for i in range(1000)]
166
embeddings = embedder.encode_corpus(documents)
167
```
168

169
### Custom Max Length Settings
170

171
```python
172
from FlagEmbedding import FlagLLMModel
173

174
# Configure different max lengths for queries vs passages
175
embedder = FlagLLMModel(
176
    'e5-mistral-7b-instruct',
177
    query_max_length=256,     # Shorter for queries
178
    passage_max_length=1024,  # Longer for passages
179
    use_fp16=True
180
)
181

182
# Long passage encoding
183
long_passage = "Very long document content..." * 100
184
passage_embedding = embedder.encode_corpus([long_passage])
185
```
186

187
### Retrieval-Specific Instructions
188

189
```python
190
from FlagEmbedding import FlagLLMModel
191

192
# Specialized instructions for different retrieval tasks
193
qa_embedder = FlagLLMModel(
194
    'e5-mistral-7b-instruct',
195
    query_instruction_for_retrieval="Represent this question for retrieving relevant answers",
196
    query_instruction_format="Task: {}\\nInput: {}"
197
)
198

199
semantic_embedder = FlagLLMModel(
200
    'e5-mistral-7b-instruct',
201
    query_instruction_for_retrieval="Encode this text for semantic similarity search",
202
    query_instruction_format="{}: {}"
203
)
204

205
# Different use cases
206
qa_queries = ["What causes climate change?"]
207
semantic_queries = ["renewable energy technologies"]
208

209
qa_embeddings = qa_embedder.encode_queries(qa_queries)
210
semantic_embeddings = semantic_embedder.encode_queries(semantic_queries)
211
```
212

213
### Comparing Decoder vs Encoder Models
214

215
```python
216
from FlagEmbedding import FlagLLMModel, FlagModel
217

218
# Decoder-only model
219
llm_embedder = FlagLLMModel('e5-mistral-7b-instruct')
220

221
# Encoder-only model  
222
encoder_embedder = FlagModel('bge-large-en-v1.5')
223

224
text = ["Machine learning algorithms"]
225

226
# Both produce embeddings but with different characteristics
227
llm_emb = llm_embedder.encode(text)
228
encoder_emb = encoder_embedder.encode(text)
229

230
print(f"LLM embedding shape: {llm_emb.shape}")
231
print(f"Encoder embedding shape: {encoder_emb.shape}")
232
```
233

234
### Memory-Efficient Processing
235

236
```python
237
from FlagEmbedding import FlagLLMModel
238

239
# Configure for memory-constrained environments
240
embedder = FlagLLMModel(
241
    'e5-mistral-7b-instruct',
242
    use_fp16=True,
243
    batch_size=8,      # Very small batch
244
    devices=['cuda:0'], # Single GPU
245
    convert_to_numpy=True  # Free GPU memory faster
246
)
247

248
# Process in smaller chunks
249
large_corpus = [f"Document {i}" for i in range(10000)]
250
chunk_size = 100
251

252
all_embeddings = []
253
for i in range(0, len(large_corpus), chunk_size):
254
    chunk = large_corpus[i:i+chunk_size]
255
    chunk_embeddings = embedder.encode_corpus(chunk)
256
    all_embeddings.append(chunk_embeddings)
257
    
258
# Combine results
259
import numpy as np
260
final_embeddings = np.vstack(all_embeddings)
261
```
262

263
## Supported Models
264

265
### E5 LLM Models
266
- e5-mistral-7b-instruct (instruction-tuned Mistral)
267

268
### BGE LLM Models  
269
- bge-en-icl (in-context learning model)
270
- bge-multilingual-gemma2
271

272
### GTE LLM Models
273
- gte-Qwen2-7B-instruct
274
- gte-Qwen2-1.5B-instruct  
275
- gte-Qwen1.5-7B-instruct
276

277
## Model Selection Guidelines
278

279
### When to Use FlagLLMModel
280
- Working with instruction-tuned language models
281
- Need natural language understanding in embeddings
282
- Have computational resources for larger models
283
- Want to leverage instruction following capabilities
284

285
### When to Use FlagICLModel
286
- Need few-shot learning capabilities
287
- Working with domain-specific tasks
288
- Want to provide examples in context
289
- Need adaptability without fine-tuning
290

291
## Types
292

293
```python { .api }
294
from typing import Optional, List, Union
295
import torch
296
import numpy as np
297

298
# Decoder-specific pooling (only last_token supported)
299
DecoderPoolingMethod = Literal["last_token"]
300

301
# Instruction format templates
302
InstructionTemplate = str  # Format string with {} placeholders
303
```

Version

Tile

Files

decoder-embedders.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

decoder-embedders.mddocs/