or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

auto-models.mdbase-classes.mddecoder-embedders.mdencoder-embedders.mdindex.mdmodel-types.mdrerankers.md

decoder-embedders.mddocs/

0

# Decoder-Only Embedders

1

2

Embedders designed for decoder-only transformer models (LLM-like architectures). These models leverage large language model capabilities for embedding generation, often using the last token for representation and supporting instruction-based formatting.

3

4

## Capabilities

5

6

### FlagLLMModel (Base LLM Embedder)

7

8

Standard embedder for decoder-only models using last token pooling. Designed for large language models that generate embeddings through their natural language understanding capabilities.

9

10

```python { .api }

11

from typing import Union

12

13

class FlagLLMModel(AbsEmbedder):

14

def __init__(

15

self,

16

model_name_or_path: str,

17

pooling_method: str = "last_token",

18

normalize_embeddings: bool = True,

19

use_fp16: bool = True,

20

query_instruction_for_retrieval: Optional[str] = None,

21

query_instruction_format: str = "Instruct: {}\nQuery: {}",

22

devices: Optional[Union[str, List[str]]] = None,

23

batch_size: int = 256,

24

query_max_length: int = 512,

25

passage_max_length: int = 512,

26

convert_to_numpy: bool = True,

27

**kwargs

28

):

29

"""

30

Initialize decoder-only LLM embedder.

31

32

Args:

33

model_name_or_path: Path to model or HuggingFace model name

34

pooling_method: Pooling strategy ("last_token")

35

normalize_embeddings: Whether to normalize output embeddings

36

use_fp16: Use half precision for inference

37

query_instruction_for_retrieval: Instruction for retrieval queries

38

query_instruction_format: Format string for instructions

39

devices: List of devices for multi-GPU inference

40

batch_size: Default batch size for encoding

41

query_max_length: Maximum query token length

42

passage_max_length: Maximum passage token length

43

convert_to_numpy: Convert outputs to numpy arrays

44

**kwargs: Additional model parameters

45

"""

46

```

47

48

### FlagICLModel (In-Context Learning Embedder)

49

50

Specialized embedder for in-context learning approaches with large language models. Leverages few-shot examples and context to generate high-quality embeddings.

51

52

```python { .api }

53

class FlagICLModel(AbsEmbedder):

54

def __init__(

55

self,

56

model_name_or_path: str,

57

pooling_method: str = "last_token",

58

normalize_embeddings: bool = True,

59

use_fp16: bool = True,

60

query_instruction_for_retrieval: Optional[str] = None,

61

query_instruction_format: str = "{}{}",

62

devices: Optional[Union[str, List[str]]] = None,

63

batch_size: int = 256,

64

query_max_length: int = 512,

65

passage_max_length: int = 512,

66

convert_to_numpy: bool = True,

67

**kwargs

68

):

69

"""

70

Initialize in-context learning embedder.

71

72

Args:

73

model_name_or_path: Path to ICL-capable model

74

pooling_method: Pooling strategy ("last_token")

75

normalize_embeddings: Whether to normalize output embeddings

76

use_fp16: Use half precision for inference

77

query_instruction_for_retrieval: Instruction for retrieval queries

78

query_instruction_format: Format string for instructions

79

devices: List of devices for multi-GPU inference

80

batch_size: Default batch size for encoding

81

query_max_length: Maximum query token length

82

passage_max_length: Maximum passage token length

83

convert_to_numpy: Convert outputs to numpy arrays

84

**kwargs: Additional model parameters

85

"""

86

```

87

88

## Usage Examples

89

90

### Basic LLM Embedder

91

92

```python

93

from FlagEmbedding import FlagLLMModel

94

95

# Initialize LLM embedder with last token pooling

96

embedder = FlagLLMModel(

97

'e5-mistral-7b-instruct',

98

pooling_method="last_token",

99

use_fp16=True

100

)

101

102

# Encode queries and passages

103

queries = ["What are the applications of machine learning?"]

104

passages = ["Machine learning is applied in healthcare, finance, and autonomous systems"]

105

106

query_embeddings = embedder.encode_queries(queries)

107

passage_embeddings = embedder.encode_corpus(passages)

108

109

print(f"Query embedding shape: {query_embeddings.shape}")

110

print(f"Passage embedding shape: {passage_embeddings.shape}")

111

```

112

113

### Custom Instruction Formatting

114

115

```python

116

from FlagEmbedding import FlagLLMModel

117

118

# Use custom instruction format for queries

119

embedder = FlagLLMModel(

120

'e5-mistral-7b-instruct',

121

query_instruction_for_retrieval="Given a question, retrieve relevant documents that answer the question",

122

query_instruction_format="Instruct: {}\\nQuery: {}",

123

use_fp16=True

124

)

125

126

# Queries will be formatted with custom instructions

127

queries = ["How do neural networks learn?"]

128

embeddings = embedder.encode_queries(queries)

129

```

130

131

### In-Context Learning Embedder

132

133

```python

134

from FlagEmbedding import FlagICLModel

135

136

# Initialize ICL embedder for few-shot learning

137

embedder = FlagICLModel(

138

'bge-en-icl',

139

use_fp16=True,

140

batch_size=64 # Smaller batch for memory efficiency

141

)

142

143

# ICL works well with examples in context

144

queries = [

145

"Example: 'What is AI?' -> AI concepts. Query: 'What is machine learning?'"

146

]

147

148

embeddings = embedder.encode_queries(queries)

149

```

150

151

### Multi-GPU LLM Processing

152

153

```python

154

from FlagEmbedding import FlagLLMModel

155

156

# Use multiple GPUs for large LLM models

157

embedder = FlagLLMModel(

158

'e5-mistral-7b-instruct',

159

devices=['cuda:0', 'cuda:1'],

160

batch_size=32, # Smaller batch size for large models

161

use_fp16=True

162

)

163

164

# Process documents efficiently across GPUs

165

documents = [f"Document {i} content" for i in range(1000)]

166

embeddings = embedder.encode_corpus(documents)

167

```

168

169

### Custom Max Length Settings

170

171

```python

172

from FlagEmbedding import FlagLLMModel

173

174

# Configure different max lengths for queries vs passages

175

embedder = FlagLLMModel(

176

'e5-mistral-7b-instruct',

177

query_max_length=256, # Shorter for queries

178

passage_max_length=1024, # Longer for passages

179

use_fp16=True

180

)

181

182

# Long passage encoding

183

long_passage = "Very long document content..." * 100

184

passage_embedding = embedder.encode_corpus([long_passage])

185

```

186

187

### Retrieval-Specific Instructions

188

189

```python

190

from FlagEmbedding import FlagLLMModel

191

192

# Specialized instructions for different retrieval tasks

193

qa_embedder = FlagLLMModel(

194

'e5-mistral-7b-instruct',

195

query_instruction_for_retrieval="Represent this question for retrieving relevant answers",

196

query_instruction_format="Task: {}\\nInput: {}"

197

)

198

199

semantic_embedder = FlagLLMModel(

200

'e5-mistral-7b-instruct',

201

query_instruction_for_retrieval="Encode this text for semantic similarity search",

202

query_instruction_format="{}: {}"

203

)

204

205

# Different use cases

206

qa_queries = ["What causes climate change?"]

207

semantic_queries = ["renewable energy technologies"]

208

209

qa_embeddings = qa_embedder.encode_queries(qa_queries)

210

semantic_embeddings = semantic_embedder.encode_queries(semantic_queries)

211

```

212

213

### Comparing Decoder vs Encoder Models

214

215

```python

216

from FlagEmbedding import FlagLLMModel, FlagModel

217

218

# Decoder-only model

219

llm_embedder = FlagLLMModel('e5-mistral-7b-instruct')

220

221

# Encoder-only model

222

encoder_embedder = FlagModel('bge-large-en-v1.5')

223

224

text = ["Machine learning algorithms"]

225

226

# Both produce embeddings but with different characteristics

227

llm_emb = llm_embedder.encode(text)

228

encoder_emb = encoder_embedder.encode(text)

229

230

print(f"LLM embedding shape: {llm_emb.shape}")

231

print(f"Encoder embedding shape: {encoder_emb.shape}")

232

```

233

234

### Memory-Efficient Processing

235

236

```python

237

from FlagEmbedding import FlagLLMModel

238

239

# Configure for memory-constrained environments

240

embedder = FlagLLMModel(

241

'e5-mistral-7b-instruct',

242

use_fp16=True,

243

batch_size=8, # Very small batch

244

devices=['cuda:0'], # Single GPU

245

convert_to_numpy=True # Free GPU memory faster

246

)

247

248

# Process in smaller chunks

249

large_corpus = [f"Document {i}" for i in range(10000)]

250

chunk_size = 100

251

252

all_embeddings = []

253

for i in range(0, len(large_corpus), chunk_size):

254

chunk = large_corpus[i:i+chunk_size]

255

chunk_embeddings = embedder.encode_corpus(chunk)

256

all_embeddings.append(chunk_embeddings)

257

258

# Combine results

259

import numpy as np

260

final_embeddings = np.vstack(all_embeddings)

261

```

262

263

## Supported Models

264

265

### E5 LLM Models

266

- e5-mistral-7b-instruct (instruction-tuned Mistral)

267

268

### BGE LLM Models

269

- bge-en-icl (in-context learning model)

270

- bge-multilingual-gemma2

271

272

### GTE LLM Models

273

- gte-Qwen2-7B-instruct

274

- gte-Qwen2-1.5B-instruct

275

- gte-Qwen1.5-7B-instruct

276

277

## Model Selection Guidelines

278

279

### When to Use FlagLLMModel

280

- Working with instruction-tuned language models

281

- Need natural language understanding in embeddings

282

- Have computational resources for larger models

283

- Want to leverage instruction following capabilities

284

285

### When to Use FlagICLModel

286

- Need few-shot learning capabilities

287

- Working with domain-specific tasks

288

- Want to provide examples in context

289

- Need adaptability without fine-tuning

290

291

## Types

292

293

```python { .api }

294

from typing import Optional, List, Union

295

import torch

296

import numpy as np

297

298

# Decoder-specific pooling (only last_token supported)

299

DecoderPoolingMethod = Literal["last_token"]

300

301

# Instruction format templates

302

InstructionTemplate = str # Format string with {} placeholders

303

```