or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batches.mdcaching.mdchats.mdclient.mdcontent-generation.mdembeddings.mdfile-search-stores.mdfiles.mdimage-generation.mdindex.mdlive.mdmodels.mdoperations.mdtokens.mdtuning.mdvideo-generation.md

embeddings.mddocs/

0

# Embeddings

1

2

Generate text embeddings for semantic search, clustering, similarity comparisons, and other natural language understanding tasks. Embeddings convert text into high-dimensional numerical vectors that capture semantic meaning, enabling mathematical operations on text.

3

4

## Capabilities

5

6

### Embed Content

7

8

Generate embeddings for text content. Supports single or multiple content inputs and various embedding tasks including retrieval, classification, and semantic similarity.

9

10

```python { .api }

11

def embed_content(

12

*,

13

model: str,

14

contents: Union[str, list[Content], Content],

15

config: Optional[EmbedContentConfig] = None

16

) -> EmbedContentResponse:

17

"""

18

Generate embeddings for content.

19

20

Parameters:

21

model (str): Model identifier for embeddings (e.g., 'text-embedding-004',

22

'text-multilingual-embedding-002'). Use models optimized for embeddings.

23

contents (Union[str, list[Content], Content]): Content to embed. Can be:

24

- str: Simple text to embed

25

- Content: Content object with text parts

26

- list[Content]: Multiple content objects to embed in batch

27

config (EmbedContentConfig, optional): Embedding configuration including:

28

- task_type: Type of embedding task (retrieval, classification, etc.)

29

- title: Document title for RETRIEVAL_DOCUMENT task

30

- output_dimensionality: Desired embedding dimension (model-dependent)

31

32

Returns:

33

EmbedContentResponse: Response containing embeddings and metadata.

34

Each embedding is a list of float values representing the vector.

35

36

Raises:

37

ClientError: For client errors (4xx status codes)

38

ServerError: For server errors (5xx status codes)

39

"""

40

...

41

42

async def embed_content(

43

*,

44

model: str,

45

contents: Union[str, list[Content], Content],

46

config: Optional[EmbedContentConfig] = None

47

) -> EmbedContentResponse:

48

"""Async version of embed_content."""

49

...

50

```

51

52

**Usage Example - Simple Embedding:**

53

54

```python

55

from google.genai import Client

56

57

client = Client(api_key='YOUR_API_KEY')

58

59

# Generate embedding for a single text

60

response = client.models.embed_content(

61

model='text-embedding-004',

62

contents='What is the capital of France?'

63

)

64

65

# Access the embedding vector

66

embedding = response.embeddings[0].values

67

print(f"Embedding dimension: {len(embedding)}")

68

print(f"First 5 values: {embedding[:5]}")

69

```

70

71

**Usage Example - Batch Embeddings:**

72

73

```python

74

from google.genai import Client

75

from google.genai.types import Content, Part

76

77

client = Client(api_key='YOUR_API_KEY')

78

79

# Embed multiple texts at once

80

texts = [

81

'Machine learning is a subset of AI',

82

'Deep learning uses neural networks',

83

'Natural language processing handles text'

84

]

85

86

contents = [Content(parts=[Part(text=text)]) for text in texts]

87

88

response = client.models.embed_content(

89

model='text-embedding-004',

90

contents=contents

91

)

92

93

print(f"Generated {len(response.embeddings)} embeddings")

94

for i, embedding in enumerate(response.embeddings):

95

print(f"Text {i+1}: dimension={len(embedding.values)}")

96

```

97

98

**Usage Example - Semantic Similarity:**

99

100

```python

101

import numpy as np

102

from google.genai import Client

103

104

def cosine_similarity(vec1, vec2):

105

"""Calculate cosine similarity between two vectors."""

106

dot_product = np.dot(vec1, vec2)

107

norm1 = np.linalg.norm(vec1)

108

norm2 = np.linalg.norm(vec2)

109

return dot_product / (norm1 * norm2)

110

111

client = Client(api_key='YOUR_API_KEY')

112

113

# Embed query and documents

114

query = "machine learning algorithms"

115

documents = [

116

"Neural networks are a type of machine learning model",

117

"The weather today is sunny and warm",

118

"Supervised learning requires labeled training data"

119

]

120

121

# Get embeddings

122

query_response = client.models.embed_content(

123

model='text-embedding-004',

124

contents=query

125

)

126

query_embedding = query_response.embeddings[0].values

127

128

doc_embeddings = []

129

for doc in documents:

130

response = client.models.embed_content(

131

model='text-embedding-004',

132

contents=doc

133

)

134

doc_embeddings.append(response.embeddings[0].values)

135

136

# Calculate similarities

137

for i, doc in enumerate(documents):

138

similarity = cosine_similarity(query_embedding, doc_embeddings[i])

139

print(f"Doc {i+1} similarity: {similarity:.4f}")

140

print(f" {doc}\n")

141

```

142

143

**Usage Example - Task-Specific Embeddings:**

144

145

```python

146

from google.genai import Client

147

from google.genai.types import EmbedContentConfig, TaskType

148

149

client = Client(api_key='YOUR_API_KEY')

150

151

# Embed for retrieval query

152

query_config = EmbedContentConfig(

153

task_type=TaskType.RETRIEVAL_QUERY

154

)

155

156

query_response = client.models.embed_content(

157

model='text-embedding-004',

158

contents='How does photosynthesis work?',

159

config=query_config

160

)

161

162

# Embed documents for retrieval

163

doc_config = EmbedContentConfig(

164

task_type=TaskType.RETRIEVAL_DOCUMENT,

165

title='Biology Textbook Chapter 3'

166

)

167

168

doc_response = client.models.embed_content(

169

model='text-embedding-004',

170

contents='Photosynthesis is the process by which plants convert light energy into chemical energy...',

171

config=doc_config

172

)

173

174

print("Query embedding generated")

175

print("Document embedding generated")

176

```

177

178

**Usage Example - Reduced Dimensionality:**

179

180

```python

181

from google.genai import Client

182

from google.genai.types import EmbedContentConfig

183

184

client = Client(api_key='YOUR_API_KEY')

185

186

# Generate embeddings with reduced dimensionality for efficiency

187

config = EmbedContentConfig(

188

output_dimensionality=256 # Reduce from default dimension

189

)

190

191

response = client.models.embed_content(

192

model='text-embedding-004',

193

contents='Text to embed with reduced dimensions',

194

config=config

195

)

196

197

embedding = response.embeddings[0].values

198

print(f"Embedding dimension: {len(embedding)}")

199

```

200

201

## Types

202

203

```python { .api }

204

from typing import Optional, Union, List, TypedDict

205

from enum import Enum

206

207

# Configuration types

208

class EmbedContentConfig:

209

"""

210

Configuration for embedding generation.

211

212

Attributes:

213

task_type (TaskType, optional): Type of embedding task. Different tasks may

214

produce optimized embeddings for specific use cases:

215

- RETRIEVAL_QUERY: Optimize for search queries

216

- RETRIEVAL_DOCUMENT: Optimize for searchable documents

217

- SEMANTIC_SIMILARITY: Optimize for similarity comparisons

218

- CLASSIFICATION: Optimize for text classification

219

- CLUSTERING: Optimize for clustering tasks

220

- QUESTION_ANSWERING: Optimize for QA tasks

221

- FACT_VERIFICATION: Optimize for fact checking

222

title (str, optional): Document title, used with RETRIEVAL_DOCUMENT task

223

to provide context for the embedding.

224

output_dimensionality (int, optional): Desired output dimension for the embedding.

225

Some models support dimensionality reduction. Smaller dimensions can reduce

226

storage and computation costs. Check model documentation for supported values.

227

"""

228

task_type: Optional[TaskType] = None

229

title: Optional[str] = None

230

output_dimensionality: Optional[int] = None

231

232

class TaskType(Enum):

233

"""Embedding task types for optimization."""

234

TASK_TYPE_UNSPECIFIED = 'TASK_TYPE_UNSPECIFIED'

235

RETRIEVAL_QUERY = 'RETRIEVAL_QUERY'

236

RETRIEVAL_DOCUMENT = 'RETRIEVAL_DOCUMENT'

237

SEMANTIC_SIMILARITY = 'SEMANTIC_SIMILARITY'

238

CLASSIFICATION = 'CLASSIFICATION'

239

CLUSTERING = 'CLUSTERING'

240

QUESTION_ANSWERING = 'QUESTION_ANSWERING'

241

FACT_VERIFICATION = 'FACT_VERIFICATION'

242

CODE_RETRIEVAL_QUERY = 'CODE_RETRIEVAL_QUERY'

243

244

# Response types

245

class EmbedContentResponse:

246

"""

247

Response from embedding generation.

248

249

Attributes:

250

embeddings (list[ContentEmbedding]): List of embeddings, one for each input content.

251

Each embedding contains the vector values and optional statistics.

252

metadata (EmbedContentMetadata, optional): Metadata about the embedding operation.

253

"""

254

embeddings: list[ContentEmbedding]

255

metadata: Optional[EmbedContentMetadata] = None

256

257

class ContentEmbedding:

258

"""

259

Individual content embedding.

260

261

Attributes:

262

values (list[float]): Embedding vector as a list of float values. The length

263

depends on the model and optional output_dimensionality configuration.

264

Typical dimensions: 768, 1024, 1536, or custom reduced dimensions.

265

statistics (ContentEmbeddingStatistics, optional): Statistics about the embedding.

266

"""

267

values: list[float]

268

statistics: Optional[ContentEmbeddingStatistics] = None

269

270

class ContentEmbeddingStatistics:

271

"""

272

Statistics about an embedding.

273

274

Attributes:

275

token_count (int, optional): Number of tokens in the input content.

276

truncated (bool, optional): Whether the input was truncated to fit model limits.

277

"""

278

token_count: Optional[int] = None

279

truncated: Optional[bool] = None

280

281

class EmbedContentMetadata:

282

"""

283

Metadata about the embedding operation.

284

285

Attributes:

286

model_version (str, optional): Version of the model used for embedding.

287

"""

288

model_version: Optional[str] = None

289

290

# Content types (shared with other capabilities)

291

class Content:

292

"""

293

Container for content with role and parts.

294

295

Attributes:

296

parts (list[Part]): List of content parts

297

role (str, optional): Role ('user' or 'model')

298

"""

299

parts: list[Part]

300

role: Optional[str] = None

301

302

class Part:

303

"""

304

Individual content part.

305

306

For embeddings, typically only text parts are used.

307

308

Attributes:

309

text (str, optional): Text content to embed

310

inline_data (Blob, optional): Inline binary data (rarely used for embeddings)

311

file_data (FileData, optional): Reference to file (rarely used for embeddings)

312

"""

313

text: Optional[str] = None

314

inline_data: Optional[Blob] = None

315

file_data: Optional[FileData] = None

316

317

class Blob:

318

"""

319

Binary data with MIME type.

320

321

Attributes:

322

mime_type (str): MIME type

323

data (bytes): Binary data

324

"""

325

mime_type: str

326

data: bytes

327

328

class FileData:

329

"""

330

Reference to uploaded file.

331

332

Attributes:

333

file_uri (str): URI of uploaded file

334

mime_type (str): MIME type

335

"""

336

file_uri: str

337

mime_type: str

338

339

# TypedDict variants for flexible usage

340

class EmbedContentConfigDict(TypedDict, total=False):

341

"""TypedDict variant of EmbedContentConfig."""

342

task_type: TaskType

343

title: str

344

output_dimensionality: int

345

```

346