0
# Embeddings
1
2
Generate text embeddings for semantic search, clustering, similarity comparisons, and other natural language understanding tasks. Embeddings convert text into high-dimensional numerical vectors that capture semantic meaning, enabling mathematical operations on text.
3
4
## Capabilities
5
6
### Embed Content
7
8
Generate embeddings for text content. Supports single or multiple content inputs and various embedding tasks including retrieval, classification, and semantic similarity.
9
10
```python { .api }
11
def embed_content(
12
*,
13
model: str,
14
contents: Union[str, list[Content], Content],
15
config: Optional[EmbedContentConfig] = None
16
) -> EmbedContentResponse:
17
"""
18
Generate embeddings for content.
19
20
Parameters:
21
model (str): Model identifier for embeddings (e.g., 'text-embedding-004',
22
'text-multilingual-embedding-002'). Use models optimized for embeddings.
23
contents (Union[str, list[Content], Content]): Content to embed. Can be:
24
- str: Simple text to embed
25
- Content: Content object with text parts
26
- list[Content]: Multiple content objects to embed in batch
27
config (EmbedContentConfig, optional): Embedding configuration including:
28
- task_type: Type of embedding task (retrieval, classification, etc.)
29
- title: Document title for RETRIEVAL_DOCUMENT task
30
- output_dimensionality: Desired embedding dimension (model-dependent)
31
32
Returns:
33
EmbedContentResponse: Response containing embeddings and metadata.
34
Each embedding is a list of float values representing the vector.
35
36
Raises:
37
ClientError: For client errors (4xx status codes)
38
ServerError: For server errors (5xx status codes)
39
"""
40
...
41
42
async def embed_content(
43
*,
44
model: str,
45
contents: Union[str, list[Content], Content],
46
config: Optional[EmbedContentConfig] = None
47
) -> EmbedContentResponse:
48
"""Async version of embed_content."""
49
...
50
```
51
52
**Usage Example - Simple Embedding:**
53
54
```python
55
from google.genai import Client
56
57
client = Client(api_key='YOUR_API_KEY')
58
59
# Generate embedding for a single text
60
response = client.models.embed_content(
61
model='text-embedding-004',
62
contents='What is the capital of France?'
63
)
64
65
# Access the embedding vector
66
embedding = response.embeddings[0].values
67
print(f"Embedding dimension: {len(embedding)}")
68
print(f"First 5 values: {embedding[:5]}")
69
```
70
71
**Usage Example - Batch Embeddings:**
72
73
```python
74
from google.genai import Client
75
from google.genai.types import Content, Part
76
77
client = Client(api_key='YOUR_API_KEY')
78
79
# Embed multiple texts at once
80
texts = [
81
'Machine learning is a subset of AI',
82
'Deep learning uses neural networks',
83
'Natural language processing handles text'
84
]
85
86
contents = [Content(parts=[Part(text=text)]) for text in texts]
87
88
response = client.models.embed_content(
89
model='text-embedding-004',
90
contents=contents
91
)
92
93
print(f"Generated {len(response.embeddings)} embeddings")
94
for i, embedding in enumerate(response.embeddings):
95
print(f"Text {i+1}: dimension={len(embedding.values)}")
96
```
97
98
**Usage Example - Semantic Similarity:**
99
100
```python
101
import numpy as np
102
from google.genai import Client
103
104
def cosine_similarity(vec1, vec2):
105
"""Calculate cosine similarity between two vectors."""
106
dot_product = np.dot(vec1, vec2)
107
norm1 = np.linalg.norm(vec1)
108
norm2 = np.linalg.norm(vec2)
109
return dot_product / (norm1 * norm2)
110
111
client = Client(api_key='YOUR_API_KEY')
112
113
# Embed query and documents
114
query = "machine learning algorithms"
115
documents = [
116
"Neural networks are a type of machine learning model",
117
"The weather today is sunny and warm",
118
"Supervised learning requires labeled training data"
119
]
120
121
# Get embeddings
122
query_response = client.models.embed_content(
123
model='text-embedding-004',
124
contents=query
125
)
126
query_embedding = query_response.embeddings[0].values
127
128
doc_embeddings = []
129
for doc in documents:
130
response = client.models.embed_content(
131
model='text-embedding-004',
132
contents=doc
133
)
134
doc_embeddings.append(response.embeddings[0].values)
135
136
# Calculate similarities
137
for i, doc in enumerate(documents):
138
similarity = cosine_similarity(query_embedding, doc_embeddings[i])
139
print(f"Doc {i+1} similarity: {similarity:.4f}")
140
print(f" {doc}\n")
141
```
142
143
**Usage Example - Task-Specific Embeddings:**
144
145
```python
146
from google.genai import Client
147
from google.genai.types import EmbedContentConfig, TaskType
148
149
client = Client(api_key='YOUR_API_KEY')
150
151
# Embed for retrieval query
152
query_config = EmbedContentConfig(
153
task_type=TaskType.RETRIEVAL_QUERY
154
)
155
156
query_response = client.models.embed_content(
157
model='text-embedding-004',
158
contents='How does photosynthesis work?',
159
config=query_config
160
)
161
162
# Embed documents for retrieval
163
doc_config = EmbedContentConfig(
164
task_type=TaskType.RETRIEVAL_DOCUMENT,
165
title='Biology Textbook Chapter 3'
166
)
167
168
doc_response = client.models.embed_content(
169
model='text-embedding-004',
170
contents='Photosynthesis is the process by which plants convert light energy into chemical energy...',
171
config=doc_config
172
)
173
174
print("Query embedding generated")
175
print("Document embedding generated")
176
```
177
178
**Usage Example - Reduced Dimensionality:**
179
180
```python
181
from google.genai import Client
182
from google.genai.types import EmbedContentConfig
183
184
client = Client(api_key='YOUR_API_KEY')
185
186
# Generate embeddings with reduced dimensionality for efficiency
187
config = EmbedContentConfig(
188
output_dimensionality=256 # Reduce from default dimension
189
)
190
191
response = client.models.embed_content(
192
model='text-embedding-004',
193
contents='Text to embed with reduced dimensions',
194
config=config
195
)
196
197
embedding = response.embeddings[0].values
198
print(f"Embedding dimension: {len(embedding)}")
199
```
200
201
## Types
202
203
```python { .api }
204
from typing import Optional, Union, List, TypedDict
205
from enum import Enum
206
207
# Configuration types
208
class EmbedContentConfig:
209
"""
210
Configuration for embedding generation.
211
212
Attributes:
213
task_type (TaskType, optional): Type of embedding task. Different tasks may
214
produce optimized embeddings for specific use cases:
215
- RETRIEVAL_QUERY: Optimize for search queries
216
- RETRIEVAL_DOCUMENT: Optimize for searchable documents
217
- SEMANTIC_SIMILARITY: Optimize for similarity comparisons
218
- CLASSIFICATION: Optimize for text classification
219
- CLUSTERING: Optimize for clustering tasks
220
- QUESTION_ANSWERING: Optimize for QA tasks
221
- FACT_VERIFICATION: Optimize for fact checking
222
title (str, optional): Document title, used with RETRIEVAL_DOCUMENT task
223
to provide context for the embedding.
224
output_dimensionality (int, optional): Desired output dimension for the embedding.
225
Some models support dimensionality reduction. Smaller dimensions can reduce
226
storage and computation costs. Check model documentation for supported values.
227
"""
228
task_type: Optional[TaskType] = None
229
title: Optional[str] = None
230
output_dimensionality: Optional[int] = None
231
232
class TaskType(Enum):
233
"""Embedding task types for optimization."""
234
TASK_TYPE_UNSPECIFIED = 'TASK_TYPE_UNSPECIFIED'
235
RETRIEVAL_QUERY = 'RETRIEVAL_QUERY'
236
RETRIEVAL_DOCUMENT = 'RETRIEVAL_DOCUMENT'
237
SEMANTIC_SIMILARITY = 'SEMANTIC_SIMILARITY'
238
CLASSIFICATION = 'CLASSIFICATION'
239
CLUSTERING = 'CLUSTERING'
240
QUESTION_ANSWERING = 'QUESTION_ANSWERING'
241
FACT_VERIFICATION = 'FACT_VERIFICATION'
242
CODE_RETRIEVAL_QUERY = 'CODE_RETRIEVAL_QUERY'
243
244
# Response types
245
class EmbedContentResponse:
246
"""
247
Response from embedding generation.
248
249
Attributes:
250
embeddings (list[ContentEmbedding]): List of embeddings, one for each input content.
251
Each embedding contains the vector values and optional statistics.
252
metadata (EmbedContentMetadata, optional): Metadata about the embedding operation.
253
"""
254
embeddings: list[ContentEmbedding]
255
metadata: Optional[EmbedContentMetadata] = None
256
257
class ContentEmbedding:
258
"""
259
Individual content embedding.
260
261
Attributes:
262
values (list[float]): Embedding vector as a list of float values. The length
263
depends on the model and optional output_dimensionality configuration.
264
Typical dimensions: 768, 1024, 1536, or custom reduced dimensions.
265
statistics (ContentEmbeddingStatistics, optional): Statistics about the embedding.
266
"""
267
values: list[float]
268
statistics: Optional[ContentEmbeddingStatistics] = None
269
270
class ContentEmbeddingStatistics:
271
"""
272
Statistics about an embedding.
273
274
Attributes:
275
token_count (int, optional): Number of tokens in the input content.
276
truncated (bool, optional): Whether the input was truncated to fit model limits.
277
"""
278
token_count: Optional[int] = None
279
truncated: Optional[bool] = None
280
281
class EmbedContentMetadata:
282
"""
283
Metadata about the embedding operation.
284
285
Attributes:
286
model_version (str, optional): Version of the model used for embedding.
287
"""
288
model_version: Optional[str] = None
289
290
# Content types (shared with other capabilities)
291
class Content:
292
"""
293
Container for content with role and parts.
294
295
Attributes:
296
parts (list[Part]): List of content parts
297
role (str, optional): Role ('user' or 'model')
298
"""
299
parts: list[Part]
300
role: Optional[str] = None
301
302
class Part:
303
"""
304
Individual content part.
305
306
For embeddings, typically only text parts are used.
307
308
Attributes:
309
text (str, optional): Text content to embed
310
inline_data (Blob, optional): Inline binary data (rarely used for embeddings)
311
file_data (FileData, optional): Reference to file (rarely used for embeddings)
312
"""
313
text: Optional[str] = None
314
inline_data: Optional[Blob] = None
315
file_data: Optional[FileData] = None
316
317
class Blob:
318
"""
319
Binary data with MIME type.
320
321
Attributes:
322
mime_type (str): MIME type
323
data (bytes): Binary data
324
"""
325
mime_type: str
326
data: bytes
327
328
class FileData:
329
"""
330
Reference to uploaded file.
331
332
Attributes:
333
file_uri (str): URI of uploaded file
334
mime_type (str): MIME type
335
"""
336
file_uri: str
337
mime_type: str
338
339
# TypedDict variants for flexible usage
340
class EmbedContentConfigDict(TypedDict, total=False):
341
"""TypedDict variant of EmbedContentConfig."""
342
task_type: TaskType
343
title: str
344
output_dimensionality: int
345
```
346