0
# Text Embeddings
1
2
Generate high-quality vector embeddings for text inputs, supporting both single strings and batch processing. The embeddings API converts text into dense vector representations that can be used for semantic search, clustering, and other machine learning tasks.
3
4
## Capabilities
5
6
### Create Embeddings
7
8
Generate embeddings for single text strings or batch process multiple texts simultaneously.
9
10
```python { .api }
11
def create(
12
input: Union[str, List[str]],
13
model: Union[str, Literal["nomic-embed-text-v1_5"]],
14
encoding_format: Literal["float", "base64"] | NotGiven = NOT_GIVEN,
15
user: Optional[str] | NotGiven = NOT_GIVEN,
16
extra_headers: Headers | None = None,
17
extra_query: Query | None = None,
18
extra_body: Body | None = None,
19
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN
20
) -> CreateEmbeddingResponse:
21
"""
22
Create embeddings for the given input text(s).
23
24
Parameters:
25
- input: Text string or list of text strings to embed
26
- model: Model identifier to use for embeddings
27
- encoding_format: Format for the embedding vectors ("float" or "base64")
28
- user: Unique identifier representing your end-user
29
30
Returns:
31
CreateEmbeddingResponse containing embedding vectors and usage information
32
"""
33
```
34
35
### Async Create Embeddings
36
37
Asynchronous version of embedding creation with identical parameters and functionality.
38
39
```python { .api }
40
async def create(
41
input: Union[str, List[str]],
42
model: Union[str, Literal["nomic-embed-text-v1_5"]],
43
encoding_format: Literal["float", "base64"] | NotGiven = NOT_GIVEN,
44
user: Optional[str] | NotGiven = NOT_GIVEN,
45
**kwargs
46
) -> CreateEmbeddingResponse:
47
"""Async version of create() with identical parameters."""
48
```
49
50
## Usage Examples
51
52
### Single Text Embedding
53
54
```python
55
from groq import Groq
56
57
client = Groq()
58
59
response = client.embeddings.create(
60
input="The quick brown fox jumps over the lazy dog",
61
model="nomic-embed-text-v1_5"
62
)
63
64
embedding = response.data[0].embedding
65
print(f"Embedding dimension: {len(embedding)}")
66
print(f"First few values: {embedding[:5]}")
67
```
68
69
### Batch Text Embeddings
70
71
```python
72
from groq import Groq
73
74
client = Groq()
75
76
texts = [
77
"Machine learning is a subset of artificial intelligence.",
78
"Deep learning uses neural networks with multiple layers.",
79
"Natural language processing helps computers understand text.",
80
"Computer vision enables machines to interpret visual information."
81
]
82
83
response = client.embeddings.create(
84
input=texts,
85
model="nomic-embed-text-v1_5"
86
)
87
88
for i, embedding_obj in enumerate(response.data):
89
print(f"Text {i+1} embedding dimension: {len(embedding_obj.embedding)}")
90
```
91
92
### Async Usage
93
94
```python
95
import asyncio
96
from groq import AsyncGroq
97
98
async def main():
99
client = AsyncGroq()
100
101
response = await client.embeddings.create(
102
input="Async embedding generation example",
103
model="nomic-embed-text-v1_5"
104
)
105
106
embedding = response.data[0].embedding
107
print(f"Generated embedding with {len(embedding)} dimensions")
108
109
asyncio.run(main())
110
```
111
112
### Semantic Search Example
113
114
```python
115
import numpy as np
116
from groq import Groq
117
118
client = Groq()
119
120
# Documents to search through
121
documents = [
122
"Python is a high-level programming language.",
123
"JavaScript is used for web development.",
124
"Machine learning algorithms can predict outcomes.",
125
"Databases store and organize information.",
126
"APIs enable communication between applications."
127
]
128
129
# Query to search for
130
query = "programming languages for software development"
131
132
# Generate embeddings for all documents and query
133
all_texts = documents + [query]
134
response = client.embeddings.create(
135
input=all_texts,
136
model="nomic-embed-text-v1_5"
137
)
138
139
# Extract embeddings
140
doc_embeddings = [resp.embedding for resp in response.data[:-1]]
141
query_embedding = response.data[-1].embedding
142
143
# Calculate cosine similarity
144
def cosine_similarity(a, b):
145
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
146
147
# Find most similar document
148
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings]
149
best_match_idx = np.argmax(similarities)
150
151
print(f"Query: {query}")
152
print(f"Most similar document: {documents[best_match_idx]}")
153
print(f"Similarity score: {similarities[best_match_idx]:.4f}")
154
```
155
156
## Types
157
158
### Request Types
159
160
```python { .api }
161
class EmbeddingCreateParams:
162
input: Union[str, List[str]]
163
model: Union[str, Literal["nomic-embed-text-v1_5"]]
164
encoding_format: Literal["float", "base64"] | NotGiven
165
user: Optional[str] | NotGiven
166
```
167
168
### Response Types
169
170
```python { .api }
171
class CreateEmbeddingResponse:
172
data: List[Embedding]
173
model: str
174
object: Literal["list"]
175
usage: EmbeddingUsage
176
177
class Embedding:
178
embedding: List[float]
179
index: int
180
object: Literal["embedding"]
181
182
class EmbeddingUsage:
183
prompt_tokens: int
184
total_tokens: int
185
```