Tessl Tile for pypi/mistralai@1.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agents.md audio.md batch.md beta.md chat-completions.md classification.md embeddings.md files.md fim.md fine-tuning.md index.md models.md ocr.md

embeddings.mddocs/

0
# Embeddings
1

2
Generate vector embeddings for text input with support for different models and output formats. Embeddings are dense vector representations of text that capture semantic meaning for use in search, clustering, and similarity tasks.
3

4
## Capabilities
5

6
### Create Embeddings
7

8
Generate vector embeddings for single or multiple text inputs with customizable output dimensions and formats.
9

10
```python { .api }
11
def create(
12
    model: str,
13
    inputs: Union[str, List[str]],
14
    output_dimension: Optional[int] = None,
15
    output_dtype: Optional[EmbeddingDtype] = None,
16
    encoding_format: Optional[EncodingFormat] = None,
17
    **kwargs
18
) -> EmbeddingResponse:
19
    """
20
    Create embeddings for text inputs.
21

22
    Parameters:
23
    - model: ID of the model to use
24
    - inputs: Text to embed (can be a string or list of strings)
25
    - output_dimension: The dimension of the output embeddings
26
    - output_dtype: Output data type ("float", "int8", "uint8", "binary", "ubinary")
27
    - encoding_format: Encoding format ("float", "base64")
28

29
    Returns:
30
    EmbeddingResponse with vector embeddings
31
    """
32
```
33

34
## Usage Examples
35

36
### Single Text Embedding
37

38
```python
39
from mistralai import Mistral
40

41
client = Mistral(api_key="your-api-key")
42

43
# Generate embedding for a single text
44
response = client.embeddings.create(
45
    model="mistral-embed",
46
    inputs="The quick brown fox jumps over the lazy dog."
47
)
48

49
# Access the embedding vector
50
embedding = response.data[0].embedding
51
print(f"Embedding dimension: {len(embedding)}")
52
print(f"First 5 values: {embedding[:5]}")
53
```
54

55
### Batch Text Embeddings
56

57
```python
58
# Generate embeddings for multiple texts
59
texts = [
60
    "Machine learning is a subset of artificial intelligence.",
61
    "Deep learning uses neural networks with multiple layers.",
62
    "Natural language processing helps computers understand text.",
63
    "Computer vision enables machines to interpret visual information."
64
]
65

66
response = client.embeddings.create(
67
    model="mistral-embed",
68
    inputs=texts
69
)
70

71
# Process embeddings
72
for i, embedding_data in enumerate(response.data):
73
    print(f"Text {i + 1}: {len(embedding_data.embedding)} dimensions")
74
    print(f"Text: {texts[i][:50]}...")
75
```
76

77
### Custom Output Format
78

79
```python
80
# Generate embeddings with specific output format
81
response = client.embeddings.create(
82
    model="mistral-embed",
83
    inputs="Semantic search with embeddings",
84
    encoding_format="base64",
85
    output_dtype="float"
86
)
87

88
embedding_data = response.data[0]
89
print(f"Encoding format: {response.encoding_format}")
90
print(f"Data type: {response.output_dtype}")
91
```
92

93
### Similarity Search
94

95
```python
96
import numpy as np
97
from scipy.spatial.distance import cosine
98

99
# Embed a query and documents
100
query = "What is machine learning?"
101
documents = [
102
    "Machine learning is a method of data analysis that automates analytical model building.",
103
    "Artificial intelligence is the simulation of human intelligence in machines.",
104
    "Data science combines domain expertise, programming skills, and statistical knowledge.",
105
    "Natural language processing is a branch of AI focused on language understanding."
106
]
107

108
# Get embeddings
109
query_response = client.embeddings.create(
110
    model="mistral-embed",
111
    inputs=query
112
)
113

114
doc_response = client.embeddings.create(
115
    model="mistral-embed", 
116
    inputs=documents
117
)
118

119
query_embedding = np.array(query_response.data[0].embedding)
120
doc_embeddings = [np.array(data.embedding) for data in doc_response.data]
121

122
# Calculate similarities
123
similarities = []
124
for doc_embedding in doc_embeddings:
125
    similarity = 1 - cosine(query_embedding, doc_embedding)
126
    similarities.append(similarity)
127

128
# Find most similar document
129
best_match_idx = np.argmax(similarities)
130
print(f"Query: {query}")
131
print(f"Most similar document: {documents[best_match_idx]}")
132
print(f"Similarity score: {similarities[best_match_idx]:.4f}")
133
```
134

135
## Types
136

137
### Request Types
138

139
```python { .api }
140
class EmbeddingRequest:
141
    model: str
142
    inputs: Union[str, List[str]]
143
    output_dimension: Optional[int]
144
    output_dtype: Optional[str]  # "float", "ubinary"
145
    encoding_format: Optional[str]  # "float", "base64"
146
```
147

148
### Response Types
149

150
```python { .api }
151
class EmbeddingResponse:
152
    id: str
153
    object: str
154
    data: List[EmbeddingResponseData]
155
    model: str
156
    usage: Optional[UsageInfo]
157

158
class EmbeddingResponseData:
159
    object: str
160
    embedding: List[float]
161
    index: int
162
```
163

164
### Configuration Types
165

166
```python { .api }
167
class EmbeddingDtype:
168
    FLOAT = "float"
169
    UBINARY = "ubinary"
170

171
class EncodingFormat:
172
    FLOAT = "float"
173
    BASE64 = "base64"
174
```
175

176
## Model Information
177

178
### Available Models
179

180
- **mistral-embed**: General-purpose embedding model for semantic search and similarity tasks
181
- Model-specific output dimensions and capabilities vary
182

183
### Output Formats
184

185
- **float**: Standard floating-point vectors (default)
186
- **base64**: Base64-encoded binary representation for reduced storage
187
- **ubinary**: Unsigned binary format for efficient storage and computation
188

189
### Usage Considerations
190

191
- Batch processing is more efficient for multiple texts
192
- Consider output format based on downstream usage requirements  
193
- Embedding dimensions are model-dependent and cannot be arbitrarily changed
194
- Consistent model usage is important for comparing embeddings across different requests

Version

Tile

Files

embeddings.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

embeddings.mddocs/