Tessl Tile for pypi/mistralai@1.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agents.md audio.md batch.md beta.md chat-completions.md classification.md embeddings.md files.md fim.md fine-tuning.md index.md models.md ocr.md

audio.mddocs/

0
# Audio Transcription
1

2
Transcribe audio files to text with support for various audio formats and streaming. The audio API provides accurate speech-to-text conversion with language detection and formatting options.
3

4
## Capabilities
5

6
### Audio Transcription
7

8
Convert audio files to text with customizable options.
9

10
```python { .api }
11
def transcribe(
12
    file: Union[str, BinaryIO],
13
    model: str,
14
    language: Optional[str] = None,
15
    prompt: Optional[str] = None,
16
    response_format: Optional[str] = None,
17
    temperature: Optional[float] = None,
18
    timestamp_granularities: Optional[List[str]] = None,
19
    **kwargs
20
) -> TranscriptionResponse:
21
    """
22
    Transcribe audio to text.
23

24
    Parameters:
25
    - file: Audio file path (string) or file-like object (BinaryIO)
26
    - model: Transcription model identifier
27
    - language: Optional language code (e.g., "en", "fr", "es")
28
    - prompt: Optional prompt to guide transcription
29
    - response_format: Output format ("json", "text", "srt", "vtt")
30
    - temperature: Sampling temperature for transcription
31
    - timestamp_granularities: Timestamp precision levels
32

33
    Returns:
34
    TranscriptionResponse with transcribed text and metadata
35
    """
36
```
37

38
### Streaming Transcription
39

40
Transcribe audio in real-time from streaming input.
41

42
```python { .api }
43
def transcribe_stream(
44
    stream: Iterator[bytes],
45
    model: str,
46
    language: Optional[str] = None,
47
    **kwargs
48
) -> Iterator[TranscriptionStreamEvents]:
49
    """
50
    Transcribe streaming audio.
51

52
    Parameters:
53
    - stream: Iterator of audio bytes
54
    - model: Transcription model identifier  
55
    - language: Optional language code
56

57
    Returns:
58
    Iterator of transcription events with partial and final results
59
    """
60
```
61

62
## Usage Examples
63

64
### Basic Audio Transcription
65

66
```python
67
from mistralai import Mistral
68

69
client = Mistral(api_key="your-api-key")
70

71
# Transcribe an audio file
72
with open("recording.mp3", "rb") as audio_file:
73
    response = client.audio.transcribe(
74
        file=audio_file,
75
        model="whisper-1",
76
        language="en",
77
        response_format="json"
78
    )
79

80
print("Transcription:")
81
print(response.text)
82
print(f"Language detected: {response.language}")
83
print(f"Duration: {response.duration} seconds")
84
```
85

86
### Transcription with Timestamps
87

88
```python
89
# Get detailed transcription with timestamps
90
response = client.audio.transcribe(
91
    file="meeting_recording.wav",
92
    model="whisper-1", 
93
    response_format="json",
94
    timestamp_granularities=["word", "segment"]
95
)
96

97
print("Detailed transcription:")
98
for segment in response.segments:
99
    start_time = segment.start
100
    end_time = segment.end
101
    text = segment.text
102
    
103
    print(f"[{start_time:.2f}s - {end_time:.2f}s]: {text}")
104

105
# Word-level timestamps
106
if hasattr(response, 'words'):
107
    print("\nWord-level timing:")
108
    for word in response.words[:10]:  # First 10 words
109
        print(f"'{word.word}' at {word.start:.2f}s")
110
```
111

112
### Multiple Format Output
113

114
```python
115
# Get transcription in different formats
116
formats = ["json", "text", "srt", "vtt"]
117

118
for format in formats:
119
    response = client.audio.transcribe(
120
        file="presentation.m4a",
121
        model="whisper-1",
122
        response_format=format
123
    )
124
    
125
    # Save to file
126
    extension = "txt" if format == "text" else format
127
    with open(f"transcription.{extension}", "w") as f:
128
        if format == "json":
129
            f.write(response.text)
130
        else:
131
            f.write(response)
132
    
133
    print(f"Saved transcription in {format} format")
134
```
135

136
### Streaming Transcription
137

138
```python
139
import pyaudio
140
import threading
141
import queue
142

143
# Setup audio stream
144
def audio_stream_generator():
145
    audio = pyaudio.PyAudio()
146
    stream = audio.open(
147
        format=pyaudio.paInt16,
148
        channels=1,
149
        rate=16000,
150
        input=True,
151
        frames_per_buffer=1024
152
    )
153
    
154
    try:
155
        while True:
156
            data = stream.read(1024)
157
            yield data
158
    finally:
159
        stream.stop_stream()
160
        stream.close()
161
        audio.terminate()
162

163
# Transcribe streaming audio
164
print("Starting real-time transcription...")
165
stream = client.audio.transcribe_stream(
166
    stream=audio_stream_generator(),
167
    model="whisper-1",
168
    language="en"
169
)
170

171
for event in stream:
172
    if event.type == "transcription.partial":
173
        print(f"Partial: {event.text}", end="\r")
174
    elif event.type == "transcription.completed":
175
        print(f"\nFinal: {event.text}")
176
```
177

178
### Batch Audio Processing
179

180
```python
181
import os
182

183
# Process multiple audio files
184
audio_files = ["interview1.mp3", "interview2.wav", "lecture.m4a"]
185
transcriptions = {}
186

187
for audio_file in audio_files:
188
    if os.path.exists(audio_file):
189
        print(f"Processing {audio_file}...")
190
        
191
        response = client.audio.transcribe(
192
            file=audio_file,
193
            model="whisper-1",
194
            language="auto",  # Auto-detect language
195
            response_format="json"
196
        )
197
        
198
        transcriptions[audio_file] = {
199
            "text": response.text,
200
            "language": response.language,
201
            "duration": response.duration
202
        }
203
        
204
        print(f"  Completed: {len(response.text)} characters")
205

206
# Save all transcriptions
207
import json
208
with open("all_transcriptions.json", "w") as f:
209
    json.dump(transcriptions, f, indent=2)
210
```
211

212
## Types
213

214
### Request Types
215

216
```python { .api }
217
class AudioTranscriptionRequest:
218
    file: Union[str, BinaryIO]
219
    model: str
220
    language: Optional[str]
221
    prompt: Optional[str]
222
    response_format: Optional[str]
223
    temperature: Optional[float]
224
    timestamp_granularities: Optional[List[str]]
225

226
class AudioTranscriptionRequestStream:
227
    stream: Iterator[bytes]
228
    model: str
229
    language: Optional[str]
230
```
231

232
### Response Types
233

234
```python { .api }
235
class TranscriptionResponse:
236
    text: str
237
    language: Optional[str]
238
    duration: Optional[float]
239
    segments: Optional[List[TranscriptionSegment]]
240
    words: Optional[List[TranscriptionWord]]
241

242
class TranscriptionSegment:
243
    id: int
244
    start: float
245
    end: float
246
    text: str
247
    temperature: Optional[float]
248
    avg_logprob: Optional[float]
249
    compression_ratio: Optional[float]
250
    no_speech_prob: Optional[float]
251

252
class TranscriptionWord:
253
    word: str
254
    start: float
255
    end: float
256

257
class TranscriptionStreamEvents:
258
    type: str  # "transcription.partial", "transcription.completed", "error"
259
    text: Optional[str]
260
    language: Optional[str]
261
    timestamp: Optional[float]
262
```
263

264
### Stream Event Types
265

266
```python { .api }
267
class TranscriptionStreamEventTypes:
268
    PARTIAL = "transcription.partial"
269
    COMPLETED = "transcription.completed" 
270
    ERROR = "error"
271
    DONE = "done"
272
```
273

274
## Supported Formats
275

276
### Audio Formats
277

278
- **MP3**: MPEG Audio Layer III
279
- **WAV**: Waveform Audio File Format
280
- **M4A**: MPEG-4 Audio
281
- **FLAC**: Free Lossless Audio Codec
282
- **OGG**: Ogg Vorbis
283
- **WEBM**: WebM Audio
284

285
### Response Formats
286

287
- **json**: Structured JSON with metadata
288
- **text**: Plain text transcription only
289
- **srt**: SubRip subtitle format with timestamps
290
- **vtt**: WebVTT subtitle format
291

292
### Language Support
293

294
Supports many languages including:
295
- English (en)
296
- Spanish (es) 
297
- French (fr)
298
- German (de)
299
- Italian (it)
300
- Portuguese (pt)
301
- And many more...
302

303
## Best Practices
304

305
### Audio Quality
306

307
- Use clear, high-quality audio recordings
308
- Minimize background noise and echo
309
- Ensure consistent volume levels
310
- Use appropriate sample rates (16kHz or higher)
311

312
### Performance Optimization
313

314
- Use appropriate models for your use case
315
- Consider batch processing for multiple files
316
- Implement proper error handling for network issues
317
- Cache results for repeated transcriptions
318

319
### Accuracy Improvement
320

321
- Provide context through prompts when helpful
322
- Specify language when known for better accuracy
323
- Use temperature settings to control consistency
324
- Review and correct transcriptions for critical applications

Version

Tile

Files

audio.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

audio.mddocs/