or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

agents.mdaudio.mdbatch.mdbeta.mdchat-completions.mdclassification.mdembeddings.mdfiles.mdfim.mdfine-tuning.mdindex.mdmodels.mdocr.md

audio.mddocs/

0

# Audio Transcription

1

2

Transcribe audio files to text with support for various audio formats and streaming. The audio API provides accurate speech-to-text conversion with language detection and formatting options.

3

4

## Capabilities

5

6

### Audio Transcription

7

8

Convert audio files to text with customizable options.

9

10

```python { .api }

11

def transcribe(

12

file: Union[str, BinaryIO],

13

model: str,

14

language: Optional[str] = None,

15

prompt: Optional[str] = None,

16

response_format: Optional[str] = None,

17

temperature: Optional[float] = None,

18

timestamp_granularities: Optional[List[str]] = None,

19

**kwargs

20

) -> TranscriptionResponse:

21

"""

22

Transcribe audio to text.

23

24

Parameters:

25

- file: Audio file path (string) or file-like object (BinaryIO)

26

- model: Transcription model identifier

27

- language: Optional language code (e.g., "en", "fr", "es")

28

- prompt: Optional prompt to guide transcription

29

- response_format: Output format ("json", "text", "srt", "vtt")

30

- temperature: Sampling temperature for transcription

31

- timestamp_granularities: Timestamp precision levels

32

33

Returns:

34

TranscriptionResponse with transcribed text and metadata

35

"""

36

```

37

38

### Streaming Transcription

39

40

Transcribe audio in real-time from streaming input.

41

42

```python { .api }

43

def transcribe_stream(

44

stream: Iterator[bytes],

45

model: str,

46

language: Optional[str] = None,

47

**kwargs

48

) -> Iterator[TranscriptionStreamEvents]:

49

"""

50

Transcribe streaming audio.

51

52

Parameters:

53

- stream: Iterator of audio bytes

54

- model: Transcription model identifier

55

- language: Optional language code

56

57

Returns:

58

Iterator of transcription events with partial and final results

59

"""

60

```

61

62

## Usage Examples

63

64

### Basic Audio Transcription

65

66

```python

67

from mistralai import Mistral

68

69

client = Mistral(api_key="your-api-key")

70

71

# Transcribe an audio file

72

with open("recording.mp3", "rb") as audio_file:

73

response = client.audio.transcribe(

74

file=audio_file,

75

model="whisper-1",

76

language="en",

77

response_format="json"

78

)

79

80

print("Transcription:")

81

print(response.text)

82

print(f"Language detected: {response.language}")

83

print(f"Duration: {response.duration} seconds")

84

```

85

86

### Transcription with Timestamps

87

88

```python

89

# Get detailed transcription with timestamps

90

response = client.audio.transcribe(

91

file="meeting_recording.wav",

92

model="whisper-1",

93

response_format="json",

94

timestamp_granularities=["word", "segment"]

95

)

96

97

print("Detailed transcription:")

98

for segment in response.segments:

99

start_time = segment.start

100

end_time = segment.end

101

text = segment.text

102

103

print(f"[{start_time:.2f}s - {end_time:.2f}s]: {text}")

104

105

# Word-level timestamps

106

if hasattr(response, 'words'):

107

print("\nWord-level timing:")

108

for word in response.words[:10]: # First 10 words

109

print(f"'{word.word}' at {word.start:.2f}s")

110

```

111

112

### Multiple Format Output

113

114

```python

115

# Get transcription in different formats

116

formats = ["json", "text", "srt", "vtt"]

117

118

for format in formats:

119

response = client.audio.transcribe(

120

file="presentation.m4a",

121

model="whisper-1",

122

response_format=format

123

)

124

125

# Save to file

126

extension = "txt" if format == "text" else format

127

with open(f"transcription.{extension}", "w") as f:

128

if format == "json":

129

f.write(response.text)

130

else:

131

f.write(response)

132

133

print(f"Saved transcription in {format} format")

134

```

135

136

### Streaming Transcription

137

138

```python

139

import pyaudio

140

import threading

141

import queue

142

143

# Setup audio stream

144

def audio_stream_generator():

145

audio = pyaudio.PyAudio()

146

stream = audio.open(

147

format=pyaudio.paInt16,

148

channels=1,

149

rate=16000,

150

input=True,

151

frames_per_buffer=1024

152

)

153

154

try:

155

while True:

156

data = stream.read(1024)

157

yield data

158

finally:

159

stream.stop_stream()

160

stream.close()

161

audio.terminate()

162

163

# Transcribe streaming audio

164

print("Starting real-time transcription...")

165

stream = client.audio.transcribe_stream(

166

stream=audio_stream_generator(),

167

model="whisper-1",

168

language="en"

169

)

170

171

for event in stream:

172

if event.type == "transcription.partial":

173

print(f"Partial: {event.text}", end="\r")

174

elif event.type == "transcription.completed":

175

print(f"\nFinal: {event.text}")

176

```

177

178

### Batch Audio Processing

179

180

```python

181

import os

182

183

# Process multiple audio files

184

audio_files = ["interview1.mp3", "interview2.wav", "lecture.m4a"]

185

transcriptions = {}

186

187

for audio_file in audio_files:

188

if os.path.exists(audio_file):

189

print(f"Processing {audio_file}...")

190

191

response = client.audio.transcribe(

192

file=audio_file,

193

model="whisper-1",

194

language="auto", # Auto-detect language

195

response_format="json"

196

)

197

198

transcriptions[audio_file] = {

199

"text": response.text,

200

"language": response.language,

201

"duration": response.duration

202

}

203

204

print(f" Completed: {len(response.text)} characters")

205

206

# Save all transcriptions

207

import json

208

with open("all_transcriptions.json", "w") as f:

209

json.dump(transcriptions, f, indent=2)

210

```

211

212

## Types

213

214

### Request Types

215

216

```python { .api }

217

class AudioTranscriptionRequest:

218

file: Union[str, BinaryIO]

219

model: str

220

language: Optional[str]

221

prompt: Optional[str]

222

response_format: Optional[str]

223

temperature: Optional[float]

224

timestamp_granularities: Optional[List[str]]

225

226

class AudioTranscriptionRequestStream:

227

stream: Iterator[bytes]

228

model: str

229

language: Optional[str]

230

```

231

232

### Response Types

233

234

```python { .api }

235

class TranscriptionResponse:

236

text: str

237

language: Optional[str]

238

duration: Optional[float]

239

segments: Optional[List[TranscriptionSegment]]

240

words: Optional[List[TranscriptionWord]]

241

242

class TranscriptionSegment:

243

id: int

244

start: float

245

end: float

246

text: str

247

temperature: Optional[float]

248

avg_logprob: Optional[float]

249

compression_ratio: Optional[float]

250

no_speech_prob: Optional[float]

251

252

class TranscriptionWord:

253

word: str

254

start: float

255

end: float

256

257

class TranscriptionStreamEvents:

258

type: str # "transcription.partial", "transcription.completed", "error"

259

text: Optional[str]

260

language: Optional[str]

261

timestamp: Optional[float]

262

```

263

264

### Stream Event Types

265

266

```python { .api }

267

class TranscriptionStreamEventTypes:

268

PARTIAL = "transcription.partial"

269

COMPLETED = "transcription.completed"

270

ERROR = "error"

271

DONE = "done"

272

```

273

274

## Supported Formats

275

276

### Audio Formats

277

278

- **MP3**: MPEG Audio Layer III

279

- **WAV**: Waveform Audio File Format

280

- **M4A**: MPEG-4 Audio

281

- **FLAC**: Free Lossless Audio Codec

282

- **OGG**: Ogg Vorbis

283

- **WEBM**: WebM Audio

284

285

### Response Formats

286

287

- **json**: Structured JSON with metadata

288

- **text**: Plain text transcription only

289

- **srt**: SubRip subtitle format with timestamps

290

- **vtt**: WebVTT subtitle format

291

292

### Language Support

293

294

Supports many languages including:

295

- English (en)

296

- Spanish (es)

297

- French (fr)

298

- German (de)

299

- Italian (it)

300

- Portuguese (pt)

301

- And many more...

302

303

## Best Practices

304

305

### Audio Quality

306

307

- Use clear, high-quality audio recordings

308

- Minimize background noise and echo

309

- Ensure consistent volume levels

310

- Use appropriate sample rates (16kHz or higher)

311

312

### Performance Optimization

313

314

- Use appropriate models for your use case

315

- Consider batch processing for multiple files

316

- Implement proper error handling for network issues

317

- Cache results for repeated transcriptions

318

319

### Accuracy Improvement

320

321

- Provide context through prompts when helpful

322

- Specify language when known for better accuracy

323

- Use temperature settings to control consistency

324

- Review and correct transcriptions for critical applications