Tessl Tile for pypi/openai-agents@0.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-agents.md guardrails.md handoffs.md index.md items-streaming.md lifecycle.md mcp.md memory-sessions.md model-providers.md realtime.md results-exceptions.md tools.md tracing.md voice-pipeline.md

voice-pipeline.mddocs/

0
# Voice Pipeline
1

2
The Voice Pipeline provides a framework for building voice processing workflows with speech-to-text (STT) and text-to-speech (TTS) capabilities. It enables creating custom voice assistants with pluggable audio models.
3

4
## Overview
5

6
The Voice Pipeline provides:
7
- Modular STT and TTS components
8
- Custom model integration
9
- Audio processing pipelines
10
- Voice-based agent interactions
11

12
## Capabilities
13

14
### Voice Pipeline
15

16
Main pipeline class for voice processing.
17

18
```python { .api }
19
class VoicePipeline:
20
    """
21
    Pipeline for voice processing.
22

23
    Coordinates STT, agent processing, and TTS
24
    for complete voice interaction workflows.
25
    """
26
```
27

28
Usage example:
29

30
```python
31
from agents.voice import VoicePipeline, STTModel, TTSModel
32

33
# Create pipeline with STT and TTS
34
pipeline = VoicePipeline(
35
    stt_model=my_stt_model,
36
    tts_model=my_tts_model,
37
    agent=my_agent
38
)
39

40
# Process voice input
41
audio_output = await pipeline.process(audio_input)
42
```
43

44
### STT Model Interface
45

46
Interface for speech-to-text models.
47

48
```python { .api }
49
class STTModel:
50
    """
51
    Speech-to-text model interface.
52

53
    Implement this to integrate custom STT models.
54
    """
55

56
    async def transcribe(
57
        audio_data: bytes,
58
        **kwargs
59
    ) -> str:
60
        """
61
        Transcribe audio to text.
62

63
        Parameters:
64
        - audio_data: Raw audio bytes
65
        - **kwargs: Additional parameters
66

67
        Returns:
68
        - str: Transcribed text
69
        """
70
```
71

72
Implementation example:
73

74
```python
75
from agents.voice import STTModel
76

77
class MySTTModel(STTModel):
78
    """Custom STT implementation."""
79

80
    def __init__(self, model_name: str):
81
        self.model_name = model_name
82
        # Initialize your STT model
83

84
    async def transcribe(self, audio_data: bytes, **kwargs) -> str:
85
        # Call your STT API or model
86
        result = await my_stt_api.transcribe(audio_data)
87
        return result.text
88

89
# Use custom STT
90
stt_model = MySTTModel("my-stt-v1")
91
```
92

93
### TTS Model Interface
94

95
Interface for text-to-speech models.
96

97
```python { .api }
98
class TTSModel:
99
    """
100
    Text-to-speech model interface.
101

102
    Implement this to integrate custom TTS models.
103
    """
104

105
    async def synthesize(
106
        text: str,
107
        **kwargs
108
    ) -> bytes:
109
        """
110
        Synthesize text to audio.
111

112
        Parameters:
113
        - text: Text to synthesize
114
        - **kwargs: Additional parameters (voice, rate, etc.)
115

116
        Returns:
117
        - bytes: Audio data
118
        """
119
```
120

121
Implementation example:
122

123
```python
124
from agents.voice import TTSModel
125

126
class MyTTSModel(TTSModel):
127
    """Custom TTS implementation."""
128

129
    def __init__(self, voice_id: str):
130
        self.voice_id = voice_id
131
        # Initialize your TTS model
132

133
    async def synthesize(self, text: str, **kwargs) -> bytes:
134
        # Call your TTS API or model
135
        audio = await my_tts_api.synthesize(
136
            text=text,
137
            voice=self.voice_id,
138
            **kwargs
139
        )
140
        return audio.data
141

142
# Use custom TTS
143
tts_model = MyTTSModel("voice-001")
144
```
145

146
## Complete Voice Workflow
147

148
Building a complete voice assistant:
149

150
```python
151
from agents import Agent, function_tool
152
from agents.voice import VoicePipeline, STTModel, TTSModel
153

154
# Define tools
155
@function_tool
156
def get_weather(city: str) -> str:
157
    """Get weather for a city."""
158
    return f"Weather in {city}: Sunny, 72°F"
159

160
# Create agent
161
agent = Agent(
162
    name="Voice Assistant",
163
    instructions="You are a voice assistant. Keep responses concise.",
164
    tools=[get_weather]
165
)
166

167
# Create voice pipeline
168
pipeline = VoicePipeline(
169
    stt_model=MySTTModel("stt-model"),
170
    tts_model=MyTTSModel("voice-001"),
171
    agent=agent
172
)
173

174
# Process voice input
175
async def handle_voice_input(audio_input: bytes):
176
    """Process voice input and return voice output."""
177
    audio_output = await pipeline.process(audio_input)
178
    return audio_output
179
```
180

181
## OpenAI STT/TTS Integration
182

183
Using OpenAI's speech APIs:
184

185
```python
186
from openai import AsyncOpenAI
187
from agents.voice import STTModel, TTSModel
188

189
class OpenAISTT(STTModel):
190
    """OpenAI Whisper STT."""
191

192
    def __init__(self):
193
        self.client = AsyncOpenAI()
194

195
    async def transcribe(self, audio_data: bytes, **kwargs) -> str:
196
        # Use OpenAI Whisper
197
        response = await self.client.audio.transcriptions.create(
198
            model="whisper-1",
199
            file=audio_data
200
        )
201
        return response.text
202

203
class OpenAITTS(TTSModel):
204
    """OpenAI TTS."""
205

206
    def __init__(self, voice: str = "alloy"):
207
        self.client = AsyncOpenAI()
208
        self.voice = voice
209

210
    async def synthesize(self, text: str, **kwargs) -> bytes:
211
        # Use OpenAI TTS
212
        response = await self.client.audio.speech.create(
213
            model="tts-1",
214
            voice=kwargs.get("voice", self.voice),
215
            input=text
216
        )
217
        return response.content
218

219
# Use OpenAI models
220
pipeline = VoicePipeline(
221
    stt_model=OpenAISTT(),
222
    tts_model=OpenAITTS(voice="nova"),
223
    agent=agent
224
)
225
```
226

227
## Audio Processing
228

229
Working with audio data:
230

231
```python
232
import io
233
from pydub import AudioSegment
234

235
async def process_audio_file(file_path: str):
236
    """Process audio file through voice pipeline."""
237

238
    # Load audio file
239
    audio = AudioSegment.from_file(file_path)
240

241
    # Convert to required format (e.g., wav)
242
    wav_buffer = io.BytesIO()
243
    audio.export(wav_buffer, format="wav")
244
    audio_data = wav_buffer.getvalue()
245

246
    # Process through pipeline
247
    output_audio = await pipeline.process(audio_data)
248

249
    # Save output
250
    output = AudioSegment.from_file(io.BytesIO(output_audio), format="wav")
251
    output.export("output.mp3", format="mp3")
252
```
253

254
## Streaming Audio
255

256
For real-time streaming, consider using the Realtime API instead:
257

258
```python
259
# For streaming audio, use Realtime API
260
from agents.realtime import RealtimeAgent, RealtimeRunner
261

262
# Realtime API provides better streaming support
263
```
264

265
## Voice Configuration
266

267
Configuring voice pipeline options:
268

269
```python
270
pipeline = VoicePipeline(
271
    stt_model=stt_model,
272
    tts_model=tts_model,
273
    agent=agent,
274
    stt_params={
275
        "language": "en",
276
        "temperature": 0.0
277
    },
278
    tts_params={
279
        "voice": "nova",
280
        "speed": 1.0
281
    }
282
)
283
```
284

285
## Best Practices
286

287
1. **Audio Format**: Use consistent audio formats (sample rate, channels, etc.)
288
2. **Model Selection**: Choose appropriate STT/TTS models for your use case
289
3. **Latency**: Minimize latency for better user experience
290
4. **Error Handling**: Handle audio processing errors gracefully
291
5. **Voice Selection**: Choose natural-sounding voices for TTS
292
6. **Concise Responses**: Keep agent responses brief for voice
293
7. **Testing**: Test with various audio inputs and accents
294
8. **Quality**: Monitor STT/TTS quality and adjust as needed
295
9. **Caching**: Cache TTS output for repeated phrases
296
10. **Streaming**: Use Realtime API for streaming scenarios
297

298
## Installation
299

300
Voice features require additional dependencies:
301

302
```bash
303
pip install 'openai-agents[voice]'
304
```
305

306
## Examples Location
307

308
Complete voice pipeline examples are available in the repository:
309
- `examples/voice/` - Voice pipeline examples
310

311
Refer to these examples for complete implementation details.
312

313
## Note
314

315
The Voice Pipeline is for batch/file-based voice processing. For real-time voice interactions (phone calls, live conversations), use the Realtime API instead.
316

317
Choose Voice Pipeline when you need:
318
- File-based audio processing
319
- Custom STT/TTS integration
320
- Batch voice processing
321
- Full control over audio pipeline
322

323
Choose Realtime API when you need:
324
- Real-time streaming audio
325
- Low-latency voice interactions
326
- Phone system integration
327
- Live conversational AI
328

329
For complete API reference and implementation details, refer to the source code and examples in the repository.
330

Version

Tile

Files

voice-pipeline.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

voice-pipeline.mddocs/