Lightweight framework for building multi-agent workflows with LLMs, supporting handoffs, guardrails, tools, and 100+ LLM providers
The Voice Pipeline provides a framework for building voice processing workflows with speech-to-text (STT) and text-to-speech (TTS) capabilities. It enables creating custom voice assistants with pluggable audio models.
The Voice Pipeline provides:
Main pipeline class for voice processing.
class VoicePipeline:
"""
Pipeline for voice processing.
Coordinates STT, agent processing, and TTS
for complete voice interaction workflows.
"""Usage example:
from agents.voice import VoicePipeline, STTModel, TTSModel
# Create pipeline with STT and TTS
pipeline = VoicePipeline(
stt_model=my_stt_model,
tts_model=my_tts_model,
agent=my_agent
)
# Process voice input
audio_output = await pipeline.process(audio_input)Interface for speech-to-text models.
class STTModel:
"""
Speech-to-text model interface.
Implement this to integrate custom STT models.
"""
async def transcribe(
audio_data: bytes,
**kwargs
) -> str:
"""
Transcribe audio to text.
Parameters:
- audio_data: Raw audio bytes
- **kwargs: Additional parameters
Returns:
- str: Transcribed text
"""Implementation example:
from agents.voice import STTModel
class MySTTModel(STTModel):
"""Custom STT implementation."""
def __init__(self, model_name: str):
self.model_name = model_name
# Initialize your STT model
async def transcribe(self, audio_data: bytes, **kwargs) -> str:
# Call your STT API or model
result = await my_stt_api.transcribe(audio_data)
return result.text
# Use custom STT
stt_model = MySTTModel("my-stt-v1")Interface for text-to-speech models.
class TTSModel:
"""
Text-to-speech model interface.
Implement this to integrate custom TTS models.
"""
async def synthesize(
text: str,
**kwargs
) -> bytes:
"""
Synthesize text to audio.
Parameters:
- text: Text to synthesize
- **kwargs: Additional parameters (voice, rate, etc.)
Returns:
- bytes: Audio data
"""Implementation example:
from agents.voice import TTSModel
class MyTTSModel(TTSModel):
"""Custom TTS implementation."""
def __init__(self, voice_id: str):
self.voice_id = voice_id
# Initialize your TTS model
async def synthesize(self, text: str, **kwargs) -> bytes:
# Call your TTS API or model
audio = await my_tts_api.synthesize(
text=text,
voice=self.voice_id,
**kwargs
)
return audio.data
# Use custom TTS
tts_model = MyTTSModel("voice-001")Building a complete voice assistant:
from agents import Agent, function_tool
from agents.voice import VoicePipeline, STTModel, TTSModel
# Define tools
@function_tool
def get_weather(city: str) -> str:
"""Get weather for a city."""
return f"Weather in {city}: Sunny, 72°F"
# Create agent
agent = Agent(
name="Voice Assistant",
instructions="You are a voice assistant. Keep responses concise.",
tools=[get_weather]
)
# Create voice pipeline
pipeline = VoicePipeline(
stt_model=MySTTModel("stt-model"),
tts_model=MyTTSModel("voice-001"),
agent=agent
)
# Process voice input
async def handle_voice_input(audio_input: bytes):
"""Process voice input and return voice output."""
audio_output = await pipeline.process(audio_input)
return audio_outputUsing OpenAI's speech APIs:
from openai import AsyncOpenAI
from agents.voice import STTModel, TTSModel
class OpenAISTT(STTModel):
"""OpenAI Whisper STT."""
def __init__(self):
self.client = AsyncOpenAI()
async def transcribe(self, audio_data: bytes, **kwargs) -> str:
# Use OpenAI Whisper
response = await self.client.audio.transcriptions.create(
model="whisper-1",
file=audio_data
)
return response.text
class OpenAITTS(TTSModel):
"""OpenAI TTS."""
def __init__(self, voice: str = "alloy"):
self.client = AsyncOpenAI()
self.voice = voice
async def synthesize(self, text: str, **kwargs) -> bytes:
# Use OpenAI TTS
response = await self.client.audio.speech.create(
model="tts-1",
voice=kwargs.get("voice", self.voice),
input=text
)
return response.content
# Use OpenAI models
pipeline = VoicePipeline(
stt_model=OpenAISTT(),
tts_model=OpenAITTS(voice="nova"),
agent=agent
)Working with audio data:
import io
from pydub import AudioSegment
async def process_audio_file(file_path: str):
"""Process audio file through voice pipeline."""
# Load audio file
audio = AudioSegment.from_file(file_path)
# Convert to required format (e.g., wav)
wav_buffer = io.BytesIO()
audio.export(wav_buffer, format="wav")
audio_data = wav_buffer.getvalue()
# Process through pipeline
output_audio = await pipeline.process(audio_data)
# Save output
output = AudioSegment.from_file(io.BytesIO(output_audio), format="wav")
output.export("output.mp3", format="mp3")For real-time streaming, consider using the Realtime API instead:
# For streaming audio, use Realtime API
from agents.realtime import RealtimeAgent, RealtimeRunner
# Realtime API provides better streaming supportConfiguring voice pipeline options:
pipeline = VoicePipeline(
stt_model=stt_model,
tts_model=tts_model,
agent=agent,
stt_params={
"language": "en",
"temperature": 0.0
},
tts_params={
"voice": "nova",
"speed": 1.0
}
)Voice features require additional dependencies:
pip install 'openai-agents[voice]'Complete voice pipeline examples are available in the repository:
examples/voice/ - Voice pipeline examplesRefer to these examples for complete implementation details.
The Voice Pipeline is for batch/file-based voice processing. For real-time voice interactions (phone calls, live conversations), use the Realtime API instead.
Choose Voice Pipeline when you need:
Choose Realtime API when you need:
For complete API reference and implementation details, refer to the source code and examples in the repository.
Install with Tessl CLI
npx tessl i tessl/pypi-openai-agents