0
# Realtime API
1
2
The Realtime API enables real-time audio/voice agent interactions with event-driven architecture. This allows building voice assistants, phone systems, and other real-time conversational AI applications.
3
4
## Overview
5
6
The Realtime API provides:
7
- Real-time audio streaming
8
- Voice activity detection
9
- Turn-based conversation management
10
- Function calling in real-time
11
- Integration with telephony systems (Twilio, etc.)
12
13
## Capabilities
14
15
### Realtime Agent
16
17
Agent configured for real-time audio interactions.
18
19
```python { .api }
20
class RealtimeAgent:
21
"""
22
Agent for real-time audio interactions.
23
24
Extends AgentBase with real-time specific configuration
25
for voice conversations.
26
"""
27
```
28
29
Usage example:
30
31
```python
32
from agents.realtime import RealtimeAgent
33
34
agent = RealtimeAgent(
35
name="Voice Assistant",
36
instructions="You are a helpful voice assistant."
37
)
38
```
39
40
### Realtime Runner
41
42
Runner for executing realtime agents.
43
44
```python { .api }
45
class RealtimeRunner:
46
"""Runner for realtime agents."""
47
48
@classmethod
49
async def run(
50
starting_agent: RealtimeAgent,
51
*,
52
session: RealtimeSession,
53
context = None,
54
hooks = None,
55
config = None
56
):
57
"""
58
Run realtime agent.
59
60
Parameters:
61
- starting_agent: Realtime agent to run
62
- session: Realtime session for audio
63
- context: Optional context object
64
- hooks: Lifecycle hooks
65
- config: Configuration options
66
67
Returns:
68
- Realtime result
69
"""
70
```
71
72
Usage example:
73
74
```python
75
from agents.realtime import RealtimeAgent, RealtimeRunner, RealtimeSession
76
77
agent = RealtimeAgent(
78
name="Voice Assistant",
79
instructions="Help users via voice."
80
)
81
82
session = RealtimeSession(...) # Configure session
83
84
result = await RealtimeRunner.run(
85
agent,
86
session=session
87
)
88
```
89
90
### Realtime Session
91
92
Session for managing real-time audio interactions.
93
94
```python { .api }
95
class RealtimeSession:
96
"""
97
Session for realtime interactions.
98
99
Manages audio streaming, turn detection,
100
and real-time events.
101
"""
102
```
103
104
## Realtime Events
105
106
Event types emitted during real-time execution (specific event types depend on the implementation).
107
108
## Integration Examples
109
110
### CLI Voice Assistant
111
112
Basic command-line voice assistant:
113
114
```python
115
# See examples/realtime/cli/demo.py in the repository
116
from agents.realtime import RealtimeAgent, RealtimeRunner
117
118
agent = RealtimeAgent(
119
name="CLI Assistant",
120
instructions="You are a voice assistant."
121
)
122
123
# Run with audio I/O
124
result = await RealtimeRunner.run(agent, session=cli_session)
125
```
126
127
### Twilio Phone Integration
128
129
Integration with Twilio for phone-based agents:
130
131
```python
132
# See examples/realtime/twilio/server.py in the repository
133
from agents.realtime import RealtimeAgent
134
135
phone_agent = RealtimeAgent(
136
name="Phone Assistant",
137
instructions="You are a phone assistant. Be concise."
138
)
139
140
# Integrate with Twilio websocket
141
# See repository examples for complete implementation
142
```
143
144
### Web Application
145
146
Real-time voice in web applications:
147
148
```python
149
# See examples/realtime/app/server.py in the repository
150
from agents.realtime import RealtimeAgent
151
152
web_agent = RealtimeAgent(
153
name="Web Assistant",
154
instructions="You are a web-based voice assistant."
155
)
156
157
# Connect to WebSocket for browser audio
158
# See repository examples for complete implementation
159
```
160
161
## Features
162
163
### Voice Activity Detection
164
165
Automatic detection of when user is speaking.
166
167
### Turn Management
168
169
Automatic turn-taking between user and agent.
170
171
### Function Calling
172
173
Real-time function/tool calling during conversation.
174
175
### Audio Streaming
176
177
Bidirectional audio streaming for low-latency interactions.
178
179
## Best Practices
180
181
1. **Concise Responses**: Voice responses should be shorter than text
182
2. **Turn Detection**: Configure appropriate VAD sensitivity
183
3. **Error Handling**: Handle audio connection drops gracefully
184
4. **Latency**: Minimize tool execution time for real-time feel
185
5. **Instructions**: Provide voice-specific instructions to agents
186
6. **Testing**: Test with real audio I/O, not just simulated
187
7. **Bandwidth**: Consider network bandwidth for audio quality
188
8. **Interruptions**: Handle user interruptions appropriately
189
190
## Examples Location
191
192
Complete working examples are available in the repository:
193
- `examples/realtime/cli/demo.py` - CLI voice assistant
194
- `examples/realtime/app/server.py` - Web application integration
195
- `examples/realtime/twilio/server.py` - Twilio phone integration
196
- `examples/realtime/twilio_sip/server.py` - Twilio SIP integration
197
198
Refer to these examples for complete implementation details.
199
200
## Note
201
202
The Realtime API is a specialized feature for voice applications. Most use cases should use the standard Agent and Runner classes. Use Realtime API when you specifically need:
203
- Real-time audio interaction
204
- Voice-based assistants
205
- Phone system integration
206
- Low-latency conversational AI
207
208
For complete API reference and implementation details, refer to the source code and examples in the repository.
209