docs
tessl install tessl/pypi-pipecat-ai@0.0.0An open source framework for building real-time voice and multimodal conversational AI agents with support for speech-to-text, text-to-speech, LLMs, and multiple transport protocols
Pipecat is a Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video processing, speech recognition (STT), text-to-speech (TTS), large language models (LLMs), and transport protocols into composable pipelines.
Key Features:
Use Cases: Voice assistants, AI companions, multimodal interfaces, interactive storytelling, business agents, dialog systems.
# Core framework
pip install pipecat-ai
# With service dependencies
pip install "pipecat-ai[openai,anthropic,deepgram,daily]"Requirements: Python >= 3.10 (3.12 recommended), BSD-2-Clause license
{ .api }
# Essential imports
import pipecat
from pipecat.frames.frames import Frame, TextFrame, AudioRawFrame, EndFrame
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.base_transport import TransportParams{ .api }
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.frames.frames import TextFrame, EndFrame
class PrintProcessor(FrameProcessor):
async def process_frame(self, frame, direction):
if isinstance(frame, TextFrame):
print(f"Text: {frame.text}")
await self.push_frame(frame, direction)
async def main():
pipeline = Pipeline([PrintProcessor()])
task = PipelineTask(pipeline)
await task.queue_frame(TextFrame("Hello!"))
await task.queue_frame(EndFrame())
await task.run()
asyncio.run(main()){ .api }
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.services.openai import OpenAILLMService, OpenAITTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.daily import DailyTransport, DailyParams
# Configure services
transport = DailyTransport(room_url="...", token="...",
params=DailyParams(audio_in_enabled=True, audio_out_enabled=True))
stt = DeepgramSTTService(api_key="...")
llm = OpenAILLMService(api_key="...", model="gpt-4")
tts = OpenAITTSService(api_key="...", voice="alloy")
# Set context
context = LLMContext(messages=[{"role": "system", "content": "You are helpful."}])
llm.set_context(context)
# Build pipeline
pipeline = Pipeline([
transport.input(), stt, LLMUserAggregator(),
llm, tts, transport.output()
])See: Quick Start Guide →
All data flows as Frame objects with three priority levels:
116+ frame types available for audio, text, images, LLM context, and control.
Processors chain together processing frames sequentially or in parallel:
{ .api }
# Sequential
pipeline = Pipeline([proc1, proc2, proc3])
# Parallel
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
parallel = ParallelPipeline([proc_a, proc_b]){ .api }
@service.event_handler("on_function_call_start")
async def handler(function_name: str):
print(f"Calling: {function_name}")See: Core Concepts →
| Category | Key Types | Documentation |
|---|---|---|
| Audio | AudioRawFrame, InputAudioRawFrame, TTSAudioRawFrame | Audio Frames |
| Text | TextFrame, LLMTextFrame, TranscriptionFrame | Text Frames |
| LLM | LLMContextFrame, LLMMessagesAppendFrame, LLMRunFrame | LLM Context |
| Control | StartFrame, EndFrame, ErrorFrame, FatalErrorFrame | Control Frames |
| Interaction | UserStartedSpeakingFrame, BotStartedSpeakingFrame | Interaction |
| Image/Video | ImageRawFrame, VideoFrame | Image/Video |
| Type | Purpose | Documentation |
|---|---|---|
| Aggregators | LLM context, sentence/word aggregation | Core Processors |
| LLM Processors | LLMUserAggregator, LLMAssistantAggregator | LLM Processors |
| Text Processors | Transform, filter text frames | Core Processors |
| Integrations | Langchain, RTVI, Strands Agents | Integrations |
| Transcripts | Handle transcription data | Transcripts |
| Category | Providers | Documentation |
|---|---|---|
| LLM (20+) | OpenAI, Anthropic, Google, Azure, AWS, Groq, Mistral, Ollama | LLM Services |
| TTS (15+) | ElevenLabs, PlayHT, Cartesia, Deepgram, OpenAI, Azure, Google | TTS Services |
| STT (10+) | Deepgram, AssemblyAI, Gladia, Azure, Google, AWS, Whisper | STT Services |
| Realtime | OpenAI Realtime, Gemini Live, AWS Nova Sonic | Realtime Services |
| Vision | Moondream, LLM vision APIs | Vision Services |
| Type | Options | Documentation |
|---|---|---|
| WebRTC | Daily.co, LiveKit, SmallWebRTC | Transports |
| WebSocket | Server, Client, FastAPI | Transports |
| Local | Local audio I/O | Transports |
| Telephony | WhatsApp voice |
| Feature | Components | Documentation |
|---|---|---|
| VAD | SileroVADAnalyzer, VADParams | VAD |
| Turn Detection | LocalSmartTurnAnalyzerV3, SmartTurnParams | Turn Detection |
| Filters/Mixers | KrispFilter, SoundfileMixer, SoxrResampler | Filters/Mixers |
| DTMF | KeypadEntry, IVRNavigator | DTMF |
| Category | Components | Documentation |
|---|---|---|
| Observers | TaskObserver, MetricsLogObserver, LLMLogObserver | Observers |
| Extensions | IVRNavigator, VoicemailDetector | Extensions |
| Metrics | TTFBMetricsData, LLMUsageMetricsData | Metrics |
| Serializers | ProtobufFrameSerializer | Serializers |
| Helpers | TaskManager, BaseNotifier, Clock utilities | Helpers |
See: Voice Assistant Example →
See: Multimodal Example →
See: Telephony Example →
See: Function Calling Example →
{ .api }
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask, PipelineParams
pipeline = Pipeline([processor1, processor2, processor3])
task = PipelineTask(pipeline, params=PipelineParams(
allow_interruptions=True,
enable_metrics=True
))
await task.run(){ .api }
from pipecat.processors.aggregators.llm_context import (
LLMContext, LLMContextAggregatorPair
)
context = LLMContext(
messages=[{"role": "system", "content": "You are helpful."}],
tools=[...],
settings={"temperature": 0.7}
)
aggregators = LLMContextAggregatorPair(context=context){ .api }
# LLM
from pipecat.services.openai import OpenAILLMService
llm = OpenAILLMService(api_key="...", model="gpt-4")
# TTS
from pipecat.services.elevenlabs import ElevenLabsTTSService
tts = ElevenLabsTTSService(api_key="...", voice_id="...")
# STT
from pipecat.services.deepgram import DeepgramSTTService
stt = DeepgramSTTService(api_key="...", model="nova-2"){ .api }
from pipecat.transports.daily import DailyTransport, DailyParams
transport = DailyTransport(
room_url="https://daily.co/room",
token="...",
params=DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True
)
){ .api }
pipeline = Pipeline([
transport.input(), # Audio/video input
stt, # Speech to text
user_aggregator, # Aggregate user messages
llm, # Generate response
assistant_aggregator, # Aggregate bot messages
tts, # Text to speech
transport.output() # Audio/video output
]){ .api }
from pipecat.turns.user_turn_processor import UserTurnProcessor
pipeline = Pipeline([
transport.input(),
UserTurnProcessor(), # Manage turn-taking
stt,
user_aggregator,
llm,
tts,
transport.output()
]){ .api }
async def get_weather(location: str) -> dict:
return {"temp": 72, "condition": "sunny"}
llm.register_function(
name="get_weather",
handler=get_weather,
description="Get current weather",
properties={"location": {"type": "string"}}
)See: Common Patterns →