or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pipecat-ai@0.0.x
tile.json

tessl/pypi-pipecat-ai

tessl install tessl/pypi-pipecat-ai@0.0.0

An open source framework for building real-time voice and multimodal conversational AI agents with support for speech-to-text, text-to-speech, LLMs, and multiple transport protocols

index.mddocs/

Pipecat-AI Framework

Overview

Pipecat is a Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video processing, speech recognition (STT), text-to-speech (TTS), large language models (LLMs), and transport protocols into composable pipelines.

Key Features:

  • Frame-based architecture for data flow
  • Real-time streaming with ultra-low latency
  • 65+ AI service provider integrations
  • Composable processor pipeline design
  • Event-driven async processing

Use Cases: Voice assistants, AI companions, multimodal interfaces, interactive storytelling, business agents, dialog systems.

Installation

# Core framework
pip install pipecat-ai

# With service dependencies
pip install "pipecat-ai[openai,anthropic,deepgram,daily]"

Requirements: Python >= 3.10 (3.12 recommended), BSD-2-Clause license

Core Imports

{ .api }
# Essential imports
import pipecat
from pipecat.frames.frames import Frame, TextFrame, AudioRawFrame, EndFrame
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.base_transport import TransportParams

Quick Start

Minimal Pipeline

{ .api }
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.frames.frames import TextFrame, EndFrame

class PrintProcessor(FrameProcessor):
    async def process_frame(self, frame, direction):
        if isinstance(frame, TextFrame):
            print(f"Text: {frame.text}")
        await self.push_frame(frame, direction)

async def main():
    pipeline = Pipeline([PrintProcessor()])
    task = PipelineTask(pipeline)
    await task.queue_frame(TextFrame("Hello!"))
    await task.queue_frame(EndFrame())
    await task.run()

asyncio.run(main())

Voice Agent

{ .api }
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.services.openai import OpenAILLMService, OpenAITTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.daily import DailyTransport, DailyParams

    # Configure services
transport = DailyTransport(room_url="...", token="...", 
    params=DailyParams(audio_in_enabled=True, audio_out_enabled=True))
stt = DeepgramSTTService(api_key="...")
llm = OpenAILLMService(api_key="...", model="gpt-4")
tts = OpenAITTSService(api_key="...", voice="alloy")

# Set context
context = LLMContext(messages=[{"role": "system", "content": "You are helpful."}])
    llm.set_context(context)

    # Build pipeline
    pipeline = Pipeline([
    transport.input(), stt, LLMUserAggregator(),
    llm, tts, transport.output()
])

See: Quick Start Guide →

Core Concepts

Frame System

All data flows as Frame objects with three priority levels:

  • SystemFrame: High-priority, immediate processing
  • DataFrame: Normal priority, can be interrupted
  • ControlFrame: Control signals, can be interrupted

116+ frame types available for audio, text, images, LLM context, and control.

Pipeline Architecture

Processors chain together processing frames sequentially or in parallel:

{ .api }
# Sequential
pipeline = Pipeline([proc1, proc2, proc3])

# Parallel
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
parallel = ParallelPipeline([proc_a, proc_b])

Event-Driven Model

{ .api }
@service.event_handler("on_function_call_start")
async def handler(function_name: str):
    print(f"Calling: {function_name}")

See: Core Concepts →

Component Quick Reference

Frames

CategoryKey TypesDocumentation
AudioAudioRawFrame, InputAudioRawFrame, TTSAudioRawFrameAudio Frames
TextTextFrame, LLMTextFrame, TranscriptionFrameText Frames
LLMLLMContextFrame, LLMMessagesAppendFrame, LLMRunFrameLLM Context
ControlStartFrame, EndFrame, ErrorFrame, FatalErrorFrameControl Frames
InteractionUserStartedSpeakingFrame, BotStartedSpeakingFrameInteraction
Image/VideoImageRawFrame, VideoFrameImage/Video

Processors

TypePurposeDocumentation
AggregatorsLLM context, sentence/word aggregationCore Processors
LLM ProcessorsLLMUserAggregator, LLMAssistantAggregatorLLM Processors
Text ProcessorsTransform, filter text framesCore Processors
IntegrationsLangchain, RTVI, Strands AgentsIntegrations
TranscriptsHandle transcription dataTranscripts

Services (40+ Providers)

CategoryProvidersDocumentation
LLM (20+)OpenAI, Anthropic, Google, Azure, AWS, Groq, Mistral, OllamaLLM Services
TTS (15+)ElevenLabs, PlayHT, Cartesia, Deepgram, OpenAI, Azure, GoogleTTS Services
STT (10+)Deepgram, AssemblyAI, Gladia, Azure, Google, AWS, WhisperSTT Services
RealtimeOpenAI Realtime, Gemini Live, AWS Nova SonicRealtime Services
VisionMoondream, LLM vision APIsVision Services

Transports

TypeOptionsDocumentation
WebRTCDaily.co, LiveKit, SmallWebRTCTransports
WebSocketServer, Client, FastAPITransports
LocalLocal audio I/OTransports
TelephonyWhatsApp voiceWhatsApp

Audio Processing

FeatureComponentsDocumentation
VADSileroVADAnalyzer, VADParamsVAD
Turn DetectionLocalSmartTurnAnalyzerV3, SmartTurnParamsTurn Detection
Filters/MixersKrispFilter, SoundfileMixer, SoxrResamplerFilters/Mixers
DTMFKeypadEntry, IVRNavigatorDTMF

Utilities

CategoryComponentsDocumentation
ObserversTaskObserver, MetricsLogObserver, LLMLogObserverObservers
ExtensionsIVRNavigator, VoicemailDetectorExtensions
MetricsTTFBMetricsData, LLMUsageMetricsDataMetrics
SerializersProtobufFrameSerializerSerializers
HelpersTaskManager, BaseNotifier, Clock utilitiesHelpers

Documentation Structure

Getting Started

Examples and Patterns

Reference Documentation

Specialized Topics

By Component

  • Frames - All frame types (audio, text, LLM, control, interaction, image)
  • Processors - Frame processors (core, LLM, transcript, integrations)
  • Services - AI services (LLM, TTS, STT, vision, realtime, adapters)
  • Utilities - Helper components (observers, extensions, metrics, serializers)

Common Use Cases

Voice Assistant

  1. Configure transport (Daily, LiveKit, or WebSocket)
  2. Set up STT service (Deepgram, AssemblyAI)
  3. Configure LLM (OpenAI, Anthropic, Google)
  4. Add TTS service (ElevenLabs, OpenAI)
  5. Build pipeline with aggregators
  6. Run with PipelineRunner

See: Voice Assistant Example →

Multimodal Agent

  1. Enable video in transport
  2. Add vision service
  3. Configure multimodal LLM
  4. Handle image/video frames
  5. Process visual context

See: Multimodal Example →

Telephony Bot

  1. Configure FastAPI WebSocket transport
  2. Detect telephony provider (Twilio, Telnyx, Plivo)
  3. Add IVR navigation (optional)
  4. Implement voicemail detection
  5. Handle DTMF input

See: Telephony Example →

Function Calling Agent

  1. Define async function handlers
  2. Register functions with LLM
  3. Configure LLMContext with tools
  4. Handle FunctionCallResultFrame
  5. Monitor with event handlers

See: Function Calling Example →

Key APIs

Pipeline Creation

{ .api }
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask, PipelineParams

pipeline = Pipeline([processor1, processor2, processor3])
task = PipelineTask(pipeline, params=PipelineParams(
    allow_interruptions=True,
    enable_metrics=True
))
await task.run()

LLM Context

{ .api }
from pipecat.processors.aggregators.llm_context import (
    LLMContext, LLMContextAggregatorPair
)

context = LLMContext(
    messages=[{"role": "system", "content": "You are helpful."}],
    tools=[...],
    settings={"temperature": 0.7}
)
aggregators = LLMContextAggregatorPair(context=context)

Service Configuration

{ .api }
# LLM
from pipecat.services.openai import OpenAILLMService
llm = OpenAILLMService(api_key="...", model="gpt-4")

# TTS
from pipecat.services.elevenlabs import ElevenLabsTTSService
tts = ElevenLabsTTSService(api_key="...", voice_id="...")

# STT
from pipecat.services.deepgram import DeepgramSTTService
stt = DeepgramSTTService(api_key="...", model="nova-2")

Transport Setup

{ .api }
from pipecat.transports.daily import DailyTransport, DailyParams

transport = DailyTransport(
    room_url="https://daily.co/room",
    token="...",
    params=DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
        vad_enabled=True
    )
)

Architecture Patterns

Basic Pattern: Input → Process → Output

{ .api }
pipeline = Pipeline([
    transport.input(),    # Audio/video input
    stt,                  # Speech to text
    user_aggregator,      # Aggregate user messages
    llm,                  # Generate response
    assistant_aggregator, # Aggregate bot messages
    tts,                  # Text to speech
    transport.output()    # Audio/video output
])

With Turn Management

{ .api }
from pipecat.turns.user_turn_processor import UserTurnProcessor

pipeline = Pipeline([
    transport.input(),
    UserTurnProcessor(),  # Manage turn-taking
    stt,
    user_aggregator,
    llm,
    tts,
    transport.output()
])

With Function Calling

{ .api }
async def get_weather(location: str) -> dict:
    return {"temp": 72, "condition": "sunny"}

llm.register_function(
    name="get_weather",
    handler=get_weather,
    description="Get current weather",
    properties={"location": {"type": "string"}}
)

See: Common Patterns →

Framework Statistics

  • 442 Python files in framework
  • 800+ classes and components
  • 116+ frame types for data flow
  • 40+ service providers (LLM, TTS, STT, vision, realtime)
  • Modular dependencies via pip extras

Version & License

  • Package: pipecat-ai
  • Version: 0.0.100
  • Python: >= 3.10 (3.12 recommended)
  • License: BSD-2-Clause

Resources

  • GitHub: pipecat-ai/pipecat
  • Documentation: docs.pipecat.ai
  • Examples: pipecat-ai/pipecat/examples
  • Discord: pipecat.ai/discord

Need Help?