tessl/pypi-pipecat-ai

An open source framework for building real-time voice and multimodal conversational AI agents with support for speech-to-text, text-to-speech, LLMs, and multiple transport protocols

Overview

Eval results

Files

Pipecat-AI Framework

Name: tessl/pypi-pipecat-ai
Author: tessl

Overview

Pipecat is a Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video processing, speech recognition (STT), text-to-speech (TTS), large language models (LLMs), and transport protocols into composable pipelines.

Key Features:

Frame-based architecture for data flow
Real-time streaming with ultra-low latency
65+ AI service provider integrations
Composable processor pipeline design
Event-driven async processing

Use Cases: Voice assistants, AI companions, multimodal interfaces, interactive storytelling, business agents, dialog systems.

Installation

# Core framework
pip install pipecat-ai

# With service dependencies
pip install "pipecat-ai[openai,anthropic,deepgram,daily]"

Requirements: Python >= 3.10 (3.12 recommended), BSD-2-Clause license

Core Imports

{ .api }
# Essential imports
import pipecat
from pipecat.frames.frames import Frame, TextFrame, AudioRawFrame, EndFrame
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.base_transport import TransportParams

Quick Start

Minimal Pipeline

{ .api }
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.frames.frames import TextFrame, EndFrame

class PrintProcessor(FrameProcessor):
    async def process_frame(self, frame, direction):
        if isinstance(frame, TextFrame):
            print(f"Text: {frame.text}")
        await self.push_frame(frame, direction)

async def main():
    pipeline = Pipeline([PrintProcessor()])
    task = PipelineTask(pipeline)
    await task.queue_frame(TextFrame("Hello!"))
    await task.queue_frame(EndFrame())
    await task.run()

asyncio.run(main())

Voice Agent

{ .api }
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.services.openai import OpenAILLMService, OpenAITTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.daily import DailyTransport, DailyParams

    # Configure services
transport = DailyTransport(room_url="...", token="...", 
    params=DailyParams(audio_in_enabled=True, audio_out_enabled=True))
stt = DeepgramSTTService(api_key="...")
llm = OpenAILLMService(api_key="...", model="gpt-4")
tts = OpenAITTSService(api_key="...", voice="alloy")

# Set context
context = LLMContext(messages=[{"role": "system", "content": "You are helpful."}])
    llm.set_context(context)

    # Build pipeline
    pipeline = Pipeline([
    transport.input(), stt, LLMUserAggregator(),
    llm, tts, transport.output()
])

See: Quick Start Guide →

Core Concepts

Frame System

All data flows as Frame objects with three priority levels:

SystemFrame: High-priority, immediate processing
DataFrame: Normal priority, can be interrupted
ControlFrame: Control signals, can be interrupted

116+ frame types available for audio, text, images, LLM context, and control.

Pipeline Architecture

Processors chain together processing frames sequentially or in parallel:

{ .api }
# Sequential
pipeline = Pipeline([proc1, proc2, proc3])

# Parallel
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
parallel = ParallelPipeline([proc_a, proc_b])

Event-Driven Model

{ .api }
@service.event_handler("on_function_call_start")
async def handler(function_name: str):
    print(f"Calling: {function_name}")

See: Core Concepts →

Component Quick Reference

Frames

Category	Key Types	Documentation
Audio	`AudioRawFrame`, `InputAudioRawFrame`, `TTSAudioRawFrame`	Audio Frames
Text	`TextFrame`, `LLMTextFrame`, `TranscriptionFrame`	Text Frames
LLM	`LLMContextFrame`, `LLMMessagesAppendFrame`, `LLMRunFrame`	LLM Context
Control	`StartFrame`, `EndFrame`, `ErrorFrame`, `FatalErrorFrame`	Control Frames
Interaction	`UserStartedSpeakingFrame`, `BotStartedSpeakingFrame`	Interaction
Image/Video	`ImageRawFrame`, `VideoFrame`	Image/Video

Processors

Type	Purpose	Documentation
Aggregators	LLM context, sentence/word aggregation	Core Processors
LLM Processors	`LLMUserAggregator`, `LLMAssistantAggregator`	LLM Processors
Text Processors	Transform, filter text frames	Core Processors
Integrations	Langchain, RTVI, Strands Agents	Integrations
Transcripts	Handle transcription data	Transcripts

Services (40+ Providers)

Category	Providers	Documentation
LLM (20+)	OpenAI, Anthropic, Google, Azure, AWS, Groq, Mistral, Ollama	LLM Services
TTS (15+)	ElevenLabs, PlayHT, Cartesia, Deepgram, OpenAI, Azure, Google	TTS Services
STT (10+)	Deepgram, AssemblyAI, Gladia, Azure, Google, AWS, Whisper	STT Services
Realtime	OpenAI Realtime, Gemini Live, AWS Nova Sonic	Realtime Services
Vision	Moondream, LLM vision APIs	Vision Services

Transports

Type	Options	Documentation
WebRTC	Daily.co, LiveKit, SmallWebRTC	Transports
WebSocket	Server, Client, FastAPI	Transports
Local	Local audio I/O	Transports
Telephony	WhatsApp voice	WhatsApp

Audio Processing

Feature	Components	Documentation
VAD	SileroVADAnalyzer, VADParams	VAD
Turn Detection	LocalSmartTurnAnalyzerV3, SmartTurnParams	Turn Detection
Filters/Mixers	KrispFilter, SoundfileMixer, SoxrResampler	Filters/Mixers
DTMF	KeypadEntry, IVRNavigator	DTMF

Utilities

Category	Components	Documentation
Observers	TaskObserver, MetricsLogObserver, LLMLogObserver	Observers
Extensions	IVRNavigator, VoicemailDetector	Extensions
Metrics	TTFBMetricsData, LLMUsageMetricsData	Metrics
Serializers	ProtobufFrameSerializer	Serializers
Helpers	TaskManager, BaseNotifier, Clock utilities	Helpers

Documentation Structure

Getting Started

Quick Start Guide - Step-by-step setup and first pipeline
Core Concepts - Frame architecture and pipeline design
Pipeline Guide - Building and managing pipelines

Examples and Patterns

Real-World Scenarios - Complete application examples
Common Patterns - Reusable patterns and recipes
Edge Cases - Advanced scenarios and solutions

Reference Documentation

Complete API Reference - All classes and methods
Frame Reference - Complete frame catalog
Service Reference - All service providers
Configuration Reference - All configuration options
Error Handling - Error patterns and recovery

Specialized Topics

Turn Management - Conversational turn-taking system
Runner & Deployment - Production deployment framework
Audio Processing - VAD, turn detection, filters
Transports - Communication layers

By Component

Frames - All frame types (audio, text, LLM, control, interaction, image)
Processors - Frame processors (core, LLM, transcript, integrations)
Services - AI services (LLM, TTS, STT, vision, realtime, adapters)
Utilities - Helper components (observers, extensions, metrics, serializers)

Common Use Cases

Voice Assistant

Configure transport (Daily, LiveKit, or WebSocket)
Set up STT service (Deepgram, AssemblyAI)
Configure LLM (OpenAI, Anthropic, Google)
Add TTS service (ElevenLabs, OpenAI)
Build pipeline with aggregators
Run with PipelineRunner

See: Voice Assistant Example →

Multimodal Agent

Enable video in transport
Add vision service
Configure multimodal LLM
Handle image/video frames
Process visual context

See: Multimodal Example →

Telephony Bot

Configure FastAPI WebSocket transport
Detect telephony provider (Twilio, Telnyx, Plivo)
Add IVR navigation (optional)
Implement voicemail detection
Handle DTMF input

See: Telephony Example →

Function Calling Agent

Define async function handlers
Register functions with LLM
Configure LLMContext with tools
Handle FunctionCallResultFrame
Monitor with event handlers

See: Function Calling Example →

Key APIs

Pipeline Creation

{ .api }
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask, PipelineParams

pipeline = Pipeline([processor1, processor2, processor3])
task = PipelineTask(pipeline, params=PipelineParams(
    allow_interruptions=True,
    enable_metrics=True
))
await task.run()

LLM Context

{ .api }
from pipecat.processors.aggregators.llm_context import (
    LLMContext, LLMContextAggregatorPair
)

context = LLMContext(
    messages=[{"role": "system", "content": "You are helpful."}],
    tools=[...],
    settings={"temperature": 0.7}
)
aggregators = LLMContextAggregatorPair(context=context)

Service Configuration

{ .api }
# LLM
from pipecat.services.openai import OpenAILLMService
llm = OpenAILLMService(api_key="...", model="gpt-4")

# TTS
from pipecat.services.elevenlabs import ElevenLabsTTSService
tts = ElevenLabsTTSService(api_key="...", voice_id="...")

# STT
from pipecat.services.deepgram import DeepgramSTTService
stt = DeepgramSTTService(api_key="...", model="nova-2")

Transport Setup

{ .api }
from pipecat.transports.daily import DailyTransport, DailyParams

transport = DailyTransport(
    room_url="https://daily.co/room",
    token="...",
    params=DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
        vad_enabled=True
    )
)

Architecture Patterns

Basic Pattern: Input → Process → Output

{ .api }
pipeline = Pipeline([
    transport.input(),    # Audio/video input
    stt,                  # Speech to text
    user_aggregator,      # Aggregate user messages
    llm,                  # Generate response
    assistant_aggregator, # Aggregate bot messages
    tts,                  # Text to speech
    transport.output()    # Audio/video output
])

With Turn Management

{ .api }
from pipecat.turns.user_turn_processor import UserTurnProcessor

pipeline = Pipeline([
    transport.input(),
    UserTurnProcessor(),  # Manage turn-taking
    stt,
    user_aggregator,
    llm,
    tts,
    transport.output()
])

With Function Calling

{ .api }
async def get_weather(location: str) -> dict:
    return {"temp": 72, "condition": "sunny"}

llm.register_function(
    name="get_weather",
    handler=get_weather,
    description="Get current weather",
    properties={"location": {"type": "string"}}
)

See: Common Patterns →

Framework Statistics

442 Python files in framework
800+ classes and components
116+ frame types for data flow
40+ service providers (LLM, TTS, STT, vision, realtime)
Modular dependencies via pip extras

Version & License

Package: pipecat-ai
Version: 0.0.100
Python: >= 3.10 (3.12 recommended)
License: BSD-2-Clause

Resources

GitHub: pipecat-ai/pipecat
Documentation: docs.pipecat.ai
Examples: pipecat-ai/pipecat/examples
Discord: pipecat.ai/discord

Need Help?

Getting Started: Quick Start Guide
Examples: Real-World Scenarios
API Details: Complete API Reference
Troubleshooting: Error Handling
Configuration: Configuration Reference

Install with Tessl CLI

npx tessl i tessl/pypi-pipecat-ai

Workspace: tessl
Visibility: Public
Created: about 1 month ago
Last updated: about 1 month ago
Describes: pkg:pypi/pipecat-ai@0.0.x