or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

audio

audio-processing.mdrealtime-transcription.mdspeech-to-speech.mdspeech-to-text.mdtext-to-speech.md
index.md
tile.json

tessl/npm-elevenlabs--elevenlabs-js

Official Node.js SDK for ElevenLabs text-to-speech API with voice synthesis, real-time transcription, music generation, and conversational AI

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@elevenlabs/elevenlabs-js@2.30.x

To install, run

npx @tessl/cli install tessl/npm-elevenlabs--elevenlabs-js@2.30.0

index.mddocs/

ElevenLabs SDK

The official Node.js SDK for the ElevenLabs API, providing comprehensive access to text-to-speech, voice management, music generation, real-time transcription, conversational AI, and more. Built with TypeScript for full type safety and supporting multiple JavaScript runtimes.

Package Information

  • Package Name: @elevenlabs/elevenlabs-js
  • Package Type: npm
  • Language: TypeScript/JavaScript
  • Installation: npm install @elevenlabs/elevenlabs-js
  • Supported Runtimes: Node.js 18+, Vercel, Cloudflare Workers, Deno v1.25+, Bun 1.0+
  • GitHub: https://github.com/elevenlabs/elevenlabs-js

Core Imports

Main Client and Utilities

import {
  ElevenLabsClient,

  // Enhanced wrapper classes
  Music,        // Advanced music generation with metadata parsing
  SpeechToText, // Speech-to-text with realtime WebSocket support

  // Error classes
  ElevenLabsError,
  ElevenLabsTimeoutError,

  // Environment configuration
  ElevenLabsEnvironment,

  // Utility functions (Node.js only)
  play,
  stream
} from "@elevenlabs/elevenlabs-js";

Note: The WebhooksClient wrapper is automatically used when accessing client.webhooks and provides enhanced functionality including HMAC-SHA256 signature verification via constructEvent().

Wrapper-Specific Types

The SDK exports enhanced types and classes for specific functionality:

import {
  // Music generation wrapper and types
  Music,                // Enhanced music generation client class
  type SongMetadata,    // Music composition metadata interface
  type MultipartResponse, // Multipart music response with metadata

  // Speech-to-text wrapper and real-time transcription
  SpeechToText,         // Enhanced STT client with WebSocket support
  RealtimeConnection,   // WebSocket connection manager for real-time STT
  RealtimeEvents,       // Real-time transcription event types
  AudioFormat,          // Audio format enum (PCM_16000, PCM_22050, etc.)
  CommitStrategy,       // Commit strategy enum (VAD, MANUAL)
  type AudioOptions,    // Audio configuration options for real-time STT
  type UrlOptions,      // URL configuration options for real-time STT
} from "@elevenlabs/elevenlabs-js";

Note: Most API request/response types are available under the ElevenLabs namespace (see below). The types listed above are wrapper-specific enhancements.

API Types via ElevenLabs Namespace

Most API request/response types are available under the ElevenLabs namespace:

import { ElevenLabs } from "@elevenlabs/elevenlabs-js";

// Use types from the namespace
type Voice = ElevenLabs.Voice;
type Model = ElevenLabs.Model;
type MusicPrompt = ElevenLabs.MusicPrompt;
type TextRequest = ElevenLabs.BodyTextToSpeechFull;

Quick Start

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

// Initialize client
const client = new ElevenLabsClient({
  apiKey: "your-api-key", // or use ELEVENLABS_API_KEY env var
});

// Convert text to speech
const audio = await client.textToSpeech.convert("voice-id", {
  text: "Hello world!",
});

// Stream the audio
for await (const chunk of audio) {
  // Process audio chunks
}

Core Documentation

Client & Configuration

  • Client Setup - Initialize client, environments, configuration options
  • Error Handling - Error classes, handling patterns, troubleshooting
  • Request Options - Timeouts, retries, headers, abort signals
  • Response Handling - HttpResponsePromise, raw responses, streaming
  • Common Types - Shared types, interfaces, enums used across the SDK

Audio Capabilities

Voice Management

Audio Generation

Conversational AI

Content Production

Management & Analytics

Integration & Security

  • Webhooks - Webhook management with signature verification
  • Tokens - Single-use tokens for frontend
  • Service Accounts - Manage service accounts and API keys

Utilities

API Quick Reference

Note: This quick reference shows top-level API methods. Many clients have nested sub-resources (e.g., client.voices.samples, client.conversationalAi.analytics.liveCount) which are fully documented in their respective capability sections below.

Client Initialization

new ElevenLabsClient(options?: {
  apiKey?: string;
  environment?: string; // Production, ProductionUs, ProductionEu, ProductionIndia
  baseUrl?: string;
  headers?: Record<string, string | Supplier<string>>;
  timeoutInSeconds?: number; // default: 240
  maxRetries?: number; // default: 2
  fetch?: typeof fetch;
  logging?: LogConfig | ILogger;
});

Primary API Endpoints

// Text-to-Speech
client.textToSpeech.convert(voice_id, request) → ReadableStream<Uint8Array>
client.textToSpeech.stream(voice_id, request) → ReadableStream<Uint8Array>
client.textToSpeech.convertWithTimestamps(voice_id, request) → AudioWithTimestampsResponse
client.textToSpeech.streamWithTimestamps(voice_id, request) → Stream<StreamingAudioChunkWithTimestampsResponse>

// Speech-to-Text
client.speechToText.convert(request) → SpeechToTextConvertResponse
client.speechToText.transcripts.get(transcript_id) → TranscriptResponse
client.speechToText.transcripts.delete(transcription_id) → unknown

// Real-time Transcription (Node.js only)
client.speechToText.realtime.connect(options) → Promise<RealtimeConnection>

// Speech-to-Speech
client.speechToSpeech.convert(voice_id, request) → ReadableStream<Uint8Array>

// Voice Management
client.voices.getAll(request?) → GetVoicesResponse
client.voices.search(request?) → GetVoicesV2Response
client.voices.get(voice_id, request?) → Voice
client.voices.update(voice_id, request) → EditVoiceResponseModel
client.voices.delete(voice_id) → DeleteVoiceResponseModel
client.voices.share(public_user_id, voice_id, request) → AddVoiceResponseModel
client.voices.getShared(request?) → GetLibraryVoicesResponse
client.voices.findSimilarVoices(request) → GetLibraryVoicesResponse

// Voice Settings
client.voices.settings.getDefault() → VoiceSettings
client.voices.settings.get(voice_id) → VoiceSettings
client.voices.settings.update(voice_id, request) → EditVoiceSettingsResponseModel

// Voice Cloning
client.voices.ivc.create(request) → AddVoiceIvcResponseModel
client.voices.pvc.create(request) → AddVoiceResponseModel
client.voices.pvc.update(voice_id, request?) → AddVoiceResponseModel
client.voices.pvc.train(voice_id, request?) → StartPvcVoiceTrainingResponseModel
client.voices.pvc.samples.create(voice_id, request) → VoiceSample[]
client.voices.pvc.samples.update(voice_id, sample_id, request?) → AddVoiceResponseModel
client.voices.pvc.samples.delete(voice_id, sample_id) → DeleteVoiceSampleResponseModel

// Voice Design
client.textToVoice.design(request) → VoiceDesignPreviewResponse
client.saveAVoicePreview() → void

// Sample Management
client.samples.delete(voice_id, sample_id) → DeleteSampleResponse

// Music Generation
client.music.compose(request?) → ReadableStream<Uint8Array>
client.music.composeDetailed(request?) → MultipartResponse
client.music.stream(request?) → ReadableStream<Uint8Array>
client.music.separateStems(request) → ReadableStream<Uint8Array>
client.music.compositionPlan.create(request) → MusicPrompt

// Sound Effects
client.textToSoundEffects.convert(request) → ReadableStream<Uint8Array>

// Text-to-Dialogue
client.textToDialogue.convert(request) → ReadableStream<Uint8Array>

// Audio Processing
client.audioIsolation.convert(request) → ReadableStream<Uint8Array>

// Conversational AI - Agents
client.conversationalAi.agents.create(request) → CreateAgentResponseModel
client.conversationalAi.agents.list(request?) → GetAgentsPageResponseModel
client.conversationalAi.agents.get(agent_id) → GetAgentResponseModel
client.conversationalAi.agents.update(agent_id, request) → GetAgentResponseModel
client.conversationalAi.agents.delete(agent_id) → void
client.conversationalAi.agents.duplicate(agent_id, request) → CreateAgentResponseModel
client.conversationalAi.agents.widget.get(agent_id, request?) → GetAgentEmbedResponseModel
client.conversationalAi.agents.widget.avatar.create(agent_id, request) → PostAgentAvatarResponseModel
client.conversationalAi.agents.knowledgeBase.size(agent_id) → GetAgentKnowledgebaseSizeResponseModel
client.conversationalAi.agents.link.get(agent_id) → GetAgentLinkResponseModel
client.conversationalAi.agents.llmUsage.calculate(agent_id, request?) → GetAgentLlmUsageCalculationResponseModel

// Conversational AI - Knowledge Base
client.conversationalAi.addToKnowledgeBase(request) → AddKnowledgeBaseResponseModel
client.conversationalAi.knowledgeBase.list(request?) → GetKnowledgeBaseListResponseModel
client.conversationalAi.knowledgeBase.documents.get(documentation_id, request?) → DocumentsGetResponse
client.conversationalAi.knowledgeBase.documents.delete(documentation_id, request?) → unknown

// Conversational AI - Tools
client.conversationalAi.tools.create(request) → CreateToolResponseModel
client.conversationalAi.tools.list() → GetToolsResponseModel
client.conversationalAi.tools.get(tool_id) → ToolResponseModel
client.conversationalAi.tools.update(tool_id, request) → void
client.conversationalAi.tools.delete(tool_id) → void

// Conversational AI - Conversations
client.conversationalAi.conversations.list(request?) → GetConversationsPageResponseModel
client.conversationalAi.conversations.get(conversation_id) → GetConversationResponseModel
client.conversationalAi.conversations.delete(conversation_id) → unknown
client.conversationalAi.conversations.audio.get(conversation_id) → ReadableStream<Uint8Array>

// Conversational AI - Phone
client.conversationalAi.phoneNumbers.list() → PhoneNumbersListResponseItem[]
client.conversationalAi.batchCalls.create(request) → SubmitBatchCallResponseModel
client.conversationalAi.batchCalls.list(request?) → WorkspaceBatchCallsResponse
client.conversationalAi.batchCalls.cancel(batch_id) → BatchCallResponse
client.conversationalAi.twilio.outboundCall(request) → TwilioOutboundCallResponse
client.conversationalAi.sipTrunk.outboundCall(request) → SipTrunkOutboundCallResponse

// Conversational AI - WhatsApp
client.conversationalAi.whatsappAccounts.list() → GetWhatsappAccountsResponseModel
client.conversationalAi.whatsappAccounts.import(request) → ImportWhatsAppAccountResponse

// Conversational AI - MCP Servers
client.conversationalAi.mcpServers.list() → McpServersResponseModel
client.conversationalAi.mcpServers.create(request) → McpServerResponseModel
client.conversationalAi.mcpServers.update(mcp_server_id, request?) → McpServerResponseModel
client.conversationalAi.mcpServers.delete(mcp_server_id) → unknown

// Dubbing
client.dubbing.create(request) → DoDubbingResponse
client.dubbing.get(dubbing_id) → DubbingMetadataResponse
client.dubbing.delete(dubbing_id) → void

// Studio
client.studio.createPodcast(request) → PodcastProjectResponseModel

// Audio Native
client.audioNative.create(request) → AudioNativeCreateProjectResponseModel

// Forced Alignment
client.forcedAlignment.create(request) → ForcedAlignmentResponseModel

// History
client.history.list(request?) → GetSpeechHistoryResponse
client.history.get(history_item_id) → SpeechHistoryItemResponse
client.history.delete(history_item_id) → void
client.history.download(history_item_id) → ReadableStream<Uint8Array>

// Usage
client.usage.get(request) → UsageCharactersResponseModel

// User
client.user.get() → User
client.user.subscription.get() → GetSubscriptionResponseModel

// Workspace
client.workspace.members.update(request) → UpdateWorkspaceMemberResponseModel
client.workspace.invites.create(request) → AddWorkspaceInviteResponseModel
client.workspace.invites.createBatch(request) → AddWorkspaceInviteResponseModel
client.workspace.invites.delete(invite_id) → string
client.workspace.groups.search(request) → WorkspaceGroupByNameResponseModel[]
client.workspace.groups.members.add(group_id, request) → AddWorkspaceGroupMemberResponseModel
client.workspace.groups.members.remove(group_id, request) → DeleteWorkspaceGroupMemberResponseModel
client.workspace.resources.get(resource_id, request) → ResourceMetadataResponseModel
client.workspace.resources.share(resource_id, request) → unknown
client.workspace.resources.unshare(resource_id, request) → unknown
client.workspace.resources.copyToWorkspace(resource_id, request) → unknown

// Webhooks
client.webhooks.create(request) → WorkspaceCreateWebhookResponseModel
client.webhooks.list() → WorkspaceWebhooksListResponseModel
client.webhooks.update(webhook_id, request) → WorkspaceUpdateWebhookResponseModel
client.webhooks.delete(webhook_id) → void
client.webhooks.constructEvent(rawBody, sigHeader, secret) → Promise<any>

// Pronunciation Dictionaries
client.pronunciationDictionaries.createFromRules(request) → AddPronunciationDictionaryResponseModel
client.pronunciationDictionaries.createFromFile(request) → AddPronunciationDictionaryResponseModel
client.pronunciationDictionaries.list(request?) → GetPronunciationDictionariesMetadataResponseModel

// Models
client.models.list() → Model[]

// Tokens
client.tokens.singleUse.create(token_type) → SingleUseTokenResponseModel

// Service Accounts
client.serviceAccounts.list() → WorkspaceServiceAccountListResponseModel

Common Patterns

Streaming Audio

All audio generation methods return ReadableStream<Uint8Array> that can be iterated:

const audio = await client.textToSpeech.convert(voiceId, { text: "Hello" });

for await (const chunk of audio) {
  // Process audio chunk (Uint8Array)
}

Error Handling

import { ElevenLabs, ElevenLabsError, ElevenLabsTimeoutError } from "@elevenlabs/elevenlabs-js";

try {
  const audio = await client.textToSpeech.convert(voiceId, { text: "Hello" });
} catch (error) {
  if (error instanceof ElevenLabsTimeoutError) {
    console.error('Request timed out');
  } else if (error instanceof ElevenLabs.UnauthorizedError) {
    console.error('Invalid API key');
  } else if (error instanceof ElevenLabs.UnprocessableEntityError) {
    console.error('Validation failed:', error.body);
  } else if (error instanceof ElevenLabsError) {
    console.error(`API Error ${error.statusCode}:`, error.body);
  }
}

Request Options

All methods accept optional request-specific configuration:

const audio = await client.textToSpeech.convert(
  voiceId,
  { text: "Hello" },
  {
    timeoutInSeconds: 300,
    maxRetries: 3,
    abortSignal: controller.signal,
    apiKey: "override-key",
  }
);

Raw Responses

Access raw HTTP responses with withRawResponse():

const { data, rawResponse } = await client.textToSpeech
  .convert(voiceId, request)
  .withRawResponse();

console.log('Status:', rawResponse.status);
console.log('Headers:', rawResponse.headers);

WebSocket Real-time (Node.js only)

const connection = await client.speechToText.realtime.connect({
  apiKey: "your-api-key",
  format: AudioFormat.PCM_16000,
  strategy: CommitStrategy.VAD,
});

connection.on("transcript", (transcript) => {
  console.log(transcript.text);
});

// Send audio data
connection.send(audioBuffer);

// Close connection
connection.close();

SDK Architecture

The SDK uses a hybrid architecture combining auto-generated API clients with enhanced wrapper classes:

  • Base API Clients: Auto-generated clients for all API endpoints providing type-safe method signatures
  • Enhanced Wrappers: Some clients have additional capabilities beyond the base API:
    • client.music - Automatic multipart response parsing for detailed metadata
    • client.speechToText - Real-time WebSocket transcription support
    • client.webhooks - HMAC-SHA256 signature verification

Wrapper classes are transparent - they provide the same methods as base clients plus additional features. See individual capability documentation for wrapper-specific features.

Type System

All types are available under the ElevenLabs namespace:

import { ElevenLabs } from "@elevenlabs/elevenlabs-js";

type TextRequest = ElevenLabs.BodyTextToSpeechFull;
type Voice = ElevenLabs.Voice;
type Model = ElevenLabs.Model;

For detailed type definitions, see Common Types.

Environment-Specific Features

Node.js Only

  • Real-time WebSocket transcription (client.speechToText.realtime)
  • Audio playback utilities (play(), stream())
  • Beta Conversational AI SDK (Conversation, ClientTools, AudioInterface) - See Beta SDK

All Runtimes

  • All other SDK features work across Node.js, Vercel, Cloudflare Workers, Deno, and Bun

Advanced Features

  • Context Continuity: Use previousText, nextText, previousRequestIds, nextRequestIds for better prosody
  • Pronunciation Dictionaries: Apply custom pronunciation rules to TTS (max 3 per request)
  • Voice Settings: Fine-tune stability, similarity boost, style, speaker boost, and speed
  • Latency Optimization: Use optimizeStreamingLatency (0-4) for TTS
  • Output Formats: Multiple audio formats for TTS and STT
  • Multi-channel Audio: STT supports multi-channel transcription with per-channel timestamps
  • Webhooks: Async notifications with HMAC-SHA256 signature verification
  • RAG: Retrieval Augmented Generation for conversational AI agents
  • Custom LLMs: Use custom LLM endpoints with conversational AI

Next Steps

  • See Client Setup for detailed initialization options
  • Explore capability-specific documentation in the sections above
  • Review Common Types for shared type definitions
  • Check Error Handling for comprehensive error management