CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-elevenlabs--elevenlabs-js

Official Node.js SDK for ElevenLabs text-to-speech API with voice synthesis, real-time transcription, music generation, and conversational AI

86

1.06x
Quality

Pending

Does it follow best practices?

Impact

86%

1.06x

Average score across 20 eval scenarios

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

ElevenLabs SDK

The official Node.js SDK for the ElevenLabs API, providing comprehensive access to text-to-speech, voice management, music generation, real-time transcription, conversational AI, and more. Built with TypeScript for full type safety and supporting multiple JavaScript runtimes.

Package Information

  • Package Name: @elevenlabs/elevenlabs-js
  • Package Type: npm
  • Language: TypeScript/JavaScript
  • Installation: npm install @elevenlabs/elevenlabs-js
  • Supported Runtimes: Node.js 18+, Vercel, Cloudflare Workers, Deno v1.25+, Bun 1.0+
  • GitHub: https://github.com/elevenlabs/elevenlabs-js

Core Imports

Main Client and Utilities

import {
  ElevenLabsClient,

  // Enhanced wrapper classes
  Music,        // Advanced music generation with metadata parsing
  SpeechToText, // Speech-to-text with realtime WebSocket support

  // Error classes
  ElevenLabsError,
  ElevenLabsTimeoutError,

  // Environment configuration
  ElevenLabsEnvironment,

  // Utility functions (Node.js only)
  play,
  stream
} from "@elevenlabs/elevenlabs-js";

Note: The WebhooksClient wrapper is automatically used when accessing client.webhooks and provides enhanced functionality including HMAC-SHA256 signature verification via constructEvent().

Wrapper-Specific Types

The SDK exports enhanced types and classes for specific functionality:

import {
  // Music generation wrapper and types
  Music,                // Enhanced music generation client class
  type SongMetadata,    // Music composition metadata interface
  type MultipartResponse, // Multipart music response with metadata

  // Speech-to-text wrapper and real-time transcription
  SpeechToText,         // Enhanced STT client with WebSocket support
  RealtimeConnection,   // WebSocket connection manager for real-time STT
  RealtimeEvents,       // Real-time transcription event types
  AudioFormat,          // Audio format enum (PCM_16000, PCM_22050, etc.)
  CommitStrategy,       // Commit strategy enum (VAD, MANUAL)
  type AudioOptions,    // Audio configuration options for real-time STT
  type UrlOptions,      // URL configuration options for real-time STT
} from "@elevenlabs/elevenlabs-js";

Note: Most API request/response types are available under the ElevenLabs namespace (see below). The types listed above are wrapper-specific enhancements.

API Types via ElevenLabs Namespace

Most API request/response types are available under the ElevenLabs namespace:

import { ElevenLabs } from "@elevenlabs/elevenlabs-js";

// Use types from the namespace
type Voice = ElevenLabs.Voice;
type Model = ElevenLabs.Model;
type MusicPrompt = ElevenLabs.MusicPrompt;
type TextRequest = ElevenLabs.BodyTextToSpeechFull;

Quick Start

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

// Initialize client
const client = new ElevenLabsClient({
  apiKey: "your-api-key", // or use ELEVENLABS_API_KEY env var
});

// Convert text to speech
const audio = await client.textToSpeech.convert("voice-id", {
  text: "Hello world!",
});

// Stream the audio
for await (const chunk of audio) {
  // Process audio chunks
}

Core Documentation

Client & Configuration

  • Client Setup - Initialize client, environments, configuration options
  • Error Handling - Error classes, handling patterns, troubleshooting
  • Request Options - Timeouts, retries, headers, abort signals
  • Response Handling - HttpResponsePromise, raw responses, streaming
  • Common Types - Shared types, interfaces, enums used across the SDK

Audio Capabilities

Voice Management

Audio Generation

Conversational AI

Content Production

Management & Analytics

Integration & Security

  • Webhooks - Webhook management with signature verification
  • Tokens - Single-use tokens for frontend
  • Service Accounts - Manage service accounts and API keys

Utilities

API Quick Reference

Note: This quick reference shows top-level API methods. Many clients have nested sub-resources (e.g., client.voices.samples, client.conversationalAi.analytics.liveCount) which are fully documented in their respective capability sections below.

Client Initialization

new ElevenLabsClient(options?: {
  apiKey?: string;
  environment?: string; // Production, ProductionUs, ProductionEu, ProductionIndia
  baseUrl?: string;
  headers?: Record<string, string | Supplier<string>>;
  timeoutInSeconds?: number; // default: 240
  maxRetries?: number; // default: 2
  fetch?: typeof fetch;
  logging?: LogConfig | ILogger;
});

Primary API Endpoints

// Text-to-Speech
client.textToSpeech.convert(voice_id, request) → ReadableStream<Uint8Array>
client.textToSpeech.stream(voice_id, request) → ReadableStream<Uint8Array>
client.textToSpeech.convertWithTimestamps(voice_id, request) → AudioWithTimestampsResponse
client.textToSpeech.streamWithTimestamps(voice_id, request) → Stream<StreamingAudioChunkWithTimestampsResponse>

// Speech-to-Text
client.speechToText.convert(request) → SpeechToTextConvertResponse
client.speechToText.transcripts.get(transcript_id) → TranscriptResponse
client.speechToText.transcripts.delete(transcription_id) → unknown

// Real-time Transcription (Node.js only)
client.speechToText.realtime.connect(options) → Promise<RealtimeConnection>

// Speech-to-Speech
client.speechToSpeech.convert(voice_id, request) → ReadableStream<Uint8Array>

// Voice Management
client.voices.getAll(request?) → GetVoicesResponse
client.voices.search(request?) → GetVoicesV2Response
client.voices.get(voice_id, request?) → Voice
client.voices.update(voice_id, request) → EditVoiceResponseModel
client.voices.delete(voice_id) → DeleteVoiceResponseModel
client.voices.share(public_user_id, voice_id, request) → AddVoiceResponseModel
client.voices.getShared(request?) → GetLibraryVoicesResponse
client.voices.findSimilarVoices(request) → GetLibraryVoicesResponse

// Voice Settings
client.voices.settings.getDefault() → VoiceSettings
client.voices.settings.get(voice_id) → VoiceSettings
client.voices.settings.update(voice_id, request) → EditVoiceSettingsResponseModel

// Voice Cloning
client.voices.ivc.create(request) → AddVoiceIvcResponseModel
client.voices.pvc.create(request) → AddVoiceResponseModel
client.voices.pvc.update(voice_id, request?) → AddVoiceResponseModel
client.voices.pvc.train(voice_id, request?) → StartPvcVoiceTrainingResponseModel
client.voices.pvc.samples.create(voice_id, request) → VoiceSample[]
client.voices.pvc.samples.update(voice_id, sample_id, request?) → AddVoiceResponseModel
client.voices.pvc.samples.delete(voice_id, sample_id) → DeleteVoiceSampleResponseModel

// Voice Design
client.textToVoice.design(request) → VoiceDesignPreviewResponse
client.saveAVoicePreview() → void

// Sample Management
client.samples.delete(voice_id, sample_id) → DeleteSampleResponse

// Music Generation
client.music.compose(request?) → ReadableStream<Uint8Array>
client.music.composeDetailed(request?) → MultipartResponse
client.music.stream(request?) → ReadableStream<Uint8Array>
client.music.separateStems(request) → ReadableStream<Uint8Array>
client.music.compositionPlan.create(request) → MusicPrompt

// Sound Effects
client.textToSoundEffects.convert(request) → ReadableStream<Uint8Array>

// Text-to-Dialogue
client.textToDialogue.convert(request) → ReadableStream<Uint8Array>

// Audio Processing
client.audioIsolation.convert(request) → ReadableStream<Uint8Array>

// Conversational AI - Agents
client.conversationalAi.agents.create(request) → CreateAgentResponseModel
client.conversationalAi.agents.list(request?) → GetAgentsPageResponseModel
client.conversationalAi.agents.get(agent_id) → GetAgentResponseModel
client.conversationalAi.agents.update(agent_id, request) → GetAgentResponseModel
client.conversationalAi.agents.delete(agent_id) → void
client.conversationalAi.agents.duplicate(agent_id, request) → CreateAgentResponseModel
client.conversationalAi.agents.widget.get(agent_id, request?) → GetAgentEmbedResponseModel
client.conversationalAi.agents.widget.avatar.create(agent_id, request) → PostAgentAvatarResponseModel
client.conversationalAi.agents.knowledgeBase.size(agent_id) → GetAgentKnowledgebaseSizeResponseModel
client.conversationalAi.agents.link.get(agent_id) → GetAgentLinkResponseModel
client.conversationalAi.agents.llmUsage.calculate(agent_id, request?) → GetAgentLlmUsageCalculationResponseModel

// Conversational AI - Knowledge Base
client.conversationalAi.addToKnowledgeBase(request) → AddKnowledgeBaseResponseModel
client.conversationalAi.knowledgeBase.list(request?) → GetKnowledgeBaseListResponseModel
client.conversationalAi.knowledgeBase.documents.get(documentation_id, request?) → DocumentsGetResponse
client.conversationalAi.knowledgeBase.documents.delete(documentation_id, request?) → unknown

// Conversational AI - Tools
client.conversationalAi.tools.create(request) → CreateToolResponseModel
client.conversationalAi.tools.list() → GetToolsResponseModel
client.conversationalAi.tools.get(tool_id) → ToolResponseModel
client.conversationalAi.tools.update(tool_id, request) → void
client.conversationalAi.tools.delete(tool_id) → void

// Conversational AI - Conversations
client.conversationalAi.conversations.list(request?) → GetConversationsPageResponseModel
client.conversationalAi.conversations.get(conversation_id) → GetConversationResponseModel
client.conversationalAi.conversations.delete(conversation_id) → unknown
client.conversationalAi.conversations.audio.get(conversation_id) → ReadableStream<Uint8Array>

// Conversational AI - Phone
client.conversationalAi.phoneNumbers.list() → PhoneNumbersListResponseItem[]
client.conversationalAi.batchCalls.create(request) → SubmitBatchCallResponseModel
client.conversationalAi.batchCalls.list(request?) → WorkspaceBatchCallsResponse
client.conversationalAi.batchCalls.cancel(batch_id) → BatchCallResponse
client.conversationalAi.twilio.outboundCall(request) → TwilioOutboundCallResponse
client.conversationalAi.sipTrunk.outboundCall(request) → SipTrunkOutboundCallResponse

// Conversational AI - WhatsApp
client.conversationalAi.whatsappAccounts.list() → GetWhatsappAccountsResponseModel
client.conversationalAi.whatsappAccounts.import(request) → ImportWhatsAppAccountResponse

// Conversational AI - MCP Servers
client.conversationalAi.mcpServers.list() → McpServersResponseModel
client.conversationalAi.mcpServers.create(request) → McpServerResponseModel
client.conversationalAi.mcpServers.update(mcp_server_id, request?) → McpServerResponseModel
client.conversationalAi.mcpServers.delete(mcp_server_id) → unknown

// Dubbing
client.dubbing.create(request) → DoDubbingResponse
client.dubbing.get(dubbing_id) → DubbingMetadataResponse
client.dubbing.delete(dubbing_id) → void

// Studio
client.studio.createPodcast(request) → PodcastProjectResponseModel

// Audio Native
client.audioNative.create(request) → AudioNativeCreateProjectResponseModel

// Forced Alignment
client.forcedAlignment.create(request) → ForcedAlignmentResponseModel

// History
client.history.list(request?) → GetSpeechHistoryResponse
client.history.get(history_item_id) → SpeechHistoryItemResponse
client.history.delete(history_item_id) → void
client.history.download(history_item_id) → ReadableStream<Uint8Array>

// Usage
client.usage.get(request) → UsageCharactersResponseModel

// User
client.user.get() → User
client.user.subscription.get() → GetSubscriptionResponseModel

// Workspace
client.workspace.members.update(request) → UpdateWorkspaceMemberResponseModel
client.workspace.invites.create(request) → AddWorkspaceInviteResponseModel
client.workspace.invites.createBatch(request) → AddWorkspaceInviteResponseModel
client.workspace.invites.delete(invite_id) → string
client.workspace.groups.search(request) → WorkspaceGroupByNameResponseModel[]
client.workspace.groups.members.add(group_id, request) → AddWorkspaceGroupMemberResponseModel
client.workspace.groups.members.remove(group_id, request) → DeleteWorkspaceGroupMemberResponseModel
client.workspace.resources.get(resource_id, request) → ResourceMetadataResponseModel
client.workspace.resources.share(resource_id, request) → unknown
client.workspace.resources.unshare(resource_id, request) → unknown
client.workspace.resources.copyToWorkspace(resource_id, request) → unknown

// Webhooks
client.webhooks.create(request) → WorkspaceCreateWebhookResponseModel
client.webhooks.list() → WorkspaceWebhooksListResponseModel
client.webhooks.update(webhook_id, request) → WorkspaceUpdateWebhookResponseModel
client.webhooks.delete(webhook_id) → void
client.webhooks.constructEvent(rawBody, sigHeader, secret) → Promise<any>

// Pronunciation Dictionaries
client.pronunciationDictionaries.createFromRules(request) → AddPronunciationDictionaryResponseModel
client.pronunciationDictionaries.createFromFile(request) → AddPronunciationDictionaryResponseModel
client.pronunciationDictionaries.list(request?) → GetPronunciationDictionariesMetadataResponseModel

// Models
client.models.list() → Model[]

// Tokens
client.tokens.singleUse.create(token_type) → SingleUseTokenResponseModel

// Service Accounts
client.serviceAccounts.list() → WorkspaceServiceAccountListResponseModel

Common Patterns

Streaming Audio

All audio generation methods return ReadableStream<Uint8Array> that can be iterated:

const audio = await client.textToSpeech.convert(voiceId, { text: "Hello" });

for await (const chunk of audio) {
  // Process audio chunk (Uint8Array)
}

Error Handling

import { ElevenLabs, ElevenLabsError, ElevenLabsTimeoutError } from "@elevenlabs/elevenlabs-js";

try {
  const audio = await client.textToSpeech.convert(voiceId, { text: "Hello" });
} catch (error) {
  if (error instanceof ElevenLabsTimeoutError) {
    console.error('Request timed out');
  } else if (error instanceof ElevenLabs.UnauthorizedError) {
    console.error('Invalid API key');
  } else if (error instanceof ElevenLabs.UnprocessableEntityError) {
    console.error('Validation failed:', error.body);
  } else if (error instanceof ElevenLabsError) {
    console.error(`API Error ${error.statusCode}:`, error.body);
  }
}

Request Options

All methods accept optional request-specific configuration:

const audio = await client.textToSpeech.convert(
  voiceId,
  { text: "Hello" },
  {
    timeoutInSeconds: 300,
    maxRetries: 3,
    abortSignal: controller.signal,
    apiKey: "override-key",
  }
);

Raw Responses

Access raw HTTP responses with withRawResponse():

const { data, rawResponse } = await client.textToSpeech
  .convert(voiceId, request)
  .withRawResponse();

console.log('Status:', rawResponse.status);
console.log('Headers:', rawResponse.headers);

WebSocket Real-time (Node.js only)

const connection = await client.speechToText.realtime.connect({
  apiKey: "your-api-key",
  format: AudioFormat.PCM_16000,
  strategy: CommitStrategy.VAD,
});

connection.on("transcript", (transcript) => {
  console.log(transcript.text);
});

// Send audio data
connection.send(audioBuffer);

// Close connection
connection.close();

SDK Architecture

The SDK uses a hybrid architecture combining auto-generated API clients with enhanced wrapper classes:

  • Base API Clients: Auto-generated clients for all API endpoints providing type-safe method signatures
  • Enhanced Wrappers: Some clients have additional capabilities beyond the base API:
    • client.music - Automatic multipart response parsing for detailed metadata
    • client.speechToText - Real-time WebSocket transcription support
    • client.webhooks - HMAC-SHA256 signature verification

Wrapper classes are transparent - they provide the same methods as base clients plus additional features. See individual capability documentation for wrapper-specific features.

Type System

All types are available under the ElevenLabs namespace:

import { ElevenLabs } from "@elevenlabs/elevenlabs-js";

type TextRequest = ElevenLabs.BodyTextToSpeechFull;
type Voice = ElevenLabs.Voice;
type Model = ElevenLabs.Model;

For detailed type definitions, see Common Types.

Environment-Specific Features

Node.js Only

  • Real-time WebSocket transcription (client.speechToText.realtime)
  • Audio playback utilities (play(), stream())
  • Beta Conversational AI SDK (Conversation, ClientTools, AudioInterface) - See Beta SDK

All Runtimes

  • All other SDK features work across Node.js, Vercel, Cloudflare Workers, Deno, and Bun

Advanced Features

  • Context Continuity: Use previousText, nextText, previousRequestIds, nextRequestIds for better prosody
  • Pronunciation Dictionaries: Apply custom pronunciation rules to TTS (max 3 per request)
  • Voice Settings: Fine-tune stability, similarity boost, style, speaker boost, and speed
  • Latency Optimization: Use optimizeStreamingLatency (0-4) for TTS
  • Output Formats: Multiple audio formats for TTS and STT
  • Multi-channel Audio: STT supports multi-channel transcription with per-channel timestamps
  • Webhooks: Async notifications with HMAC-SHA256 signature verification
  • RAG: Retrieval Augmented Generation for conversational AI agents
  • Custom LLMs: Use custom LLM endpoints with conversational AI

Next Steps

  • See Client Setup for detailed initialization options
  • Explore capability-specific documentation in the sections above
  • Review Common Types for shared type definitions
  • Check Error Handling for comprehensive error management
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@elevenlabs/elevenlabs-js@2.30.x
Publish Source
CLI
Badge
tessl/npm-elevenlabs--elevenlabs-js badge