deepgram-js-speech-to-text

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribeUrl` / `transcribeFile` (REST) plus `client.listen.v1.createConnection()` / `connect()` (WebSocket). Use `deepgram-js-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-js-conversational-stt` for Flux turn-taking on `/v2/listen`, and `deepgram-js-voice-agent` for full-duplex assistants. Triggers include "transcribe", "speech to text", "STT", "listen.v1", "nova-3", "live transcription", and "websocket transcription".

Quality

92%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Using Deepgram Speech-to-Text (JavaScript / TypeScript SDK)

Name: deepgram-js-speech-to-text
Rating: 75.2 (1 reviews)
Author: deepgram

Basic transcription for prerecorded audio (REST) or live audio (WebSocket) via /v1/listen.

When to use this product

REST (client.listen.v1.media.transcribeUrl / transcribeFile) — one-shot transcription of a finished URL or file. Good for batch jobs, caption generation, offline processing.
WebSocket (client.listen.v1.createConnection() / connect()) — continuous streaming transcription. Good for live captions, microphone audio, telephony streams, browser or Node realtime apps.

Use a different skill when:

You also want summaries, topics, intents, sentiment, language detection, or redaction guidance on the same /v1/listen call → deepgram-js-audio-intelligence.
You need Flux turn-taking and end-of-turn events on /v2/listen → deepgram-js-conversational-stt.
You need a full interactive assistant with STT + LLM + TTS over one socket → deepgram-js-voice-agent.

Authentication

require("dotenv").config();

const { DeepgramClient } = require("@deepgram/sdk");

const deepgramClient = new DeepgramClient({
  apiKey: process.env.DEEPGRAM_API_KEY,
});

Use the exported DeepgramClient from src/CustomClient.ts, not DefaultDeepgramClient. The wrapper adds the required Token auth prefix, session headers, and patched WebSocket behavior.

Quick start — REST (prerecorded URL)

From examples/04-transcription-prerecorded-url.ts:

const data = await deepgramClient.listen.v1.media.transcribeUrl({
  url: "https://dpgr.am/spacewalk.wav",
  model: "nova-3",
  language: "en",
  punctuate: true,
  paragraphs: true,
  utterances: true,
});

console.log(
  "Transcription:",
  data.results?.channels?.[0]?.alternatives?.[0]?.transcript,
);

Quick start — REST (prerecorded file)

From examples/05-transcription-prerecorded-file.ts:

const { createReadStream } = require("fs");

const data = await deepgramClient.listen.v1.media.transcribeFile(
  createReadStream("./examples/spacewalk.wav"),
  {
    model: "nova-3",
    language: "en",
    punctuate: true,
    paragraphs: true,
    utterances: true,
    smart_format: true,
  }
);

transcribeFile(...) accepts multiple upload shapes in this SDK: fs.ReadStream, Buffer, ReadableStream, Blob, File, ArrayBuffer, and Uint8Array (see examples/23-file-upload-types.ts).

Quick start — WebSocket (live streaming)

From examples/07-transcription-live-websocket.ts:

const deepgramConnection = await deepgramClient.listen.v1.createConnection({
  model: "nova-3",
  language: "en",
  punctuate: "true",
  interim_results: "true",
});

deepgramConnection.on("message", (data) => {
  if (data.type === "Results") {
    console.log("Transcript:", data);
  }
});

deepgramConnection.connect();
await deepgramConnection.waitForOpen();

// Swap this for a mic capture (e.g. `node-microphone` / `MediaRecorder`)
// in real apps; the repo examples use `createReadStream` over a sample WAV.
const { createReadStream } = require("node:fs");
const audioStream = createReadStream("samples/spacewalk.wav");

audioStream.on("data", (chunk) => {
  deepgramConnection.sendMedia(chunk);
});

audioStream.on("end", () => {
  deepgramConnection.sendFinalize({ type: "Finalize" });
});

The repo examples use the two-step socket flow: createConnection() → register handlers → connect() → waitForOpen().

Key parameters / API surface

REST: model, language, punctuate, smart_format, paragraphs, utterances, multichannel, numerals, search, keyterm, keywords, encoding, sample_rate, callback, tag.
WSS connect args (src/api/resources/listen/resources/v1/client/Client.ts): model is required; common realtime flags include language, interim_results, endpointing, utterance_end_ms, vad_events, encoding, sample_rate, multichannel, punctuate, smart_format.
WSS client messages (src/api/resources/listen/resources/v1/client/Socket.ts): sendMedia(...), sendFinalize(...), sendCloseStream(...), sendKeepAlive(...).
WSS server events: Results, Metadata, UtteranceEnd, SpeechStarted.

API reference (layered)

In-repo reference: reference.md → Listen V1 Media for REST; WSS behavior lives in src/CustomClient.ts and src/api/resources/listen/resources/v1/client/{Client,Socket}.ts.
Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt
Product docs:
- https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded
- https://developers.deepgram.com/reference/speech-to-text/listen-streaming

Gotchas

Use DeepgramClient, not DefaultDeepgramClient. The custom wrapper adds Token auth, session IDs, browser WS auth protocols, and patched sockets.
Repo examples are two-stage for WSS. createConnection() does not open the socket; call connect() and usually waitForOpen().
Finalize before closing v1 streams. sendFinalize({ type: "Finalize" }) flushes the final partial.
Keep idle streams alive. Use audio or sendKeepAlive({ type: "KeepAlive" }) on long pauses.
Raw audio metadata must match reality. If you send PCM, encoding and sample_rate must match the bytes.
Browser auth differs from Node auth. In browsers, the wrapper moves auth/session info into WebSocket subprotocols because custom headers are unavailable.
Use /v2/listen only for Flux. If you need turn-aware conversational STT, switch skills instead of forcing v1.

Example files in this repo

examples/04-transcription-prerecorded-url.ts
examples/05-transcription-prerecorded-file.ts
examples/06-transcription-prerecorded-callback.ts
examples/07-transcription-live-websocket.ts
examples/08-transcription-captions.ts
examples/23-file-upload-types.ts
examples/27-deepgram-session-header.ts

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

Repository: deepgram/deepgram-js-sdk
Commit: bcffba7

Last updated: 11 days ago
Created: 11 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.