CtrlK
BlogDocsLog inGet started
Tessl Logo

deepgram-core-workflow-a

Implement production pre-recorded speech-to-text with Deepgram. Use when building audio transcription, batch processing, or implementing diarization and intelligence features. Trigger: "deepgram transcription", "speech to text", "transcribe audio", "batch transcription", "deepgram nova", "diarize audio".

80

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/saas-packs/deepgram-pack/skills/deepgram-core-workflow-a/SKILL.md
SKILL.md
Quality
Evals
Security

Deepgram Core Workflow A: Pre-recorded Transcription

Overview

Production pre-recorded transcription service using Deepgram's REST API. Covers transcribeUrl and transcribeFile, speaker diarization, audio intelligence (summarization, topic detection, sentiment, intent), batch processing with concurrency control, and callback-based async transcription for large files.

Prerequisites

  • @deepgram/sdk installed, DEEPGRAM_API_KEY configured
  • Audio files: WAV, MP3, FLAC, OGG, M4A, or WebM
  • For batch: p-limit package (npm install p-limit)

Instructions

Step 1: Transcription Service Class

import { createClient, DeepgramClient } from '@deepgram/sdk';
import { readFileSync } from 'fs';

interface TranscribeOptions {
  model?: 'nova-3' | 'nova-2' | 'nova-2-meeting' | 'nova-2-phonecall' | 'base';
  language?: string;
  diarize?: boolean;
  utterances?: boolean;
  paragraphs?: boolean;
  smart_format?: boolean;
  summarize?: boolean;      // Audio intelligence
  detect_topics?: boolean;  // Topic detection
  sentiment?: boolean;      // Sentiment analysis
  intents?: boolean;        // Intent recognition
  keywords?: string[];      // Keyword boosting: ["term:weight"]
  callback?: string;        // Async callback URL
}

class DeepgramTranscriber {
  private client: DeepgramClient;

  constructor(apiKey: string) {
    this.client = createClient(apiKey);
  }

  async transcribeUrl(url: string, opts: TranscribeOptions = {}) {
    const { result, error } = await this.client.listen.prerecorded.transcribeUrl(
      { url },
      {
        model: opts.model ?? 'nova-3',
        language: opts.language ?? 'en',
        smart_format: opts.smart_format ?? true,
        diarize: opts.diarize ?? false,
        utterances: opts.utterances ?? false,
        paragraphs: opts.paragraphs ?? false,
        summarize: opts.summarize ? 'v2' : undefined,
        detect_topics: opts.detect_topics ?? false,
        sentiment: opts.sentiment ?? false,
        intents: opts.intents ?? false,
        keywords: opts.keywords,
        callback: opts.callback,
      }
    );
    if (error) throw new Error(`Transcription failed: ${error.message}`);
    return result;
  }

  async transcribeFile(filePath: string, opts: TranscribeOptions = {}) {
    const audio = readFileSync(filePath);
    const mimetype = this.detectMimetype(filePath);

    const { result, error } = await this.client.listen.prerecorded.transcribeFile(
      audio,
      {
        model: opts.model ?? 'nova-3',
        smart_format: opts.smart_format ?? true,
        mimetype,
        diarize: opts.diarize ?? false,
        utterances: opts.utterances ?? false,
        summarize: opts.summarize ? 'v2' : undefined,
        detect_topics: opts.detect_topics ?? false,
        sentiment: opts.sentiment ?? false,
      }
    );
    if (error) throw new Error(`File transcription failed: ${error.message}`);
    return result;
  }

  private detectMimetype(path: string): string {
    const ext = path.split('.').pop()?.toLowerCase();
    const map: Record<string, string> = {
      wav: 'audio/wav', mp3: 'audio/mpeg', flac: 'audio/flac',
      ogg: 'audio/ogg', m4a: 'audio/mp4', webm: 'audio/webm',
    };
    return map[ext ?? ''] ?? 'audio/wav';
  }
}

Step 2: Extract Structured Results

function formatResult(result: any) {
  const channel = result.results.channels[0];
  const alt = channel.alternatives[0];

  return {
    transcript: alt.transcript,
    confidence: alt.confidence,
    words: alt.words?.map((w: any) => ({
      word: w.word,
      start: w.start,
      end: w.end,
      confidence: w.confidence,
      speaker: w.speaker,       // Only if diarize: true
      punctuated_word: w.punctuated_word,
    })),
    // Speaker segments (requires utterances: true + diarize: true)
    utterances: result.results.utterances?.map((u: any) => ({
      speaker: u.speaker,
      text: u.transcript,
      start: u.start,
      end: u.end,
      confidence: u.confidence,
    })),
    // Audio intelligence results
    summary: result.results.summary?.short,
    topics: result.results.topics?.segments,
    sentiments: result.results.sentiments?.segments,
    intents: result.results.intents?.segments,
    metadata: {
      duration: result.metadata.duration,
      channels: result.metadata.channels,
      model: result.metadata.model_info,
      request_id: result.metadata.request_id,
    },
  };
}

Step 3: Batch Processing

import pLimit from 'p-limit';

async function batchTranscribe(
  files: string[],
  opts: TranscribeOptions = {},
  concurrency = 5
) {
  const transcriber = new DeepgramTranscriber(process.env.DEEPGRAM_API_KEY!);
  const limit = pLimit(concurrency);

  const results = await Promise.allSettled(
    files.map(file =>
      limit(async () => {
        const result = await transcriber.transcribeFile(file, opts);
        console.log(`Done: ${file} (${result.metadata.duration}s)`);
        return { file, result: formatResult(result) };
      })
    )
  );

  const succeeded = results.filter(r => r.status === 'fulfilled');
  const failed = results.filter(r => r.status === 'rejected');
  console.log(`Batch complete: ${succeeded.length} ok, ${failed.length} failed`);
  return results;
}

Step 4: Async Callback Transcription (Large Files)

// For files >2 hours or when you don't want to hold a connection open,
// use Deepgram's callback feature. Deepgram POSTs results to your URL.
async function submitAsync(audioUrl: string, callbackUrl: string) {
  const transcriber = new DeepgramTranscriber(process.env.DEEPGRAM_API_KEY!);

  // Deepgram returns a request_id immediately, processes in background
  const result = await transcriber.transcribeUrl(audioUrl, {
    model: 'nova-3',
    diarize: true,
    callback: callbackUrl,  // Your HTTPS endpoint
  });

  console.log('Submitted. Request ID:', result.metadata.request_id);
  // Deepgram will POST results to callbackUrl when done
  // Retries up to 10 times with 30s delay on failure
}

Step 5: Keyword Boosting

// Boost domain-specific terms for higher accuracy
const result = await transcriber.transcribeUrl(audioUrl, {
  model: 'nova-3',
  keywords: [
    'Kubernetes:1.5',    // Boost weight 1.0-2.0
    'PostgreSQL:1.5',
    'microservices:1.3',
  ],
});

Output

  • DeepgramTranscriber class with URL and file transcription
  • Structured result extraction with word-level timing, speakers, and intelligence
  • Batch processing with configurable concurrency via p-limit
  • Async callback pattern for large files
  • Keyword boosting for domain vocabulary

Error Handling

ErrorCauseSolution
400 Bad RequestInvalid audio formatVerify file header bytes (WAV: RIFF, MP3: 0xFFF3/0xFFFB)
413 Payload Too LargeFile exceeds limitUse callback URL for async processing
Empty transcriptNo speech in audioCheck audio volume, try alternatives: 3 for confidence
408 TimeoutLong file, sync modeSwitch to callback-based async
Low confidenceBackground noisePreprocess: ffmpeg -i input.wav -af "highpass=f=200,lowpass=f=3000" clean.wav

Resources

  • Pre-recorded API
  • Speaker Diarization
  • Audio Intelligence
  • Summarization
  • Callback Feature

Next Steps

Proceed to deepgram-core-workflow-b for real-time streaming transcription.

Repository
jeremylongshore/claude-code-plugins-plus-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.