CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/speech-recognition-architect

Use when the user wants to design, size, audit, or choose a self-hosted speech recognition or streaming ASR stack, including Whisper, Parakeet, Canary, Riva, NIM, Triton ASR, faster-whisper, sherpa-onnx, voice-agent transcription, Romanian or Moldovan ASR, contact-center transcription, GPU sizing, latency budgets, multilingual routing, VAD, diarization, or production evaluation.

100

2.00x
Quality

100%

Does it follow best practices?

Impact

100%

2.00x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Speech Recognition Architect

Architect, size, and audit self-hosted speech recognition and streaming ASR systems for production workloads.

What this skill does

This Tessl skill creates architecture blueprints for local or on-prem ASR systems. It covers language and model routing, streaming architecture, serving topology, multi-GPU scaling, hardware sizing, runtime and quantization choices, layered processing, audio I/O, evaluation plans, and rollout risk.

It is especially detailed for Romanian, Moldovan, and Romanian-English streaming ASR, where model language coverage, licensing, and latency tradeoffs are easy to get wrong.

When to use

Use this skill for prompts such as:

  • "Design a streaming speech-to-text stack"
  • "Architect a local ASR platform"
  • "Pick Whisper vs Parakeet vs Canary"
  • "Size GPUs for contact-center transcription"
  • "Build a Romanian or Moldovan transcription system"
  • "Audit our Triton, Riva, NIM, or faster-whisper ASR deployment"
  • "Plan VAD, diarization, language ID, hotwords, or ASR evaluation"

Install

tessl install sharaf/speech-recognition-architect

How it's organized

LayerWherePurpose
Entry pointSKILL.mdRequired inputs, decision triage, output contract, guardrails
Model selectionreferences/model-selection.mdInputs, model routing, licenses, RO/EN handling
Streaming architecturereferences/streaming-architecture.mdLatency bands, architecture families, chunking, commit policy
Serving and scalingreferences/serving-and-scaling.mdTriton, Riva, NIM, OSS stacks, stream affinity, autoscaling
Hardware sizingreferences/hardware-sizing.mdGPU math, per-stream memory, cost and capacity formulas
Runtime and evaluationreferences/runtime-processing-evaluation.mdQuantization, layered processing, audio I/O, metrics
Output and guardrailsreferences/blueprint-output-and-guardrails.mdRequired blueprint structure and must-not-do rules
Full migrated sourcereferences/streaming-asr-architect-source.mdComplete legacy skill text and source research index

The entrypoint is intentionally short. Open the focused bundled references only when the task needs deeper tables or decision detail.

Output format

For full designs, the skill produces:

# ASR Stack Blueprint - <project name>
## Requirements Recap
## Model Selection
## Streaming Architecture
## Serving Topology
## Hardware Sizing
## Quantization & Runtime
## Layered Processing
## Audio I/O & Protocols
## Evaluation Plan
## Rollout & Risks
## Open Questions

Eval results

Initial Sonnet eval on 3 generated ASR architecture scenarios:

MetricResult
Activation3/3 scenarios
Baseline average51%
With-context average100%
Uplift1.96x

Scenario coverage:

ScenarioBaselineWith context
Romanian contact-center real-time ASR blueprint32%100%
High-concurrency Whisper serving architecture63%100%
RO+EN code-switching with English translation58%100%
Workspace
sharaf
Visibility
Public
Created
Last updated
Publish Source
CLI
Badge
sharaf/speech-recognition-architect badge