sharaf/speech-recognition-architect

Use when the user wants to design, size, audit, or choose a self-hosted speech recognition or streaming ASR stack, including Whisper, Parakeet, Canary, Riva, NIM, Triton ASR, faster-whisper, sherpa-onnx, voice-agent transcription, Romanian or Moldovan ASR, contact-center transcription, GPU sizing, latency budgets, multilingual routing, VAD, diarization, or production evaluation.

100

2.00x

Quality

100%

Does it follow best practices?

Impact

100%

2.00x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Speech Recognition Architect

Name: sharaf/speech-recognition-architect
Rating: 100 (1 reviews)
Author: sharaf

Architect, size, and audit self-hosted speech recognition and streaming ASR systems for production workloads.

What this skill does

This Tessl skill creates architecture blueprints for local or on-prem ASR systems. It covers language and model routing, streaming architecture, serving topology, multi-GPU scaling, hardware sizing, runtime and quantization choices, layered processing, audio I/O, evaluation plans, and rollout risk.

It is especially detailed for Romanian, Moldovan, and Romanian-English streaming ASR, where model language coverage, licensing, and latency tradeoffs are easy to get wrong.

When to use

Use this skill for prompts such as:

"Design a streaming speech-to-text stack"
"Architect a local ASR platform"
"Pick Whisper vs Parakeet vs Canary"
"Size GPUs for contact-center transcription"
"Build a Romanian or Moldovan transcription system"
"Audit our Triton, Riva, NIM, or faster-whisper ASR deployment"
"Plan VAD, diarization, language ID, hotwords, or ASR evaluation"

Install

tessl install sharaf/speech-recognition-architect

How it's organized

Layer	Where	Purpose
Entry point	`SKILL.md`	Required inputs, decision triage, output contract, guardrails
Model selection	`references/model-selection.md`	Inputs, model routing, licenses, RO/EN handling
Streaming architecture	`references/streaming-architecture.md`	Latency bands, architecture families, chunking, commit policy
Serving and scaling	`references/serving-and-scaling.md`	Triton, Riva, NIM, OSS stacks, stream affinity, autoscaling
Hardware sizing	`references/hardware-sizing.md`	GPU math, per-stream memory, cost and capacity formulas
Runtime and evaluation	`references/runtime-processing-evaluation.md`	Quantization, layered processing, audio I/O, metrics
Output and guardrails	`references/blueprint-output-and-guardrails.md`	Required blueprint structure and must-not-do rules
Full migrated source	`references/streaming-asr-architect-source.md`	Complete legacy skill text and source research index

The entrypoint is intentionally short. Open the focused bundled references only when the task needs deeper tables or decision detail.

Output format

For full designs, the skill produces:

# ASR Stack Blueprint - <project name>
## Requirements Recap
## Model Selection
## Streaming Architecture
## Serving Topology
## Hardware Sizing
## Quantization & Runtime
## Layered Processing
## Audio I/O & Protocols
## Evaluation Plan
## Rollout & Risks
## Open Questions

Eval results

Initial Sonnet eval on 3 generated ASR architecture scenarios:

Metric	Result
Activation	3/3 scenarios
Baseline average	51%
With-context average	100%
Uplift	1.96x

Scenario coverage:

Scenario	Baseline	With context
Romanian contact-center real-time ASR blueprint	32%	100%
High-concurrency Whisper serving architecture	63%	100%
RO+EN code-switching with English translation	58%	100%

Workspace: sharaf
Visibility: Public
Created: about 2 months ago
Last updated: about 1 month ago
Publish Source: CLI
Badge