Name: sharaf/speech-recognition-architect
Rating: 100 (1 reviews)
Author: sharaf

sharaf/speech-recognition-architect

Use when the user wants to design, size, audit, or choose a self-hosted speech recognition or streaming ASR stack, including Whisper, Parakeet, Canary, Riva, NIM, Triton ASR, faster-whisper, sherpa-onnx, voice-agent transcription, Romanian or Moldovan ASR, contact-center transcription, GPU sizing, latency budgets, multilingual routing, VAD, diarization, or production evaluation.

100

2.00x

Quality

100%

Does it follow best practices?

Impact

100%

2.00x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent handles bilingual RO+EN audio correctly: avoiding forced language=\"ro\" on mixed audio, recommending Canary-1B-v2 for translation to English (not Whisper-turbo), excluding non-commercial models like MMS or SeamlessM4T-v2 for commercial deployment, treating Moldovan dialect as a research gap requiring an evaluation set, and using per-segment LID for code-switching routing.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "No forced language=ro",
      "description": "Does NOT recommend forcing language=\"ro\" on the mixed RO+EN audio; instead uses auto-detect, per-segment LID, diarize-then-detect, or dual-token prompt strategies",
      "max_score": 10
    },
    {
      "name": "Canary-1B-v2 for translation",
      "description": "Recommends Canary-1B-v2 (with target_lang=\"en\" or equivalent) as the model for translating non-English segments to English",
      "max_score": 10
    },
    {
      "name": "Whisper-turbo translation rejected",
      "description": "Explicitly states that Whisper-large-v3-turbo cannot translate (decoder pruning removed translation capability) and is rejected for the translation requirement",
      "max_score": 9
    },
    {
      "name": "MMS or SeamlessM4T-v2 excluded",
      "description": "Does NOT recommend MMS or SeamlessM4T-v2 for this commercial deployment, or explicitly names them as rejected due to CC-BY-NC non-commercial license",
      "max_score": 9
    },
    {
      "name": "Per-segment LID for code-switching",
      "description": "Includes per-segment Language ID (LID) re-detection as a component for handling code-switching between Romanian and English",
      "max_score": 8
    },
    {
      "name": "Canary license stated",
      "description": "States Canary-1B-v2's license (CC-BY-4.0 or equivalent) confirming commercial use is permitted",
      "max_score": 7
    },
    {
      "name": "Moldovan dialect as research gap",
      "description": "Marks Moldovan dialect coverage as a research gap requiring a representative evaluation set (not just noted as a minor variation)",
      "max_score": 8
    },
    {
      "name": "Romanian eval benchmarks",
      "description": "Includes at least two of CV-21 RO, FLEURS-RO, RSC as named benchmark datasets in the evaluation plan for Romanian WER",
      "max_score": 7
    },
    {
      "name": "CER for Romanian",
      "description": "Includes CER (Character Error Rate) as a sanity check metric in the evaluation plan, noting Romanian's inflected morphology or diacritics",
      "max_score": 7
    },
    {
      "name": "Blueprint sections present",
      "description": "The blueprint file contains the required top-level section coverage: Requirements Recap, Model Selection, Serving Topology, Hardware Sizing, Quantization & Runtime, Layered Processing, Audio I/O & Protocols, Evaluation Plan, Rollout & Risks, and Open Questions. For this explicitly batch/offline request, a top-level Batch Processing Pipeline or Offline Processing Architecture section counts for the Streaming Architecture slot.",
      "max_score": 7
    },
    {
      "name": "Rejected alternatives with reasons",
      "description": "Names at least two rejected model or technology alternatives with explicit reasons for rejection",
      "max_score": 6
    },
    {
      "name": "Runtime appropriate for L4",
      "description": "Specifies a runtime appropriate for the NVIDIA L4 GPU (e.g., TensorRT-LLM or NeMo native/Riva/NIM), not a CPU or Apple Silicon runtime",
      "max_score": 6
    },
    {
      "name": "No implementation artifacts",
      "description": "Does NOT contain Dockerfile, Helm chart, Triton config.pbtxt, NeMo YAML, or actual Python/bash code blocks implementing the stack",
      "max_score": 6
    }
  ]
}

evals

scenario-1

scenario-2

scenario-3

skills

sharaf/speech-recognition-architect

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-3/

criteria.jsonevals/scenario-3/