Use when the user wants to design, size, audit, or choose a self-hosted speech recognition or streaming ASR stack, including Whisper, Parakeet, Canary, Riva, NIM, Triton ASR, faster-whisper, sherpa-onnx, voice-agent transcription, Romanian or Moldovan ASR, contact-center transcription, GPU sizing, latency budgets, multilingual routing, VAD, diarization, or production evaluation.
100
100%
Does it follow best practices?
Impact
100%
2.00xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent handles bilingual RO+EN audio correctly: avoiding forced language=\"ro\" on mixed audio, recommending Canary-1B-v2 for translation to English (not Whisper-turbo), excluding non-commercial models like MMS or SeamlessM4T-v2 for commercial deployment, treating Moldovan dialect as a research gap requiring an evaluation set, and using per-segment LID for code-switching routing.",
"type": "weighted_checklist",
"checklist": [
{
"name": "No forced language=ro",
"description": "Does NOT recommend forcing language=\"ro\" on the mixed RO+EN audio; instead uses auto-detect, per-segment LID, diarize-then-detect, or dual-token prompt strategies",
"max_score": 10
},
{
"name": "Canary-1B-v2 for translation",
"description": "Recommends Canary-1B-v2 (with target_lang=\"en\" or equivalent) as the model for translating non-English segments to English",
"max_score": 10
},
{
"name": "Whisper-turbo translation rejected",
"description": "Explicitly states that Whisper-large-v3-turbo cannot translate (decoder pruning removed translation capability) and is rejected for the translation requirement",
"max_score": 9
},
{
"name": "MMS or SeamlessM4T-v2 excluded",
"description": "Does NOT recommend MMS or SeamlessM4T-v2 for this commercial deployment, or explicitly names them as rejected due to CC-BY-NC non-commercial license",
"max_score": 9
},
{
"name": "Per-segment LID for code-switching",
"description": "Includes per-segment Language ID (LID) re-detection as a component for handling code-switching between Romanian and English",
"max_score": 8
},
{
"name": "Canary license stated",
"description": "States Canary-1B-v2's license (CC-BY-4.0 or equivalent) confirming commercial use is permitted",
"max_score": 7
},
{
"name": "Moldovan dialect as research gap",
"description": "Marks Moldovan dialect coverage as a research gap requiring a representative evaluation set (not just noted as a minor variation)",
"max_score": 8
},
{
"name": "Romanian eval benchmarks",
"description": "Includes at least two of CV-21 RO, FLEURS-RO, RSC as named benchmark datasets in the evaluation plan for Romanian WER",
"max_score": 7
},
{
"name": "CER for Romanian",
"description": "Includes CER (Character Error Rate) as a sanity check metric in the evaluation plan, noting Romanian's inflected morphology or diacritics",
"max_score": 7
},
{
"name": "Blueprint sections present",
"description": "The blueprint file contains the required top-level section coverage: Requirements Recap, Model Selection, Serving Topology, Hardware Sizing, Quantization & Runtime, Layered Processing, Audio I/O & Protocols, Evaluation Plan, Rollout & Risks, and Open Questions. For this explicitly batch/offline request, a top-level Batch Processing Pipeline or Offline Processing Architecture section counts for the Streaming Architecture slot.",
"max_score": 7
},
{
"name": "Rejected alternatives with reasons",
"description": "Names at least two rejected model or technology alternatives with explicit reasons for rejection",
"max_score": 6
},
{
"name": "Runtime appropriate for L4",
"description": "Specifies a runtime appropriate for the NVIDIA L4 GPU (e.g., TensorRT-LLM or NeMo native/Riva/NIM), not a CPU or Apple Silicon runtime",
"max_score": 6
},
{
"name": "No implementation artifacts",
"description": "Does NOT contain Dockerfile, Helm chart, Triton config.pbtxt, NeMo YAML, or actual Python/bash code blocks implementing the stack",
"max_score": 6
}
]
}