ElevenLabs text-to-speech with mac-style say UX.
64
51%
Does it follow best practices?
Impact
91%
3.50xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/sag/SKILL.mdQuality
Discovery
22%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is extremely terse and reads more like a label than a functional skill description. It lacks concrete actions, explicit trigger guidance ('Use when...'), and sufficient natural keywords for reliable skill selection. While 'ElevenLabs' provides some distinctiveness, the description needs significant expansion to be effective.
Suggestions
Add a 'Use when...' clause with explicit triggers, e.g., 'Use when the user asks to convert text to speech, generate audio, read text aloud, or mentions ElevenLabs, TTS, or voice synthesis.'
List specific concrete actions such as 'Converts text to speech audio using the ElevenLabs API, plays audio locally using macOS say-style UX, supports voice selection and audio file export.'
Include common natural trigger terms users would say: 'TTS', 'voice generation', 'read aloud', 'speak text', 'audio output', '.mp3'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description mentions 'text-to-speech' and 'mac-style say UX' but does not list concrete actions like generating audio files, converting text to speech, or specifying supported formats. It's more of a label than a capability description. | 1 / 3 |
Completeness | The description weakly addresses 'what' (text-to-speech) but completely lacks a 'when' clause or any explicit trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also too vague to merit a 2. | 1 / 3 |
Trigger Term Quality | Includes 'text-to-speech', 'ElevenLabs', and 'say' which are relevant keywords users might use, but misses common variations like 'TTS', 'voice generation', 'audio', 'speak', or 'read aloud'. | 2 / 3 |
Distinctiveness Conflict Risk | Mentioning 'ElevenLabs' and 'mac-style say UX' provides some distinctiveness from generic audio or speech skills, but the description is too terse to clearly carve out a unique niche and could overlap with other TTS or audio generation skills. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
79%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted, concise skill that provides highly actionable CLI commands and specific configuration details for ElevenLabs TTS via the `sag` tool. Its main weaknesses are the lack of validation/error-handling steps in workflows and the absence of external references for more detailed topics. The content is appropriately sized for a single-file skill but could benefit from explicit verification steps when generating audio for chat responses.
Suggestions
Add a validation step after audio generation (e.g., check file exists and has non-zero size before including MEDIA: reference) to improve workflow reliability.
Consider adding brief error handling guidance—e.g., what to do if the API key is not set or if voice generation returns an error.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. It assumes Claude knows how to use CLI tools and doesn't waste tokens explaining what TTS is or how APIs work. Every section delivers actionable information concisely. | 3 / 3 |
Actionability | Provides concrete, copy-paste-ready commands throughout (e.g., `sag "Hello there"`, `sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"`). Specific model names, voice IDs, audio tags, and flag options are all directly usable. | 3 / 3 |
Workflow Clarity | The 'Chat voice responses' section has a clear two-step workflow (generate then include), and pronunciation fix ordering is logical. However, there's no validation or error handling guidance—e.g., what to do if the API key is missing, if voice generation fails, or how to verify the output file before sending it. | 2 / 3 |
Progressive Disclosure | The content is well-organized into logical sections with clear headers, but everything is inline in a single file. The `sag prompting` command reference is a nice touch for discovery, but there are no references to external files for deeper topics like model comparison or advanced SSML usage, and some sections (v3 audio tags, pronunciation rules) could benefit from being split out if they grow. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
72%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 8 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
metadata_field | 'metadata' should map string keys to string values | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 8 / 11 Passed | |
ec8d4f8
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.