ElevenLabs text-to-speech with mac-style say UX.
64
51%
Does it follow best practices?
Impact
91%
3.50xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/sag/SKILL.mdQuality
Discovery
22%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is extremely terse, reading more like a label than a functional skill description. It lacks concrete actions, a 'Use when...' clause, and sufficient trigger terms. While 'ElevenLabs' and 'say' provide some specificity, the description would fail to help Claude reliably select this skill from a large pool.
Suggestions
Add a 'Use when...' clause with trigger terms like 'text-to-speech', 'TTS', 'voice generation', 'read aloud', 'audio output', 'ElevenLabs', or 'say command'.
List specific concrete actions such as 'Converts text to speech audio using the ElevenLabs API, plays audio locally via macOS say-style interface, supports voice selection and audio file export'.
Clarify what 'mac-style say UX' means in practical terms (e.g., 'Provides a simple command-line-like interface similar to macOS `say` for generating and playing speech').
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description mentions 'text-to-speech' and 'mac-style say UX' but does not list concrete actions like generating audio files, converting text to speech, or specifying supported formats. It's more of a label than a capability description. | 1 / 3 |
Completeness | The description weakly addresses 'what' (text-to-speech) but completely lacks a 'when' clause or any explicit trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also quite thin, warranting a 1. | 1 / 3 |
Trigger Term Quality | Includes 'text-to-speech', 'ElevenLabs', and 'say' which are relevant keywords users might use, but misses common variations like 'TTS', 'voice generation', 'audio', 'speak', or 'read aloud'. | 2 / 3 |
Distinctiveness Conflict Risk | Mentioning 'ElevenLabs' and 'mac-style say UX' provides some distinctiveness, but the description is too brief to clearly carve out a niche. It could overlap with other audio or TTS-related skills. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
79%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted, concise skill that provides highly actionable CLI commands and configuration details for ElevenLabs TTS. Its main strengths are token efficiency and concrete examples. The workflow could be improved with explicit validation/fallback steps for voice generation, and the document structure could benefit from consistent heading hierarchy.
Suggestions
Add a brief validation step to the chat voice response workflow (e.g., check if the file was created successfully before referencing it with MEDIA)
Use consistent markdown heading levels (##) for all major sections instead of mixing bold text and plain labels to improve navigation
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. It assumes Claude knows how to use CLI tools and doesn't waste tokens explaining what TTS is or how APIs work. Every section delivers actionable information concisely. | 3 / 3 |
Actionability | Provides concrete, copy-paste-ready commands throughout (e.g., `sag "Hello there"`, `sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"`). Specific flags, model names, voice IDs, and audio tags are all directly usable. | 3 / 3 |
Workflow Clarity | The 'Chat voice responses' section has a clear two-step workflow (generate then include), but the pronunciation/delivery rules section is more of a reference list than a sequenced workflow. There's a note to 'Confirm voice + speaker before long output' but no explicit validation or error recovery steps for when TTS fails or sounds wrong. | 2 / 3 |
Progressive Disclosure | The content is well-organized with clear sections and headers, but everything is inline in a single file. The `sag prompting` command reference is a nice touch for external discovery, but some content (like the full audio tags list or pronunciation rules) could potentially be split out. For a skill of this size (~50 lines), the inline approach is borderline acceptable but the lack of any markdown heading hierarchy (most sections use bold or plain text rather than ## headers) slightly hurts navigation. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
72%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 8 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
metadata_field | 'metadata' should map string keys to string values | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 8 / 11 Passed | |
a5bf5e0
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.