ElevenLabs text-to-speech with mac-style say UX.
64
51%
Does it follow best practices?
Impact
91%
3.50xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./openclaw/skills/sag/SKILL.mdQuality
Discovery
22%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is extremely terse and reads more like a label than a functional skill description. It lacks concrete actions, explicit trigger guidance ('Use when...'), and sufficient natural keywords for reliable skill selection. While 'ElevenLabs' provides some distinctiveness, the description fails to communicate what the skill actually does or when it should be invoked.
Suggestions
Add a 'Use when...' clause with explicit triggers, e.g., 'Use when the user asks to convert text to speech, generate audio, read text aloud, or mentions ElevenLabs, TTS, or voice synthesis.'
List specific concrete actions such as 'Converts text to speech audio using the ElevenLabs API, plays audio locally using macOS say-style UX, supports voice selection and audio file export.'
Include common natural trigger terms users would say: 'TTS', 'voice generation', 'read aloud', 'speak text', 'audio output', '.mp3'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description mentions 'text-to-speech' and 'mac-style say UX' but does not list concrete actions like generating audio files, converting text to speech, or specifying supported formats. It's more of a label than a capability description. | 1 / 3 |
Completeness | There is no explicit 'Use when...' clause or equivalent trigger guidance. The 'what' is only weakly implied (text-to-speech), and the 'when' is entirely missing. Per rubric guidelines, missing 'Use when' caps completeness at 2, but the 'what' is also weak, so this scores a 1. | 1 / 3 |
Trigger Term Quality | Includes 'text-to-speech', 'ElevenLabs', and 'say' which are relevant keywords a user might use, but misses common variations like 'TTS', 'voice generation', 'audio', 'speak', or 'read aloud'. | 2 / 3 |
Distinctiveness Conflict Risk | Mentioning 'ElevenLabs' and 'mac-style say' provides some distinctiveness from generic audio or speech skills, but the description is too terse to clearly carve out a niche and could overlap with other TTS or audio generation skills. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
79%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted, concise skill that provides highly actionable guidance for using the `sag` CLI tool. Its main strengths are token efficiency and concrete examples. The main weaknesses are the lack of explicit validation/verification steps (e.g., previewing audio before committing to long outputs) and the slightly flat structure that could benefit from clearer separation of quick-start vs. reference content.
Suggestions
Add an explicit verification step after generating audio (e.g., 'Play back short test clip before generating long output') to strengthen the workflow around the existing 'Confirm voice + speaker before long output' note.
Consider separating the audio tags and pronunciation rules into a linked reference section or file to improve progressive disclosure for the main skill overview.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. No unnecessary explanations of what TTS is or how ElevenLabs works. Every section delivers actionable information without padding. The format is terse but clear. | 3 / 3 |
Actionability | Provides concrete, copy-paste-ready commands throughout (e.g., `sag "Hello there"`, `sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"`). Specific flags, model names, voice IDs, and audio tags are all directly usable. | 3 / 3 |
Workflow Clarity | The 'Chat voice responses' section has a clear two-step workflow (generate then include), but the pronunciation/delivery rules section is more of a reference list than a sequenced workflow. For a tool with potential issues (wrong voice, bad pronunciation), there's no validation or feedback loop mentioned—no 'listen and verify' step before sending long outputs, despite the hint to 'confirm voice + speaker before long output.' | 2 / 3 |
Progressive Disclosure | Content is well-organized into logical sections with good headers, but everything is inline in a single file. The `sag prompting` command reference is a nice touch for progressive disclosure via the CLI itself, but some sections (v3 audio tags, pronunciation rules) could benefit from being split out or more clearly signaled as reference material. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
72%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 8 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
metadata_field | 'metadata' should map string keys to string values | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 8 / 11 Passed | |
09cce3e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.