QA cycling workflow - test, verify, fix, repeat until goal met
40
38%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/ultraqa/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is too vague and generic to be effective for skill selection. It lacks concrete actions, specific technologies or domains, and has no explicit 'Use when...' clause. The cycling workflow concept is not differentiated enough from general debugging or testing skills.
Suggestions
Add a 'Use when...' clause specifying explicit triggers, e.g., 'Use when the user asks to iteratively run tests, diagnose failures, and fix code until all tests pass.'
Specify concrete actions and scope, e.g., 'Runs test suites, analyzes failure output, applies targeted code fixes, and re-runs tests in a loop until all assertions pass or a defined quality threshold is met.'
Include natural trigger terms users would say, such as 'run tests', 'fix failing tests', 'test loop', 'all tests passing', 'green build', 'test-driven fixing'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague, abstract language like 'test, verify, fix, repeat' without specifying what is being tested, what kinds of fixes, or what tools/technologies are involved. No concrete actions are listed. | 1 / 3 |
Completeness | The 'what' is only vaguely described as a cycling workflow of test/verify/fix, and there is no explicit 'when' clause or trigger guidance. The description fails to answer either question clearly. | 1 / 3 |
Trigger Term Quality | Contains some relevant keywords like 'QA', 'test', 'verify', 'fix' that users might naturally say, but lacks common variations such as 'quality assurance', 'debugging', 'automated testing', 'test suite', 'regression', or specific testing frameworks. | 2 / 3 |
Distinctiveness Conflict Risk | The description is extremely generic - 'test, verify, fix, repeat until goal met' could apply to virtually any debugging, QA, CI/CD, or iterative development workflow, creating high conflict risk with many other skills. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill defines a clear QA cycling workflow with well-structured exit conditions and observability, earning strong marks for workflow clarity. However, it loses points on actionability because the core commands are abstract templates rather than executable code, and on conciseness due to sections like the Ralph/Team/Ultragoal relationship paragraph and parallel session caveats that add cognitive overhead. The skill would benefit from concrete command discovery patterns and better separation of advanced topics into referenced files.
Suggestions
Make the QA execution step more actionable by showing how to discover and run project-specific commands (e.g., checking package.json scripts, Makefile targets, or common conventions) rather than just saying 'Run the project's test command'.
Move the 'Parallel session caveats' and 'Relationship to /goal, Ralph, Team, and Ultragoal' sections into a referenced file (e.g., REFERENCE.md or INTEGRATION.md) to reduce the main skill's token footprint.
Provide a concrete, end-to-end example of a single complete cycle (with actual shell commands and real output) rather than only template Task() calls and illustrative log output.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is moderately efficient but includes some unnecessary sections (e.g., the 'Relationship to /goal, Ralph, Team, and Ultragoal' paragraph is dense and context-heavy, the parallel session caveats section adds bulk). Some explanatory text could be trimmed, but overall the tables and code blocks are reasonably tight. | 2 / 3 |
Actionability | The skill provides structured Task() invocations and command patterns, but these are pseudocode/template-style rather than fully executable. The actual project commands (test, build, lint) are never specified concretely—it says 'Run the project's test command' without showing how to discover or execute it. The state tracking JSON and observability output are illustrative but not copy-paste ready. | 2 / 3 |
Workflow Clarity | The cycle workflow is clearly sequenced (run → check → diagnose → fix → repeat) with explicit exit conditions including max cycles, repeated failure detection, and environment errors. The exit conditions table and observability output format provide strong validation checkpoints and feedback loops for error recovery. | 3 / 3 |
Progressive Disclosure | The skill references docs/REFERENCE.md for workspace resolution but no bundle files are provided to support it. The content is somewhat monolithic—the parallel session caveats, state tracking details, and cancellation info could be split into separate references. However, the use of tables and clear section headers provides reasonable internal organization. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
3e94567
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.