Debugs a failing production call, reproduces the bug with Cekura evaluators, implements a fix, verifies it, runs regression tests, then raises a PR with evidence. Use when the user wants to fix a production call bug, investigate a failing prod call, reproduce and fix a production issue, run regression tests before a PR, or says things like "fix this prod call issue", "debug and fix call ID", "test my fix against prod scenarios", "reproduce this production bug", or "regression test before raising PR".
75
92%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates a specific multi-step workflow for debugging production call issues. It provides comprehensive trigger terms covering natural user language, explicitly states both what the skill does and when to use it, and occupies a distinct niche that minimizes conflict with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions in sequence: debugs a failing production call, reproduces the bug with Cekura evaluators, implements a fix, verifies it, runs regression tests, and raises a PR with evidence. These are detailed, actionable steps. | 3 / 3 |
Completeness | Clearly answers both 'what' (debugs, reproduces, fixes, verifies, runs regression tests, raises PR) and 'when' with an explicit 'Use when...' clause listing multiple trigger scenarios and example phrases. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including 'fix this prod call issue', 'debug and fix call ID', 'reproduce this production bug', 'regression test before raising PR', plus variations like 'production call bug', 'failing prod call', and 'test my fix against prod scenarios'. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: production call debugging with Cekura evaluators, regression testing, and PR creation with evidence. The domain-specific terms like 'Cekura evaluators', 'production call', and the specific workflow make it unlikely to conflict with generic debugging or PR skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured orchestration skill that excels at workflow clarity and progressive disclosure. The strict gate-based sequential workflow with explicit validation checkpoints is exactly what's needed for a complex multi-phase debugging process. The main weakness is that actionability depends heavily on the phase files (which aren't provided), so the SKILL.md itself reads more as a workflow contract than an executable guide.
Suggestions
Consider adding a minimal concrete example in the overview showing what a typical invocation looks like end-to-end (e.g., a sample call ID, the first API call to fetch it, and what the output looks like) to improve actionability at the top level.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. Every section earns its place — the ASCII workflow diagram, the phase table, and the strictness rules all convey unique, non-obvious information. There's no explanation of concepts Claude already knows; it focuses entirely on domain-specific workflow constraints. | 3 / 3 |
Actionability | The skill provides clear structural guidance and strict rules, but the actual executable details (API endpoints, specific commands, code snippets) are deferred to phase files that aren't provided. The one concrete API call mentioned (GET /test_framework/v1/ai-agents/{id}/) is good, but the main SKILL.md itself lacks copy-paste-ready commands or code examples. | 2 / 3 |
Workflow Clarity | The 6-phase workflow is clearly sequenced with explicit gates between phases, validation checkpoints (eval must fail in Phase 2, must pass in Phase 4), feedback loops (stop and ask user if uncertain), and a clear no-push-until-Phase-5 rule. The strictness rules add robust error prevention and the sequence rationale is explicitly justified. | 3 / 3 |
Progressive Disclosure | The SKILL.md serves as a clear overview with a well-organized table linking to six phase-specific files, each one level deep. Navigation is intuitive with the phase table providing both file references and brief descriptions of what happens in each phase. Content is appropriately split between overview and detail files. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
24ad1d0
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.