Workflow 1.5: Bridge between idea discovery and auto review. Reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. Use when user says "实现实验", "implement experiments", "bridge", "从计划到跑实验", "deploy the plan", or has an experiment plan ready to execute.
72
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines a specific workflow step with concrete actions, explicit trigger terms in multiple languages, and a well-defined niche. It effectively communicates both what the skill does and when it should be selected, with minimal risk of conflicting with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. These are clear, actionable steps in a defined workflow. | 3 / 3 |
Completeness | Clearly answers both 'what' (reads experiment plan, implements code, deploys to GPU, collects results) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger). Both dimensions are well-covered. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms in both English and Chinese: '实现实验', 'implement experiments', 'bridge', '从计划到跑实验', 'deploy the plan', plus the contextual trigger 'experiment plan ready to execute'. Good coverage of how users would naturally phrase this request. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: it's specifically a bridge workflow (1.5) between idea discovery and auto review, targeting GPU deployment of experiment plans. The specific file reference (EXPERIMENT_PLAN.md) and workflow numbering make it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured orchestration skill that clearly sequences a complex multi-phase workflow with excellent validation checkpoints and error recovery paths. Its main weakness is length—at 250+ lines it pushes the boundary of conciseness, with some sections (backend lifecycle rules, multiple output templates) that could be extracted to reference files. The actionability is excellent with concrete commands, file paths, templates, and checklists throughout.
Suggestions
Consider extracting the backend lifecycle rules (Vast.ai, Modal, Local/SSH) into a shared reference file to reduce the main skill's length.
The result summary template, compact log template, and handoff template share overlapping information—consider consolidating or referencing a single output format guide.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is quite long (~250+ lines) with some sections that could be tightened (e.g., the Constants section explains each flag verbosely, the ASCII workflow diagram is helpful but adds bulk). However, most content is genuinely instructive and not explaining things Claude already knows—it's domain-specific workflow orchestration. Some redundancy exists between the checkpoint display, the results template, and the handoff template. | 2 / 3 |
Actionability | The skill provides concrete, executable guidance throughout: specific file paths to read, exact commands to run (/run-experiment, /experiment-queue, /monitor-experiment), detailed code review prompts with spawn_agent syntax, structured markdown templates for output, and explicit checklists for self-review. The experiment result tables and log formats are copy-paste ready. | 3 / 3 |
Workflow Clarity | The workflow is exceptionally well-sequenced across 6 phases with clear milestone ordering (sanity → baseline → main → ablation → polish). Validation checkpoints are explicit: Phase 2 has a self-review checklist, Phase 2.5 adds cross-model code review with blocking/non-blocking issue classification, Phase 3 is a dedicated sanity check with failure recovery (including rescue diagnosis), and Phase 4 has an AUTO_DEPLOY checkpoint. Feedback loops are present for error recovery throughout. | 3 / 3 |
Progressive Disclosure | The skill references shared protocols (output-versioning.md, output-manifest.md, output-language.md) and other skills (/run-experiment, /experiment-queue, /ablation-planner, etc.) appropriately. However, no bundle files are provided to verify these references exist, and the skill itself is quite long—some sections like the detailed result templates and backend lifecycle rules could potentially be split into reference files. The composing section at the end is well-organized. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
a425a71
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.