Workflow 1.5: Bridge between idea discovery and auto review. Reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. Use when user says "实现实验", "implement experiments", "bridge", "从计划到跑实验", "deploy the plan", or has an experiment plan ready to execute.
88
85%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines a specific workflow step in an experiment pipeline. It excels at providing concrete actions, explicit trigger guidance in multiple languages, and a well-defined niche that distinguishes it from other skills. The description is concise yet comprehensive, covering all essential information without unnecessary verbosity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. These are clear, actionable steps in a defined workflow. | 3 / 3 |
Completeness | Clearly answers both 'what' (reads experiment plan, implements code, deploys to GPU, collects results) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger). Both dimensions are well-covered. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms in both English and Chinese: '实现实验', 'implement experiments', 'bridge', '从计划到跑实验', 'deploy the plan', plus the contextual trigger 'experiment plan ready to execute'. Good coverage of how users would naturally phrase this request. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: it's specifically a bridge workflow (1.5) between idea discovery and auto review, targeting GPU deployment of experiment plans. The workflow numbering, specific file reference (EXPERIMENT_PLAN.md), and bilingual triggers make it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured orchestration skill with excellent workflow clarity and progressive disclosure—the phased approach with validation gates is a strength. Its main weaknesses are moderate verbosity (some display templates and backend rules could be more concise) and limited executable code examples, relying heavily on placeholder commands and checklists rather than copy-paste-ready scripts.
Suggestions
Replace placeholder commands like `/run-experiment [experiment commands]` with at least one concrete, fully executable example showing actual arguments and expected output
Tighten the backend lifecycle rules section—consider moving Vast.ai/Modal/SSH specifics to a referenced file since they add significant length for conditional content
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is quite long (~300 lines) with some sections that could be tightened—e.g., the backend lifecycle rules, the detailed checkpoint display templates, and the composing section. However, most content is genuinely novel configuration and workflow logic that Claude wouldn't know, so it's not padding with known concepts. It sits between lean and verbose. | 2 / 3 |
Actionability | The skill provides concrete workflow steps, specific file paths, and example output templates, but the actual executable code is minimal—most commands are placeholders like `/run-experiment [experiment commands]` and `spawn_agent:` pseudo-YAML. The implementation guidance in Phase 2 is a checklist rather than executable code. It's actionable at a process level but lacks copy-paste-ready code. | 2 / 3 |
Workflow Clarity | The multi-phase workflow is clearly sequenced (Parse → Implement → Code Review → Sanity → Deploy → Collect → Handoff) with explicit validation checkpoints: sanity check before full deployment, code review gate, success criteria checks, AUTO_DEPLOY checkpoint, and rescue-on-failure loops. Feedback loops for error recovery are well-defined (fix and re-run sanity, re-run code review once for blocking issues). | 3 / 3 |
Progressive Disclosure | The skill is well-structured as an overview with clear references to external files (EXPERIMENT_PLAN.md, FINAL_PROPOSAL.md, shared protocols via links). It delegates to other skills (/run-experiment, /experiment-queue, /monitor-experiment, /ablation-planner, /training-check) without inlining their content. References are one level deep and clearly signaled. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
700fbe2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.