experiment-bridge

Workflow 1.5: Bridge between idea discovery and auto review. Reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. Use when user says "实现实验", "implement experiments", "bridge", "从计划到跑实验", "deploy the plan", or has an experiment plan ready to execute.

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines a specific workflow step with concrete actions, explicit trigger terms in multiple languages, and a well-defined niche. It effectively communicates both what the skill does and when it should be selected, with minimal risk of conflicting with other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. These are clear, actionable steps in a defined workflow.	3 / 3
Completeness	Clearly answers both 'what' (reads experiment plan, implements code, deploys to GPU, collects results) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger). Both dimensions are well-covered.	3 / 3
Trigger Term Quality	Includes strong natural trigger terms in both English and Chinese: '实现实验', 'implement experiments', 'bridge', '从计划到跑实验', 'deploy the plan', plus the contextual trigger 'experiment plan ready to execute'. Good coverage of how users would naturally phrase this request.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche: it's specifically a bridge workflow (1.5) between idea discovery and auto review, targeting GPU deployment of experiment plans. The specific file reference (EXPERIMENT_PLAN.md) and workflow numbering make it very unlikely to conflict with other skills.	3 / 3
	Total	12 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured orchestration skill that clearly sequences a complex multi-phase workflow with excellent validation checkpoints and error recovery paths. Its main weakness is length—at 250+ lines it pushes the boundary of conciseness, with some sections (backend lifecycle rules, multiple output templates) that could be extracted to reference files. The actionability is excellent with concrete commands, file paths, templates, and checklists throughout.

Suggestions

Consider extracting the backend lifecycle rules (Vast.ai, Modal, Local/SSH) into a shared reference file to reduce the main skill's length.

The result summary template, compact log template, and handoff template share overlapping information—consider consolidating or referencing a single output format guide.

Dimension	Reasoning	Score
Conciseness	The skill is quite long (~250+ lines) with some sections that could be tightened (e.g., the Constants section explains each flag verbosely, the ASCII workflow diagram is helpful but adds bulk). However, most content is genuinely instructive and not explaining things Claude already knows—it's domain-specific workflow orchestration. Some redundancy exists between the checkpoint display, the results template, and the handoff template.	2 / 3
Actionability	The skill provides concrete, executable guidance throughout: specific file paths to read, exact commands to run (/run-experiment, /experiment-queue, /monitor-experiment), detailed code review prompts with spawn_agent syntax, structured markdown templates for output, and explicit checklists for self-review. The experiment result tables and log formats are copy-paste ready.	3 / 3
Workflow Clarity	The workflow is exceptionally well-sequenced across 6 phases with clear milestone ordering (sanity → baseline → main → ablation → polish). Validation checkpoints are explicit: Phase 2 has a self-review checklist, Phase 2.5 adds cross-model code review with blocking/non-blocking issue classification, Phase 3 is a dedicated sanity check with failure recovery (including rescue diagnosis), and Phase 4 has an AUTO_DEPLOY checkpoint. Feedback loops are present for error recovery throughout.	3 / 3
Progressive Disclosure	The skill references shared protocols (output-versioning.md, output-manifest.md, output-language.md) and other skills (/run-experiment, /experiment-queue, /ablation-planner, etc.) appropriately. However, no bundle files are provided to verify these references exist, and the skill itself is quite long—some sections like the detailed result templates and backend lifecycle rules could potentially be split into reference files. The composing section at the end is well-organized.	2 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: a425a71

Reviewed: about 24 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.