run-experiment

Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.

Quality

83%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Risky

Do not use without reviewing

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid skill description that clearly communicates its purpose and includes explicit trigger guidance with natural user phrases. The main weakness is that the 'what' portion could be more specific about the concrete actions performed (e.g., environment setup, job monitoring, log retrieval). The multilingual trigger term adds useful coverage.

Suggestions

Expand the capability list with more specific actions, e.g., 'configure environments, submit training jobs, monitor GPU utilization, retrieve logs' to improve specificity.

Dimension	Reasoning	Score
Specificity	Names the domain (ML experiments, GPU servers) and some actions (deploy, run, launch training jobs), but doesn't list comprehensive specific actions like configuring environments, monitoring jobs, managing checkpoints, etc.	2 / 3
Completeness	Clearly answers both 'what' (deploy and run ML experiments on local or remote GPU servers) and 'when' (explicit 'Use when' clause with specific trigger phrases including 'run experiment', 'deploy to server', '跑实验', and launching training jobs).	3 / 3
Trigger Term Quality	Includes strong natural trigger terms users would actually say: 'run experiment', 'deploy to server', '跑实验', 'launch training jobs'. The multilingual trigger term is a nice touch for broader coverage, and these are phrases users would naturally use.	3 / 3
Distinctiveness Conflict Risk	The combination of ML experiments, GPU servers, and training jobs creates a clear niche. The specific trigger terms like 'run experiment', 'deploy to server', and '跑实验' are distinct enough to avoid conflicts with general coding or deployment skills.	3 / 3
	Total	11 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted, highly actionable skill with clear multi-step workflows, explicit validation checkpoints, and concrete executable commands for multiple deployment targets. Its main weakness is length — the skill tries to cover five deployment targets (local CUDA, local MPS, remote SSH, Vast.ai, Modal) plus W&B integration and Feishu notifications all in one file, which makes it longer than ideal. Some content could be trimmed or split into referenced sub-files for better progressive disclosure.

Suggestions

Split deployment target details (Vast.ai lifecycle, Modal configuration, W&B integration) into separate referenced files to reduce the main SKILL.md to a concise overview with pointers.

Trim the AGENTS.md example block — it's useful but could be shortened to just one or two target examples with a note that other targets follow the same pattern.

Dimension	Reasoning	Score
Conciseness	The skill is fairly long (~200 lines) and includes some sections that could be tightened (e.g., the AGENTS.md example block is extensive, the W&B integration step explains metrics Claude would know to log). However, most content is actionable commands rather than explanatory prose, so it's not egregiously verbose.	2 / 3
Actionability	The skill provides fully executable bash and Python commands throughout — SSH commands, rsync patterns, screen session creation, nvidia-smi queries, wandb integration code, and Modal deployment commands are all copy-paste ready with clear placeholder conventions.	3 / 3
Workflow Clarity	The workflow is clearly sequenced (Steps 1-7) with explicit validation checkpoints: GPU availability check before assignment, launch verification via `screen -ls`, artifact copy verification before Vast.ai destruction, and explicit error-handling guidance (e.g., 'If any artifact copy fails, do not destroy the instance'). Feedback loops are present for risky operations.	3 / 3
Progressive Disclosure	The content is well-structured with clear headers and conditional sections (e.g., 'Remote Only', 'when wandb: true'), but everything is in a single monolithic file with no references to supporting documents. The W&B integration details, Vast.ai lifecycle management, and Modal configuration could be split into separate reference files to keep the main skill leaner.	2 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: a425a71

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.