CtrlK
BlogDocsLog inGet started
Tessl Logo

run-experiment

Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.

86

Quality

83%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid description that clearly communicates when to use the skill with explicit trigger terms including multilingual support. Its main weakness is that the 'what' portion could be more specific about the concrete actions performed beyond just 'deploy and run'. Overall it performs well for skill selection purposes.

Suggestions

Expand the capability list with more specific actions, e.g., 'configure training environments, monitor job progress, manage GPU resources, sync datasets' to improve specificity.

DimensionReasoningScore

Specificity

Names the domain (ML experiments, GPU servers) and some actions (deploy, run, launch training jobs), but doesn't list comprehensive specific actions like configuring environments, monitoring jobs, managing checkpoints, etc.

2 / 3

Completeness

Clearly answers both 'what' (deploy and run ML experiments on local or remote GPU servers) and 'when' (explicit 'Use when' clause with specific trigger phrases and scenarios).

3 / 3

Trigger Term Quality

Includes natural trigger terms users would actually say: 'run experiment', 'deploy to server', 'launch training jobs', and even a Chinese equivalent '跑实验'. Good coverage of natural language variations.

3 / 3

Distinctiveness Conflict Risk

The combination of ML experiments, GPU servers, and training jobs creates a clear niche. The specific trigger terms like 'run experiment' and 'deploy to server' in the ML context are distinct and unlikely to conflict with general deployment or coding skills.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted, highly actionable skill with excellent workflow clarity and validation checkpoints for a complex multi-environment deployment process. Its main weakness is length — at ~200 lines covering five deployment targets (local, remote, Vast.ai, Modal, plus W&B and Feishu integrations), it would benefit from splitting advanced/optional sections into referenced files. The commands are executable and the conditional logic is clearly signaled.

Suggestions

Extract Vast.ai, Modal, and W&B integration sections into separate referenced files (e.g., VASTAI.md, MODAL.md, WANDB.md) to reduce the main skill's token footprint and improve progressive disclosure.

Remove the 'Benefits:' explanation under the git sync option and the W&B setup note at the bottom — Claude doesn't need to be told why git is useful or how W&B dashboards work.

DimensionReasoningScore

Conciseness

The skill is fairly long but most content is actionable. However, there's some redundancy (e.g., the AGENTS.md example at the bottom partially repeats information already covered in the workflow steps), and some explanatory text like 'Benefits: version-tracked, multi-server sync with one push' is unnecessary for Claude.

2 / 3

Actionability

The skill provides fully executable bash and Python commands throughout — SSH commands, rsync, screen session creation, nvidia-smi queries, wandb integration code, and Modal deployment commands are all copy-paste ready with clear placeholder conventions.

3 / 3

Workflow Clarity

The workflow is clearly sequenced (Steps 1-7) with explicit validation checkpoints: GPU availability check before assignment, launch verification via 'screen -ls', artifact copy verification before Vast.ai destruction, and explicit error handling instructions (e.g., 'If any artifact copy fails, do not destroy the instance'). Feedback loops are present for risky operations.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and conditional sections (e.g., W&B step is skippable), but the entire skill is monolithic — the Vast.ai lifecycle management, Modal deployment, W&B integration, and Feishu notification could each be separate referenced files. The AGENTS.md example section at the bottom adds significant length that could be a separate reference.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.