CtrlK
BlogDocsLog inGet started
Tessl Logo

run-experiment

Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.

86

Quality

83%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid skill description that clearly communicates its purpose and provides explicit trigger guidance including multilingual terms. Its main weakness is that the capability listing could be more specific about what concrete actions the skill performs beyond 'deploy and run'. The trigger terms and completeness are strong points.

Suggestions

Add more specific concrete actions to improve specificity, e.g., 'configure training environments, monitor job progress, manage GPU resources, sync datasets'

DimensionReasoningScore

Specificity

Names the domain (ML experiments, GPU servers) and some actions (deploy, run, launch training jobs), but doesn't list comprehensive specific actions like configuring environments, monitoring jobs, managing checkpoints, etc.

2 / 3

Completeness

Clearly answers both 'what' (deploy and run ML experiments on local or remote GPU servers) and 'when' (explicit 'Use when' clause with specific trigger phrases including 'run experiment', 'deploy to server', '跑实验', or launching training jobs).

3 / 3

Trigger Term Quality

Includes natural trigger terms users would actually say: 'run experiment', 'deploy to server', 'launch training jobs', and even a Chinese variant '跑实验'. Good coverage of how users naturally phrase these requests.

3 / 3

Distinctiveness Conflict Risk

The combination of ML experiments, GPU servers, and training jobs creates a clear niche. The specific trigger terms like 'run experiment' and 'deploy to server' in the ML context are distinct and unlikely to conflict with general deployment or general ML skills.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted, actionable skill with clear multi-step workflow, concrete commands, and good validation checkpoints. Its main weakness is length — the W&B integration details and AGENTS.md example inflate the token cost and could benefit from being split into referenced files. The conditional logic (remote vs local, rsync vs git, wandb on/off) is handled clearly.

Suggestions

Move the W&B integration details (Step 3.5) into a separate WANDB.md file and reference it with a one-line link, reducing the main skill's token footprint.

Move the AGENTS.md example to a separate SETUP.md or AGENTS_EXAMPLE.md file — it's reference material, not operational guidance.

DimensionReasoningScore

Conciseness

The skill is mostly efficient but includes some unnecessary content like the full AGENTS.md example at the bottom and verbose W&B integration details that could be split into a separate file. The W&B metrics list is somewhat padded. However, most instructions are direct and actionable.

2 / 3

Actionability

Every step includes concrete, executable bash commands and Python code snippets with clear placeholders. The rsync, SSH, screen, nvidia-smi, and wandb commands are all copy-paste ready with appropriate parameterization.

3 / 3

Workflow Clarity

The workflow is clearly sequenced (Steps 1-6) with explicit validation checkpoints: GPU availability check before deployment (Step 2), screen session verification after launch (Step 5), and conditional steps clearly gated on AGENTS.md configuration. The W&B step includes a check-before-modify pattern.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and conditional sections, but it's a long monolithic file. The W&B integration section and the AGENTS.md example could be split into separate reference files. No external file references are used despite the content length warranting them.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.