Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.
86
83%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Risky
Do not use without reviewing
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid skill description that clearly communicates its purpose and provides explicit trigger guidance including multilingual terms. Its main weakness is that the capability listing could be more specific about what concrete actions the skill performs beyond 'deploy and run'. The trigger terms and completeness are strong points.
Suggestions
Add more specific concrete actions to improve specificity, e.g., 'configure training environments, monitor job progress, manage GPU resources, sync datasets'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (ML experiments, GPU servers) and some actions (deploy, run, launch training jobs), but doesn't list comprehensive specific actions like configuring environments, monitoring jobs, managing checkpoints, etc. | 2 / 3 |
Completeness | Clearly answers both 'what' (deploy and run ML experiments on local or remote GPU servers) and 'when' (explicit 'Use when' clause with specific trigger phrases including 'run experiment', 'deploy to server', '跑实验', or launching training jobs). | 3 / 3 |
Trigger Term Quality | Includes natural trigger terms users would actually say: 'run experiment', 'deploy to server', 'launch training jobs', and even a Chinese variant '跑实验'. Good coverage of how users naturally phrase these requests. | 3 / 3 |
Distinctiveness Conflict Risk | The combination of ML experiments, GPU servers, and training jobs creates a clear niche. The specific trigger terms like 'run experiment' and 'deploy to server' in the ML context are distinct and unlikely to conflict with general deployment or general ML skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted, actionable skill with clear multi-step workflow, concrete commands, and good validation checkpoints. Its main weakness is length — the W&B integration details and AGENTS.md example inflate the token cost and could benefit from being split into referenced files. The conditional logic (remote vs local, rsync vs git, wandb on/off) is handled clearly.
Suggestions
Move the W&B integration details (Step 3.5) into a separate WANDB.md file and reference it with a one-line link, reducing the main skill's token footprint.
Move the AGENTS.md example to a separate SETUP.md or AGENTS_EXAMPLE.md file — it's reference material, not operational guidance.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient but includes some unnecessary content like the full AGENTS.md example at the bottom and verbose W&B integration details that could be split into a separate file. The W&B metrics list is somewhat padded. However, most instructions are direct and actionable. | 2 / 3 |
Actionability | Every step includes concrete, executable bash commands and Python code snippets with clear placeholders. The rsync, SSH, screen, nvidia-smi, and wandb commands are all copy-paste ready with appropriate parameterization. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced (Steps 1-6) with explicit validation checkpoints: GPU availability check before deployment (Step 2), screen session verification after launch (Step 5), and conditional steps clearly gated on AGENTS.md configuration. The W&B step includes a check-before-modify pattern. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear headers and conditional sections, but it's a long monolithic file. The W&B integration section and the AGENTS.md example could be split into separate reference files. No external file references are used despite the content length warranting them. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
dc00dfb
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.