CtrlK
BlogDocsLog inGet started
Tessl Logo

monitor-experiment

Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.

77

Quality

73%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/monitor-experiment/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is functional with a clear 'Use when' clause and natural trigger terms, making it easy for Claude to know when to select it. However, the capabilities listed are somewhat generic ('monitor', 'check progress', 'collect results') and could benefit from more concrete specifics about what kind of experiments and what actions are actually performed. The distinctiveness could be improved by specifying the experiment framework or domain.

Suggestions

Add specifics about what kind of experiments (e.g., ML training runs, A/B tests, scientific simulations) and concrete actions (e.g., 'parse training logs', 'extract metrics from output files').

Improve distinctiveness by specifying the experiment framework or environment to reduce overlap with other monitoring skills.

DimensionReasoningScore

Specificity

Names the domain (experiments) and lists some actions (monitor, check progress, collect results), but the actions are somewhat generic and don't describe concrete mechanisms or specific operations like 'parse log files' or 'query experiment database'.

2 / 3

Completeness

Clearly answers both 'what' (monitor running experiments, check progress, collect results) and 'when' (explicit 'Use when' clause with specific trigger phrases like 'check results', 'is it done', 'monitor').

3 / 3

Trigger Term Quality

Includes natural trigger terms users would actually say: 'check results', 'is it done', 'monitor', and 'experiment output'. These are realistic phrases covering common variations of how users would ask about experiment status.

3 / 3

Distinctiveness Conflict Risk

The term 'experiments' provides some specificity, but 'monitor', 'check results', and 'is it done' could overlap with other monitoring-related skills (e.g., CI/CD monitoring, job queue monitoring). The description doesn't clarify what type of experiments.

2 / 3

Total

10

/

12

Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable monitoring skill with concrete commands for multiple infrastructure backends (SSH, Vast.ai, Modal, W&B). Its main weaknesses are the lack of validation/error-recovery checkpoints in the workflow and some verbosity in the W&B section that could be trimmed or split into a separate file. The conditional sections (W&B, Feishu, Modal) make the skill comprehensive but add length that could benefit from progressive disclosure.

Suggestions

Add explicit error handling/validation checkpoints — e.g., what to do when SSH connection fails, when screen sessions don't exist, or when JSON results are malformed/empty.

Extract the W&B monitoring section (Step 3.5) into a separate WANDB_MONITORING.md file and reference it with a one-line link, keeping the main skill leaner.

Remove the explanatory note 'This gives the auto-review-loop richer signal than just screen output — training dynamics, loss curves, and metric trends over time' as it explains rationale Claude doesn't need.

DimensionReasoningScore

Conciseness

The skill is mostly efficient but includes some unnecessary commentary (e.g., 'This gives the auto-review-loop richer signal than just screen output' is explanatory padding). The W&B section is quite lengthy and could be tightened. However, most content is actionable commands rather than explanation.

2 / 3

Actionability

The skill provides fully executable bash commands and Python snippets for every step — SSH commands, screen capture, JSON parsing, W&B API calls, and vastai CLI commands. Commands are copy-paste ready with clear placeholder conventions.

3 / 3

Workflow Clarity

Steps are clearly sequenced and numbered, covering multiple infrastructure types (SSH, Vast.ai, Modal). However, there are no explicit validation checkpoints or error recovery feedback loops — e.g., no guidance on what to do if SSH fails, if screen sessions are dead, or if JSON results are malformed. The 'If hardcopy fails' note is minimal.

2 / 3

Progressive Disclosure

The content is a single monolithic file with no references to external documentation. The W&B section (Step 3.5) is quite long and could be split into a separate reference file. The conditional sections (W&B, Feishu, Modal) add bulk that not all users need inline.

2 / 3

Total

9

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.