CtrlK
BlogDocsLog inGet started
Tessl Logo

monitor-experiment

Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.

68

Quality

84%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Discovery

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-constructed skill description that clearly states concrete capabilities and provides explicit trigger terms. The 'Use when...' clause with quoted user phrases is effective for skill selection. The main weakness is that 'monitor' and 'check results' could potentially overlap with other monitoring-related skills, and the description could benefit from slightly more specificity about what kind of experiments are being monitored.

DimensionReasoningScore

Specificity

Lists multiple concrete actions: 'Monitor running experiments', 'check progress', 'collect results'. These are specific, actionable capabilities.

3 / 3

Completeness

Clearly answers both what ('Monitor running experiments, check progress, collect results') and when ('Use when user says "check results", "is it done", "monitor", or wants experiment output') with explicit triggers.

3 / 3

Trigger Term Quality

Includes natural trigger terms users would actually say: 'check results', 'is it done', 'monitor', and 'experiment output'. These cover common phrasings well.

3 / 3

Distinctiveness Conflict Risk

The term 'monitor' is somewhat generic and could overlap with system monitoring or other monitoring skills. However, the experiment-specific context ('running experiments', 'experiment output') helps narrow it. Could be more distinctive by specifying the type of experiments or platform.

2 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable skill with clear workflow sequencing and executable commands covering multiple deployment targets (SSH, Vast.ai, Modal). Its main weakness is verbosity in the W&B section and some unnecessary explanatory text that doesn't add value for Claude. The skill would benefit from tightening commentary and potentially splitting the W&B monitoring into a referenced sub-file.

Suggestions

Remove explanatory commentary like 'This gives the auto-review-loop richer signal...' and the 'What to extract' bullet descriptions — Claude can infer what metrics matter from the code.

Consider extracting the W&B monitoring section into a separate WANDB_MONITORING.md file referenced from the main skill, reducing the main file's token footprint.

DimensionReasoningScore

Conciseness

The skill is mostly efficient but includes some unnecessary commentary (e.g., 'This gives the auto-review-loop richer signal than just screen output' is explanatory rather than instructional). The W&B section is quite lengthy with inline Python scripts that could be more condensed. The 'What to extract' bullet list explains concepts Claude already understands.

2 / 3

Actionability

The skill provides fully executable bash commands and Python code snippets for every step. Commands are copy-paste ready with clear placeholders (<server>, <PORT>, <HOST>, etc.), and concrete examples are given for SSH, screen capture, JSON parsing, W&B API calls, and vastai CLI usage.

3 / 3

Workflow Clarity

The workflow is clearly sequenced with numbered steps from checking running processes through collecting output, parsing results, summarizing, interpreting, and notifying. It includes validation guidance (Step 5 flags unexpected results, checking logs for errors) and conditional logic (skip W&B if not configured, skip Feishu if absent). The 'if hardcopy fails' fallback and 'if results look wrong, check training logs' provide error recovery paths.

3 / 3

Progressive Disclosure

The content is reasonably well-structured with clear section headers, but it's a fairly long monolithic document with no references to external files. The W&B section in particular is quite detailed and could be split into a separate reference file. However, for a skill of this complexity, the inline approach is borderline acceptable.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.