monitor-experiment

Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The content is highly actionable with a clear, checkpointed workflow and clean section organization. Its main weakness is conciseness: explanatory commentary and large inline code blocks could be trimmed or split out.

Suggestions

Move the large inline W&B Python snippets and the Modal/Vast/Feishu branches into reference files (e.g. references/wandb.md), keeping only the core screen/SSH workflow in SKILL.md.

Trim the opening cadence blockquote and the "What to extract" prose to essentials, since Claude can infer what training-loss/eval-metric signals mean.

Fix or remove the dangling reference to ../shared-references/external-cadence.md, which does not resolve to a real file.

Dimension	Reasoning	Score
Conciseness	The body is mostly lean executable commands, but the opening cadence blockquote, the verbose "What to extract" commentary, and the large inline W&B Python blocks add tokens Claude could do without or that could be referenced out.	2 / 3
Actionability	Provides copy-paste-ready commands (ssh screen -ls, hardcopy, vastai show instances, modal app logs) and complete W&B API Python snippets, matching the fully-executable anchor.	3 / 3
Workflow Clarity	A clear numbered Steps 1-6 sequence with conditional checkpoints ("If hardcopy fails...", "If JSON results exist...") and error-handling notes; monitoring is non-destructive so no destructive-loop cap applies.	3 / 3
Progressive Disclosure	No bundle files exist; the self-contained body is organized into well-labeled sections with one-level-deep signaled references, satisfying the simple-skill note. A broken external link (shared-references/external-cadence.md) is a minor defect not captured by the bundle-structure anchors.	3 / 3
	Total	11 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is concise, action-oriented, and gives explicit natural-language triggers covering both what the skill does and when to invoke it. It hits the top anchor on every dimension.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions ("Monitor running experiments, check progress, collect results"), matching the multiple-specific-actions anchor rather than the single-domain anchor at 2.	3 / 3
Completeness	Explicitly answers both what (monitor/check/collect) and when via an explicit "Use when user says..." trigger clause, satisfying the full what-and-when anchor.	3 / 3
Trigger Term Quality	Provides several natural user phrasings ("check results", "is it done", "monitor", "wants experiment output") that a user would actually say, giving good coverage rather than a lone keyword.	3 / 3
Distinctiveness Conflict Risk	The experiment-monitoring niche with its distinct trigger phrasings is unlikely to fire for unrelated skills; it does not fall to the overlap-prone 2 anchor.	3 / 3
	Total	12 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 13 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
relative_links	Relative link issues: 1 suspicious	Warning

	Total	13 / 16 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: fe5963c

Reviewed: about 17 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.