Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.
81
78%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/monitor-experiment/SKILL.mdMonitor: $ARGUMENTS
SSH server:
ssh <server> "screen -ls"Vast.ai instance (read ssh_host, ssh_port from vast-instances.json):
ssh -p <PORT> root@<HOST> "screen -ls"Also check vast.ai instance status:
vastai show instancesModal (when gpu: modal in CLAUDE.md):
modal app list # List running/recent apps
modal app logs <app> # Stream logs from a running appModal apps auto-terminate when done — if it's not in the list, it already finished. Check results via modal volume ls <volume> or local output.
For each screen session, capture the last N lines:
ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"If hardcopy fails, check for log files or tee output.
ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"If JSON results exist, fetch and parse them:
ssh <server> "cat <results_dir>/<latest>.json"wandb: true in CLAUDE.md)Skip this step entirely if wandb is not set or is false in CLAUDE.md.
Pull training curves and metrics from Weights & Biases via Python API:
# List recent runs in the project
ssh <server> "python3 -c \"
import wandb
api = wandb.Api()
runs = api.runs('<entity>/<project>', per_page=10)
for r in runs:
print(f'{r.id} {r.state} {r.name} {r.summary.get(\"eval/loss\", \"N/A\")}')
\""
# Pull specific metrics from a run (last 50 steps)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
history = list(run.scan_history(keys=['train/loss', 'eval/loss', 'eval/ppl', 'train/lr'], page_size=50))
print(json.dumps(history[-10:], indent=2))
\""
# Pull run summary (final metrics)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
print(json.dumps(dict(run.summary), indent=2, default=str))
\""What to extract:
W&B dashboard link (include in summary for user):
https://wandb.ai/<entity>/<project>/runs/<run_id>This gives the auto-review-loop richer signal than just screen output — training dynamics, loss curves, and metric trends over time.
Present results in a comparison table:
| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline | X.XX | — | done |
| Method A | X.XX | +Y.Y | done |After results are collected, check ~/.claude/feishu.json:
experiment_done notification: results summary table, delta vs baseline"off": skip entirely (no-op)vast-instances.json). If all experiments on an instance are done, remind the user to run /vast-gpu destroy <instance_id> to stop billingdc00dfb
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.