Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.
Install with Tessl CLI
npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill setting-up-experiment-tracking43
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description effectively identifies its specific domain (ML experiment tracking) and names concrete tools and actions. However, it's weakened by the vague filler phrase 'Trigger with relevant phrases based on skill purpose' which adds no value, and it misses common trigger term variations like 'W&B', 'wandb', or 'track my experiments'.
Suggestions
Remove the vague filler 'Trigger with relevant phrases based on skill purpose' and replace with specific trigger terms like 'W&B', 'wandb', 'track experiments', 'log training runs'
Expand the 'Use when' clause to include more natural user phrases such as 'track my model training', 'log experiment results', or 'compare model runs'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'implement machine learning experiment tracking', 'configures environment', 'logging parameters, metrics, and artifacts'. Names specific tools (MLflow, Weights & Biases). | 3 / 3 |
Completeness | Has a clear 'what' (implement ML experiment tracking with specific tools) and includes a 'Use when' clause, but the final sentence 'Trigger with relevant phrases based on skill purpose' is vague filler that doesn't add explicit trigger guidance. | 2 / 3 |
Trigger Term Quality | Includes some natural keywords like 'setup experiment tracking', 'initialize MLflow', but missing common variations users might say like 'W&B', 'wandb', 'track experiments', 'log metrics', or 'ML logging'. | 2 / 3 |
Distinctiveness Conflict Risk | Clear niche focused specifically on ML experiment tracking with named tools (MLflow, Weights & Biases). Unlikely to conflict with general ML skills or other tracking/logging skills due to specific domain focus. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
7%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a template/placeholder with no actionable content. It describes what it would do rather than providing executable code, concrete commands, or specific guidance. The content is padded with generic explanations and repeated information while lacking the actual implementation details needed to set up experiment tracking.
Suggestions
Replace the abstract 'How It Works' section with actual executable code for MLflow and W&B setup (pip install commands, initialization code, logging examples)
Remove redundant overview text and generic sections like 'Prerequisites', 'Instructions', 'Output', and 'Error Handling' that contain only placeholder content
Add concrete, copy-paste ready code examples showing parameter logging, metric logging, and artifact saving for both MLflow and W&B
Include validation steps such as verifying package installation and testing connection to tracking servers
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with redundant explanations (overview repeated twice), generic filler content ('This skill provides automated assistance'), and explains concepts Claude already knows like what experiment tracking is and basic tool comparisons. | 1 / 3 |
Actionability | No executable code provided despite claiming to 'provide code snippets'. Examples describe what the skill 'will do' rather than showing actual commands or code. The Instructions section is completely generic placeholder text with no concrete guidance. | 1 / 3 |
Workflow Clarity | The 'How It Works' section describes abstract steps without any concrete commands, validation checkpoints, or actual workflow. No feedback loops or error recovery steps for environment configuration which can easily fail. | 1 / 3 |
Progressive Disclosure | Has section headers providing some structure, but content is monolithic with no references to external files. The organization exists but sections contain filler rather than appropriately split content. | 2 / 3 |
Total | 5 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 13 / 16 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
metadata_version | 'metadata' field is not a dictionary | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 13 / 16 Passed | |
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.