Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.
48
37%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/ai-ml/experiment-tracking-setup/skills/setting-up-experiment-tracking/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description adequately identifies its niche (ML experiment tracking with MLflow/W&B) and includes explicit 'Use when' triggers, making it functional for skill selection. However, it is weakened by the meaningless filler sentence 'Trigger with relevant phrases based on skill purpose' which adds no value, and it could benefit from more natural trigger term variations. The specificity of actions could also be improved with more concrete examples of what the skill produces.
Suggestions
Remove the vague filler sentence 'Trigger with relevant phrases based on skill purpose' and replace it with additional concrete trigger terms like 'W&B', 'wandb', 'track experiments', 'log model runs', 'experiment logging'.
Add more specific actions to the description, such as 'compare runs', 'visualize training metrics', 'register models', or 'set up run dashboards' to better differentiate capabilities.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (ML experiment tracking) and some actions (configures environment, logging parameters/metrics/artifacts), but the actions are somewhat generic and not comprehensively listed. It mentions MLflow and Weights & Biases which adds specificity. | 2 / 3 |
Completeness | Clearly answers both 'what' (implement ML experiment tracking, configure environment, log parameters/metrics/artifacts) and 'when' (explicit 'Use when' clause with trigger phrases like 'setup experiment tracking' and 'initialize MLflow'). | 3 / 3 |
Trigger Term Quality | Includes some natural keywords like 'experiment tracking', 'MLflow', 'logging parameters, metrics, and artifacts', but misses common variations like 'W&B', 'wandb', 'track experiments', 'log runs', 'model tracking'. The final sentence 'Trigger with relevant phrases based on skill purpose' is meaningless filler that adds no trigger value. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on MLflow and Weights & Biases for experiment tracking is a clear niche that is unlikely to conflict with other skills. The specific tool names and domain make it distinctly identifiable. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is almost entirely generic boilerplate with no actionable content. It describes what it would do in abstract terms but never provides actual code for MLflow or W&B setup, logging, or configuration. The majority of sections (Prerequisites, Instructions, Output, Error Handling, Resources) contain placeholder text that adds no value.
Suggestions
Replace the abstract 'How It Works' and 'Examples' sections with actual executable Python code snippets for both MLflow and W&B setup, including `pip install` commands, initialization code, and parameter/metric/artifact logging examples.
Remove all generic filler sections (Prerequisites, Instructions, Output, Error Handling, Resources, Integration, When to Use) that contain no skill-specific information.
Add concrete environment setup commands (e.g., `mlflow server --backend-store-uri sqlite:///mlflow.db --port 5000`) and validation steps (e.g., 'Verify server is running: `curl http://localhost:5000/api/2.0/mlflow/experiments/list`').
Provide complete, copy-paste-ready code blocks showing a minimal training loop with experiment tracking for both MLflow and W&B, rather than describing what code would be generated.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with sections like 'How It Works', 'When to Use This Skill', 'Integration', 'Prerequisites', 'Instructions', 'Output', 'Error Handling', and 'Resources' that are all generic filler. The content explains things Claude already knows ('This skill activates when...') and pads extensively without providing any actual executable value. | 1 / 3 |
Actionability | Despite being about code setup, there is zero executable code anywhere in the skill. The 'Examples' section describes what the skill 'will do' abstractly rather than providing actual code snippets for MLflow or W&B initialization, logging parameters, metrics, or artifacts. The 'Instructions' section is completely generic ('Invoke this skill when the trigger conditions are met'). | 1 / 3 |
Workflow Clarity | The workflow steps are abstract descriptions ('Analyze Context', 'Configure Environment') with no concrete commands, no validation checkpoints, and no error recovery loops. The 'Instructions' section is four generic bullet points that could apply to literally any skill. | 1 / 3 |
Progressive Disclosure | No bundle files exist, yet the skill has no concrete content to organize in the first place. The 'Resources' section references vague 'Project documentation' and 'Related skills and commands' with no actual links. The content is a monolithic wall of generic text with no meaningful structure or navigation. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
3a2d27d
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.