Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.
38
37%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/ai-ml/experiment-tracking-setup/skills/setting-up-experiment-tracking/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description adequately identifies its niche (ML experiment tracking with MLflow/W&B) and includes explicit 'Use when' triggers, but is weakened by the meaningless filler sentence 'Trigger with relevant phrases based on skill purpose' which adds no information. It could benefit from more specific actions and additional natural trigger term variations like 'wandb', 'W&B', or 'track experiments'.
Suggestions
Remove the vague filler sentence 'Trigger with relevant phrases based on skill purpose' and replace it with additional concrete trigger terms like 'W&B', 'wandb', 'track experiments', 'log model metrics', 'run tracking'.
Add more specific concrete actions such as 'compare runs', 'visualize training curves', 'register models', or 'log model checkpoints' to better differentiate capabilities.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (ML experiment tracking) and some actions (configures environment, logging parameters/metrics/artifacts), but the actions are somewhat generic and not comprehensively listed. It mentions two specific tools (MLflow, Weights & Biases) which adds some specificity. | 2 / 3 |
Completeness | Answers both 'what' (implement ML experiment tracking, configure environment, provide code for logging) and 'when' (explicitly states 'Use when asked to setup experiment tracking or initialize MLflow'). The 'when' clause has explicit trigger phrases, though the final sentence is unhelpful filler. | 3 / 3 |
Trigger Term Quality | Includes some natural keywords like 'experiment tracking', 'MLflow', 'Weights & Biases', 'logging parameters, metrics, and artifacts', but the final sentence 'Trigger with relevant phrases based on skill purpose' is vague filler that adds no value. Missing common variations like 'W&B', 'wandb', 'model tracking', 'run logging'. | 2 / 3 |
Distinctiveness Conflict Risk | The description targets a clear niche — ML experiment tracking with specific tools (MLflow, Weights & Biases). This is unlikely to conflict with other skills as it's a specialized domain with distinct trigger terms. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a template with placeholder content that provides no actionable guidance for setting up experiment tracking. It contains no executable code, no concrete commands, no specific configuration details, and is padded with verbose sections explaining concepts Claude already understands. Nearly every section could be replaced with actual MLflow/W&B code examples and setup commands.
Suggestions
Replace the abstract 'How It Works' and 'Examples' sections with actual executable code: pip install commands, MLflow/W&B initialization code, and concrete logging examples (mlflow.log_param, wandb.log, etc.)
Remove the boilerplate sections (Overview, When to Use, Integration, Prerequisites, Instructions, Output, Error Handling, Resources) that contain no skill-specific information and waste tokens
Add concrete workflow steps with validation: e.g., 'Run `mlflow ui` and verify the server is accessible at http://localhost:5000 before proceeding'
Provide complete, copy-paste-ready code blocks for both MLflow and W&B setups, including environment variable configuration (MLFLOW_TRACKING_URI, WANDB_API_KEY) and a minimal training loop with logging
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with extensive sections explaining concepts Claude already knows (what experiment tracking is, when to use it, how it works at a high level). The 'Overview', 'How It Works', 'When to Use This Skill', 'Integration', 'Prerequisites', 'Instructions', 'Output', 'Error Handling', and 'Resources' sections are almost entirely filler that adds no actionable value. The content reads like a product brochure rather than a skill instruction. | 1 / 3 |
Actionability | Despite being about code setup, the skill contains zero executable code, no concrete commands (not even `pip install mlflow`), no actual code snippets for logging parameters/metrics/artifacts, and no specific configuration examples. The 'Examples' section describes what the skill 'will do' rather than providing the actual code. Sections like 'Output: The skill produces structured output relevant to the task' are completely vacuous. | 1 / 3 |
Workflow Clarity | The workflow steps are abstract descriptions ('Analyze Context', 'Configure Environment') with no concrete commands, no validation checkpoints, and no error recovery steps. The 'Instructions' section is generic boilerplate ('Invoke this skill when the trigger conditions are met') that provides no actual workflow guidance for setting up experiment tracking. | 1 / 3 |
Progressive Disclosure | The content is a monolithic wall of text with no references to external files, no bundle files to support it, and no meaningful structure beyond generic section headers. Content that should contain detailed code examples and configuration references is instead filled with vague descriptions. There is no layered information architecture. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
6e9558f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.