setting-up-experiment-tracking

Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.

Install with Tessl CLI

npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill setting-up-experiment-tracking

What are skills?

Review — 37%

Does it follow best practices?

If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:

npx tessl skill review --optimize ./path/to/skill

Learn more

Validation — 13 / 16 Passed

Validation for skill structure

SKILL.md

Review

Evals

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description effectively identifies its specific domain (ML experiment tracking) and names concrete tools and actions. However, it's weakened by the vague filler phrase 'Trigger with relevant phrases based on skill purpose' which adds no value, and it misses common trigger term variations like 'W&B', 'wandb', or 'track my experiments'.

Suggestions

Remove the vague filler 'Trigger with relevant phrases based on skill purpose' and replace with specific trigger terms like 'W&B', 'wandb', 'track experiments', 'log training runs'

Expand the 'Use when' clause to include more natural user phrases such as 'track my model training', 'log experiment results', or 'compare model runs'

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'implement machine learning experiment tracking', 'configures environment', 'logging parameters, metrics, and artifacts'. Names specific tools (MLflow, Weights & Biases).	3 / 3
Completeness	Has a clear 'what' (implement ML experiment tracking with specific tools) and includes a 'Use when' clause, but the final sentence 'Trigger with relevant phrases based on skill purpose' is vague filler that doesn't add explicit trigger guidance.	2 / 3
Trigger Term Quality	Includes some natural keywords like 'setup experiment tracking', 'initialize MLflow', but missing common variations users might say like 'W&B', 'wandb', 'track experiments', 'log metrics', or 'ML logging'.	2 / 3
Distinctiveness Conflict Risk	Clear niche focused specifically on ML experiment tracking with named tools (MLflow, Weights & Biases). Unlikely to conflict with general ML skills or other tracking/logging skills due to specific domain focus.	3 / 3
	Total	10 / 12 Passed

Implementation

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a template/placeholder with no actionable content. It describes what it would do rather than providing executable code, concrete commands, or specific guidance. The content is padded with generic explanations and repeated information while lacking the actual implementation details needed to set up experiment tracking.

Suggestions

Replace the abstract 'How It Works' section with actual executable code for MLflow and W&B setup (pip install commands, initialization code, logging examples)

Remove redundant overview text and generic sections like 'Prerequisites', 'Instructions', 'Output', and 'Error Handling' that contain only placeholder content

Add concrete, copy-paste ready code examples showing parameter logging, metric logging, and artifact saving for both MLflow and W&B

Include validation steps such as verifying package installation and testing connection to tracking servers

Dimension	Reasoning	Score
Conciseness	Extremely verbose with redundant explanations (overview repeated twice), generic filler content ('This skill provides automated assistance'), and explains concepts Claude already knows like what experiment tracking is and basic tool comparisons.	1 / 3
Actionability	No executable code provided despite claiming to 'provide code snippets'. Examples describe what the skill 'will do' rather than showing actual commands or code. The Instructions section is completely generic placeholder text with no concrete guidance.	1 / 3
Workflow Clarity	The 'How It Works' section describes abstract steps without any concrete commands, validation checkpoints, or actual workflow. No feedback loops or error recovery steps for environment configuration which can easily fail.	1 / 3
Progressive Disclosure	Has section headers providing some structure, but content is monolithic with no references to external files. The organization exists but sections contain filler rather than appropriately split content.	2 / 3
	Total	5 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 13 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
metadata_version	'metadata' field is not a dictionary	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	13 / 16 Passed

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.