CtrlK
BlogDocsLog inGet started
Tessl Logo

setting-up-experiment-tracking

Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.

48

Quality

37%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/ai-ml/experiment-tracking-setup/skills/setting-up-experiment-tracking/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description adequately identifies its niche (ML experiment tracking with MLflow/W&B) and includes explicit 'Use when' triggers, making it functional for skill selection. However, it is weakened by the meaningless filler sentence 'Trigger with relevant phrases based on skill purpose' which adds no value, and it could benefit from more natural trigger term variations. The specificity of actions could also be improved with more concrete examples of what the skill produces.

Suggestions

Remove the vague filler sentence 'Trigger with relevant phrases based on skill purpose' and replace it with additional concrete trigger terms like 'W&B', 'wandb', 'track experiments', 'log model runs', 'experiment logging'.

Add more specific actions to the description, such as 'compare runs', 'visualize training metrics', 'register models', or 'set up run dashboards' to better differentiate capabilities.

DimensionReasoningScore

Specificity

Names the domain (ML experiment tracking) and some actions (configures environment, logging parameters/metrics/artifacts), but the actions are somewhat generic and not comprehensively listed. It mentions MLflow and Weights & Biases which adds specificity.

2 / 3

Completeness

Clearly answers both 'what' (implement ML experiment tracking, configure environment, log parameters/metrics/artifacts) and 'when' (explicit 'Use when' clause with trigger phrases like 'setup experiment tracking' and 'initialize MLflow').

3 / 3

Trigger Term Quality

Includes some natural keywords like 'experiment tracking', 'MLflow', 'logging parameters, metrics, and artifacts', but misses common variations like 'W&B', 'wandb', 'track experiments', 'log runs', 'model tracking'. The final sentence 'Trigger with relevant phrases based on skill purpose' is meaningless filler that adds no trigger value.

2 / 3

Distinctiveness Conflict Risk

The focus on MLflow and Weights & Biases for experiment tracking is a clear niche that is unlikely to conflict with other skills. The specific tool names and domain make it distinctly identifiable.

3 / 3

Total

10

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is almost entirely generic boilerplate with no actionable content. It describes what it would do in abstract terms but never provides actual code for MLflow or W&B setup, logging, or configuration. The majority of sections (Prerequisites, Instructions, Output, Error Handling, Resources) contain placeholder text that adds no value.

Suggestions

Replace the abstract 'How It Works' and 'Examples' sections with actual executable Python code snippets for both MLflow and W&B setup, including `pip install` commands, initialization code, and parameter/metric/artifact logging examples.

Remove all generic filler sections (Prerequisites, Instructions, Output, Error Handling, Resources, Integration, When to Use) that contain no skill-specific information.

Add concrete environment setup commands (e.g., `mlflow server --backend-store-uri sqlite:///mlflow.db --port 5000`) and validation steps (e.g., 'Verify server is running: `curl http://localhost:5000/api/2.0/mlflow/experiments/list`').

Provide complete, copy-paste-ready code blocks showing a minimal training loop with experiment tracking for both MLflow and W&B, rather than describing what code would be generated.

DimensionReasoningScore

Conciseness

Extremely verbose with sections like 'How It Works', 'When to Use This Skill', 'Integration', 'Prerequisites', 'Instructions', 'Output', 'Error Handling', and 'Resources' that are all generic filler. The content explains things Claude already knows ('This skill activates when...') and pads extensively without providing any actual executable value.

1 / 3

Actionability

Despite being about code setup, there is zero executable code anywhere in the skill. The 'Examples' section describes what the skill 'will do' abstractly rather than providing actual code snippets for MLflow or W&B initialization, logging parameters, metrics, or artifacts. The 'Instructions' section is completely generic ('Invoke this skill when the trigger conditions are met').

1 / 3

Workflow Clarity

The workflow steps are abstract descriptions ('Analyze Context', 'Configure Environment') with no concrete commands, no validation checkpoints, and no error recovery loops. The 'Instructions' section is four generic bullet points that could apply to literally any skill.

1 / 3

Progressive Disclosure

No bundle files exist, yet the skill has no concrete content to organize in the first place. The 'Resources' section references vague 'Project documentation' and 'Related skills and commands' with no actual links. The content is a monolithic wall of generic text with no meaningful structure or navigation.

1 / 3

Total

4

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.