CtrlK
BlogDocsLog inGet started
Tessl Logo

setting-up-experiment-tracking

Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.

38

Quality

37%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/ai-ml/experiment-tracking-setup/skills/setting-up-experiment-tracking/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description adequately identifies its niche (ML experiment tracking with MLflow/W&B) and includes explicit 'Use when' triggers, but is weakened by the meaningless filler sentence 'Trigger with relevant phrases based on skill purpose' which adds no information. It could benefit from more specific actions and additional natural trigger term variations like 'wandb', 'W&B', or 'track experiments'.

Suggestions

Remove the vague filler sentence 'Trigger with relevant phrases based on skill purpose' and replace it with additional concrete trigger terms like 'W&B', 'wandb', 'track experiments', 'log model metrics', 'run tracking'.

Add more specific concrete actions such as 'compare runs', 'visualize training curves', 'register models', or 'log model checkpoints' to better differentiate capabilities.

DimensionReasoningScore

Specificity

Names the domain (ML experiment tracking) and some actions (configures environment, logging parameters/metrics/artifacts), but the actions are somewhat generic and not comprehensively listed. It mentions two specific tools (MLflow, Weights & Biases) which adds some specificity.

2 / 3

Completeness

Answers both 'what' (implement ML experiment tracking, configure environment, provide code for logging) and 'when' (explicitly states 'Use when asked to setup experiment tracking or initialize MLflow'). The 'when' clause has explicit trigger phrases, though the final sentence is unhelpful filler.

3 / 3

Trigger Term Quality

Includes some natural keywords like 'experiment tracking', 'MLflow', 'Weights & Biases', 'logging parameters, metrics, and artifacts', but the final sentence 'Trigger with relevant phrases based on skill purpose' is vague filler that adds no value. Missing common variations like 'W&B', 'wandb', 'model tracking', 'run logging'.

2 / 3

Distinctiveness Conflict Risk

The description targets a clear niche — ML experiment tracking with specific tools (MLflow, Weights & Biases). This is unlikely to conflict with other skills as it's a specialized domain with distinct trigger terms.

3 / 3

Total

10

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is essentially a template with placeholder content that provides no actionable guidance for setting up experiment tracking. It contains no executable code, no concrete commands, no specific configuration details, and is padded with verbose sections explaining concepts Claude already understands. Nearly every section could be replaced with actual MLflow/W&B code examples and setup commands.

Suggestions

Replace the abstract 'How It Works' and 'Examples' sections with actual executable code: pip install commands, MLflow/W&B initialization code, and concrete logging examples (mlflow.log_param, wandb.log, etc.)

Remove the boilerplate sections (Overview, When to Use, Integration, Prerequisites, Instructions, Output, Error Handling, Resources) that contain no skill-specific information and waste tokens

Add concrete workflow steps with validation: e.g., 'Run `mlflow ui` and verify the server is accessible at http://localhost:5000 before proceeding'

Provide complete, copy-paste-ready code blocks for both MLflow and W&B setups, including environment variable configuration (MLFLOW_TRACKING_URI, WANDB_API_KEY) and a minimal training loop with logging

DimensionReasoningScore

Conciseness

Extremely verbose with extensive sections explaining concepts Claude already knows (what experiment tracking is, when to use it, how it works at a high level). The 'Overview', 'How It Works', 'When to Use This Skill', 'Integration', 'Prerequisites', 'Instructions', 'Output', 'Error Handling', and 'Resources' sections are almost entirely filler that adds no actionable value. The content reads like a product brochure rather than a skill instruction.

1 / 3

Actionability

Despite being about code setup, the skill contains zero executable code, no concrete commands (not even `pip install mlflow`), no actual code snippets for logging parameters/metrics/artifacts, and no specific configuration examples. The 'Examples' section describes what the skill 'will do' rather than providing the actual code. Sections like 'Output: The skill produces structured output relevant to the task' are completely vacuous.

1 / 3

Workflow Clarity

The workflow steps are abstract descriptions ('Analyze Context', 'Configure Environment') with no concrete commands, no validation checkpoints, and no error recovery steps. The 'Instructions' section is generic boilerplate ('Invoke this skill when the trigger conditions are met') that provides no actual workflow guidance for setting up experiment tracking.

1 / 3

Progressive Disclosure

The content is a monolithic wall of text with no references to external files, no bundle files to support it, and no meaningful structure beyond generic section headers. Content that should contain detailed code examples and configuration references is instead filled with vague descriptions. There is no layered information architecture.

1 / 3

Total

4

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.