CtrlK
BlogDocsLog inGet started
Tessl Logo

tracking-model-versions

Build this skill enables AI assistant to track and manage ai/ml model versions using the model-versioning-tracker plugin. it should be used when the user asks to manage model versions, track model lineage, log model performance, or implement version control f... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

Install with Tessl CLI

npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill tracking-model-versions
What are skills?

68

1.10x

Quality

20%

Does it follow best practices?

Impact

72%

1.10x

Average score across 12 eval scenarios

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/ai-ml/model-versioning-tracker/skills/tracking-model-versions/SKILL.md
SKILL.md
Review
Evals

Evaluation results

70%

40%

Register a New Recommendation Model Version

Model version registration with metadata

Criteria
Without context
With context

Plugin import

0%

0%

Log version call

0%

0%

Metadata included

100%

100%

Input validation

0%

100%

Error handling present

0%

100%

Failure exit code

0%

100%

Success confirmation

100%

100%

No raw model file output

100%

100%

Without context: $0.3609 · 1m 3s · 16 turns · 22 in / 3,124 out tokens

With context: $0.5012 · 1m 41s · 28 turns · 2,101 in / 5,454 out tokens

62%

6%

Compare Model Performance Across Registry Versions

Performance metrics retrieval and comparison

Criteria
Without context
With context

Plugin import

0%

0%

Metrics query call

0%

50%

Multiple versions queried

100%

100%

Input validation

0%

0%

Error handling present

100%

100%

Informative error messages

100%

0%

Structured output file

66%

100%

Numeric metrics present

100%

100%

No hardcoded mock only

50%

83%

Without context: $0.2322 · 1m 1s · 13 turns · 57 in / 3,670 out tokens

With context: $0.4911 · 1m 36s · 27 turns · 1,598 in / 5,790 out tokens

25%

-21%

Build an End-to-End Model Lifecycle Pipeline

Automated model versioning workflow

Criteria
Without context
With context

Plugin import

0%

0%

Version registration call

20%

0%

Training metrics logged

30%

0%

Monitoring loop present

100%

0%

Metrics retrieval in loop

0%

0%

Degradation warning

100%

50%

Error handling present

50%

0%

Pipeline log written

100%

100%

Stage comments

100%

100%

No large files

0%

100%

Without context: $0.5340 · 2m 57s · 23 turns · 26 in / 9,044 out tokens

With context: $0.9761 · 4m 7s · 42 turns · 1,611 in / 13,284 out tokens

64%

3%

Model Promotion Gate for Production Deployment

Deployment promotion gate

Criteria
Without context
With context

Plugin import

0%

0%

Metrics retrieval for candidate

0%

0%

Metrics retrieval for baseline

0%

0%

Threshold comparison logic

100%

100%

Promote or reject decision

100%

100%

Decision artifact written

100%

100%

Input validation present

100%

100%

Error handling present

50%

60%

Informative failure message

100%

100%

Deployment workflow integration

14%

42%

Exit code on rejection

100%

100%

Without context: $0.3387 · 1m 31s · 20 turns · 27 in / 5,516 out tokens

With context: $0.5735 · 2m 6s · 30 turns · 206 in / 7,108 out tokens

80%

Register Fine-Tuned Model with Full Provenance

Model lineage and provenance registration

Criteria
Without context
With context

Plugin import

0%

0%

Version registration call

0%

0%

Parent/lineage metadata included

100%

100%

Performance metrics in metadata

100%

100%

Training provenance metadata

100%

100%

Input validation present

100%

100%

Error handling present

100%

100%

Lineage summary written

100%

100%

Success confirmation logged

100%

100%

No large binary files

100%

100%

Without context: $0.3997 · 1m 47s · 26 turns · 33 in / 5,743 out tokens

With context: $0.7379 · 2m 56s · 37 turns · 318 in / 8,512 out tokens

86%

7%

Quarterly Model Registry Audit

Model registry audit report

Criteria
Without context
With context

Plugin import

0%

12%

Metrics queried per version

0%

83%

Performance threshold check

100%

100%

Underperformers identified

100%

80%

Audit report written

100%

100%

Summary statistics present

100%

100%

Error handling present

100%

100%

Informative error logging

100%

100%

Performance monitoring context

87%

62%

No large binary files

100%

100%

Human-readable summary printed

100%

100%

Without context: $0.4887 · 1m 54s · 27 turns · 29 in / 7,508 out tokens

With context: $0.8899 · 3m 10s · 33 turns · 88 in / 12,889 out tokens

46%

-13%

Hyperparameter Sweep Tracking and Best Model Selection

Hyperparameter sweep version tracking

Criteria
Without context
With context

Plugin import

0%

0%

Multiple versions registered

25%

0%

Hyperparameter metadata in registration

100%

66%

Performance metrics in metadata

100%

70%

Metrics retrieval for selection

25%

0%

Best version identified

100%

100%

Input validation present

0%

0%

Error handling present

0%

0%

Automated sweep loop

100%

100%

Sweep results artifact written

100%

100%

Without context: $0.4116 · 1m 37s · 24 turns · 31 in / 5,506 out tokens

With context: $0.5728 · 2m 7s · 31 turns · 378 in / 7,116 out tokens

83%

17%

Production Model Regression Recovery

Model regression detection and rollback registration

Criteria
Without context
With context

Plugin import

0%

14%

Metrics retrieval for current version

50%

70%

Metrics retrieval for previous version

50%

70%

Regression detection logic

100%

100%

Rollback registration call

50%

75%

Rollback metadata included

90%

100%

Input validation present

62%

75%

Error handling present

12%

100%

Rollback decision output

100%

100%

Rollback report written

100%

100%

No rollback when no regression

100%

100%

Without context: $0.7716 · 3m 33s · 31 turns · 38 in / 12,049 out tokens

With context: $0.5483 · 2m 25s · 30 turns · 171 in / 8,252 out tokens

77%

-5%

Automated Model Performance Monitoring Service

Scheduled continuous performance monitoring workflow

Criteria
Without context
With context

Plugin import

0%

14%

Repeated metrics polling

100%

50%

Scheduling or interval logic

100%

100%

Degradation detection per cycle

100%

100%

Alert generation

100%

100%

Monitoring log written

100%

100%

Multiple models or versions monitored

100%

100%

Error handling present

0%

0%

Automation framing in code

62%

62%

Summary printed at end

100%

100%

No large binary files

100%

100%

Without context: $0.3544 · 1m 34s · 21 turns · 28 in / 4,716 out tokens

With context: $0.5683 · 2m 18s · 29 turns · 31 in / 7,609 out tokens

100%

34%

Set Up MLflow Tracking for a New Fraud Detection Model

MLflow workflow configuration

Criteria
Without context
With context

workflow_name present

0%

100%

mlflow_tracking_uri present

75%

100%

artifact_location present

71%

100%

environment field present

0%

100%

Model flavor specified

100%

100%

Training section present

62%

100%

Training parameters included

100%

100%

Evaluation metrics thresholds

100%

100%

Deployment section present

100%

100%

Stage transition configured

58%

100%

Model registry name present

71%

100%

Model URI uses registry format

0%

100%

Without context: $0.1758 · 1m 1s · 8 turns · 13 in / 3,270 out tokens

With context: $0.2210 · 58s · 13 turns · 1,220 in / 3,101 out tokens

82%

-6%

Automate Model Artifact Version Control for a Recommendation Engine

Version control automation for model artifacts

Criteria
Without context
With context

Plugin import

0%

0%

Plugin version registration

60%

0%

Git operations included

100%

100%

Git tag creation

100%

100%

Metadata in registration

100%

100%

Input validation present

100%

100%

Error handling present

100%

100%

Informative error output

100%

100%

Automation log written

100%

100%

Commit hash in log or metadata

100%

100%

Non-zero exit on failure

100%

100%

Without context: $0.3141 · 1m 28s · 13 turns · 16 in / 6,075 out tokens

With context: $0.5488 · 2m 13s · 24 turns · 1,609 in / 8,188 out tokens

100%

27%

Document a Production-Ready NLP Model

Model card documentation creation

Criteria
Without context
With context

Intended use section

100%

100%

Out-of-scope use section

100%

100%

Architecture details

100%

100%

Input/output specification

100%

100%

Training data described

100%

100%

Performance metrics reported

100%

100%

Limitations documented

100%

100%

Bias section present

62%

100%

Fairness section present

57%

100%

Privacy section present

57%

100%

Version history table

0%

100%

Version history has author column

0%

100%

License specified

100%

100%

Without context: $0.2677 · 1m 40s · 12 turns · 19 in / 4,325 out tokens

With context: $0.2822 · 1m 24s · 14 turns · 2,125 in / 4,136 out tokens

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.