CtrlK
BlogDocsLog inGet started
Tessl Logo

experiment-tracker

Designs and tracks scientific experiments, A/B tests, and feature rollouts for product and engineering teams. Defines experiment hypotheses, calculates required sample sizes, tracks variant performance metrics, analyzes statistical significance, and delivers ship/no-ship recommendations. Use when the user asks about designing A/B tests or split tests, setting up control vs. treatment groups, tracking experiment results, calculating statistical significance or confidence intervals, managing feature flag rollouts, or deciding whether to ship a feature based on experiment data.

93

Quality

92%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong, well-crafted skill description that clearly defines its domain (experiment design and analysis), lists specific concrete capabilities, and provides an explicit 'Use when...' clause with diverse natural trigger terms. It uses proper third-person voice throughout and covers both the 'what' and 'when' comprehensively, making it easy for Claude to select this skill appropriately from a large pool.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: defines hypotheses, calculates sample sizes, tracks variant performance metrics, analyzes statistical significance, and delivers ship/no-ship recommendations.

3 / 3

Completeness

Clearly answers both 'what' (designs and tracks experiments, calculates sample sizes, analyzes significance, delivers recommendations) and 'when' with an explicit 'Use when...' clause listing six specific trigger scenarios.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'A/B tests', 'split tests', 'control vs. treatment groups', 'statistical significance', 'confidence intervals', 'feature flag rollouts', 'ship a feature', 'experiment data'.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche around experimentation and A/B testing with distinct triggers like 'sample sizes', 'ship/no-ship', 'feature flag rollouts', and 'statistical significance' that are unlikely to conflict with general analytics or data science skills.

3 / 3

Total

12

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured experiment tracking skill that excels in workflow clarity with explicit validation checkpoints at every stage and clear decision criteria. The progressive disclosure is excellent, providing just enough inline detail while pointing to dedicated files for implementations and templates. The main area for improvement is that actionability could be stronger with at least one inline executable code example rather than deferring all code to external files.

Suggestions

Include at least one inline executable code snippet (e.g., the sample_size calculation) so the skill has copy-paste ready code without requiring navigation to STATISTICAL_METHODS.md

DimensionReasoningScore

Conciseness

The content is lean and efficient throughout. It avoids explaining what A/B tests are or how statistics work conceptually, instead jumping straight into actionable workflow steps. Every section earns its place with specific thresholds, criteria, and references rather than padding.

3 / 3

Actionability

The skill provides specific thresholds (e.g., '< 95% of expected', '> 5% deviation'), concrete metric examples, and references to Python implementations in STATISTICAL_METHODS.md. However, the actual executable code is deferred to external files rather than included inline, and the templates are similarly referenced but not shown. The guidance is concrete but not fully copy-paste ready within this file.

2 / 3

Workflow Clarity

The four-step workflow is clearly sequenced with explicit validation checkpoints at each stage, including specific trigger conditions (e.g., data collection rate < 95%, split deviation > 5%). It includes feedback loops (halt and fix, reduce scope) and covers the full lifecycle from design through decision with clear go/no-go criteria.

3 / 3

Progressive Disclosure

The skill provides a clear overview with well-signaled one-level-deep references to STATISTICAL_METHODS.md and TEMPLATES.md. The main file contains enough context (function signatures, test selection table, example values) to be useful standalone while appropriately deferring full implementations and templates to separate files.

3 / 3

Total

11

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
OpenRoster-ai/awesome-agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.