CtrlK
BlogDocsLog inGet started
Tessl Logo

exp-driven-dev

Builds features with A/B testing in mind using Ronny Kohavi's frameworks and Netflix/Airbnb experimentation culture. Use when implementing feature flags, choosing metrics, designing experiments, or building for fast iteration. Focuses on guardrail metrics, statistical significance, and experiment-driven development.

86

Quality

83%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured skill description with strong completeness and distinctiveness. It clearly identifies its niche (experimentation/A/B testing) and provides explicit trigger conditions. The main weakness is that the specific capabilities could be more concrete - it describes the domain well but could better enumerate the actual actions it performs.

Suggestions

Add more concrete actions to improve specificity, e.g., 'calculate sample sizes, define success metrics, implement feature flag patterns, analyze experiment results'

Consider adding file type or code pattern triggers if applicable, e.g., 'when working with experiment configuration files or feature toggle code'

DimensionReasoningScore

Specificity

Names the domain (A/B testing, experimentation) and mentions some actions like 'implementing feature flags, choosing metrics, designing experiments' but lacks concrete specific actions like 'calculate sample sizes' or 'set up control groups'. The phrase 'building for fast iteration' is vague.

2 / 3

Completeness

Clearly answers both what ('Builds features with A/B testing in mind using Ronny Kohavi's frameworks') and when ('Use when implementing feature flags, choosing metrics, designing experiments, or building for fast iteration'). Has explicit 'Use when...' clause with multiple trigger scenarios.

3 / 3

Trigger Term Quality

Good coverage of natural terms users would say: 'A/B testing', 'feature flags', 'metrics', 'experiments', 'statistical significance'. Also includes domain-specific but recognizable terms like 'guardrail metrics' and references to known frameworks (Kohavi, Netflix, Airbnb).

3 / 3

Distinctiveness Conflict Risk

Clear niche focused specifically on A/B testing and experimentation culture. The specific references to Kohavi's frameworks and Netflix/Airbnb experimentation culture make this highly distinctive and unlikely to conflict with general coding or analytics skills.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with strong actionability and clear workflows. The templates and code examples are immediately usable. However, it's overly verbose for a skill file - it includes explanatory content about company cultures and quotes that don't add actionable value, and could benefit from moving detailed examples to reference files.

Suggestions

Remove or significantly trim the 'Real-World Examples' section - Claude doesn't need explanations of Netflix/Airbnb culture; keep only the specific test examples if useful

Delete the 'Key Quotes' section entirely - these add no actionable guidance

Move 'Common Pitfalls' to a reference file and keep only a brief summary in the main skill

Trim the 'When This Skill Activates' section - this is meta-information Claude can infer from context

DimensionReasoningScore

Conciseness

The skill contains useful content but is verbose in places - the 'Real-World Examples' section explains company cultures Claude already knows, and some sections like 'Key Quotes' and 'Further Learning' add little actionable value. The core frameworks are reasonably efficient but could be tightened.

2 / 3

Actionability

Provides fully executable code examples for feature flags, experiment assignment, and TypeScript implementations. Templates are copy-paste ready with clear placeholders. The HITS framework and experiment spec template give concrete, actionable guidance.

3 / 3

Workflow Clarity

Clear multi-step workflows with explicit validation checkpoints - the experiment checklist has before/during/after phases, the HITS framework provides clear sequencing, and the decision tree offers unambiguous branching logic. Success criteria and decision points are explicit.

3 / 3

Progressive Disclosure

References external files (references/experiment-design-guide.md, etc.) appropriately, but the main document is quite long with content that could be split out. The 'Real-World Examples' and 'Common Pitfalls' sections could be separate reference files to keep the core skill leaner.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
menkesu/awesome-pm-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.