CtrlK
BlogDocsLog inGet started
Tessl Logo

experimentation-platform-orchestrator

A platform decision framework for experimentation. When to use Statsig vs PostHog vs GrowthBook vs Optimizely vs Amplitude vs Eppo vs Kameleoon. How to migrate between them. How to coordinate when multi-platform is genuinely warranted. The decisions that compound for years and the ones you can defer. Triggers on which experimentation platform, choose Statsig vs PostHog, evaluate experimentation tools, switch experimentation platform, migrate from Optimizely, consolidate experimentation tools, multi-platform experimentation, experimentation platform decision, ab test platform selection, feature flag platform vs experiment platform, warehouse-native experiments, vendor lock-in experimentation. Also triggers when a team is asking about cost, governance, or migration cost across experimentation tools, or when an evaluation is starting.

55

Quality

62%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/experimentation-platform-orchestrator/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description with excellent trigger term coverage and clear completeness. Its main weakness is that the capability descriptions lean slightly conceptual ('decisions that compound for years') rather than listing concrete deliverables. The extensive trigger term list and explicit 'when to use' guidance make it highly effective for skill selection.

Suggestions

Replace abstract phrases like 'decisions that compound for years and the ones you can defer' with concrete actions such as 'generates platform comparison matrices, produces migration checklists, creates cost-benefit analyses'.

DimensionReasoningScore

Specificity

The description names the domain (experimentation platform selection) and some actions like 'migrate between them', 'coordinate when multi-platform is genuinely warranted', and 'decisions that compound for years', but the actions are more conceptual than concrete. It doesn't list specific deliverables like 'generates comparison matrices' or 'produces migration plans'.

2 / 3

Completeness

Clearly answers both 'what' (platform decision framework for experimentation, comparisons, migration guidance, multi-platform coordination) and 'when' with explicit trigger clauses ('Triggers on...', 'Also triggers when...'). The when guidance is thorough and covers multiple scenarios.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms including specific platform names (Statsig, PostHog, GrowthBook, Optimizely, Amplitude, Eppo, Kameleoon), common user phrases ('which experimentation platform', 'ab test platform selection', 'vendor lock-in'), and contextual triggers ('cost, governance, or migration cost'). These are terms users would naturally use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a very specific niche: experimentation platform selection and comparison. The named platforms and specific domain (A/B testing, feature flags, warehouse-native experiments) make it unlikely to conflict with other skills.

3 / 3

Total

11

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a comprehensive blog post or whitepaper than a concise, actionable skill file. Its greatest strength is the breadth of coverage and the opinionated decision matrix that maps contexts to specific platform recommendations. Its greatest weaknesses are extreme verbosity (the 7 considerations and 11 considerations overlap significantly, platform profiles are prose-heavy), lack of executable artifacts (no templates, checklists, or concrete evaluation scripts), and poor content distribution between the main file and references (too much detail is inlined rather than delegated to the referenced files).

Suggestions

Cut the SKILL.md body by 50%+: move platform profiles, cost details, common mistakes, and governance details into their respective reference files, keeping only a 2-3 sentence summary per platform in the main file with a link to the detail.

Consolidate the '7 considerations' and '11 considerations' sections into a single framework — the current duplication adds ~400 words of redundancy.

Add a concrete evaluation template or checklist (e.g., a markdown table with the 11 considerations as rows and columns for 'your answer', 'platform A score', 'platform B score') that Claude can fill out during an actual evaluation.

Add a concrete migration checklist with explicit validation gates (e.g., 'Week 2: confirm metric parity within 5% across both platforms before proceeding to phase 2') rather than deferring all specifics to the reference file.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~3000+ words. It explains concepts a senior advisor would already know (what a PDF is equivalent: what experimentation platforms are, what vendor-native means, what governance is). The 7 considerations are listed, then repeated as 11 considerations later with significant overlap. Platform profiles contain marketing-style prose ('strong PM-led ergonomics') rather than decision-relevant facts. Much content could be cut by 50%+ without losing actionable information.

1 / 3

Actionability

The skill provides concrete decision guidance (the decision matrix summary is genuinely useful with specific recommendations per context) and migration effort estimates. However, there are no executable commands, no code examples, no concrete API calls, no template outputs, and no specific steps for actually performing an evaluation or migration. The guidance is directional rather than copy-paste actionable.

2 / 3

Workflow Clarity

The 11-consideration framework provides a clear sequence, and migration patterns include parallel-run validation steps. However, there are no explicit validation checkpoints in the evaluation workflow itself (e.g., 'run a trial, measure these specific metrics, compare against these thresholds'). The migration section mentions three rules but defers the actual step-by-step to a reference file. The evaluation workflow lacks a concrete checklist or decision tree with verification gates.

2 / 3

Progressive Disclosure

Seven reference files are clearly listed with descriptive labels and relative paths, which is good structure. However, no bundle files were provided, so the references are unverifiable. More importantly, the SKILL.md itself is far too long — the per-platform profiles, cost details, common mistakes list, and governance discussion should largely live in the reference files rather than being inlined. The main file should be a concise overview pointing to these references, not a near-complete treatment.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
rampstackco/claude-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.