A platform decision framework for experimentation. When to use Statsig vs PostHog vs GrowthBook vs Optimizely vs Amplitude vs Eppo vs Kameleoon. How to migrate between them. How to coordinate when multi-platform is genuinely warranted. The decisions that compound for years and the ones you can defer. Triggers on which experimentation platform, choose Statsig vs PostHog, evaluate experimentation tools, switch experimentation platform, migrate from Optimizely, consolidate experimentation tools, multi-platform experimentation, experimentation platform decision, ab test platform selection, feature flag platform vs experiment platform, warehouse-native experiments, vendor lock-in experimentation. Also triggers when a team is asking about cost, governance, or migration cost across experimentation tools, or when an evaluation is starting.
55
62%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/experimentation-platform-orchestrator/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description with excellent trigger term coverage and clear completeness. Its main weakness is that the capability descriptions lean slightly conceptual ('decisions that compound for years') rather than listing concrete deliverables. The extensive trigger term list and explicit 'when to use' guidance make it highly effective for skill selection.
Suggestions
Replace abstract phrases like 'decisions that compound for years and the ones you can defer' with concrete actions such as 'generates platform comparison matrices, produces migration checklists, creates cost-benefit analyses'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (experimentation platform selection) and some actions like 'migrate between them', 'coordinate when multi-platform is genuinely warranted', and 'decisions that compound for years', but the actions are more conceptual than concrete. It doesn't list specific deliverables like 'generates comparison matrices' or 'produces migration plans'. | 2 / 3 |
Completeness | Clearly answers both 'what' (platform decision framework for experimentation, comparisons, migration guidance, multi-platform coordination) and 'when' with explicit trigger clauses ('Triggers on...', 'Also triggers when...'). The when guidance is thorough and covers multiple scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including specific platform names (Statsig, PostHog, GrowthBook, Optimizely, Amplitude, Eppo, Kameleoon), common user phrases ('which experimentation platform', 'ab test platform selection', 'vendor lock-in'), and contextual triggers ('cost, governance, or migration cost'). These are terms users would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a very specific niche: experimentation platform selection and comparison. The named platforms and specific domain (A/B testing, feature flags, warehouse-native experiments) make it unlikely to conflict with other skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like a comprehensive blog post or whitepaper than a concise, actionable skill file. Its greatest strength is the breadth of coverage and the opinionated decision matrix that maps contexts to specific platform recommendations. Its greatest weaknesses are extreme verbosity (the 7 considerations and 11 considerations overlap significantly, platform profiles are prose-heavy), lack of executable artifacts (no templates, checklists, or concrete evaluation scripts), and poor content distribution between the main file and references (too much detail is inlined rather than delegated to the referenced files).
Suggestions
Cut the SKILL.md body by 50%+: move platform profiles, cost details, common mistakes, and governance details into their respective reference files, keeping only a 2-3 sentence summary per platform in the main file with a link to the detail.
Consolidate the '7 considerations' and '11 considerations' sections into a single framework — the current duplication adds ~400 words of redundancy.
Add a concrete evaluation template or checklist (e.g., a markdown table with the 11 considerations as rows and columns for 'your answer', 'platform A score', 'platform B score') that Claude can fill out during an actual evaluation.
Add a concrete migration checklist with explicit validation gates (e.g., 'Week 2: confirm metric parity within 5% across both platforms before proceeding to phase 2') rather than deferring all specifics to the reference file.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~3000+ words. It explains concepts a senior advisor would already know (what a PDF is equivalent: what experimentation platforms are, what vendor-native means, what governance is). The 7 considerations are listed, then repeated as 11 considerations later with significant overlap. Platform profiles contain marketing-style prose ('strong PM-led ergonomics') rather than decision-relevant facts. Much content could be cut by 50%+ without losing actionable information. | 1 / 3 |
Actionability | The skill provides concrete decision guidance (the decision matrix summary is genuinely useful with specific recommendations per context) and migration effort estimates. However, there are no executable commands, no code examples, no concrete API calls, no template outputs, and no specific steps for actually performing an evaluation or migration. The guidance is directional rather than copy-paste actionable. | 2 / 3 |
Workflow Clarity | The 11-consideration framework provides a clear sequence, and migration patterns include parallel-run validation steps. However, there are no explicit validation checkpoints in the evaluation workflow itself (e.g., 'run a trial, measure these specific metrics, compare against these thresholds'). The migration section mentions three rules but defers the actual step-by-step to a reference file. The evaluation workflow lacks a concrete checklist or decision tree with verification gates. | 2 / 3 |
Progressive Disclosure | Seven reference files are clearly listed with descriptive labels and relative paths, which is good structure. However, no bundle files were provided, so the references are unverifiable. More importantly, the SKILL.md itself is far too long — the per-platform profiles, cost details, common mistakes list, and governance discussion should largely live in the reference files rather than being inlined. The main file should be a concise overview pointing to these references, not a near-complete treatment. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
8e70d03
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.