Experimentation Agent. A/B 테스트 설계, 가설 검증, 통계 분석을 담당합니다.
Install with Tessl CLI
npx tessl i github:shaul1991/shaul-agents-plugin --skill growth-experiment40
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear domain (experimentation/A/B testing) but suffers from lack of explicit trigger guidance and insufficient specificity in concrete actions. The Korean-only text limits trigger term coverage, and the absence of a 'Use when...' clause makes it difficult for Claude to know when to select this skill over others.
Suggestions
Add an explicit 'Use when...' clause with trigger scenarios like 'Use when the user asks about A/B tests, experiment design, sample size calculation, statistical significance, or hypothesis testing'
List more specific concrete actions such as 'calculate required sample sizes', 'determine statistical significance', 'design experiment variants', 'analyze conversion rates'
Include both Korean and English trigger terms to improve coverage: 'A/B test', 'split test', 'experiment', 'p-value', 'conversion optimization'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (A/B testing, hypothesis verification, statistical analysis) and some actions, but lacks concrete specific actions like 'calculate sample sizes', 'determine statistical significance', or 'design control groups'. | 2 / 3 |
Completeness | Describes what it does (A/B test design, hypothesis verification, statistical analysis) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | Includes relevant terms like 'A/B 테스트', '가설 검증', '통계 분석' but missing common variations users might say such as 'experiment', 'split test', 'significance testing', 'p-value', or English equivalents. | 2 / 3 |
Distinctiveness Conflict Risk | The A/B testing and experimentation focus provides some distinctiveness, but '통계 분석' (statistical analysis) is broad and could overlap with general data analysis or statistics skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
22%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a role description rather than actionable guidance. It lists what an experimentation agent does but provides zero concrete instructions on how to do any of it - no statistical methods, no code for sample size calculations, no experiment design templates, and no analysis workflows.
Suggestions
Add executable code examples for sample size calculation (e.g., using scipy.stats or statsmodels with specific parameters)
Include a concrete workflow with validation steps: hypothesis → sample size → randomization → data collection → statistical test → decision criteria
Provide specific statistical thresholds and decision rules (e.g., 'Use α=0.05, power=0.8, minimum detectable effect of X%')
Add an example experiment design template showing required fields and expected output format
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is brief but lacks substance - it's concise by omission rather than by efficient information density. The bullet points are high-level categories without actionable detail. | 2 / 3 |
Actionability | Completely vague and abstract. No concrete code, commands, statistical formulas, sample size calculations, or specific examples of how to design or analyze A/B tests. Describes responsibilities rather than instructs. | 1 / 3 |
Workflow Clarity | Lists tasks as bullet points but provides no sequence, no validation checkpoints, and no guidance on how to actually execute any step. Missing critical details like statistical significance thresholds or decision criteria. | 1 / 3 |
Progressive Disclosure | References an output location which is good, but the skill itself is too sparse to need progressive disclosure. No links to detailed methodology, statistical references, or example experiments. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.