CtrlK
BlogDocsLog inGet started
Tessl Logo

alonso-skills/arm-bandits-expert

Implements, evaluates, and deploys multi-armed bandit algorithms — including Thompson Sampling, UCB, epsilon-greedy, LinUCB, EXP3, and contextual bandits. Covers algorithm selection, experiment harnesses, offline evaluation (IPS, Doubly Robust), infrastructure patterns, and correctness verification. Use when the user asks about multi-armed bandits, exploration-exploitation tradeoffs, adaptive experiments, A/B testing alternatives, online optimization, bandit-based recommendation or personalization systems, or contextual bandits.

94

Quality

94%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines a specific technical domain with concrete actions, named algorithms, and evaluation methods. It includes a comprehensive 'Use when...' clause with diverse natural trigger terms covering both technical jargon and user-friendly alternatives like 'A/B testing alternatives'. The description is concise yet thorough, and occupies a clearly distinct niche.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions and algorithms: 'Thompson Sampling, UCB, epsilon-greedy, LinUCB, EXP3, contextual bandits' along with specific activities like 'algorithm selection, experiment harnesses, offline evaluation (IPS, Doubly Robust), infrastructure patterns, and correctness verification.'

3 / 3

Completeness

Clearly answers both 'what' (implements, evaluates, and deploys multi-armed bandit algorithms with specific algorithm names and activities) and 'when' (explicit 'Use when...' clause listing multiple trigger scenarios).

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'multi-armed bandits', 'exploration-exploitation tradeoffs', 'adaptive experiments', 'A/B testing alternatives', 'online optimization', 'bandit-based recommendation or personalization systems', 'contextual bandits'. These cover both technical and more casual phrasings.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche focused on multi-armed bandit algorithms specifically. The named algorithms (Thompson Sampling, UCB, LinUCB, EXP3) and domain-specific terms (IPS, Doubly Robust, exploration-exploitation) make it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted routing and decision-framework skill that excels at progressive disclosure and conciseness. It efficiently directs Claude to the right reference material based on user intent and provides a thorough algorithm selection framework. The main weakness is that actionability depends entirely on the referenced files — the skill itself contains no executable code or concrete implementation guidance, though this is appropriate for its role as an overview/router.

DimensionReasoningScore

Conciseness

The content is lean and well-structured. It avoids explaining what bandits are or how exploration-exploitation works — it assumes Claude already knows. Every section serves a routing or decision-making purpose with no filler.

3 / 3

Actionability

The decision framework tables are concrete and useful for algorithm selection, but the skill itself contains no executable code, no specific commands, and delegates all implementation details to reference files. The routing instructions are clear but the skill alone doesn't let Claude execute anything.

2 / 3

Workflow Clarity

The routing section provides clear entry paths based on user intent, the decision framework is a well-sequenced 4-step process with explicit tables at each step, and the build phases provide a clear 3-stage progression. For a routing/overview skill, this is excellent workflow clarity.

3 / 3

Progressive Disclosure

Exemplary progressive disclosure: the SKILL.md is a concise overview and routing layer that points to 6 clearly-described reference files, all one level deep. The reference table at the bottom provides clear navigation with content descriptions for each file.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents