Name: alonso-skills/arm-bandits-expert
Rating: 71.2 (1 reviews)
Author: alonso-skills

alonso-skills/arm-bandits-expert

Implements, evaluates, and deploys multi-armed bandit algorithms — including Thompson Sampling, UCB, epsilon-greedy, LinUCB, EXP3, and contextual bandits. Covers algorithm selection, experiment harnesses, offline evaluation (IPS, Doubly Robust), infrastructure patterns, and correctness verification. Use when the user asks about multi-armed bandits, exploration-exploitation tradeoffs, adaptive experiments, A/B testing alternatives, online optimization, bandit-based recommendation or personalization systems, or contextual bandits.

Quality

89%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured routing/decision-making skill that excels at progressive disclosure and conciseness. Its main weakness is the lack of any concrete executable examples in the main file — all implementation is deferred to references — and the absence of validation checkpoints in the workflow. The decision framework tables are a strong feature that provides genuinely useful, dense guidance for algorithm selection.

Suggestions

Add at least one minimal executable code example (e.g., a basic epsilon-greedy implementation) in the main skill to improve actionability, even if detailed implementations live in reference files.

Add validation/verification checkpoints to the Build Phases section — e.g., 'verify algorithm correctness with known distributions before proceeding to experiment harness' — to strengthen workflow clarity.

Dimension	Reasoning	Score
Conciseness	The content is lean and efficient. It avoids explaining what bandits are or how algorithms work conceptually — it assumes Claude already knows. Every section serves a routing or decision-making purpose. The tables are dense with information and no filler.	3 / 3
Actionability	The decision framework tables are concrete and useful for algorithm selection, but the skill itself contains no executable code, no concrete commands, and no copy-paste examples. It delegates all implementation details to reference files. The routing and build phases are directional rather than executable.	2 / 3
Workflow Clarity	The Build Phases section provides a clear 3-step sequence, and the routing section gives clear entry paths. However, there are no validation checkpoints, no feedback loops for error recovery, and no verification steps — which matters for a skill involving implementation and deployment of algorithms.	2 / 3
Progressive Disclosure	Excellent progressive disclosure structure. The SKILL.md serves as a concise overview and routing hub, with clearly signaled one-level-deep references to six well-organized reference files. The reference table at the bottom provides clear navigation with descriptions of each file's contents.	3 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the marks. It provides highly specific capabilities with named algorithms and techniques, includes a comprehensive 'Use when' clause with natural trigger terms spanning both technical and colloquial phrasings, and occupies a clearly distinct niche that minimizes conflict risk with other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions and algorithms: 'Thompson Sampling, UCB, epsilon-greedy, LinUCB, EXP3, contextual bandits' along with specific activities like 'algorithm selection, experiment harnesses, offline evaluation (IPS, Doubly Robust), infrastructure patterns, and correctness verification.'	3 / 3
Completeness	Clearly answers both 'what' (implements, evaluates, and deploys bandit algorithms with specific techniques listed) and 'when' (explicit 'Use when...' clause with multiple trigger scenarios covering a broad range of user intents).	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'multi-armed bandits', 'exploration-exploitation tradeoffs', 'adaptive experiments', 'A/B testing alternatives', 'online optimization', 'bandit-based recommendation or personalization systems', 'contextual bandits'. These cover both technical and more casual phrasings.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche — multi-armed bandits and exploration-exploitation is a very specific domain. The named algorithms (Thompson Sampling, UCB, LinUCB, EXP3) and evaluation methods (IPS, Doubly Robust) make it clearly distinguishable from general ML, A/B testing, or optimization skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

22 days ago

Table of Contents

Discovery Implementation Validation