Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.
91
86%
Does it follow best practices?
Impact
95%
1.07xAverage score across 6 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that covers all key dimensions thoroughly. It lists specific algorithms, provides clear trigger terms, explicitly states both what it does and when to use it, and even includes negative triggers (when to use an alternative skill instead). The description is concise yet comprehensive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific algorithms (PPO, SAC, DQN, TD3, DDPG, A2C), mentions the API style (scikit-learn-like), and describes concrete use cases (standard RL experiments, quick prototyping, well-documented implementations). | 3 / 3 |
Completeness | Clearly answers 'what' (production-ready RL algorithms with scikit-learn-like API) and 'when' (standard RL experiments, quick prototyping, single-agent RL with Gymnasium environments). Also explicitly states when NOT to use it (high-performance parallel training, multi-agent systems), which further clarifies the 'when'. | 3 / 3 |
Trigger Term Quality | Includes highly relevant natural keywords users would say: specific algorithm names (PPO, SAC, DQN, TD3, DDPG, A2C), 'reinforcement learning', 'RL', 'Gymnasium environments', 'prototyping', and 'scikit-learn-like API'. These cover the terms a user working in RL would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive by specifying the exact algorithms, the single-agent scope, Gymnasium environments, and explicitly differentiating from pufferlib for multi-agent/parallel scenarios. This clear boundary-drawing makes it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
72%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with strong actionability and excellent progressive disclosure. The main weaknesses are moderate verbosity (purpose/description sentences that Claude doesn't need) and a workflow section that lists steps without explicit validation checkpoints or feedback loops for handling common failure modes like training instability.
Suggestions
Remove 'Purpose:' paragraphs and descriptive sentences that explain what things are (e.g., 'Callbacks enable monitoring metrics...', 'Vectorized environments run multiple environment instances in parallel...') — Claude already knows these concepts.
Add explicit validation checkpoints to the workflow, e.g., 'After step 5, verify reward is increasing over first 1000 steps; if not, check reward scaling and hyperparameters before continuing.'
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly comprehensive but includes some unnecessary explanations (e.g., explaining what vectorized environments are, what callbacks enable). Several sections describe purpose before giving instructions, adding tokens without value for Claude. The content could be tightened by ~30% while preserving all actionable information. | 2 / 3 |
Actionability | The skill provides fully executable, copy-paste-ready code examples throughout — training, custom environments, vectorized envs, callbacks, evaluation, video recording, learning rate schedules, HER, and TensorBoard integration. Key constraints and gotchas (e.g., uint8 images, Discrete start!=0, replay buffer not saved) are concrete and specific. | 3 / 3 |
Workflow Clarity | The 'Starting a New RL Project' workflow provides a clear 8-step sequence, and the custom environment section includes a validation step with check_env(). However, the workflow lacks explicit validation checkpoints and feedback loops — there's no 'if training diverges, do X' or 'verify reward is improving before proceeding' guidance. For a domain where training instability is common, this is a gap. | 2 / 3 |
Progressive Disclosure | Excellent structure with clear overview content in the main file and well-signaled one-level-deep references to scripts/ and references/ directories. Each section ends with a pointer to the relevant detailed file. The Resources section provides a clean navigation index. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
1420470
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.