stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

1.07x

Quality

86%

Does it follow best practices?

Impact

95%

1.07x

Average score across 6 eval scenarios

Securityby

Passed

No known issues

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that covers all key dimensions thoroughly. It lists specific algorithms, provides clear trigger terms, explicitly states both what it does and when to use it, and even includes negative triggers (when to use an alternative skill instead). The description is concise yet comprehensive.

Dimension	Reasoning	Score
Specificity	Lists multiple specific algorithms (PPO, SAC, DQN, TD3, DDPG, A2C), mentions the API style (scikit-learn-like), and describes concrete use cases (standard RL experiments, quick prototyping, well-documented implementations).	3 / 3
Completeness	Clearly answers 'what' (production-ready RL algorithms with scikit-learn-like API) and 'when' (standard RL experiments, quick prototyping, single-agent RL with Gymnasium environments). Also explicitly states when NOT to use it (high-performance parallel training, multi-agent systems), which further clarifies the 'when'.	3 / 3
Trigger Term Quality	Includes highly relevant natural keywords users would say: specific algorithm names (PPO, SAC, DQN, TD3, DDPG, A2C), 'reinforcement learning', 'RL', 'Gymnasium environments', 'prototyping', and 'scikit-learn-like API'. These cover the terms a user working in RL would naturally use.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive by specifying the exact algorithms, the single-agent scope, Gymnasium environments, and explicitly differentiating from pufferlib for multi-agent/parallel scenarios. This clear boundary-drawing makes it very unlikely to conflict with other skills.	3 / 3
	Total	12 / 12 Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, well-organized skill that provides comprehensive, actionable guidance for using Stable Baselines3. Its main strengths are excellent code examples and good progressive disclosure to supporting files. Its weaknesses are moderate verbosity (explanatory text that Claude doesn't need) and a workflow section that could benefit from explicit validation checkpoints and error recovery loops for training instability.

Suggestions

Trim explanatory prose — remove 'Purpose' paragraphs and descriptions of what things are (e.g., 'Callbacks enable monitoring metrics...') in favor of just showing how to use them.

Add explicit validation checkpoints to the workflow, such as 'After N timesteps, verify mean reward is improving via evaluate_policy(); if not, check reward scaling and hyperparameters before continuing.'

Dimension	Reasoning	Score
Conciseness	The skill is reasonably well-structured but includes some unnecessary explanations (e.g., 'Purpose' descriptions for vectorized environments and callbacks that Claude would already understand). Section headers like 'Overview' with a sentence restating what SB3 is add little value. The content could be tightened by ~20-30% without losing information.	2 / 3
Actionability	The skill provides fully executable, copy-paste-ready code examples throughout — training, custom environments, vectorized envs, callbacks, evaluation, learning rate schedules, HER, and TensorBoard integration. Key constraints and gotchas (e.g., uint8 images, Discrete start!=0, replay buffer not saved) are concrete and specific.	3 / 3
Workflow Clarity	The 'Starting a New RL Project' workflow provides a clear 8-step sequence, and the custom environment section includes a validation step with check_env(). However, the workflow lacks explicit validation checkpoints and feedback loops — there's no 'if training diverges, do X' or 'verify reward is improving before proceeding' guidance. For a domain where training instability is common, this is a gap.	2 / 3
Progressive Disclosure	The skill has a clear overview structure with well-signaled one-level-deep references to scripts/ and references/ directories. The Resources section cleanly lists all supporting files with descriptions. Content is appropriately split between the main skill (patterns and quick reference) and detailed references (algorithms.md, callbacks.md, etc.).	3 / 3
	Total	10 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata.version' is missing	Warning

	Total	10 / 11 Passed

Repository: K-Dense-AI/claude-scientific-skills
Commit: 75c41d7

Reviewed: 4 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.