CtrlK
BlogDocsLog inGet started
Tessl Logo

stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

91

1.07x
Quality

86%

Does it follow best practices?

Impact

95%

1.07x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that covers all key dimensions thoroughly. It lists specific algorithms, provides clear trigger terms, explicitly states both what it does and when to use it, and even includes negative triggers (when to use an alternative skill instead). The description is concise yet comprehensive.

DimensionReasoningScore

Specificity

Lists multiple specific algorithms (PPO, SAC, DQN, TD3, DDPG, A2C), mentions the API style (scikit-learn-like), and describes concrete use cases (standard RL experiments, quick prototyping, well-documented implementations).

3 / 3

Completeness

Clearly answers 'what' (production-ready RL algorithms with scikit-learn-like API) and 'when' (standard RL experiments, quick prototyping, single-agent RL with Gymnasium environments). Also explicitly states when NOT to use it (high-performance parallel training, multi-agent systems), which further clarifies the 'when'.

3 / 3

Trigger Term Quality

Includes highly relevant natural keywords users would say: specific algorithm names (PPO, SAC, DQN, TD3, DDPG, A2C), 'reinforcement learning', 'RL', 'Gymnasium environments', 'prototyping', and 'scikit-learn-like API'. These cover the terms a user working in RL would naturally use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive by specifying the exact algorithms, the single-agent scope, Gymnasium environments, and explicitly differentiating from pufferlib for multi-agent/parallel scenarios. This clear boundary-drawing makes it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, well-organized skill that provides comprehensive, actionable guidance for using Stable Baselines3. Its main strengths are excellent code examples and good progressive disclosure to supporting files. Its weaknesses are moderate verbosity (explanatory text that Claude doesn't need) and a workflow section that could benefit from explicit validation checkpoints and error recovery loops for training instability.

Suggestions

Trim explanatory prose — remove 'Purpose' paragraphs and descriptions of what things are (e.g., 'Callbacks enable monitoring metrics...') in favor of just showing how to use them.

Add explicit validation checkpoints to the workflow, such as 'After N timesteps, verify mean reward is improving via evaluate_policy(); if not, check reward scaling and hyperparameters before continuing.'

DimensionReasoningScore

Conciseness

The skill is reasonably well-structured but includes some unnecessary explanations (e.g., 'Purpose' descriptions for vectorized environments and callbacks that Claude would already understand). Section headers like 'Overview' with a sentence restating what SB3 is add little value. The content could be tightened by ~20-30% without losing information.

2 / 3

Actionability

The skill provides fully executable, copy-paste-ready code examples throughout — training, custom environments, vectorized envs, callbacks, evaluation, learning rate schedules, HER, and TensorBoard integration. Key constraints and gotchas (e.g., uint8 images, Discrete start!=0, replay buffer not saved) are concrete and specific.

3 / 3

Workflow Clarity

The 'Starting a New RL Project' workflow provides a clear 8-step sequence, and the custom environment section includes a validation step with check_env(). However, the workflow lacks explicit validation checkpoints and feedback loops — there's no 'if training diverges, do X' or 'verify reward is improving before proceeding' guidance. For a domain where training instability is common, this is a gap.

2 / 3

Progressive Disclosure

The skill has a clear overview structure with well-signaled one-level-deep references to scripts/ and references/ directories. The Resources section cleanly lists all supporting files with descriptions. Content is appropriately split between the main skill (patterns and quick reference) and detailed references (algorithms.md, callbacks.md, etc.).

3 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata.version' is missing

Warning

Total

10

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.