CtrlK
BlogDocsLog inGet started
Tessl Logo

stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

91

1.07x
Quality

86%

Does it follow best practices?

Impact

95%

1.07x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that covers all key dimensions thoroughly. It lists specific algorithms, provides clear trigger terms, explicitly states both what it does and when to use it, and even includes negative triggers (when to use an alternative skill instead). The description is concise yet comprehensive.

DimensionReasoningScore

Specificity

Lists multiple specific algorithms (PPO, SAC, DQN, TD3, DDPG, A2C), mentions the API style (scikit-learn-like), and describes concrete use cases (standard RL experiments, quick prototyping, well-documented implementations).

3 / 3

Completeness

Clearly answers 'what' (production-ready RL algorithms with scikit-learn-like API) and 'when' (standard RL experiments, quick prototyping, single-agent RL with Gymnasium environments). Also explicitly states when NOT to use it (high-performance parallel training, multi-agent systems), which further clarifies the 'when'.

3 / 3

Trigger Term Quality

Includes highly relevant natural keywords users would say: specific algorithm names (PPO, SAC, DQN, TD3, DDPG, A2C), 'reinforcement learning', 'RL', 'Gymnasium environments', 'prototyping', and 'scikit-learn-like API'. These cover the terms a user working in RL would naturally use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive by specifying the exact algorithms, the single-agent scope, Gymnasium environments, and explicitly differentiating from pufferlib for multi-agent/parallel scenarios. This clear boundary-drawing makes it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with strong actionability and excellent progressive disclosure. The main weaknesses are moderate verbosity (purpose/description sentences that Claude doesn't need) and a workflow section that lists steps without explicit validation checkpoints or feedback loops for handling common failure modes like training instability.

Suggestions

Remove 'Purpose:' paragraphs and descriptive sentences that explain what things are (e.g., 'Callbacks enable monitoring metrics...', 'Vectorized environments run multiple environment instances in parallel...') — Claude already knows these concepts.

Add explicit validation checkpoints to the workflow, e.g., 'After step 5, verify reward is increasing over first 1000 steps; if not, check reward scaling and hyperparameters before continuing.'

DimensionReasoningScore

Conciseness

The skill is fairly comprehensive but includes some unnecessary explanations (e.g., explaining what vectorized environments are, what callbacks enable). Several sections describe purpose before giving instructions, adding tokens without value for Claude. The content could be tightened by ~30% while preserving all actionable information.

2 / 3

Actionability

The skill provides fully executable, copy-paste-ready code examples throughout — training, custom environments, vectorized envs, callbacks, evaluation, video recording, learning rate schedules, HER, and TensorBoard integration. Key constraints and gotchas (e.g., uint8 images, Discrete start!=0, replay buffer not saved) are concrete and specific.

3 / 3

Workflow Clarity

The 'Starting a New RL Project' workflow provides a clear 8-step sequence, and the custom environment section includes a validation step with check_env(). However, the workflow lacks explicit validation checkpoints and feedback loops — there's no 'if training diverges, do X' or 'verify reward is improving before proceeding' guidance. For a domain where training instability is common, this is a gap.

2 / 3

Progressive Disclosure

Excellent structure with clear overview content in the main file and well-signaled one-level-deep references to scripts/ and references/ directories. Each section ends with a pointer to the relevant detailed file. The Resources section provides a clean navigation index.

3 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata.version' is missing

Warning

Total

10

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.