stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

1.07x

Quality

86%

Does it follow best practices?

Impact

95%

1.07x

Average score across 6 eval scenarios

Securityby

Passed

No known issues

Evaluation results

100%

15%

Visual Inspection Environment for Defect Detection

Image-based custom environment

Criteria

Without context

With context

gymnasium.Env inheritance

100%

reset() return tuple

100%

step() return 5-tuple

100%

Image dtype uint8

100%

Channel-first image shape

100%

check_env called

100%

CnnPolicy used

100%

Discrete space starts at 0

100%

self.np_random used

70%

100%

Image pixel bounds [0, 255]

100%

85%

High-Throughput Pendulum Training Pipeline

Vectorized off-policy training with callbacks

Criteria

Without context

With context

make_vec_env used

100%

gradient_steps=-1

eval_freq adjusted for n_envs

100%

save_freq adjusted for n_envs

100%

Separate eval environment

100%

Multiple callbacks chained

100%

No DDPG used

100%

model.save() called

100%

SAC or TD3 for continuous control

100%

VecEnv type used

100%

96%

12%

RL Agent Benchmarking with Per-Episode Statistics

Model persistence and VecEnv API

Criteria

Without context

With context

Static model load

100%

evaluate_policy used

100%

deterministic=True in evaluate_policy

100%

VecEnv step() returns 4-tuple

100%

VecEnv reset() returns obs only

100%

terminal_observation accessed correctly

50%

Replay buffer caveat noted

100%

Installation uses uv pip install

100%

results.json written

100%

No DDPG used

100%

Robotic Arm Pick-and-Place Trainer

HER goal-conditioned training

Criteria

Without context

With context

HerReplayBuffer class used

100%

SAC or TD3 base algorithm

100%

MultiInputPolicy used

100%

Goal obs structure keys

100%

compute_reward method defined

100%

goal_selection_strategy set

100%

n_sampled_goal set

100%

gymnasium.Env inheritance

100%

check_env called

100%

model.save() called

100%

No DDPG used

100%

uv pip install

100%

Continuous Control Agent with Adaptive Training Configuration

VecNormalize persistence and learning rate schedule

Criteria

Without context

With context

VecNormalize applied

100%

VecNormalize stats saved separately

100%

VecNormalize training disabled on load

100%

norm_reward disabled on load

100%

linear_schedule returns progress_remaining * value

100%

Schedule passed to model

100%

VecNormalize.load used

100%

model.save() called

100%

Separate eval environment

100%

evaluate_policy called

100%

uv pip install used

100%

92%

Instrumented RL Training Pipeline with Automatic Stopping

Custom callbacks with early stopping

Criteria

Without context

With context

BaseCallback subclass

100%

_on_step returns bool

100%

self.num_timesteps used

100%

self.model or self.training_env used

self.logger.record used

100%

StopTrainingOnRewardThreshold via callback_on_new_best

100%

EvalCallback used

100%

Multiple callbacks combined

100%

Callback log file or printed output

100%

uv pip install used

100%

No DDPG used

100%

Repository: K-Dense-AI/claude-scientific-skills
Commit: 75c41d7

Evaluated: about 2 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Visual Inspection Environment for Defect Detection High-Throughput Pendulum Training Pipeline RL Agent Benchmarking with Per-Episode Statistics Robotic Arm Pick-and-Place Trainer Continuous Control Agent with Adaptive Training Configuration Instrumented RL Training Pipeline with Automatic Stopping

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.