CtrlK
BlogDocsLog inGet started
Tessl Logo

stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

91

1.07x
Quality

86%

Does it follow best practices?

Impact

95%

1.07x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Evaluation results

100%

15%

Visual Inspection Environment for Defect Detection

Image-based custom environment

Criteria
Without context
With context

gymnasium.Env inheritance

100%

100%

reset() return tuple

100%

100%

step() return 5-tuple

100%

100%

Image dtype uint8

100%

100%

Channel-first image shape

0%

100%

check_env called

100%

100%

CnnPolicy used

100%

100%

Discrete space starts at 0

100%

100%

self.np_random used

70%

100%

Image pixel bounds [0, 255]

100%

100%

85%

High-Throughput Pendulum Training Pipeline

Vectorized off-policy training with callbacks

Criteria
Without context
With context

make_vec_env used

100%

100%

gradient_steps=-1

0%

0%

eval_freq adjusted for n_envs

100%

100%

save_freq adjusted for n_envs

100%

100%

Separate eval environment

100%

100%

Multiple callbacks chained

100%

100%

No DDPG used

100%

100%

model.save() called

100%

100%

SAC or TD3 for continuous control

100%

100%

VecEnv type used

100%

100%

96%

12%

RL Agent Benchmarking with Per-Episode Statistics

Model persistence and VecEnv API

Criteria
Without context
With context

Static model load

100%

100%

evaluate_policy used

100%

100%

deterministic=True in evaluate_policy

100%

100%

VecEnv step() returns 4-tuple

100%

100%

VecEnv reset() returns obs only

100%

100%

terminal_observation accessed correctly

0%

50%

Replay buffer caveat noted

100%

100%

Installation uses uv pip install

0%

100%

results.json written

100%

100%

No DDPG used

100%

100%

100%

5%

Robotic Arm Pick-and-Place Trainer

HER goal-conditioned training

Criteria
Without context
With context

HerReplayBuffer class used

100%

100%

SAC or TD3 base algorithm

100%

100%

MultiInputPolicy used

100%

100%

Goal obs structure keys

100%

100%

compute_reward method defined

100%

100%

goal_selection_strategy set

100%

100%

n_sampled_goal set

100%

100%

gymnasium.Env inheritance

100%

100%

check_env called

100%

100%

model.save() called

100%

100%

No DDPG used

100%

100%

uv pip install

0%

100%

100%

5%

Continuous Control Agent with Adaptive Training Configuration

VecNormalize persistence and learning rate schedule

Criteria
Without context
With context

VecNormalize applied

100%

100%

VecNormalize stats saved separately

100%

100%

VecNormalize training disabled on load

100%

100%

norm_reward disabled on load

100%

100%

linear_schedule returns progress_remaining * value

100%

100%

Schedule passed to model

100%

100%

VecNormalize.load used

100%

100%

model.save() called

100%

100%

Separate eval environment

100%

100%

evaluate_policy called

100%

100%

uv pip install used

0%

100%

92%

7%

Instrumented RL Training Pipeline with Automatic Stopping

Custom callbacks with early stopping

Criteria
Without context
With context

BaseCallback subclass

100%

100%

_on_step returns bool

100%

100%

self.num_timesteps used

100%

100%

self.model or self.training_env used

0%

0%

self.logger.record used

100%

100%

StopTrainingOnRewardThreshold via callback_on_new_best

100%

100%

EvalCallback used

100%

100%

Multiple callbacks combined

100%

100%

Callback log file or printed output

100%

100%

uv pip install used

0%

100%

No DDPG used

100%

100%

Repository
K-Dense-AI/claude-scientific-skills
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.