Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.
91
86%
Does it follow best practices?
Impact
95%
1.07xAverage score across 6 eval scenarios
Passed
No known issues
Image-based custom environment
gymnasium.Env inheritance
100%
100%
reset() return tuple
100%
100%
step() return 5-tuple
100%
100%
Image dtype uint8
100%
100%
Channel-first image shape
0%
100%
check_env called
100%
100%
CnnPolicy used
100%
100%
Discrete space starts at 0
100%
100%
self.np_random used
70%
100%
Image pixel bounds [0, 255]
100%
100%
Vectorized off-policy training with callbacks
make_vec_env used
100%
100%
gradient_steps=-1
0%
0%
eval_freq adjusted for n_envs
100%
100%
save_freq adjusted for n_envs
100%
100%
Separate eval environment
100%
100%
Multiple callbacks chained
100%
100%
No DDPG used
100%
100%
model.save() called
100%
100%
SAC or TD3 for continuous control
100%
100%
VecEnv type used
100%
100%
Model persistence and VecEnv API
Static model load
100%
100%
evaluate_policy used
100%
100%
deterministic=True in evaluate_policy
100%
100%
VecEnv step() returns 4-tuple
100%
100%
VecEnv reset() returns obs only
100%
100%
terminal_observation accessed correctly
0%
50%
Replay buffer caveat noted
100%
100%
Installation uses uv pip install
0%
100%
results.json written
100%
100%
No DDPG used
100%
100%
HER goal-conditioned training
HerReplayBuffer class used
100%
100%
SAC or TD3 base algorithm
100%
100%
MultiInputPolicy used
100%
100%
Goal obs structure keys
100%
100%
compute_reward method defined
100%
100%
goal_selection_strategy set
100%
100%
n_sampled_goal set
100%
100%
gymnasium.Env inheritance
100%
100%
check_env called
100%
100%
model.save() called
100%
100%
No DDPG used
100%
100%
uv pip install
0%
100%
VecNormalize persistence and learning rate schedule
VecNormalize applied
100%
100%
VecNormalize stats saved separately
100%
100%
VecNormalize training disabled on load
100%
100%
norm_reward disabled on load
100%
100%
linear_schedule returns progress_remaining * value
100%
100%
Schedule passed to model
100%
100%
VecNormalize.load used
100%
100%
model.save() called
100%
100%
Separate eval environment
100%
100%
evaluate_policy called
100%
100%
uv pip install used
0%
100%
Custom callbacks with early stopping
BaseCallback subclass
100%
100%
_on_step returns bool
100%
100%
self.num_timesteps used
100%
100%
self.model or self.training_env used
0%
0%
self.logger.record used
100%
100%
StopTrainingOnRewardThreshold via callback_on_new_best
100%
100%
EvalCallback used
100%
100%
Multiple callbacks combined
100%
100%
Callback log file or printed output
100%
100%
uv pip install used
0%
100%
No DDPG used
100%
100%
1420470
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.