High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
78
71%
Does it follow best practices?
Impact
87%
1.50xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/pufferlib/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly communicates specific capabilities, includes natural trigger terms, explicitly states both what the skill does and when to use it, and even provides guidance on when to use an alternative. The inclusion of concrete environment names and performance benchmarks makes it highly distinctive and actionable.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete capabilities: fast parallel training, vectorized environments, multi-agent systems, integration with game environments (Atari, Procgen, NetHack), and quantifies performance (2-10x speedups). | 3 / 3 |
Completeness | Clearly answers both 'what' (high-performance RL framework with parallel training, vectorized environments, multi-agent systems, game environment integration) and 'when' (explicit 'Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments'). Also includes a negative trigger for when NOT to use it. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'reinforcement learning', 'parallel training', 'vectorized environments', 'multi-agent', 'Atari', 'Procgen', 'NetHack', 'speedups', and even references the alternative 'stable-baselines3' for disambiguation. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche (high-performance RL), specific environment names (Atari, Procgen, NetHack), and explicit disambiguation from stable-baselines3, making it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has excellent progressive disclosure structure with clear references to detailed guides, but is significantly too verbose - it could be cut by 40-50% without losing actionable content. The code examples are illustrative but not fully executable, and the workflows lack validation checkpoints. The 'Tips for Success', 'Common Use Cases', and 'When to Use This Skill' sections add substantial bulk with minimal unique value.
Suggestions
Remove or drastically reduce the 'When to Use This Skill', 'Tips for Success', and 'Common Use Cases' sections - these largely duplicate content already present in the core capabilities sections and contain generic advice Claude already knows.
Make code examples fully executable by defining all variables (e.g., show how to get obs_dim and num_actions from spaces, define a complete minimal policy for the training example) or explicitly mark them as templates requiring customization.
Add validation checkpoints to workflows, e.g., 'Test environment with env.reset() and env.step() before vectorizing' and 'Verify training convergence by checking reward curves after N iterations before scaling up.'
Consolidate the Resources section descriptions - the bullet-point summaries of each reference file repeat information already provided in the 'For complete X, read references/Y.md' callouts throughout the document.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~350+ lines. It includes a 'When to Use This Skill' section that restates the description, a 'Tips for Success' section with 10 generic tips Claude already knows (e.g., 'start simple', 'profile early'), 'Common Use Cases' that largely duplicate earlier examples, and extensive resource listings that repeat what's already described in the progressive disclosure sections. The 'Overview' paragraph also restates information Claude doesn't need explained. | 1 / 3 |
Actionability | The skill provides code examples that appear plausible but several are likely not fully executable as-is (e.g., `PuffeRL` import path, `pufferlib.make` with string identifiers, `pufferlib.emulate` usage patterns may not match actual API). The training loop and environment examples give reasonable structure but use undefined variables (my_policy, num_iterations, obs_dim, num_actions) without showing how to obtain them, making them pseudocode-like rather than truly copy-paste ready. | 2 / 3 |
Workflow Clarity | The 'Quick Start Workflow' section provides numbered steps for four different workflows, which is helpful. However, none of the workflows include validation checkpoints or feedback loops - there's no 'verify environment works before scaling', no 'check training is converging before running full experiment', and no error recovery guidance. For operations involving custom environment development and training at scale, this is a notable gap. | 2 / 3 |
Progressive Disclosure | The skill does an excellent job of structuring content with a clear overview, inline code snippets for quick reference, and well-signaled one-level-deep references to detailed files (references/training.md, references/environments.md, etc.). Each reference is clearly described with bullet points of what it contains, and template scripts are also referenced appropriately. | 3 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
b58ad7e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.