CtrlK
BlogDocsLog inGet started
Tessl Logo

pufferlib

High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.

78

1.50x
Quality

71%

Does it follow best practices?

Impact

87%

1.50x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/pufferlib/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly communicates specific capabilities, includes natural trigger terms, explicitly states both what the skill does and when to use it, and even provides guidance on when to use an alternative. The inclusion of concrete environment names and performance benchmarks makes it highly distinctive and actionable.

DimensionReasoningScore

Specificity

Lists multiple specific concrete capabilities: fast parallel training, vectorized environments, multi-agent systems, integration with game environments (Atari, Procgen, NetHack), and quantifies performance (2-10x speedups).

3 / 3

Completeness

Clearly answers both 'what' (high-performance RL framework with parallel training, vectorized environments, multi-agent systems, game environment integration) and 'when' (explicit 'Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments'). Also includes a negative trigger for when NOT to use it.

3 / 3

Trigger Term Quality

Includes strong natural trigger terms users would say: 'reinforcement learning', 'parallel training', 'vectorized environments', 'multi-agent', 'Atari', 'Procgen', 'NetHack', 'speedups', and even references the alternative 'stable-baselines3' for disambiguation.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche (high-performance RL), specific environment names (Atari, Procgen, NetHack), and explicit disambiguation from stable-baselines3, making it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

42%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill has excellent progressive disclosure structure with clear references to detailed guides, but is significantly too verbose - it could be cut by 40-50% without losing actionable content. The code examples are illustrative but not fully executable, and the workflows lack validation checkpoints. The 'Tips for Success', 'Common Use Cases', and 'When to Use This Skill' sections add substantial bulk with minimal unique value.

Suggestions

Remove or drastically reduce the 'When to Use This Skill', 'Tips for Success', and 'Common Use Cases' sections - these largely duplicate content already present in the core capabilities sections and contain generic advice Claude already knows.

Make code examples fully executable by defining all variables (e.g., show how to get obs_dim and num_actions from spaces, define a complete minimal policy for the training example) or explicitly mark them as templates requiring customization.

Add validation checkpoints to workflows, e.g., 'Test environment with env.reset() and env.step() before vectorizing' and 'Verify training convergence by checking reward curves after N iterations before scaling up.'

Consolidate the Resources section descriptions - the bullet-point summaries of each reference file repeat information already provided in the 'For complete X, read references/Y.md' callouts throughout the document.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~350+ lines. It includes a 'When to Use This Skill' section that restates the description, a 'Tips for Success' section with 10 generic tips Claude already knows (e.g., 'start simple', 'profile early'), 'Common Use Cases' that largely duplicate earlier examples, and extensive resource listings that repeat what's already described in the progressive disclosure sections. The 'Overview' paragraph also restates information Claude doesn't need explained.

1 / 3

Actionability

The skill provides code examples that appear plausible but several are likely not fully executable as-is (e.g., `PuffeRL` import path, `pufferlib.make` with string identifiers, `pufferlib.emulate` usage patterns may not match actual API). The training loop and environment examples give reasonable structure but use undefined variables (my_policy, num_iterations, obs_dim, num_actions) without showing how to obtain them, making them pseudocode-like rather than truly copy-paste ready.

2 / 3

Workflow Clarity

The 'Quick Start Workflow' section provides numbered steps for four different workflows, which is helpful. However, none of the workflows include validation checkpoints or feedback loops - there's no 'verify environment works before scaling', no 'check training is converging before running full experiment', and no error recovery guidance. For operations involving custom environment development and training at scale, this is a notable gap.

2 / 3

Progressive Disclosure

The skill does an excellent job of structuring content with a clear overview, inline code snippets for quick reference, and well-signaled one-level-deep references to detailed files (references/training.md, references/environments.md, etc.). Each reference is clearly described with bullet points of what it contains, and template scripts are also referenced appropriately.

3 / 3

Total

8

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata.version' is missing

Warning

Total

10

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.