CtrlK
BlogDocsLog inGet started
Tessl Logo

pufferlib

High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.

75

1.50x
Quality

67%

Does it follow best practices?

Impact

87%

1.50x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/pufferlib/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly communicates specific capabilities, includes natural trigger terms, explicitly states both what the skill does and when to use it, and even provides guidance on when to use an alternative. The inclusion of concrete examples (Atari, Procgen, NetHack), quantified performance claims (2-10x speedups), and a disambiguation clause against stable-baselines3 make this a strong, well-crafted description.

DimensionReasoningScore

Specificity

Lists multiple specific concrete capabilities: fast parallel training, vectorized environments, multi-agent systems, integration with game environments (Atari, Procgen, NetHack), and quantifies performance (2-10x speedups).

3 / 3

Completeness

Clearly answers both 'what' (high-performance RL framework with parallel training, vectorized environments, multi-agent systems, game environment integration) and 'when' (explicit 'Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments'). Also includes a 'when NOT to use' clause pointing to stable-baselines3.

3 / 3

Trigger Term Quality

Includes strong natural trigger terms users would say: 'reinforcement learning', 'parallel training', 'vectorized environments', 'multi-agent', 'Atari', 'Procgen', 'NetHack', 'speedups'. Also mentions the alternative 'stable-baselines3' which helps with disambiguation.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche (high-performance RL) and explicit differentiation from stable-baselines3. The specific game environments (Atari, Procgen, NetHack) and performance focus make it unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill covers PufferLib comprehensively with reasonable structure and code examples, but suffers significantly from verbosity — repeating information across sections, including generic advice Claude doesn't need, and providing incomplete code examples that fall short of being truly executable. The progressive disclosure structure is conceptually sound with references to external files, but the main file itself contains too much redundant content that undermines the overview-to-detail pattern.

Suggestions

Cut the 'When to Use This Skill', 'Tips for Success', and 'Common Use Cases' sections entirely — they repeat information already covered in the core capabilities and workflows, and Claude can infer appropriate usage from the skill description.

Make code examples fully executable: define `obs_dim`, `num_actions`, `num_iterations`, and `my_policy` in examples, or use concrete values. Replace placeholder methods in PufferEnv with minimal working implementations.

Add validation checkpoints to workflows: e.g., 'Test environment with `env.reset()` and manual `step()` calls before vectorizing' and 'Verify training convergence by checking reward curves after 100k steps'.

Consolidate the 'Resources' section into the existing reference links within each capability section — the detailed file descriptions are redundant with the 'For complete X, read references/Y.md' patterns already present.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~350+ lines. It includes extensive 'When to Use This Skill' bullets, 'Tips for Success' with 10 items of generic advice ('Start simple', 'Profile early'), redundant 'Common Use Cases' that repeat earlier examples, a 'Resources' section that re-describes what each reference file contains (already listed inline), and explanatory text Claude doesn't need (e.g., 'PufferLib is a high-performance reinforcement learning library designed for...'). Much of this could be cut by 50%+ without losing actionable content.

1 / 3

Actionability

Code examples are provided and appear mostly executable, but several are incomplete or uncertain — the Python training loop references `my_policy` and `num_iterations` without definition, the PufferEnv example has placeholder methods (`_get_observation`, `_compute_reward`, `_is_done`) that aren't implemented, and the Policy example uses undefined `obs_dim` and `num_actions`. These are closer to pseudocode than copy-paste ready.

2 / 3

Workflow Clarity

The 'Quick Start Workflow' section provides numbered steps for four different workflows, which is helpful. However, none include validation checkpoints or feedback loops — there's no 'verify environment works before vectorizing', no 'check training is converging before scaling', and no error recovery guidance. For a framework involving complex multi-step processes (environment creation → vectorization → training), this is a significant gap.

2 / 3

Progressive Disclosure

The skill references multiple external files (references/training.md, references/environments.md, etc.) and scripts, which is good structure. However, no bundle files are provided, so we can't verify these exist. The main file itself is bloated with content that should be in those reference files (e.g., the full Resources section re-describing each file, the 10 tips, the Common Use Cases section). The inline content doesn't achieve a clean overview-to-detail split.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata.version' is missing

Warning

Total

10

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.