CtrlK
BlogDocsLog inGet started
Tessl Logo

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

84

1.22x
Quality

86%

Does it follow best practices?

Impact

72%

1.22x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly identifies the tool (Vaex), its specific capabilities (out-of-core operations, lazy evaluation, fast aggregations, visualization, ML), and explicit trigger conditions (large files, memory constraints, specific file formats). The only minor issue is the use of second-person 'Use this skill' at the start, though the rest of the description uses appropriate third-person framing. Overall it is well-structured and highly distinguishable.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization, machine learning on large datasets, processing CSV/HDF5/Arrow/Parquet files, fast statistics, and ML pipelines.

3 / 3

Completeness

Clearly answers both 'what' (out-of-core DataFrame operations, lazy evaluation, fast aggregations, visualization, ML on large datasets) and 'when' with explicit triggers ('Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory').

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'large tabular datasets', 'billions of rows', 'exceed available RAM', 'CSV/HDF5/Arrow/Parquet files', 'massive datasets', 'big data', 'ML pipelines', 'not fit in memory', and 'Vaex'. These cover many natural variations of how users describe large-data problems.

3 / 3

Distinctiveness Conflict Risk

Clearly occupies a distinct niche: Vaex-specific, out-of-core processing for datasets exceeding RAM. The emphasis on billions of rows, specific file formats (HDF5/Arrow/Parquet), and memory constraints makes it unlikely to conflict with general data analysis or pandas-based skills.

3 / 3

Total

12

/

12

Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with strong progressive disclosure and actionable code examples. Its main weaknesses are moderate verbosity (the 'When to Use' and 'Core Capabilities' sections could be significantly tightened since they largely restate what's obvious from context) and the lack of validation/error-handling guidance in workflows. The common patterns section is a highlight, providing practical, executable examples.

Suggestions

Trim the 'When to Use This Skill' section significantly — it largely duplicates the skill description and explains things Claude can infer. Consider removing it entirely or reducing to 2-3 bullet points of non-obvious guidance.

Condense the 'Core Capabilities' section to a simple table or compact list mapping capability areas to reference files, rather than repeating sub-bullet descriptions of what each reference contains.

Add validation/error-handling guidance to workflows — e.g., checking row counts after CSV-to-HDF5 conversion, handling corrupt files, or verifying that lazy operations produce expected results with `.head()` before full computation.

DimensionReasoningScore

Conciseness

The 'Overview' and 'When to Use This Skill' sections repeat information Claude already knows or that's in the YAML description. The core capabilities section is somewhat verbose with bullet-point descriptions that could be more compact. However, the code examples are lean and the best practices are concise.

2 / 3

Actionability

The Quick Start Pattern and Common Patterns sections provide fully executable, copy-paste ready Python code with clear comments. The examples cover the most important operations (loading, virtual columns, filtering, aggregations, visualization, export) with concrete syntax.

3 / 3

Workflow Clarity

The Quick Start Pattern provides a clear numbered sequence for typical usage, and the Common Patterns show specific multi-step workflows. However, there are no validation checkpoints or error recovery steps — e.g., no guidance on what to do if a CSV load fails, how to verify data integrity after conversion, or how to handle memory issues during processing.

2 / 3

Progressive Disclosure

Excellent progressive disclosure structure: the SKILL.md serves as a clear overview with well-signaled, one-level-deep references to six specific reference files. The 'Working with References' section provides task-based navigation guidance for which reference to load based on the user's need.

3 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata.version' is missing

Warning

Total

10

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.