vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

1.22x

Quality

86%

Does it follow best practices?

Impact

72%

1.22x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly identifies the tool (Vaex), lists specific capabilities (out-of-core operations, lazy evaluation, aggregations, visualization, ML), and provides explicit trigger guidance with natural keywords. The only minor issue is the use of second-person 'Use this skill' at the start, though the rubric penalizes first/second person voice, the description otherwise uses third person for capability descriptions ('Vaex excels at...'). The description effectively differentiates itself from general data analysis skills by emphasizing the large-scale, out-of-memory use case.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization, machine learning on large datasets, processing CSV/HDF5/Arrow/Parquet files, fast statistics, and ML pipelines.	3 / 3
Completeness	Clearly answers both 'what' (out-of-core DataFrame operations, lazy evaluation, fast aggregations, visualization, ML) and 'when' ('Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory').	3 / 3
Trigger Term Quality	Includes strong natural keywords users would say: 'large tabular datasets', 'billions of rows', 'exceed available RAM', 'CSV/HDF5/Arrow/Parquet', 'massive datasets', 'big data', 'ML pipelines', 'not fit in memory', and the tool name 'Vaex'. Good coverage of natural terms and file formats.	3 / 3
Distinctiveness Conflict Risk	Clearly carved out niche: specifically targets Vaex for out-of-core, RAM-exceeding tabular data processing. The emphasis on 'billions of rows', 'exceed available RAM', and 'not fit in memory' distinguishes it from general pandas/data analysis skills. Naming the specific tool 'Vaex' further reduces conflict risk.	3 / 3
	Total	12 / 12 Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with strong actionability through executable code examples and excellent progressive disclosure via clearly referenced supporting files. Its main weaknesses are moderate verbosity (the overview and 'when to use' sections contain redundant/unnecessary content) and the lack of validation checkpoints in workflows involving data conversion or batch operations.

Suggestions

Remove or significantly trim the 'Overview' and 'When to Use This Skill' sections — Claude already knows what Vaex is, and the description metadata covers use cases. This would improve conciseness.

Add validation/verification steps to the CSV-to-HDF5 conversion pattern (e.g., compare row counts before and after export) and to the aggregation pattern (e.g., checking for NaN results), which would improve workflow clarity.

Dimension	Reasoning	Score
Conciseness	The 'When to Use This Skill' section largely duplicates the description and the overview. The 'Core Capabilities' section is somewhat verbose with bullet-point summaries that mostly just name topics rather than adding value beyond the reference file names. The 'Overview' paragraph explains what Vaex is, which Claude likely already knows. However, the code examples are lean and the best practices are concise.	2 / 3
Actionability	The Quick Start Pattern and Common Patterns sections provide fully executable, copy-paste ready Python code with clear comments. The examples cover the most important operations (loading, virtual columns, filtering, aggregations, visualization, export, delay=True batching) with concrete syntax.	3 / 3
Workflow Clarity	The Quick Start Pattern provides a clear numbered sequence for typical Vaex tasks, and the Common Patterns show specific multi-step workflows. However, there are no validation checkpoints or error recovery steps — for example, the CSV-to-HDF5 conversion pattern has no verification that the export succeeded or that data integrity was maintained, and there's no guidance on handling common errors like memory issues or corrupt files.	2 / 3
Progressive Disclosure	The skill is well-structured as an overview with clear, one-level-deep references to six specific reference files in the references/ directory. The 'Working with References' section provides task-based navigation guidance, and the 'Resources' section provides a clean listing. References are consistently signaled throughout the Core Capabilities section.	3 / 3
	Total	10 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata.version' is missing	Warning

	Total	10 / 11 Passed

Repository: K-Dense-AI/claude-scientific-skills
Commit: cbcae7b

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.