CtrlK
BlogDocsLog inGet started
Tessl Logo

hypogenic

Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.

69

1.19x
Quality

62%

Does it follow best practices?

Impact

74%

1.19x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/hypogenic/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly defines its niche in automated hypothesis generation/testing on tabular data, includes explicit 'Use when' guidance, and thoughtfully disambiguates from related skills. The main weakness is that the specific capabilities could be more concretely enumerated (e.g., what statistical tests, what outputs are produced). The use of second person 'you' in the trigger clause is a minor style issue but doesn't significantly harm clarity.

Suggestions

Add more concrete action verbs describing specific capabilities, e.g., 'Generates hypotheses from literature, runs statistical tests, produces summary reports on tabular datasets.'

DimensionReasoningScore

Specificity

The description names the domain ('tabular datasets', 'hypothesis generation and testing') and some actions ('explore hypotheses about patterns'), but doesn't list multiple concrete actions like 'generate hypotheses, run statistical tests, produce reports'. The actions remain somewhat abstract.

2 / 3

Completeness

Clearly answers both 'what' (automated LLM-driven hypothesis generation and testing on tabular datasets, combining literature insights with data-driven testing) and 'when' (explicit 'Use when you want to systematically explore hypotheses about patterns in empirical data'). Also includes disambiguation guidance for related skills.

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'hypothesis generation', 'hypothesis testing', 'tabular datasets', 'deception detection', 'content analysis', 'patterns in empirical data'. The examples in parentheses add useful trigger terms, and the cross-references to related skills ('hypothesis-generation', 'scientific-brainstorming') further clarify scope.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche (automated/LLM-driven hypothesis testing on tabular data) and explicitly differentiates itself from related skills ('hypothesis-generation' for manual formulation, 'scientific-brainstorming' for creative ideation), reducing conflict risk significantly.

3 / 3

Total

11

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is comprehensive in coverage but severely bloated, containing extensive marketing content, repeated instructions, full academic citations, and explanations that don't help Claude perform the task. The actionable guidance is present but often speculative or incomplete, and the monolithic structure makes it hard to navigate. Significant trimming and restructuring would dramatically improve its utility.

Suggestions

Cut the content by at least 60%: remove BibTeX citations, performance marketing stats, GitHub community info, and repository structure listing. Move publications and detailed examples to separate reference files.

Eliminate redundancy: installation and dataset cloning instructions appear 3+ times. Consolidate into a single Installation section.

Add explicit validation checkpoints to workflows: e.g., verify dataset format before generation, check hypothesis output file exists and is valid JSON before inference, validate config.yaml structure.

Replace vague CLI parameter descriptions ('Key parameters: Task configuration file path') with actual flag names and example values (e.g., `--config ./data/task/config.yaml --method hypogenic --num_hypotheses 20 --output ./output/`).

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~400+ lines. It includes extensive marketing-style content (proven results, community stats, GitHub stars), full BibTeX citations, explanations of concepts Claude already knows, and significant redundancy (installation and dataset cloning instructions repeated multiple times). The 'Related Publications' section alone is massive and adds little actionable value.

1 / 3

Actionability

The skill provides some concrete code examples and CLI commands, but many are incomplete or uncertain—Python API examples use speculative method signatures (e.g., `BaseTask` constructor, `task.generate_hypotheses()`) without confirming they match the actual library API. CLI parameters are described vaguely ('Key parameters' lists descriptions rather than actual flag names). The config.yaml example is illustrative but not fully executable.

2 / 3

Workflow Clarity

The workflow examples (Examples 1-3) provide reasonable step sequences, but lack explicit validation checkpoints. There's no guidance on verifying hypothesis quality, checking for errors in generation output, or validating dataset format before running. The literature processing workflow has clear steps but no error recovery beyond the troubleshooting section at the end.

2 / 3

Progressive Disclosure

The skill mentions `references/config_template.yaml` as an external reference, which is good. However, the vast majority of content is inlined in a single monolithic document—the publications, repository structure, workflow examples, and detailed API usage could all be split into separate reference files. The document would benefit greatly from being an overview that points to detailed materials.

2 / 3

Total

7

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (654 lines); consider splitting into references/ and linking

Warning

metadata_version

'metadata.version' is missing

Warning

Total

9

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.