hypogenic

Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.

Quality

58%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/hypogenic/SKILL.md

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly defines its niche in automated hypothesis generation/testing on tabular data, provides explicit 'Use when' guidance with concrete examples, and thoughtfully disambiguates from related skills. The main weakness is that the specific capabilities could be more concretely enumerated (e.g., what statistical tests, what outputs are produced). The use of second person 'you' in 'Use when you want' is a minor style issue but follows common patterns seen in good examples.

Suggestions

Add more concrete action verbs describing specific capabilities, e.g., 'Generates hypotheses from literature, runs statistical tests, produces significance reports on tabular datasets.'

Dimension	Reasoning	Score
Specificity	The description names the domain ('hypothesis generation and testing on tabular datasets') and some actions ('explore hypotheses about patterns'), but doesn't list multiple concrete actions like 'generate hypotheses, run statistical tests, produce reports'. The actions remain somewhat abstract.	2 / 3
Completeness	Clearly answers both 'what' (automated LLM-driven hypothesis generation and testing on tabular datasets, combining literature insights with data-driven testing) and 'when' (explicit 'Use when you want to systematically explore hypotheses about patterns in empirical data'). Also includes disambiguation guidance for related skills.	3 / 3
Trigger Term Quality	Includes strong natural keywords users would say: 'hypothesis generation', 'hypothesis testing', 'tabular datasets', 'deception detection', 'content analysis', 'patterns in empirical data'. The examples and cross-references to related skills ('hypothesis-generation', 'scientific-brainstorming') also serve as useful trigger differentiators.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche (automated LLM-driven hypothesis testing on tabular data) and explicitly differentiates itself from related skills ('hypothesis-generation' for manual formulation, 'scientific-brainstorming' for creative ideation), reducing conflict risk significantly.	3 / 3
	Total	11 / 12 Passed

Implementation

27%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is excessively verbose, containing marketing copy, full academic citations, repository structure documentation, and repeated content that inflates token usage without proportional value. While it provides some concrete code examples and CLI commands, the actionability is undermined by placeholder-heavy examples and lack of validation steps in workflows. The monolithic structure with dead references to non-existent bundle files represents poor progressive disclosure.

Suggestions

Reduce content by 60-70%: remove BibTeX citations, performance statistics, community stats, repository structure, and concept explanations. Keep only actionable instructions and code.

Split detailed content into bundle files: move dataset format specification to references/dataset_format.md, configuration details to references/config_guide.md, literature processing to references/literature_setup.md, and workflow examples to references/workflows.md.

Add explicit validation checkpoints to workflows, e.g., after hypothesis generation verify the output file exists and contains valid JSON, check hypothesis count matches expectations before proceeding to inference.

Provide one complete, minimal end-to-end working example with a specific dataset (e.g., the headline click prediction example) rather than multiple placeholder-heavy examples.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~400+ lines. Includes extensive marketing-style content (proven results, community stats, GitHub stars), full BibTeX citations, explanations of concepts Claude already knows, repeated installation instructions, and sections like 'Key Features' that describe rather than instruct. The Related Publications section alone is massive and adds little actionable value.	1 / 3
Actionability	Provides concrete CLI commands and Python code examples that appear executable, but many code snippets use placeholder paths like './data/your_task/config.yaml' without a complete working example. The Python API examples reference functions like `BaseTask` with methods that may not match the actual API (no verification). CLI examples show `--help` rather than concrete parameter values.	2 / 3
Workflow Clarity	Multi-step workflows are listed (e.g., Creating Custom Tasks has 5 steps, Literature Processing has 3 steps) but lack explicit validation checkpoints. For example, after hypothesis generation there's no step to verify output quality or check for errors before proceeding to inference. The troubleshooting section is reactive rather than integrated into workflows as validation gates.	2 / 3
Progressive Disclosure	The content is a monolithic wall of text with everything inlined. References to 'references/config_template.yaml' and 'scripts/' and 'assets/' directories are mentioned but no bundle files are provided, making these dead references. The document includes full BibTeX entries, repository structure, dataset format specs, and workflow examples all in one massive file that should be split across multiple reference documents.	1 / 3
	Total	6 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (656 lines); consider splitting into references/ and linking	Warning
metadata_version	'metadata.version' is missing	Warning

	Total	9 / 11 Passed

Repository: K-Dense-AI/claude-scientific-skills
Commit: cbcae7b

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.