Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.
69
62%
Does it follow best practices?
Impact
74%
1.19xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/hypogenic/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly defines its niche in automated hypothesis generation/testing on tabular data, includes explicit 'Use when' guidance, and thoughtfully disambiguates from related skills. The main weakness is that the specific capabilities could be more concretely enumerated (e.g., what statistical tests, what outputs are produced). The use of second person 'you' in the trigger clause is a minor style issue but doesn't significantly harm clarity.
Suggestions
Add more concrete action verbs describing specific capabilities, e.g., 'Generates hypotheses from literature, runs statistical tests, produces summary reports on tabular datasets.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain ('tabular datasets', 'hypothesis generation and testing') and some actions ('explore hypotheses about patterns'), but doesn't list multiple concrete actions like 'generate hypotheses, run statistical tests, produce reports'. The actions remain somewhat abstract. | 2 / 3 |
Completeness | Clearly answers both 'what' (automated LLM-driven hypothesis generation and testing on tabular datasets, combining literature insights with data-driven testing) and 'when' (explicit 'Use when you want to systematically explore hypotheses about patterns in empirical data'). Also includes disambiguation guidance for related skills. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'hypothesis generation', 'hypothesis testing', 'tabular datasets', 'deception detection', 'content analysis', 'patterns in empirical data'. The examples in parentheses add useful trigger terms, and the cross-references to related skills ('hypothesis-generation', 'scientific-brainstorming') further clarify scope. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche (automated/LLM-driven hypothesis testing on tabular data) and explicitly differentiates itself from related skills ('hypothesis-generation' for manual formulation, 'scientific-brainstorming' for creative ideation), reducing conflict risk significantly. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is comprehensive in coverage but severely bloated, containing extensive marketing content, repeated instructions, full academic citations, and explanations that don't help Claude perform the task. The actionable guidance is present but often speculative or incomplete, and the monolithic structure makes it hard to navigate. Significant trimming and restructuring would dramatically improve its utility.
Suggestions
Cut the content by at least 60%: remove BibTeX citations, performance marketing stats, GitHub community info, and repository structure listing. Move publications and detailed examples to separate reference files.
Eliminate redundancy: installation and dataset cloning instructions appear 3+ times. Consolidate into a single Installation section.
Add explicit validation checkpoints to workflows: e.g., verify dataset format before generation, check hypothesis output file exists and is valid JSON before inference, validate config.yaml structure.
Replace vague CLI parameter descriptions ('Key parameters: Task configuration file path') with actual flag names and example values (e.g., `--config ./data/task/config.yaml --method hypogenic --num_hypotheses 20 --output ./output/`).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~400+ lines. It includes extensive marketing-style content (proven results, community stats, GitHub stars), full BibTeX citations, explanations of concepts Claude already knows, and significant redundancy (installation and dataset cloning instructions repeated multiple times). The 'Related Publications' section alone is massive and adds little actionable value. | 1 / 3 |
Actionability | The skill provides some concrete code examples and CLI commands, but many are incomplete or uncertain—Python API examples use speculative method signatures (e.g., `BaseTask` constructor, `task.generate_hypotheses()`) without confirming they match the actual library API. CLI parameters are described vaguely ('Key parameters' lists descriptions rather than actual flag names). The config.yaml example is illustrative but not fully executable. | 2 / 3 |
Workflow Clarity | The workflow examples (Examples 1-3) provide reasonable step sequences, but lack explicit validation checkpoints. There's no guidance on verifying hypothesis quality, checking for errors in generation output, or validating dataset format before running. The literature processing workflow has clear steps but no error recovery beyond the troubleshooting section at the end. | 2 / 3 |
Progressive Disclosure | The skill mentions `references/config_template.yaml` as an external reference, which is good. However, the vast majority of content is inlined in a single monolithic document—the publications, repository structure, workflow examples, and detailed API usage could all be split into separate reference files. The document would benefit greatly from being an overview that points to detailed materials. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (654 lines); consider splitting into references/ and linking | Warning |
metadata_version | 'metadata.version' is missing | Warning |
Total | 9 / 11 Passed | |
b58ad7e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.