Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.
52
58%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/hypogenic/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly defines its niche in automated hypothesis generation/testing on tabular data, provides explicit 'Use when' guidance with concrete examples, and thoughtfully disambiguates from related skills. The main weakness is that the specific capabilities could be more concretely enumerated (e.g., what statistical tests, what outputs are produced). The use of second person 'you' in 'Use when you want' is a minor style issue but follows common patterns seen in good examples.
Suggestions
Add more concrete action verbs describing specific capabilities, e.g., 'Generates hypotheses from literature, runs statistical tests, produces significance reports on tabular datasets.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain ('hypothesis generation and testing on tabular datasets') and some actions ('explore hypotheses about patterns'), but doesn't list multiple concrete actions like 'generate hypotheses, run statistical tests, produce reports'. The actions remain somewhat abstract. | 2 / 3 |
Completeness | Clearly answers both 'what' (automated LLM-driven hypothesis generation and testing on tabular datasets, combining literature insights with data-driven testing) and 'when' (explicit 'Use when you want to systematically explore hypotheses about patterns in empirical data'). Also includes disambiguation guidance for related skills. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'hypothesis generation', 'hypothesis testing', 'tabular datasets', 'deception detection', 'content analysis', 'patterns in empirical data'. The examples and cross-references to related skills ('hypothesis-generation', 'scientific-brainstorming') also serve as useful trigger differentiators. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche (automated LLM-driven hypothesis testing on tabular data) and explicitly differentiates itself from related skills ('hypothesis-generation' for manual formulation, 'scientific-brainstorming' for creative ideation), reducing conflict risk significantly. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
27%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is excessively verbose, containing marketing copy, full academic citations, repository structure documentation, and repeated content that inflates token usage without proportional value. While it provides some concrete code examples and CLI commands, the actionability is undermined by placeholder-heavy examples and lack of validation steps in workflows. The monolithic structure with dead references to non-existent bundle files represents poor progressive disclosure.
Suggestions
Reduce content by 60-70%: remove BibTeX citations, performance statistics, community stats, repository structure, and concept explanations. Keep only actionable instructions and code.
Split detailed content into bundle files: move dataset format specification to references/dataset_format.md, configuration details to references/config_guide.md, literature processing to references/literature_setup.md, and workflow examples to references/workflows.md.
Add explicit validation checkpoints to workflows, e.g., after hypothesis generation verify the output file exists and contains valid JSON, check hypothesis count matches expectations before proceeding to inference.
Provide one complete, minimal end-to-end working example with a specific dataset (e.g., the headline click prediction example) rather than multiple placeholder-heavy examples.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~400+ lines. Includes extensive marketing-style content (proven results, community stats, GitHub stars), full BibTeX citations, explanations of concepts Claude already knows, repeated installation instructions, and sections like 'Key Features' that describe rather than instruct. The Related Publications section alone is massive and adds little actionable value. | 1 / 3 |
Actionability | Provides concrete CLI commands and Python code examples that appear executable, but many code snippets use placeholder paths like './data/your_task/config.yaml' without a complete working example. The Python API examples reference functions like `BaseTask` with methods that may not match the actual API (no verification). CLI examples show `--help` rather than concrete parameter values. | 2 / 3 |
Workflow Clarity | Multi-step workflows are listed (e.g., Creating Custom Tasks has 5 steps, Literature Processing has 3 steps) but lack explicit validation checkpoints. For example, after hypothesis generation there's no step to verify output quality or check for errors before proceeding to inference. The troubleshooting section is reactive rather than integrated into workflows as validation gates. | 2 / 3 |
Progressive Disclosure | The content is a monolithic wall of text with everything inlined. References to 'references/config_template.yaml' and 'scripts/' and 'assets/' directories are mentioned but no bundle files are provided, making these dead references. The document includes full BibTeX entries, repository structure, dataset format specs, and workflow examples all in one massive file that should be split across multiple reference documents. | 1 / 3 |
Total | 6 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (656 lines); consider splitting into references/ and linking | Warning |
metadata_version | 'metadata.version' is missing | Warning |
Total | 9 / 11 Passed | |
cbcae7b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.