CtrlK
BlogDocsLog inGet started
Tessl Logo

hypogenic

Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.

68

Quality

62%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/hypogenic/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly defines its niche in automated hypothesis generation/testing on tabular data, includes an explicit 'Use when' clause with concrete examples, and helpfully distinguishes itself from related skills. The main weakness is that the specific capabilities could be more concretely enumerated (e.g., what statistical tests, what outputs are produced) rather than staying at a somewhat abstract level.

Suggestions

Add more concrete action verbs describing specific capabilities, e.g., 'generates hypotheses from literature, runs statistical tests, produces significance reports on tabular datasets'

DimensionReasoningScore

Specificity

The description names the domain ('tabular datasets', 'hypothesis generation and testing') and some actions ('explore hypotheses about patterns'), but doesn't list multiple concrete actions like 'generate hypotheses, run statistical tests, produce reports'. The actions remain somewhat abstract.

2 / 3

Completeness

Clearly answers both 'what' (automated LLM-driven hypothesis generation and testing on tabular datasets, combining literature insights with data-driven testing) and 'when' (explicit 'Use when' clause plus differentiation guidance for when to use alternative skills instead).

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'hypothesis generation', 'hypothesis testing', 'tabular datasets', 'deception detection', 'content analysis', 'patterns in empirical data'. Also differentiates from related skills with terms like 'hypothesis-generation' and 'scientific-brainstorming'.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche (automated LLM-driven hypothesis testing on tabular data) and explicitly differentiates itself from related skills ('hypothesis-generation' for manual formulation, 'scientific-brainstorming' for creative ideation), reducing conflict risk significantly.

3 / 3

Total

11

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is comprehensive in coverage but severely bloated, containing marketing copy, full academic citations, community statistics, and repeated information that wastes token budget. While it provides some concrete code examples and clear workflow steps, it lacks validation checkpoints and could be dramatically improved by splitting into a concise overview with references to detailed sub-documents. The content reads more like a README/documentation page than a focused skill instruction.

Suggestions

Reduce content by 60-70%: remove BibTeX citations, marketing stats (GitHub stars, contributor counts), 'When to Use This Skill' section, 'Expected Outcomes' performance numbers, and 'Additional Resources' community section—none of these help Claude execute the skill.

Add validation checkpoints to workflows: after dataset preparation (validate JSON format), after hypothesis generation (check output file exists and contains expected structure), and after inference (verify results format).

Split into overview + reference files: move Configuration details, Dataset Format specification, Literature Processing steps, and Creating Custom Tasks into separate referenced documents, keeping SKILL.md as a concise quick-start with pointers.

Provide one complete, end-to-end executable example with a specific dataset rather than multiple partial examples using placeholder paths like 'your_task'.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~400+ lines. It includes extensive marketing-style content (proven results, community stats, GitHub stars), full BibTeX citations, explanations of concepts Claude already knows, repeated installation instructions, and sections like 'When to Use This Skill' and 'Key Features' that add little actionable value. The Related Publications section alone is massive and unnecessary for a skill file.

1 / 3

Actionability

The skill provides concrete CLI commands and Python code examples that appear executable, but many examples use placeholder paths like './data/your_task/config.yaml' without a complete working example. The Python API examples reference methods (task.generate_hypotheses, task.inference) whose exact signatures are uncertain, and the config.yaml template is incomplete with '{...}' placeholders.

2 / 3

Workflow Clarity

The workflow examples (Examples 1-3) provide clear step sequences, and the 'Creating Custom Tasks' section has a logical 5-step process. However, there are no validation checkpoints or error recovery steps in any workflow. For a multi-step process involving LLM API calls, dataset preparation, and PDF processing, the absence of verification steps (e.g., validate dataset format, check hypothesis output quality) is a notable gap.

2 / 3

Progressive Disclosure

The skill mentions 'references/config_template.yaml' as an external reference, which is good. However, the vast majority of content is inline in one monolithic file—the configuration details, all three workflow examples, full BibTeX citations, repository structure, and troubleshooting could all be split into separate reference files. The document would benefit greatly from being an overview that points to detailed sub-documents.

2 / 3

Total

7

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (656 lines); consider splitting into references/ and linking

Warning

metadata_version

'metadata.version' is missing

Warning

Total

9

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.