Instinct-based learning system that observes sessions via hooks, creates atomic instincts with confidence scoring, and evolves them into skills/commands/agents. v2.1 adds project-scoped instincts to prevent cross-project contamination.
38
38%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
17%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description reads like a technical architecture summary or release note rather than a skill description designed for selection. It lacks natural trigger terms users would say, has no explicit 'Use when...' guidance, and relies heavily on internal jargon ('atomic instincts', 'confidence scoring', 'hooks') that obscures what the skill actually does for the user.
Suggestions
Add an explicit 'Use when...' clause describing the situations that should trigger this skill, e.g., 'Use when the user wants to automatically learn patterns from sessions and generate reusable skills or commands.'
Replace jargon with natural language trigger terms users might say, such as 'learn from my workflow', 'auto-generate skills', 'remember patterns', or 'create automation from observation'.
Rewrite the 'what' portion to describe user-facing outcomes rather than internal architecture, e.g., 'Automatically learns recurring patterns from user sessions and generates reusable skills, commands, and agents with confidence-based promotion.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names some actions like 'observes sessions via hooks', 'creates atomic instincts with confidence scoring', and 'evolves them into skills/commands/agents', but these are domain-specific jargon rather than concrete user-facing actions. The description is more about internal architecture than what it concretely does for the user. | 2 / 3 |
Completeness | While there is a partial 'what' (though expressed in abstract/architectural terms), there is no 'Use when...' clause or any explicit guidance on when Claude should select this skill. The description reads more like a changelog entry than a skill selector. | 1 / 3 |
Trigger Term Quality | The description uses highly technical jargon ('atomic instincts', 'confidence scoring', 'hooks', 'cross-project contamination') that users would almost never naturally say. There are no natural trigger terms a user would use when needing this skill. | 1 / 3 |
Distinctiveness Conflict Risk | The concept of 'instinct-based learning' is fairly unique and unlikely to conflict with common skills, but the mention of 'skills/commands/agents' is broad enough to potentially overlap with other meta-learning or automation skills. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
27%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is comprehensive in coverage but severely over-documented for a SKILL.md file. It reads more like a README/documentation page than an actionable skill, with extensive version comparison tables, architectural diagrams, and conceptual explanations that bloat the token budget. The actionable content (setup steps, commands) is buried among reference material that should be split into separate files.
Suggestions
Reduce the main file to Quick Start + Commands + Config essentials (~80 lines), moving version comparisons, file structure, scope guide, confidence scoring, and promotion details into separate reference files (e.g., REFERENCE.md, SCOPE-GUIDE.md)
Add validation checkpoints: after enabling hooks, show how to verify observations are being captured (e.g., 'Run a command, then check `cat ~/.claude/homunculus/projects/<hash>/observations.jsonl | tail -1`')
Remove the v1 vs v2 comparison table entirely — it's historical context that wastes tokens and isn't actionable
Add a concrete end-to-end example: trigger an observation, show the resulting instinct YAML, then demonstrate /instinct-status output
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~300+ lines. Includes extensive version comparison tables (v1 vs v2, v2 vs v2.1), explanations of concepts Claude can infer (why hooks vs skills, what an instinct is), marketing-style prose ('teaching Claude your patterns'), and a full ASCII architecture diagram. Much of this is reference documentation that doesn't need to be in the main skill file. | 1 / 3 |
Actionability | Provides concrete setup steps (JSON config for hooks, mkdir commands, CLI commands), but the core learning/observation process is abstract — there's no executable example of how to actually create an instinct, run the observer, or verify the system is working. The CLI commands are listed but not demonstrated with real input/output. | 2 / 3 |
Workflow Clarity | The Quick Start section provides a reasonable 3-step setup sequence, and the architecture diagram shows the flow. However, there are no validation checkpoints — no way to verify hooks are firing correctly, no way to confirm observations are being recorded, and no error recovery guidance if something fails. | 2 / 3 |
Progressive Disclosure | This is a monolithic wall of text with everything inline — version comparison tables, full file structure trees, scope decision guides, confidence scoring details, backward compatibility notes, and privacy information all in one file. The scope guide, confidence scoring, file structure, and promotion details should be in separate reference files with clear links from the main skill. | 1 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents