Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
68
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-constructed skill description with strong trigger term coverage (including bilingual terms) and a clear 'Use when' clause that explicitly defines activation conditions. The main weakness is that the 'what' portion could be more specific about the concrete actions performed beyond 'generate and rank.' Overall, it would perform well in a multi-skill selection scenario.
Suggestions
Expand the capability description with more specific actions, e.g., 'Generate, evaluate novelty of, and rank research ideas with feasibility assessments' to improve specificity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (research ideas) and two actions (generate and rank), but doesn't elaborate on specific concrete sub-actions like evaluating novelty, assessing feasibility, producing structured comparisons, or outputting ranked lists with justifications. | 2 / 3 |
Completeness | Clearly answers both 'what' (generate and rank research ideas given a broad direction) and 'when' (explicit 'Use when' clause with multiple trigger phrases and a general condition about exploring research areas). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including bilingual phrases ('找idea'), common phrasings ('brainstorm ideas', 'generate research ideas', 'what can we work on'), and a contextual description ('explore a research area for publishable directions'). These are terms users would naturally say. | 3 / 3 |
Distinctiveness Conflict Risk | The description targets a clear niche—research idea generation and ranking for publishable directions—with distinct triggers like '找idea' and 'generate research ideas' that are unlikely to conflict with general brainstorming or writing skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, highly actionable research idea generation skill with a clear multi-phase workflow, concrete templates, and good validation checkpoints. Its main weakness is length — at ~300 lines it pushes the boundary of what should be in a single SKILL.md, and some content (report templates, wiki update details) could be offloaded to referenced files. The workflow design is excellent with proper budget guards, timeout handling, and empirical validation through pilot experiments.
Suggestions
Move the detailed report template (Phase 6 markdown block) and wiki update instructions (Phase 7) into separate referenced files to reduce the main SKILL.md length by ~30%.
Remove minor redundancies: pilot budget constants are defined in Constants then re-explained in Phase 5 step 1; review tracing instructions appear both inline and in a separate section at the bottom.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is quite long (~300 lines) and includes some redundancy (e.g., pilot constants are defined then re-explained in Phase 5, review tracing is mentioned twice). However, most content is genuinely instructive and not explaining things Claude already knows. The verbosity is borderline justified by the complexity of the multi-phase workflow, but could be tightened by ~20-30%. | 2 / 3 |
Actionability | The skill provides highly concrete, executable guidance: specific bash scripts for wiki resolution, exact spawn_agent/send_input message templates, concrete report templates with markdown structure, specific GPU allocation examples, clear metric thresholds (e.g., 'if metric improves by > 1%'), and precise constants. The output format is fully specified with copy-paste-ready markdown templates. | 3 / 3 |
Workflow Clarity | The 7-phase workflow is clearly sequenced with explicit validation checkpoints: Phase 3 filters ideas before deep validation, Phase 4 runs novelty checks and critical review before piloting, Phase 5 has timeout/budget guards (PILOT_MAX_HOURS, PILOT_TIMEOUT_HOURS, MAX_TOTAL_GPU_HOURS), and there are clear feedback loops (re-rank based on pilot results, kill and collect partial results on timeout). The skill also handles edge cases like missing wiki, too-broad directions, and unavailable GPUs. | 3 / 3 |
Progressive Disclosure | The skill references several external files (shared-references/wiki-helper-resolution.md, review-tracing.md, output-versioning.md, output-manifest.md, output-language.md) and composes with other skills (/novelty-check, /research-review, /run-experiment), which is good structure. However, no bundle files were provided to verify these references exist, and the main SKILL.md itself is quite long — the detailed report template and wiki update instructions could arguably be split into separate reference files to keep the main skill leaner. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
a425a71
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.