analyze-results

Analyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says "analyze results", "compare", or needs to interpret experimental data.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

87%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A compact, well-structured analysis workflow that respects token budget and gives a concrete framework. The main gap is the absence of explicit validation/verification checkpoints in the multi-step process.

Suggestions

Add an explicit validation checkpoint (e.g., verify files are complete and metrics are present before building the comparison table) and a feedback loop for handling missing or malformed results.

Specify how to parse the JSON/CSV results (e.g., a concrete command or library) so the "Locate Results" step is copy-paste ready rather than directional.

Dimension	Reasoning	Score
Conciseness	Lean and efficient — uses terse bullets, names specific directories and metrics, and assumes Claude's competence without explaining basic concepts like JSON or statistics.	3 / 3
Actionability	Provides a concrete framework for an instruction-only task: specific directories to check, variables to organize by, statistical practices, and a four-part insight structure with a defined output format.	3 / 3
Workflow Clarity	Five steps are clearly sequenced, but there are no explicit validation checkpoints or feedback loops; checks like "flag outliers" are implicit rather than structured validate-then-retry gates.	2 / 3
Progressive Disclosure	A short (<50 line), single-purpose skill with well-organized sections and no need for external references, which satisfies the simple-skill allowance for a top score.	3 / 3
	Total	11 / 12 Passed

Description

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A concise, third-person description that clearly states capabilities and provides explicit usage triggers. Its only weakness is the generic "compare" trigger, which raises mild conflict risk.

Suggestions

Tie the "compare" trigger more tightly to the ML-results context (e.g., "compare models or runs") to reduce overlap with general comparison requests.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — "compute statistics", "generate comparison tables", "generate... insights" — matching the anchor for listing several specific concrete actions.	3 / 3
Completeness	Explicitly answers both what (analyze results, compute stats, generate tables/insights) and when ("Use when user says..."), with explicit triggers.	3 / 3
Trigger Term Quality	Natural phrases users would say — "analyze results", "compare", "interpret experimental data" — give good coverage of likely user requests.	3 / 3
Distinctiveness Conflict Risk	The ML-results niche is somewhat distinct, but the trigger "compare" is generic and could fire for unrelated comparison requests, leaving moderate overlap risk.	2 / 3
	Total	11 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: fe5963c

Reviewed: about 18 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.