CtrlK
BlogDocsLog inGet started
Tessl Logo

custom-index-eval

Iterative evaluation of Fusion Framework MCP search quality against documented domain patterns. Loads domain files from eval/index/, queries Fusion MCP for each pattern, validates recall against must/should requirements, and produces a human-readable pass/fail report. USE FOR: eval core, eval all, evaluate MCP index accuracy, validate search recall for a domain, check index freshness. DO NOT USE FOR: writing domain patterns, populating eval/index/ files, running CI pipelines, or batch automation.

63

Quality

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Content

50%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

Well-structured workflow with concrete tooling and good error handling, but weakened by a dangling reference to the missing agents/query-judge.md (which holds the judging logic), redundant trigger sections, and no batch-level verification checkpoint.

Suggestions

Add the missing agents/query-judge.md (or inline its pass/partial/fail judging criteria) so the core evaluation logic is actually available rather than referenced into a non-existent file.

Drop the "When to use" / "When not to use" sections or shrink them — they duplicate the description's USE FOR / DO NOT USE FOR triggers and add tokens without new information.

For "eval all" batch runs, add an explicit verification/aggregation checkpoint (e.g., confirm every domain file produced a verdict before generating the summary) to satisfy batch-workflow validation.

DimensionReasoningScore

Conciseness

It assumes Claude's intelligence (no preamble on what MCP or recall is), but the "When to use" / "When not to use" sections duplicate the description's USE FOR/DO NOT USE FOR triggers, "Expected output" restates the report template, and the example markdown table is lengthy — efficient but could be tightened.

2 / 3

Actionability

It names concrete tools and paths ("mcp_fusion_search_framework (preferred) or mcp_fusion_search", "eval/index/<domain>.md", "assets/report-template.md"), but the core judging logic is delegated to "agents/query-judge.md" which is not present, and the inline fallback ("check results against each must/should bullet") is vague on how pass/partial/fail is decided.

2 / 3

Workflow Clarity

A clear 5-step sequence with explicit error checkpoints (skip empty files, stop on MCP failure, mark ambiguous as partial, do not retry in a loop), but for a batch operation ("eval all") there is no verification step on intermediate results and the per-query verification is delegated to the missing sub-agent, capping clarity per the batch guideline.

2 / 3

Progressive Disclosure

The body is a clear overview with well-signaled one-level-deep references ("agents/query-judge.md", "assets/report-template.md"), but agents/query-judge.md does not exist in the bundle (the agents/ directory is absent) — only assets/report-template.md is present — so navigation to a critical resource is broken.

2 / 3

Total

8

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong description: concrete actions, explicit use/avoid triggers, third-person voice, and a clear distinct niche with no over-claims or vague fluff.

DimensionReasoningScore

Specificity

Lists multiple concrete actions — "Loads domain files from eval/index/", "queries Fusion MCP for each pattern", "validates recall against must/should requirements", and "produces a human-readable pass/fail report" — matching the anchor for several specific concrete actions.

3 / 3

Completeness

Clearly answers both what (iterative evaluation: load, query, validate, report) and when via an explicit "USE FOR" clause, plus a "DO NOT USE FOR" boundary, satisfying the top anchor for explicit triggers.

3 / 3

Trigger Term Quality

The "USE FOR" block surfaces natural phrases a user would say — "eval core", "eval all", "evaluate MCP index accuracy", "validate search recall for a domain", "check index freshness" — giving good coverage of real trigger terms.

3 / 3

Distinctiveness Conflict Risk

It targets a narrow niche (Fusion Framework MCP search-index evaluation) with distinct triggers and an explicit "DO NOT USE FOR" list, making it unlikely to fire for the wrong skill.

3 / 3

Total

12

/

12

Passed

Validation

93%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation15 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_field

'metadata' should map string keys to string values

Warning

Total

15

/

16

Passed

Repository
equinor/fusion-framework
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.