custom-index-eval

Iterative evaluation of Fusion Framework MCP search quality against documented domain patterns. Loads domain files from eval/index/, queries Fusion MCP for each pattern, validates recall against must/should requirements, and produces a human-readable pass/fail report. USE FOR: eval core, eval all, evaluate MCP index accuracy, validate search recall for a domain, check index freshness. DO NOT USE FOR: writing domain patterns, populating eval/index/ files, running CI pipelines, or batch automation.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Evaluate Fusion MCP Search Index

When to use

Use this skill to evaluate whether Fusion MCP search returns accurate, relevant results for documented framework patterns.

Typical triggers:

eval core — evaluate all patterns in eval/index/core.md
eval http-services — evaluate patterns in eval/index/http-services.md
eval all — evaluate every domain file in eval/index/
"check MCP index accuracy for auth patterns"
"validate search recall for the state-data domain"
"how well does the index cover framework initialization?"
"run an eval pass against the index"

When not to use

Do not use this skill for:

Writing or populating domain pattern files (use normal editing workflow)
Creating new domain files in eval/index/ (follow the README template)
CI/CD integration or scheduled evaluation runs (future phase)
Batch automation without human review (future phase)
Searching the index for application development answers (use Fusion MCP directly)

Required inputs

Collect before execution:

Domain target: a specific domain file name (e.g., core, http-services) or all
Evaluation strictness (optional): strict (must patterns only) or full (must + should); default is full

If the user says only eval with no domain, ask which domain to evaluate or whether to run all.

Instructions

Step 1 — Resolve domain files

If the target is a specific domain (e.g., core), read eval/index/<domain>.md
If the target is all, list all .md files in eval/index/ except README.md and process each one sequentially
If the file does not exist or has no ## query sections, report it as empty and skip

Step 2 — Parse the domain file

Domain files have a simple structure:

# Domain Name — the domain heading
Paragraph below # — judgement instructions for how to evaluate results across all queries in this domain
## <query> — each ## heading is a search query to run against MCP
- must ... / - should ... — expectations for each query's results

Extract the judgement instructions once (use them as context when evaluating every query). Then build a list of queries, each with its must and should expectations.

Step 3 — Evaluate each query via judge sub-agent

For each ## heading, spin up a query-judge sub-agent (see agents/query-judge.md). Pass it:

The heading text (search query)
The must and should bullets for that query
The domain-level judgement instructions

The judge sub-agent will:

Search MCP using the heading text
Check results against each expectation
Return a verdict (pass / partial / fail) with counts and explanation

If the runtime does not support sub-agents, follow the same workflow inline:

Use the heading text as the search query
Call mcp_fusion_search_framework (preferred) or mcp_fusion_search
Check results against each must / should bullet
Record verdict with explanation

Collect all verdicts before producing the report.

Step 4 — Generate the evaluation report

Produce a report following the template in assets/report-template.md. The report includes:

Header: domain name, evaluation date, strictness level
Pattern results table: one row per pattern with name, requirement, verdict, and explanation
Summary statistics: total patterns, pass/partial/fail counts, pass rate
Recommendations: actionable next steps for failed or partial patterns

Step 5 — Present results

Print the full report to the conversation
For eval all, present a per-domain summary first, then offer to show detailed results for any domain
Highlight must failures prominently — these indicate critical index gaps
Suggest concrete remediation for each failure (e.g., "re-index package X", "add documentation for Y")

Expected output

A structured evaluation report containing:

Per-pattern pass/partial/fail verdicts with explanations
Summary statistics (total, pass rate, must vs should breakdown)
Prioritized recommendations for improving index coverage
Clear identification of stale or missing content

Example: `eval core`

User: eval core

Workflow:

Read eval/index/core.md
Note judgement instructions: "Results should reference @equinor/fusion-framework and @equinor/fusion-framework-module..."
Extract 5 queries from ## headings
For each, search MCP and check must/should bullets
Produce report:

## Evaluation Report: core

Date: 2026-03-14
Strictness: full
Domain: eval/index/core.md

| # | Query | Verdict | Explanation |
|---|-------|---------|-------------|
| 1 | How to initialize Fusion Framework | pass | Results mention FrameworkConfigurator, init, configureMsal |
| 2 | How to create a custom module | pass | Module interface, BaseConfigBuilder, BaseModuleProvider all covered |
| 3 | Module lifecycle phases | partial | Lifecycle order shown but postConfigure/postInitialize hooks not detailed |
| 4 | How to configure an app with AppModuleInitiator | fail | No results reference AppModuleInitiator |
| 5 | How to listen to framework events | pass | addEventListener, dispatchEvent, event module all returned |

### Summary
- Queries: 5 | Pass: 3 | Partial: 1 | Fail: 1
- Must expectations met: 14/17 (82%)

### Recommendations
1. **[CRITICAL]** Query 4: Re-index `packages/app/src/types.ts` and `cookbooks/app-react/src/config.ts`
2. **[IMPROVE]** Query 3: Enrich lifecycle section in `packages/modules/module/README.md`

Safety & constraints

This skill is read-only with respect to the repository — it never modifies domain files or index content
MCP search calls are read-only queries; no mutations are performed
Do not fabricate pass verdicts — if results are ambiguous, mark as partial with explanation
If MCP is unavailable or rate-limited, report the failure clearly and stop; do not retry in a loop
Evaluation results reflect index state at query time; they are not cached across sessions

Repository: equinor/fusion-framework
Commit: 54fb2a7

Last updated: 4 days ago
Created: 4 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.