Name: sharaf/migrate-to-tessl
Rating: 100 (1 reviews)
Author: sharaf

sharaf/migrate-to-tessl

Use when migrating, restructuring, publishing, or auditing an existing Claude skill into a Tessl tile; converting flat .md files or SKILL.md bundles; fixing Tessl Quality, Impact, Uplift, frontmatter, metadata, tile.json summary, README, markdown reference links, registry-vs-local Quality gaps, artifact anchors, auto-eval wait discipline, or pushing tile scores from 88-99% to 100%.

100

1.11x

Quality

100%

Does it follow best practices?

Impact

100%

1.11x

Average score across 4 eval scenarios

Securityby

Advisory

Suggest reviewing before use

{
  "context": "Tests whether the agent applies the migrate-to-tessl eval-count aggregate rule in a noisier score-triage setting and can design a stronger future eval scenario for the same failure mode. The output must diagnose that only two eval scenarios cap aggregate at 0.9333 even though all score-bearing dimensions are perfect, reject security/manual verification/wording churn, and propose a non-obvious scenario repair.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Writes requested files",
      "description": "score-triage.md and scenario-repair-plan.md both exist and cover the requested diagnosis and scenario-repair sections",
      "max_score": 6
    },
    {
      "name": "Identifies eval count as blocker",
      "description": "score-triage.md states that the aggregate score is capped because the target tile has fewer than 3 eval scenarios, not because the skill content is still weak",
      "max_score": 10
    },
    {
      "name": "Cites exact count and aggregate",
      "description": "score-triage.md cites `scores.evals.count: 2` and `scores.aggregate: 0.9333` (or 93%) for sharaf/release-note-architect from inputs/registry-search.json as key evidence",
      "max_score": 8
    },
    {
      "name": "Recognizes perfect score-bearing dimensions",
      "description": "score-triage.md notes that the target tile's `scores.quality`, `scores.impact`, and `scores.evals.average` are all `1`, so those dimensions are already satisfied",
      "max_score": 8
    },
    {
      "name": "Explains three-scenario threshold",
      "description": "score-triage.md explains that observed aggregate caps are 0.8667 for one scenario, 0.9333 for two scenarios, and 1.0 for three or more scenarios when other score-bearing fields are perfect",
      "max_score": 8
    },
    {
      "name": "Uses comparison evidence",
      "description": "score-triage.md uses the comparison entries in registry-search.json (one-scenario 0.8667 and three-scenario 1.0 examples) to support the eval-count diagnosis",
      "max_score": 8
    },
    {
      "name": "Rejects wrong hypotheses",
      "description": "score-triage.md explicitly rejects security advisory, missing manual verification, Quality, Impact, README/SKILL wording churn, and other broad content-polish passes as the next fix for this specific aggregate gap",
      "max_score": 8
    },
    {
      "name": "Prescribes adding a real third scenario",
      "description": "score-triage.md recommends adding at least one real, quality-checked eval scenario so the target tile has at least 3 scenarios total",
      "max_score": 6
    },
    {
      "name": "Includes publish and auto-eval wait",
      "description": "score-triage.md says to publish a new version with the added scenario and wait for the exact auto-eval run to complete before judging the score",
      "max_score": 8
    },
    {
      "name": "Verifies with registry search JSON",
      "description": "score-triage.md uses `tessl search --json release-note-architect` or equivalent as the final verification source and says to check aggregate 1 plus eval count at least 3",
      "max_score": 5
    },
    {
      "name": "Uses tile info appropriately",
      "description": "score-triage.md treats `tessl tile info` as useful summary evidence for Quality/Security status, but not as the source of the total aggregate score",
      "max_score": 4
    },
    {
      "name": "Identifies answer leakage risk",
      "description": "scenario-repair-plan.md explains that a direct prompt asking why the score is 93% with an obvious `evals.count: 2` field is too easy and can inflate baseline performance",
      "max_score": 5
    },
    {
      "name": "Designs harder fixtures",
      "description": "scenario-repair-plan.md proposes noisy fixture inputs such as registry search results with comparison tiles, tile-info output with security distractors, teammate hypotheses, and/or eval inventory rather than a single obvious score object",
      "max_score": 7
    },
    {
      "name": "Proposes concrete task framing",
      "description": "scenario-repair-plan.md frames the future eval as score triage or migration incident response that requires diagnosing and planning the fix, rather than merely answering a leading question",
      "max_score": 4
    },
    {
      "name": "Proposes weighted rubric totaling 100",
      "description": "scenario-repair-plan.md includes a proposed weighted rubric whose checks total 100 points and cover eval-count diagnosis, wrong-hypothesis rejection, add-scenario fix, publish/wait verification, and non-obvious scenario design; answer-leakage risk may be handled in the rubric or adjacent scenario-design guidance",
      "max_score": 5
    }
  ]
}

sharaf/migrate-to-tessl

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-4/

criteria.jsonevals/scenario-4/