CtrlK
BlogDocsLog inGet started
Tessl Logo

hannaklim/ai-eval-report

Generates AI quality evaluation reports for LLM and ML-powered products — designs golden datasets, defines accuracy metrics, tracks quality across iterations, and produces stakeholder-ready summaries that explain probabilistic behaviour in business language. Use when evaluating AI or LLM output quality, building an eval framework or golden dataset, benchmarking accuracy between releases or prompt versions, reporting AI quality to clients or executives, or when the user asks "how good is our AI", "accuracy report", "eval results", "benchmark the model", or "why does the AI give different answers to the same question".

80

Quality

100%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

tile.json

{
  "name": "hannaklim/ai-eval-report",
  "version": "0.1.0",
  "summary": "Generates AI quality evaluation reports for LLM and ML-powered products — designs golden datasets, defines accuracy metrics, tracks quality across iterations, and produces stakeholder-ready summaries that explain probabilistic behaviour in business language. Use when evaluating AI or LLM output quality, building an eval framework or golden dataset, benchmarking accuracy between releases or prompt versions, reporting AI quality to clients or executives, or when the user asks \"how good is our AI\", \"accuracy report\", \"eval results\", \"benchmark the model\", or \"why does the AI give different answers to the same question\".",
  "skills": {
    "ai-eval-report": {
      "path": "SKILL.md"
    }
  },
  "private": false
}

SKILL.md

tile.json