A/B testing, side-by-side comparison, and preference ranking for AI outputs.
31
23%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./gemini-extension/evaluation/skills/comparative-evaluation/SKILL.mdAbsolute quality scores are useful but limited. Comparative evaluation — putting outputs side by side and asking which is better — often reveals quality differences that rubrics miss.
A/B testing AI is different from A/B testing UI:
For human evaluation of AI outputs:
0e565c2
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.