Run task evals across multiple Claude models, compare results side-by-side, and identify which skill gaps are model-specific versus universal
96
Quality
97%
Does it follow best practices?
Impact
96%
1.65xAverage score across 3 eval scenarios
Passed
No known issues
{
"name": "tessl-labs/review-model-performance",
"version": "0.1.2",
"summary": "Run task evals across multiple Claude models, compare results side-by-side, and identify which skill gaps are model-specific versus universal",
"private": false,
"skills": {
"review-model-performance": {
"path": "skills/review-model-performance/SKILL.md"
}
}
}