Generate eval scenarios from repo commits, configure multi-agent runs, execute baseline + with-context evals, and compare results — the full setup pipeline before improvement begins
79
Does it follow best practices?
Validation for skill structure
{
"name": "experiments/eval-setup",
"version": "0.3.0",
"summary": "Generate eval scenarios from repo commits, configure multi-agent runs, execute baseline + with-context evals, and compare results — the full setup pipeline before improvement begins",
"private": false,
"skills": {
"eval-setup": {
"path": "skills/eval-setup/SKILL.md"
}
}
}