Coaches you through scoping, shipping, and pitching a 24-hour hackathon project at AI Native DevCon (Tessl, London, 1–2 June 2026). Spec-first, track-aware, demo-obsessed. Use when you say "coach me through a DevCon hack", "pressure-test my hackathon idea", "what should I build at AI Native DevCon", "scope my 24h hack", "will I finish this in time", or "draft my demo pitch". Refuses to let you write code before a one-page spec exists.
100
100%
Does it follow best practices?
Impact
100%
1.69xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly runs Phase 2 of the coaching workflow — proposing exactly three hack angles with required elements, enforcing the spec template fields, capping features at three, requiring a stage-directions demo moment, and meeting the exit gate before moving on. Sam starts with five features and wants to skip to the timeline.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Phase announcement",
"description": "The coach explicitly announces Phase 2 by name (e.g. 'Phase 2 — Spec it' or equivalent) before beginning Step A",
"max_score": 5
},
{
"name": "Skip-ahead refused",
"description": "The coach does NOT jump straight to the timeline when Sam asks to — Phase 2 steps are carried out first",
"max_score": 8
},
{
"name": "Exactly three angles proposed",
"description": "The coach proposes exactly three (not two, not four) hack angles in Step A",
"max_score": 10
},
{
"name": "Angles include one-line description",
"description": "Each of the three proposed angles includes a one-line description of the concept",
"max_score": 7
},
{
"name": "Angles include demo moment as stage directions",
"description": "Each of the three proposed angles includes a demo moment written as stage directions (Judge does X / system does Y / judge sees Z), not as an abstract concept",
"max_score": 10
},
{
"name": "Angles include feasibility note",
"description": "Each of the three proposed angles includes an explanation of what makes it feasible within 24 hours",
"max_score": 7
},
{
"name": "Feature cap enforced",
"description": "The final spec's 'What's in' section contains at most 3 items — Sam's original list of 5 features is cut down",
"max_score": 10
},
{
"name": "All spec fields filled",
"description": "The produced spec.md contains all required fields: Goal, User, Demo moment, What's in, What's out, Success in 24h, Red flags — none are blank",
"max_score": 10
},
{
"name": "Concrete demo moment in spec",
"description": "The Demo moment in spec.md is written as concrete stage directions (what the judge does, what the system does, what the judge sees) — NOT an abstract description",
"max_score": 12
},
{
"name": "What's out names temptations",
"description": "The 'What's out' section in spec.md explicitly names specific features that were cut (e.g. the dashboard, Slack notifications, or VS Code extension from Sam's original list)",
"max_score": 8
},
{
"name": "Exit gate confirmed",
"description": "The session log contains a Phase 2 Complete section confirming all fields are filled and the demo moment meets the stage-directions requirement",
"max_score": 8
},
{
"name": "No code started",
"description": "The coach does NOT suggest writing any code or implementation steps during Phase 2",
"max_score": 5
}
]
}