Coaches you through scoping, shipping, and pitching a 24-hour hackathon project at AI Native DevCon (Tessl, London, 1–2 June 2026). Spec-first, track-aware, demo-obsessed. Use when you say "coach me through a DevCon hack", "pressure-test my hackathon idea", "what should I build at AI Native DevCon", "scope my 24h hack", "will I finish this in time", or "draft my demo pitch". Refuses to let you write code before a one-page spec exists.
100
100%
Does it follow best practices?
Impact
100%
1.69xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly runs Phase 3 of the coaching workflow — using the exact four checkpoints at the exact specified hours, pushing back on vague 'pretty well' milestone language, demanding concrete named artefacts at each checkpoint, refusing scope creep, and meeting the exit gate.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Phase announcement",
"description": "The coach explicitly announces Phase 3 by name (e.g. 'Phase 3 — Plan it' or equivalent) before starting",
"max_score": 5
},
{
"name": "Four checkpoints present",
"description": "The plan includes all four checkpoint names: Smoke test, Golden path, Second scenario, and Pitch dry-run",
"max_score": 10
},
{
"name": "Correct checkpoint hours",
"description": "Each checkpoint is assigned the correct hour: Smoke test = T+2h, Golden path = T+8h, Second scenario = T+16h, Pitch dry-run = T+22h",
"max_score": 12
},
{
"name": "Pushback on vague milestone",
"description": "The coach pushes back on Jordan's 'pretty well by hour 8' statement and demands a concrete named artefact (file, endpoint, or screen)",
"max_score": 12
},
{
"name": "Concrete artefacts at all checkpoints",
"description": "The final plan.md lists a specific named artefact (not just a vague state) at each of the four checkpoints",
"max_score": 12
},
{
"name": "Scope creep refused",
"description": "The coach refuses Jordan's suggestion to add a similarity heat-map visualization, labelling it as v2 or out of scope rather than incorporating it",
"max_score": 12
},
{
"name": "Scope-creep phrasing",
"description": "The coach uses language consistent with 'That's v2. Write it down and move on.' when dismissing the scope creep (may be paraphrased)",
"max_score": 8
},
{
"name": "Checkpoint table format",
"description": "The plan.md presents the four checkpoints in a table or structured list format showing checkpoint name, hour, and artefact",
"max_score": 8
},
{
"name": "Exit gate confirmed",
"description": "The Phase 3 Complete section explicitly states that Jordan has committed to a named artefact at each of the four checkpoints",
"max_score": 10
},
{
"name": "No implementation code",
"description": "The coach does NOT write or suggest any actual code implementation during Phase 3",
"max_score": 6
},
{
"name": "No skipping to Phase 4 early",
"description": "The coach does NOT move to pitch/Phase 4 content until all four checkpoint artefacts are concretely defined",
"max_score": 5
}
]
}