AI Native DevCon 2026 London — all conference sessions as interactive skills
71
89%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Simon Obstbaum and Rob Willoughby
[inferred from filename and transcript] Why Evals Are Hard and How We're Solving It is an AI Native DevCon session covering AI evals, evaluation design, testing agents, measurement, quality gates, and agent reliability.
[inferred] The talk's main contribution is its framing of AI evals, evaluation design, testing agents for practitioners working with AI-native software development.
The source is timestamped speech-to-text output. Speaker labels, punctuation, and some technical terms may be imperfect. Use timestamps when citing.
| # | Timestamp range | Section | Summary |
|---|---|---|---|
| 1 | 00:00-04:40 | Opening and framing | I mentioned it before, and if you haven't, I've been in the room when I was talking about it. |
| 2 | 04:44-09:35 | Main discussion 1 | The first piece that we covered was Repair By published a piece on AI engineers, found that roughly 10% of the engineers. |
| 3 | 09:40-14:37 | Main discussion 2 | Looking out inside of the teams and looking at the individuals, we see that, you know, and I think everyone here in this room encountered someone. |
| 4 | 14:40-18:57 | Main discussion 3 | yeah, it just it feels like so the tooling and instrumentation is essential to it. |
| 5 | 19:01-23:05 | Main discussion 4 | But the things that you care about, the structure of these skills or changes, how it does it and how well it does it. |
| 6 | 23:11-27:43 | Main discussion 5 | Unique testing is getting structure to what other countries are doing and that different is better. |
| 7 | 27:47-31:57 | Main discussion 6 | So so there is a bunch of data on because under some regulation, no data. |
| 8 | 32:02-36:34 | Closing points | there is a code structure quality that you unlock, or you've come to consensus or there's some, some idea that the quality of the code matters or has an influence in the outcome t… |
transcript.md around the relevant timestamp.Last observed timestamp: 36:34
.tessl-plugin
talk-azriel-executable-specs-agentic-coding
talk-batey-building-product-teams-age-of-ai
talk-birgitta-closing-keynote
talk-cormack-tests-lie-observability-ai-honest
talk-debois-agent-enablement
talk-douglas-training-ai-on-your-own-code
talk-dubnov-merge-rate-ai-adoption
talk-farley-vibe-coding-best-we-can-do
talk-firtman-web-mcp-agentic-web
talk-foxwell-reinvention-dev-team
talk-graziano-spec-driven-development
talk-groetzinger-skills-everywhere
talk-jones-odevo-ai-native-transformation
talk-jourdan-pipelines-to-prompts
talk-katsioloudes-code-security-ai
talk-kerr-bipolar-disorder-dysregulation-ai
talk-lamis-context-engineering-dreaming
talk-lawson-agent-experience
talk-lopopolo-harness-engineering-humans-steer-agents-execute
talk-luebken-embedding-pi-coding-agent
talk-maleix-collective-intelligence
talk-marsden-agent-desktops
talk-martinelli-spec-driven-development
talk-moss-skills-team-workflow
talk-obstbaum-willoughby-evals-hard
talk-overweg-one-brain-no-filtering
talk-podjarny-skills-are-the-new-code
talk-roberts-ai-native-brownfield
talk-roberts-brownfield-ai-native
talk-scheire-artificial-intelligence
talk-selajev-docker-sandboxes-agents
talk-sloan-harness-engineering-beyond-code
talk-smith-connecting-context-future-transports
talk-stack-humans-architect-ai-writes-code
talk-stoneham-product-brain
talk-syme-agentic-repository-automation
talk-tal-skills-security
talk-thomas-ai-native-engineering
talk-trieloff-browser-agents
talk-walter-runtime-intelligence-agents
talk-wilson-cq-stack-overflow-for-agents
talk-wotherspoon-humans-vs-slop