ainativedev/latest-aidevcon-speakers-london-2026

AI Native DevCon 2026 London — all conference sessions as interactive skills

Quality

89%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Risky

Do not use without reviewing

Outline - Why Evals Are Hard and How We're Solving It

Name: ainativedev/latest-aidevcon-speakers-london-2026
Rating: 71.77 (1 reviews)
Author: ainativedev

Speaker

Simon Obstbaum and Rob Willoughby

Abstract

[inferred from filename and transcript] Why Evals Are Hard and How We're Solving It is an AI Native DevCon session covering AI evals, evaluation design, testing agents, measurement, quality gates, and agent reliability.

Thesis

[inferred] The talk's main contribution is its framing of AI evals, evaluation design, testing agents for practitioners working with AI-native software development.

Transcript Status

The source is timestamped speech-to-text output. Speaker labels, punctuation, and some technical terms may be imperfect. Use timestamps when citing.

Timeline

#	Timestamp range	Section	Summary
1	00:00-04:40	Opening and framing	I mentioned it before, and if you haven't, I've been in the room when I was talking about it.
2	04:44-09:35	Main discussion 1	The first piece that we covered was Repair By published a piece on AI engineers, found that roughly 10% of the engineers.
3	09:40-14:37	Main discussion 2	Looking out inside of the teams and looking at the individuals, we see that, you know, and I think everyone here in this room encountered someone.
4	14:40-18:57	Main discussion 3	yeah, it just it feels like so the tooling and instrumentation is essential to it.
5	19:01-23:05	Main discussion 4	But the things that you care about, the structure of these skills or changes, how it does it and how well it does it.
6	23:11-27:43	Main discussion 5	Unique testing is getting structure to what other countries are doing and that different is better.
7	27:47-31:57	Main discussion 6	So so there is a bunch of data on because under some regulation, no data.
8	32:02-36:34	Closing points	there is a code structure quality that you unlock, or you've come to consensus or there's some, some idea that the quality of the code matters or has an influence in the outcome t…

Named Concepts / Search Anchors

AI Evals - Topic named in the talk metadata and used as a search anchor for transcript Q&A.
Evaluation Design - Topic named in the talk metadata and used as a search anchor for transcript Q&A.
Testing Agents - Topic named in the talk metadata and used as a search anchor for transcript Q&A.
Measurement - Topic named in the talk metadata and used as a search anchor for transcript Q&A.
Quality Gates - Topic named in the talk metadata and used as a search anchor for transcript Q&A.
Agent Reliability - Topic named in the talk metadata and used as a search anchor for transcript Q&A.

Useful Search Terms

AI evals
evaluation design
testing agents
measurement
quality gates
agent reliability

Open Questions / Limits

The outline is generated from timestamped transcript text rather than speaker-provided slides.
Some transcript terms may be speech-to-text artifacts.
For precise claims, inspect transcript.md around the relevant timestamp.

Duration Marker

Last observed timestamp: 36:34

.tessl-plugin

talk-azriel-executable-specs-agentic-coding

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-cormack-tests-lie-observability-ai-honest

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-graziano-spec-driven-development

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-kerr-bipolar-disorder-dysregulation-ai

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-lopopolo-harness-engineering-humans-steer-agents-execute

talk-luebken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-obstbaum-willoughby-evals-hard

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-smith-connecting-context-future-transports

talk-stack-humans-architect-ai-writes-code

talk-stoneham-product-brain

talk-syme-agentic-repository-automation

talk-tal-skills-security

talk-thomas-ai-native-engineering

talk-trieloff-browser-agents

talk-walter-runtime-intelligence-agents

talk-wilson-cq-stack-overflow-for-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json