They're going to be talking about from vibes to metrics and how to actually measure what your agents do. Over to you. >> Cool. So, what we're here to talk to you

Source: L0054-L0058

Trajectory evals

the levels and we correlate the levels with the output measurement that we have uh shown in in the beginning. So just when we look at uh okay why do we trust the four levels? So in terms of

Source: L0302-L0306

Agent instrumentation

different views on how to be assessing those metrics. one kind of top down looking at a correlational studies across kind of a whole bunch a whole big part of the industry, one bottoms up

Source: L0078-L0082

Skill activation

lot of time in thinking how could we even measure output and and subsequently productivity So what we found uh to work is um that we have the engineer, he writes the code and then we have a panel

Source: L0139-L0143

Compliance measurement

we're starting to see it now. So people that know how to orchestrate agents, people that know how to work with AI, they achieve significantly better outcomes.

Source: L0243-L0247

Coverage metrics

duplication goes down, code uh cognitive complexity goes down. So all metrics that we analyzed are actually now improving today with applying AI and and that wasn't always so like in the

Source: L0332-L0336

Output evals

On the instruction following, this is the rubrics, the metrics that are grounded specifically in what the skill is telling the agent how to do. So if you have your own internal design

Source: L0422-L0426

Trajectory evals

that unique flavor of how you as a company want your agents to be operating? So this is kind of a concrete example that we found that I found super interesting. So I use a lot of hugging

Source: L0495-L0499

.tessl-plugin

talk-azriel-executable-specs

talk-baker-sadogursky-context-engineering-skills

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-cormack-tests-lie-observability-ai

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-kerr-bipolar-disorder-dysregulation-ai

talk-kushwaha-benchmarking-agent-era

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-lopopolo-harness-engineering

talk-lubken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-obstbaum-willoughby-vibes-to-metrics

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-ruiz-agents-on-canvas-tldraw

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-smith-connecting-context-future-transports

talk-stack-humans-architect-ai-writes-code

talk-syme-agentic-repository-automation

talk-thomas-ai-native-engineering

talk-trieloff-browser-agents

talk-walter-runtime-intelligence-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/aidevcon-2026-ldn

quote.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-obstbaum-willoughby-vibes-to-metrics/

Quotes -- From Vibes to Metrics: How to Actually Measure What Your AI Agents Do

Output evals

Trajectory evals

Agent instrumentation

Skill activation

Compliance measurement

Coverage metrics

Output evals

Trajectory evals

quote.mdtalk-obstbaum-willoughby-vibes-to-metrics/