ainativedev/aidevcon-2026-ldn

AI Native DevCon 2026 London — all conference sessions as interactive skills

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quotes -- Benchmarking the Agent Era: Measuring Performance Beyond the LLM

Name: ainativedev/aidevcon-2026-ldn
Rating: 70.61 (1 reviews)
Author: ainativedev

Short excerpts selected from the transcript for grounding answers. Preserve transcript artifacts when quoting.

Agent-era benchmarking

So you're going to be talking about benchmarking the benchmarking the agent era and you're ready to go. So I will leave you to it.

Source: L0007-L0011

Inference performance

three things. One, what does the agentic benchmark look agentic workload looks like? Second, what are the optimizations that already exist to run these agentic workloads efficiently? And third, what

Source: L0032-L0036

Workflow complexity

these applications were not using tools. Now in the agentic workloads what we are seeing is dozens of turns and I'll give you concrete examples of what what I mean. U the LLMs are getting called

Source: L0060-L0064

Tool calls

talk will mostly focus on the coding agent side of things. This workload might diff look different if you're talking about different kinds of agent but here the focus is primarily on the

Source: L0135-L0139

Context length

replicas of the same model that you're using in your agentic workloads. So turn one end up on replica A. If you do a roundroin kind of a setup, the turn two might end up on some other replica, but

Source: L0189-L0193

Real-world workload measurement

trajectory which is completely different from what we see in agentic workloads on the agentic workloads. Now what we miss there's no fixed shape as I saw the trajectory keeps growing as you are

Source: L0318-L0322

Agent-era benchmarking

So all this slide is trying to say in the agentic workload you can't just talk about mean median you have to look at distributions you have to talk about metrics in terms of distributions in

Source: L0379-L0383

Inference performance

representing your workload correctly especially in agentic workloads those gray regions where this stuff is being run on CPU is very important and that kind of helps you getting the right

Source: L0500-L0504

.tessl-plugin

talk-azriel-executable-specs

talk-baker-sadogursky-context-engineering-skills

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-cormack-tests-lie-observability-ai

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-kerr-bipolar-disorder-dysregulation-ai

talk-kushwaha-benchmarking-agent-era

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-lopopolo-harness-engineering

talk-lubken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-obstbaum-willoughby-vibes-to-metrics

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-ruiz-agents-on-canvas-tldraw

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-smith-connecting-context-future-transports

talk-stack-humans-architect-ai-writes-code

talk-syme-agentic-repository-automation

talk-thomas-ai-native-engineering

talk-trieloff-browser-agents

talk-walter-runtime-intelligence-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/aidevcon-2026-ldn

quote.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-kushwaha-benchmarking-agent-era/

Quotes -- Benchmarking the Agent Era: Measuring Performance Beyond the LLM

Agent-era benchmarking

Inference performance

Workflow complexity

Tool calls

Context length

Real-world workload measurement

Agent-era benchmarking

Inference performance

quote.mdtalk-kushwaha-benchmarking-agent-era/