We're live on Product Hunt! Product huntUpvote & share feedback
Logo
Back to podcasts

From IBM Acquisition to AI-Native Observability | Dash0 CEO

with Mirko Novakovic

Transcript

Chapters

Trailer
[00:00:00]
Introduction
[00:01:15]
Deep Dive into AI Native Observability
[00:02:31]
Understanding OpenTelemetry and Its Benefits
[00:04:06]
AI Agents in Observability
[00:08:44]
Building and Integrating AI Agents
[00:16:35]
The Future of AI in Observability
[00:22:12]
Auto-Resolution and Team Collaboration
[00:28:33]
User Experience and Collaborative Troubleshooting
[00:34:08]
AI Agents as Primary Interaction Points
[00:38:52]
Rethinking Dashboards and User Experience
[00:41:10]
Adopting AI in Incident Resolution
[00:48:31]
Career Advice for Aspiring Software Engineers
[00:52:45]

In this episode

"Charts are good for users, not good for agents. Agents look at the underlying data and do deep analysis."

Mirko Novakovic built Instana, sold it to IBM, and now he's building Dash0, rethinking observability for agents, not humans.

In conversation with Guy Podjarny, he explains:
• why OpenTelemetry turned out to be perfect for AI
• how UX changes when agents are your primary users
• why interactive collaboration beats static chat outputs
• the survival question for observability vendors in the AI era

Only 2-3 people in most companies can truly debug production. That knowledge lives in their heads and disappears when they leave. Mirko's betting agents will change that.

Why Context Engineering Matters for AI-Native Observability

The observability space is undergoing a quiet transformation. As AI agents become central to how developers troubleshoot and monitor production systems, the question of how those agents consume and understand telemetry data has become critical. Context engineering, it turns out, may matter as much in DevOps as it does in coding assistants.

Mirko Novakovic brings a unique perspective to this shift. As the founder of Instana (acquired by IBM) and now CEO of Dash0, he has spent years thinking about how observability platforms should evolve. In a recent conversation on The AI Native Dev podcast, he shared how OpenTelemetry became an unexpected foundation for AI-native observability, and why designing for agents first is reshaping product development.

OpenTelemetry as Context for AI Agents

Before OpenTelemetry (OTel), every observability vendor created proprietary data formats. Instana had its own agent, Datadog had another, and telemetry data lived in silos. OTel changed that by standardizing not just the format of logs, metrics, and traces, but also the tagging system through Semantic Conventions.

What Mirko and his team discovered was surprising: LLMs already understand OpenTelemetry. Because OTel is open source, well-documented, and widely adopted, foundation models have been trained on its documentation and sample data. When you feed a trace into Claude, it recognizes that host.name is a hostname, that HTTP status code 404 indicates a problem, and can reason about the relationships between services.

"OpenTelemetry turned out to be really useful because all the models by default understand the format," Mirko explained. "It is like code. It has a syntax, it has semantics. And so it can actually do interesting things and analyze telemetry data."

This matters for context engineering in observability. The richer and more standardized the context you provide to an AI agent, the better its analysis. A trace decorated with proper Kubernetes metadata, service names, and semantic tags gives an agent everything it needs to pinpoint anomalies. Missing that context, and the agent is working blind.

Building AI Agents for Root Cause Analysis

Dash0's Agent Zero platform includes multiple specialized agents, with "The Seeker" focused specifically on troubleshooting. When a 3 AM incident fires, The Seeker helps SREs identify root causes by navigating system dependencies, correlating logs with traces, and surfacing anomalies across millions of data points.

The key insight is that LLMs excel at different tasks than humans. An AI agent struggles to process a million traces looking for anomalies directly, but give it the right tools and it performs remarkably well. Dash0 built a triage feature that compares traces and identifies patterns, like noticing that all errors share a specific customer ID. This tool is exposed through an MCP server (/blog/what-is-model-context-protocol), allowing external agents like Claude Code or Cursor to invoke it.

"We provide that tool to the agent through an MCP server," Mirko noted. "And the AI agent can now use that triage tool autonomously. It will say, okay, there is a problem, let's figure out if there are any anomalies."

This pattern of building specialized analytical tools that agents can invoke represents a shift in how observability platforms should think about their APIs. The question is no longer just "Can a human read this dashboard?" but "Can an agent call this function and reason about the result?"

Designing Products Agent-First

Perhaps the most striking insight from the conversation was Dash0's approach to product design. Rather than building for human users and then adding AI features, they now design every feature by first asking: "Can an agent do this work?"

Consider dashboards. Traditionally, a service dashboard displays charts showing request volume, response times, and error rates. Humans are good at visually spotting spikes in charts, which is exactly why we built dashboards this way. But agents do not need charts. They can analyze the underlying 5,000 data points directly and identify anomalies faster than any human scanning a visualization.

"We created charts because we, as humans, are really good at looking at the chart and seeing a spike," Mirko observed. "The agent will actually look at the underlying data and do a deep analysis. So now the charts get useless, because you don't need them anymore."

This does not mean charts disappear entirely, but their purpose changes. In an agent-first world, the primary interface might be a textual summary: "Your service is operating within normal parameters, but two new error types appeared in the last hour." The agent has already done the analysis. The human's role shifts to validation and decision-making.

Democratizing Observability Expertise

A persistent challenge in large organizations is that only a handful of experts truly understand system-wide dependencies. When incidents occur, these are the people called into war rooms because they know that when everything turns red, the real culprit is usually that one finicky database connected to everything else.

AI agents appear well-suited to democratize this expertise. By understanding system topology and having access to historical context, an agent can guide any developer through troubleshooting, not just the senior SRE who has memorized the dependency graph.

"Agents are good at understanding the wider scope and narrowing it down," Mirko explained. "Now we can enable everyone to basically have that knowledge about the overall system because the agent can give you that context."

The implications extend beyond incident response. Dash0 also builds agents for removing toil, such as automatically updating dashboards across fifty instances when a service changes, or keeping alert rules synchronized. These are tasks that humans find tedious and error-prone, but that agents handle reliably.

Where This Is Headed

The conversation surfaced an existential question for observability vendors: if users primarily interact with observability data through external agents in their IDE or terminal, does the observability platform become just a database? The value would shift to wherever the agent lives, triggering a race to the bottom on storage pricing.

Mirko's response is to make the interaction layer compelling enough that users want to work within the observability tool itself. That means building collaborative experiences where humans and agents work together, not just chat interfaces that produce answers. The agent surfaces insights, but the human can filter, adjust, and request re-analysis. It is interactive, not transactional.

For developers building AI-native applications, this conversation highlights a broader pattern. The tools we build need context engineering strategies that make them consumable by AI agents. That might mean adopting standards like OpenTelemetry, exposing MCP servers for tool invocation, or rethinking UX entirely around human-agent collaboration.

The shift is already happening. As Mirko noted, customers are rewriting their incident playbooks to start with asking the AI agent rather than following manual checklists. Worth keeping an eye on as observability continues to evolve.

Chapters

Trailer
[00:00:00]
Introduction
[00:01:15]
Deep Dive into AI Native Observability
[00:02:31]
Understanding OpenTelemetry and Its Benefits
[00:04:06]
AI Agents in Observability
[00:08:44]
Building and Integrating AI Agents
[00:16:35]
The Future of AI in Observability
[00:22:12]
Auto-Resolution and Team Collaboration
[00:28:33]
User Experience and Collaborative Troubleshooting
[00:34:08]
AI Agents as Primary Interaction Points
[00:38:52]
Rethinking Dashboards and User Experience
[00:41:10]
Adopting AI in Incident Resolution
[00:48:31]
Career Advice for Aspiring Software Engineers
[00:52:45]