The Tessl Registry now has security scores, powered by SnykLearn more
Logo
Back to podcasts

Why Every Developer needs to know about WebMCP Now

with Maximiliano Firtman

Transcript

Chapters

Trailer
[00:00:00]
AI DevCon
[00:01:19]
Introduction and Vanilla Web
[00:02:13]
WebMCP: what it is and why it matters
[00:06:25]
How agents browse websites today
[00:08:01]
Why screenshot-based browsing is inefficient
[00:10:20]
Exposing JavaScript functions to agents
[00:11:43]
User confirmation and trust delegation
[00:14:41]
WebMCP vs traditional APIs
[00:16:52]
Client-side use cases like Apple Pay
[00:18:00]
Headless browsers and playwright
[00:19:33]
WebMCP vs standard MCP
[00:22:47]
Declarative HTML version
[00:25:57]
Detecting when an agent controls your site
[00:27:16]
Who should adopt WebMCP first
[00:29:08]
WebMCP for end-to-end testing
[00:33:03]
Local models in the browser
[00:36:44]
Chrome's built-in Gemini Nano
[00:39:12]
WebAssembly, WebGPU and neural network APIs
[00:42:38]
Cost, latency and privacy benefits
[00:45:47]
Small models for specific tasks
[00:47:03]
Who's using client-side AI today
[00:50:28]
Browser sandboxing and security
[00:53:55]
The future of web apps and local AI
[00:56:49]
Wrap-up
[00:59:37]

In this episode

An agent cannot read your website. And that needs to change.

In this episode of AI Native Dev, Guy Podjarny sits down with Maximiliano Firtman, 30-year web developer and author of 14 books, to talk about what building for the web looks like when traffic comes from agents and humans both.


They get into:

  • why AI agents taking screenshots of your website is inefficient and expensive
  • what WebMCP is and how it gives agents a direct API into your website
  • how a 200MB Apple model running offline is opening up a whole new category of web apps
  • why every vibe coded app is a web app and what that means for the future of the web

Your next visitor might not be human. Are you ready for that?

WebMCP and Client-Side AI: The Browser as Agent Interface

Agents navigating websites today face an awkward problem: they have to reverse-engineer interfaces designed for human eyes. Screenshots get analyzed, coordinates get calculated, and buttons get clicked based on visual inference. It works, but it is expensive, slow, and fragile. When JavaScript moves a button between screenshots, the whole process breaks down.

In a recent episode of the AI Native Dev podcast, Guy Podjarny sat down with Maximiliano Firtman, a web developer with three decades of experience who has been tracking the intersection of web technologies and AI since the early ChatGPT days. The conversation explored two emerging capabilities: WebMCP, which gives agents a programmatic interface to websites, and client-side AI, which runs models directly in the browser.

Why Agents Struggle with Modern Websites

The current approach to agent browsing involves either taking screenshots and using image models to interpret them, or parsing the DOM directly. Neither works particularly well. Screenshot-based approaches suffer from timing issues and consume significant tokens. DOM parsing struggles because modern web development, particularly the React era, has produced pages full of non-semantic divs that offer little meaning to automated systems.

"When you look at the DOM that we are shipping to the user, it's not semantic," Max explained. "It's just a list of a hundred divs. So understanding what's there, the DOM might not be useful on every website."

Accessibility trees offer one alternative, similar to how screen readers navigate applications. But tests have shown inconsistent results, pushing most agent implementations back to the inefficient screenshot approach. WebMCP emerges as a potential solution: let website developers explicitly expose functions that agents can call directly.

How WebMCP Works

WebMCP, currently experimental in Chrome 146 behind a flag, allows web developers to register JavaScript functions as tools that agents can discover and invoke. The API is straightforward: call navigator.modelContext.registerTool with a name, a natural language description, and the function to execute. Agents browsing a WebMCP-enabled site see available services, match them against their goals, and call them directly rather than navigating the visual interface.

The functions execute client-side, which opens interesting possibilities. They can interact with local storage, trigger payment flows like Apple Pay that must happen in the browser, or maintain user authentication context. There is also a declarative HTML version where forms can be marked up with attributes that expose them as agent-callable actions without writing JavaScript.

This differs from traditional APIs in important ways. A REST endpoint requires the agent to manage authentication, understand the API structure, and make server requests. WebMCP functions live within the authenticated browser session, can access client-side state, and follow existing user flows. For shopping carts that live in the browser rather than the server, or payment processes that require client-side verification, WebMCP provides capabilities that server APIs cannot.

The specification also introduces new events: toolActivated and toolCanceled fire when agents take control, letting developers adjust the UI or notify users. CSS pseudo-classes allow styling changes when agents are operating. This creates transparency that does not exist with current browser automation approaches.

E-Commerce as the First Adoption Vector

The conversation surfaced e-commerce as the likely first major use case. Retailers want sales regardless of whether the buyer is human or agent. Exposing search, cart management, and checkout as WebMCP tools makes purchases faster and more reliable for agentic shoppers.

"You actually don't care if it's a human or an agent; you want the money from the consumer," Max observed. "That's a motivation to offer as many tools as possible as fast as possible so they can purchase your products or services quickly."

Content sites face different incentives. Publishers worried about agents extracting content without driving traffic may resist making their sites more agent-friendly. But for any site with forms, transactions, or service interactions, the value proposition is clearer. Support ticket systems, reservation platforms, and configuration interfaces all stand to benefit from structured agent access.

Client-Side AI: Models in the Browser

The second major topic was running AI models directly in user browsers rather than making cloud API calls. Chrome now supports built-in AI through Gemini Nano, a smaller model that downloads on first use (around four to eight gigabytes) and remains available for subsequent requests from any website.

Beyond built-in models, open-source libraries can load models like Llama or Gemma using WebAssembly for CPU, WebGPU for graphics processors, or WebNN for dedicated AI chips. These approaches work across browsers, including Safari and Firefox, enabling inference without any cloud dependency.

The use cases differ from general-purpose chat. "In a lot of cases, you don't need that power," Max noted about frontier models. Translation, content categorization, hate speech detection, and support chatbots can run on much smaller models. A 500-megabyte model running locally can handle specific tasks that would otherwise require cloud tokens.

The cost implications are significant. Support bots in particular have driven interest because token bills can escalate quickly, especially when users discover prompt manipulation. Moving inference client-side eliminates per-query costs entirely. "If you want to hack your prompt, go ahead; it's your computer," as Max put it. "You're hacking your own model, so I don't care if you do that."

Security and Privacy Advantages

Client-side execution also enables functionality that cloud APIs cannot safely support. End-to-end encrypted applications cannot send user data to external servers for processing. Running models locally keeps sensitive content within the browser sandbox, opening possibilities for AI features in privacy-focused applications.

The browser sandbox provides security boundaries that desktop AI tools lack. Each tab operates independently. Malicious websites cannot access another tab's model interactions. The same isolation that protects web browsing extends to local AI inference.

The Browser as AI Sandbox

The conversation pointed toward a future where browsers serve as comprehensive AI interfaces. WebMCP provides structured ways for external agents to interact with web applications. Client-side AI provides local inference capabilities for web applications to use. Together, they position the browser as a key platform for agentic computing.

For web developers, the practical implications are emerging. E-commerce sites should track WebMCP development and prepare to expose purchasing flows as agent-callable functions. Applications with significant token costs should evaluate whether client-side models can handle specific tasks more economically. Anyone building interactive web applications should consider how agents will navigate and interact with their interfaces.

The technology is experimental today, probably suited for pioneers and teams with specific requirements around cost or agent interaction. But in AI development, experimental tends to become standard faster than traditional timelines suggest. The browser has survived many predicted deaths. As more applications become web-based, partly because agents are building them, the intersection of web technology and AI capabilities seems likely to matter more, not less.

The full conversation covers additional ground on responsive web applications, the trajectory of context engineering (https://claude.ai/blog/context-engineering-guide) for web interfaces, and the technical details of model loading. Worth a listen for anyone building web applications that agents will need to use.

Chapters

Trailer
[00:00:00]
AI DevCon
[00:01:19]
Introduction and Vanilla Web
[00:02:13]
WebMCP: what it is and why it matters
[00:06:25]
How agents browse websites today
[00:08:01]
Why screenshot-based browsing is inefficient
[00:10:20]
Exposing JavaScript functions to agents
[00:11:43]
User confirmation and trust delegation
[00:14:41]
WebMCP vs traditional APIs
[00:16:52]
Client-side use cases like Apple Pay
[00:18:00]
Headless browsers and playwright
[00:19:33]
WebMCP vs standard MCP
[00:22:47]
Declarative HTML version
[00:25:57]
Detecting when an agent controls your site
[00:27:16]
Who should adopt WebMCP first
[00:29:08]
WebMCP for end-to-end testing
[00:33:03]
Local models in the browser
[00:36:44]
Chrome's built-in Gemini Nano
[00:39:12]
WebAssembly, WebGPU and neural network APIs
[00:42:38]
Cost, latency and privacy benefits
[00:45:47]
Small models for specific tasks
[00:47:03]
Who's using client-side AI today
[00:50:28]
Browser sandboxing and security
[00:53:55]
The future of web apps and local AI
[00:56:49]
Wrap-up
[00:59:37]