20 Apr 20267 minute read

Cloudflare introduces “Agent Memory” to help AI agents remember across sessions
20 Apr 20267 minute read

AI agents can now run for longer, take on more work, and operate at something approaching production quality. But one of their main limitations is memory.
That problem sits at the centre of a new release from Cloudflare which introduces “Agent Memory”, a managed service designed to give agents persistent memory across sessions.
Agents generally operate within a limited context window, and even as those limits increase, more information doesn’t necessarily mean better results. Too much context can degrade output, while trimming this context risks losing important information.
The company describes this as a trade-off that existing systems struggle to resolve.
“A natural tension emerges between two bad options: keep everything in context and watch quality degrade, or aggressively prune and risk losing information the agent needs later,” Cloudflare’s Tyson Trautmann and Rob Sutter wrote in a blog post.
Cloudflare’s approach is to move memory out of the prompt entirely. Instead of keeping everything in context, Agent Memory extracts useful information from conversations and stores it separately, making it available when needed without filling up the model’s working window.
Cloudflare exposes that memory through a small set of operations — ingesting conversations, storing specific facts, recalling what is needed, listing stored memories, and forgetting outdated information.
A typical interaction might look like this:

Memory as a managed layer
At a technical level, Agent Memory sits alongside the agent rather than inside it.
The system handles the underlying work of extracting, deduplicating, and retrieving memory, returning a synthesized result when queried. This matters because memory isn’t just storage. It’s about deciding what is worth keeping, and when to bring it back.
Cloudflare’s Agent Memory is a managed service, where that logic is handled by the platform rather than left to the agent itself. That contrasts with approaches where agents are given direct access to a database or filesystem and must decide how to store and retrieve information, often burning tokens on that process instead of the task at hand.
Cloudflare positions this as a layer that can sit across a range of agent setups, from coding agents and managed services to custom-built systems that run autonomously in the background.
It also introduces the idea of shared memory profiles, where multiple agents and developers contribute to the same pool of knowledge — allowing teams to retain context such as coding decisions, review feedback, and internal conventions over time. Notably, Cloudflare says this can sit alongside individual agent systems such as Claude Code, as well as services like Anthropic’s Managed Agents, which include their own memory but leave storage and retrieval strategy to the agent rather than handling it as a separate system.
From stateless tools to long-running systems
Harnessing memory effectively changes what an agent is. Without it, important information is often lost as context is trimmed or reset between interactions. With it, agents can accumulate knowledge over time — user preferences, system behaviour, past decisions — and use that to inform future actions.
Cloudflare frames this as a way to support agents that run over longer periods, rather than focusing on individual, short-lived tasks. Agent Memory plugs into that model by preserving information when context is compacted, instead of discarding it.
“Agents running for weeks or months against real codebases and production systems need memory that stays useful as it grows — not just memory that performs well on a clean benchmark dataset that may fit entirely into a newer model's context window,” Trautmann and Sutter wrote.
That direction aligns with what other vendors are building. Anthropic, as alluded to earlier, recently rolled out Claude Managed Agents, where execution, state, and orchestration are handled as part of the service. These systems include their own memory, but typically leave how that memory is stored and retrieved up to the developer rather than handling it as a separate service.
Other efforts are approaching the same problem from a different angle. Projects like LanceDB are focusing on versioned context and retrieval systems, while frameworks such as Letta treat memory as a core component of the agent itself rather than an external service.
The missing piece
There’s a reason so many of these systems are converging on memory.
As agents become more capable, failures are less often about the model and more often about context — what the agent knows, what it remembers, and what it forgets.
In a recent Tessl podcast, Oracle’s Richmond Alake argued that most agent failures come down to memory, rather than model quality or infrastructure.
Cloudflare’s answer is to treat memory as a first-class service: something managed, persistent, and shared across agents, users, and tools.
The bet is that as agents move closer to real-world use, memory stops being an implementation detail and becomes part of the product itself.
Related Articles
More by Paul Sawers

Google adds subagents to Gemini CLI to handle parallel coding tasks
20 Apr 2026
Paul Sawers

Anthropic adds 'routines' to Claude Code for scheduled agent tasks
16 Apr 2026
Paul Sawers

A Proposed Framework For Evaluating Skills [Research Eng Blog]
15 Apr 2026
Maksim Shaposhnikov

Vercel open-sources Open Agents to help companies build their own AI coding agents
15 Apr 2026
Paul Sawers

The infrastructure gap: what we heard at AI Engineer Europe
14 Apr 2026
Jordan Sanders



