Context Architecture replaces RAG as agent AI pushes business recovery to its limits

Redis built its name as a caching layer that keeps web applications from collapsing under load. The problem now being addressed has a similar structure but is harder to solve: productive AI agents fail not because the models are flawed, but because the underlying data is fragmented, primitive and designed for humans rather than machines. Retrieval pipelines built for a single query cannot absorb the volume agents generate.
The gap that Redis is targeting is structural: agents make orders of magnitude more data requests than human users, but many retrieval layers are designed for the human-scale problem. Redis Iris, unveiled Monday, is the company’s answer: a core and in-memory platform that sits between an agent and the data it needs to operate. The platform includes real-time data entry, a semantic interface that automatically generates MCP tools from business data models, and an agent memory server built on Redis Flex, a rewritten storage engine that uses 99% of data in flash at a tenth of the cost of memory storage alone.
The announcement comes as RAG’s corporate infrastructure is in active transition. VentureBeat’s Q1 2026 VB Pulse The RAG Infrastructure Market Tracker found that buyer intent to use tripled compound returns from 10.3% to 33.3% between January and March. Return preparation has passed the test as the most important business investment for the first time. Custom stacks of internal returns rose from 24.1% to 35.6% as businesses ran out of options. Redis isn’t the only infrastructure vendor reading those signs — several data platform providers have also sided with agent core layers in recent weeks.
Inconsistency of scale is the structural argument behind the launch.
"Companies will be orders of magnitude larger than people," Rowan Trollope, CEO of Redis, said VentureBeat. "Orders of magnitude more agents than people means orders of magnitude more load on backend systems."
From cache to context
Trollope traces the parallel back to the mobile era: When legacy databases built for branch accountants suddenly had to serve a million smartphone users, Redis became the storage layer that absorbed the load without a full rebuild.
The exception this time is that agents cannot write their own middleware. In the mobile era, the developer sat with the database administrator, identified the queries the application needed and hard-coded the cache logic into the middleware layer. Agents can’t do that. They need to get the right data at runtime, using predefined links, or they stop.
"This is similar to the grocery store refrigerator analogy," he said. "If every time you have to go make your own sandwich, you have to run to the grocery store to get food, that’s not very efficient. You put a fridge in every house, you keep a little food there. And that’s where we still tend to be in the infrastructure stack."
Including Redis iris
Iris deploys five components including data ingestion, semantic access, memory and caching.
Redis Data Integration. Now with general availability. RDI uses flexible data capture pipelines to sync data from relational databases, warehouses and document stores to Redis continuously, with connectors for Oracle, Snowflake, Databricks and Postgres.
Context Retriever. It is now being seen first. Developers define a semantic model of business data using pydantic models and Redis that automatically generates MCP tools for agents to query it directly, with row-level access controls enforced on the server side. Trollope describes the departure from classical RAG as a reversal of direction. "It’s just a flip to let the agent pull the data instead of guessing and putting it into the pipeline," he said.
Agent memory. It is now being seen first. It maintains short-term and long-term state across sessions so that agents manage context without retrieving it at each turn.
Redis Flex. A rewritten storage engine that uses 99% of data on SSDs and 1% in RAM, delivering petabyte-scale retrieval at sub-millisecond latency.
Redis Search and LangCache. Core retrieval and semantic caching under the platform. LangCache reduces redundant model calls by caching fast responses.
What the commentators say
The data industry is generally moving in the same direction now. Every major web vendor makes a content layer argument.
Traditional database vendors including Oracle they combine context and memory layers to bring relational data into the agency’s AI era. Purpose-built vector database vendors including Pine they do the same, creating a new layer of knowledge for the agent’s AI context. Independent context layers such as Looking back they are also part of the developing world.
Trollope positions Redis as a unique architecture from the competition.
"In order for us to win, no one else must lose," he said. Most Redis deployments already use MongoDB or Oracle as the backend system of record. Iris displays and caches those systems rather than deleting them. Redis introduces Iris to the Snowflake marketplace with native connectors.
Stephanie Walter, AI Stack Practice Leader at HyperFRAME Research, sets the market context clearly. "The market is converging on the same conclusion: agents don’t just need more tokens or better models. They need a dominant, current, low-latency context," Walter said.
His readings on Redis fragmentation focus on where Redis already sits in the stack, near runtime, critical workloads, and real-time data.
"Voice is not ‘better RAG’ as much as ‘agents need live context, memory, and quick retrieval while working," he said.
Whether Redis or another vendor, all context layer technologies will face a management challenge to succeed.
"Agent AI will not grow in business if every agent becomes a new cost center, a new data access risk, and a new management variable," he said. "The winning context layers are the ones that make agents faster, cheaper, and safer to use."
With real-time clinical AI, getting the context wrong is not an option
Mangoes.ai is one company that has already had to answer those questions in production, under conditions where the cost of getting the context wrong is weighed against patient outcomes.
Amit Lamba, founder and CEO of Mangoes.ai, uses a real-time voice AI platform that is distributed across major healthcare facilities where patients and doctors ask live questions about treatment, planning and case history. Mangoes.ai built its stack natively on Redis from scratch.
"Retrieval, memory, and session state all run through Redis, so we don’t mix different tools and hope they talk to each other," Lamba said.
The problem with Iris dynamic memory addresses is that it happens in every complex session.
"Consider a one-hour group therapy session," Lamba said. "You need to know who said what, when, and be able to present the relevant information to the therapist in the moment. That is not an easy problem to find."
The platform uses multiple specialized agents in parallel, one for business identification, one for relational inference and one for case history aggregation.
"Dynamic memory capacity is almost exactly the same for the problem we are solving," Lamba said.
What does this mean for businesses
For businesses that have built their AI stack around RAG, the recovery layer that got them into production is no longer enough to keep them there.
The RAG era paves the way for context building. The classic RAG model pushed data to the agent before the model was called. Production deployments are changing: agents pull what they need at runtime with tool calls, treating the data layer as a live resource rather than a preloaded load. Teams are still developing RAG pipelines that solved last year’s problem.
The semantic layer is now the production infrastructure. The model that describes the entities, their relationships and the access rules between them needs to be created, versioned and maintained in the same way as a data pipeline. Many organizations have never staffed or planned for such work. Businesses that define their context structure are now the ones that won’t need to be rebuilt when an agent loads a job estimate.
The budget is already moving. VB Pulse Q1 2026 data shows investment in development acquisitions rising from 19% to 28.9% across the quarter, surpassing exploration spending for the first time. Organizations that spent the last year measuring their return quality are now spending money to fix it. The context layer is a functional purchasing decision, not a roadmap item.
"A buyer’s first question shouldn’t be ‘Do I need a vector database, long context, in-memory, or content engine?’ It should be ‘What does the agent need to know, how recent should that information be, who is allowed access to it, and how much does it cost to retrieve it all?’" Walter said.



