Enterprise AI agents often fail because they forget what they learned

0 0 5 minutes read

Enterprise AI agents often fail because they forget what they learned

RAG architectures are good at one thing: producing logically coherent documents. This is where they stopped.

A framework called the decision context graph addresses that gap by providing agents with structured memory, time-aware thinking, and clear decision making. Rippletide, a startup in the Neo4j ecosystem, built one. Key strengths: irreversible agents, able to establish a guaranteed sequence of actions and be integrated into them over time.

“The key point you want is not to reverse: How do you make sure that, when the agent is going to generate something new, you can combine it with previous findings?” said Yann Bilien, founder and chief scientific officer of Rippletid.

Why RAG doesn’t go far enough

The enterprise context is full of ERP tools, logs, databases, vector stores, and policy documents. Generative AI tools can find everything – by using keyword searches, SQL queries, or full RAG pipelines – but retrieval has a ceiling.

Notably, the returned data may not match the existing decision (thus causing false positives); and, even if agents are pulling in the right data, they often lack the guidance to make decisions based on solid reason.

That is, the RAG receives the documents, not the context of the decision. “Everybody starts with RAG: Pull the right documents, put them in quickly, let the model figure it out,” says Wyatt Mayham of Northwest AI Consulting.

While that works well for chatbots, it “breaks quickly” for agents who need to make decisions and take actions, he pointed out. “The biggest thing builders struggle with is the gap between return and use.”

The returned document does not tell the agent whether it is still active, whether it has been removed from office, or whether there is a significant conflict of interest, Mayham said. “Agents need the context of a decision, not just the details.”

In construction (the human world), that might mean knowing that a price exception is out of date, that a security policy only applies to certain locations, or that a standard operating procedure was updated last month. “You miss any of that, and the agent is doing something wrong with confidence,” Mayham said.

Without formal decision content, agents combine incompatible rules, establish barriers to fill gaps, and rely on what Bilien calls "possible predictions over unlimited data." Errors are difficult to reproduce because developers cannot track why an agent made a particular choice.

The problem of compound error is real, too, Mayham said: A small amount of misses per step becomes “catastrophic” in a multistep workflow. “That’s why most business agents don’t leave the testing phase.”

Decision context graphs how to arrive at the correct answer

A decision context graph solves this by encoding a systematic map of what works, what the rules are, and when they work.

The framework is designed for one question: "Given this situation, what is the current context?" Time is considered as the dimension of the first phase; All rules, regulations, and exceptions are accessed where applicable.

“The goal is to directly address missing, inconsistent, or conflicting data when constructing a graph to avoid potential pitfalls. [errors] when the agent is working,” said Bilien.

The system is built on three principles:

Performance: Logic is clearly coded so that the agent knows which rules to remember and apply in a given situation. Context is returned only if it matches the condition.
Time-sensitive memory: Every rule, decision, and exception has a time. This allows agents to think about it "What was true then versus what is true now," then reproduce or explain its decisions.
Decision methods: A program can explain how to get from A to B and "why" behind its reason (for example, why a certain part of the context is included and another is not). Agents are provided "decision method" examples of how similar cases were handled in the past.

In setup, unstructured data is entered and organized into an ontology: what entities exist, what rules apply, what is important as an exception. Neuro-symbolic AI handles pattern recognition and writes structured, machine-readable logic. Over time, the system improves its knowledge base as new decisions are made.

“Neuro-symbolic brings two parts: A neuronal part that gives greater autonomy to agents and a symbolic part to reduce the amount of data needed and bring control,” said Bilien.

An agent is tested during construction (before production) to verify its behavior or to identify improvements. This reduces risk and computational requirements during testing, he noted.

Agents learn, rather than regress

When it comes to scalability, the key piece includes both intelligence (models) and knowledge (shared between agents), says Bilien. It is important for agents to be able to evaluate; when they don’t know how to accomplish a task, they can try different possibilities, usually in a controlled environment or simulation (like a support bot trying out multiple response patterns).

Then, “once the solution has been evaluated as satisfactory, the graph captures that sequence of actions,” Bilien said. Future assessments then begin from this “stable base of proven behavior” to prevent newly acquired skills from overwriting previously learned behaviors.

Before an agent acts or touches a customer, he checks against the graph: Is it breaking the law? Is it hallucinating? Staying within limits? Can it make a solution in all the same situations?

At the macro level, the program evaluates the results: Did the behavior improve long-term performance? Is it common to all similar situations? Does it retain previous capabilities?

“This decision is key for agents to drive reliability at scale,” said Bilien. It leads to consistent, predictable, explainable behavior, and allows powerful control and readability.

“You want your agents to be able to learn for themselves if they are faced with something they don’t know,” he said. “You want them to be able to explore and find new solutions.”

Passing by "episode" memory

While the team originally thought they would use RL everywhere, "that is very difficult in the business environment," Bilien said. "Data is lacking for some use cases and dirty for others."

Often, using raw data for reliable prediction has been a manual and time-consuming challenge, but “now with agents we’ve entered a new era where it’s possible to build ontologies automatically,” says Bilien.

The old guarded methods of fine-tuning can lead to wandering, when models forget the last skill they learned while learning the next tone. Overall, learning is disjointed, pressure is “surprising,” and models develop “intermittently” rather than continuously, leading them to fail frequently in new or abstract tasks.

As Bilien noted: “You’ll never have a model of self-study fully if you’re always going backwards.”

In business applications – such as banking where millions of transactions are processed per day – a high level of reliability is essential, he noted. “One question I ask every customer: Is 95% enough? For most use cases, it’s not enough. You need 99.999%. 1% off is too much.”

Decision content graphs can close that gap, he argues: If the same customer support question is asked over and over again, the agent will return a “satisfactory” answer predictably and without regression, all while maintaining autonomy.

Encoding performance and temporal validity in a structured graph – rather than relying on LLM to think about it – is a feature. "sound method" at the real limit on existing recovery programs, Mayham said. An open question is whether automated ontology generation can cope with the messy, heterogeneous data that enterprises actually have. "It’s always hard," he said.

Mosegas 1 hour ago

0 0 5 minutes read