Stanford’s DeLM reduces the cost of multi-agent tasks by 50% – without a central orchestrator

Mosegas 3 hours ago

0 0 5 minutes read

Stanford’s DeLM reduces the cost of multi-agent tasks by 50% – without a central orchestrator

One of the assumptions behind today’s AI frameworks is that agents need a “manager” at the center; this orchestrator runs the show, requests routes, and ensures that the entire system does not descend into chaos.

That assumption may be wrong, and the cost of carrying it can be measured in dollars of assumptions and communication delays. Stanford’s new framework called the decentralized language model, or DeLM, is built on the premise that agents can communicate directly, without having to route every update through a central controller.

DeLM’s shared knowledge base serves as a “common communication component” so that agents can build on each other’s progress without having to route all interactions through a central agent to “aggregate, filter, and redistribute,” Yuzhen Mao and Azalia Mirhoseini, the framework’s developers, explain in a research paper.

A system that is not only feasible, but desirable in certain situations. “Agents can build on previous findings, avoid repeated failures, maintain constraints, and obtain detailed evidence only when needed.”

Challenges of traditional multi-agent systems

In a typical system of multiple central agents, the master agent divides tasks into sub-tasks, assigns them to multiple sub-agents in parallel, waits for responses, aggregates and summarizes the central progress, and initiates the next wave of orders based on the collected context.

Although this is a natural way to measure LLM thinking, the Stanford researchers argue that it is not a good fit. All useful finds, partial finds, and failures must be reported back to the master agent, who then decides what information should be aggregated and redistributed to subordinate agents.

“As the number of subtasks increases, this controller becomes a bottleneck for communication and integration,” Mao and Mirhoseini wrote. In addition, the master orchestrator may “minimize, omit, or distort” useful information, leading to a loss of progress.

This bottleneck also occurs in long-range thinking situations. Once it has received the reports from the subagents, the master agent will usually gather related concepts, data points, and other important things together in an unsupervised learning process. It may provide these in advance "collections of evidence" to sub-agents before knowing which surface material is suitable or whether it is properly assembled.

If the subagent receives this insufficient context, it will become confused and return to the main agent, initiating another retrieval or dispatch cycle. “This back-and-forth makes communication slower, more repetitive, and forced by an overloaded single agent,” the researchers wrote.

What DeLM is about and how it works

DeLM, in contrast, is built around parallel agents, a shared context, and workflows.

Shared content is essentially a curated store of “cases,” or snippets of information that other agents may find useful. This includes confirmed and evidence-based findings and partial findings and documented failures; they also point to detailed evidence that agents can issue based on their specific work.

A job queue is the next set of waiting sub-jobs that agents can claim independently.

“Agents write unified, validated updates into a shared context that agents can read directly,” the researchers wrote. Useful findings, failures, and bottlenecks are accumulated as a “shared problem situation,” rather than bypassing a central controller.

The pipeline looks like this:

Implementation: Entries are divided into separate work units and added to the queue;
Related transactions: Agents work independently and in parallel, pulling tasks and learning shared context as they go.
Compression and validation: Results are compressed into reusable “hypotheses” that are tested against supporting evidence. Only fully verified opinions are shared in the group.
Additional work (if needed): When the queue is emptied, the last agent to return a response examines the entire shared context to determine if further work is required.
Last step: The final agent determines that no more steps are needed and returns a final response.

Agents “exchange progress on a shared state, equally search for optimal tasks, and dynamically balance as the number of subtasks increases,” the researchers explain.

How DeLM works in the field

With DeLM, agents can avoid unnecessary inspections; recycle and build on each other’s findings and failures; and focus on unresolved issues.

The framework can be particularly useful for time estimation of software engineering tests, where models are given time to “think” to improve their thinking and problem-solving skills. Different agents can test their own ideas or follow reasoning paths in parallel, while sharing average progress. One example is concurrent error correction.

DeLM is also suitable for long content reasoning and answering questions with many documents; agents can simultaneously examine their collections of evidence (collections of documents, code, or other materials) at the same time, while maintaining a “global unified view” of the collected evidence.

Researchers argue that it makes agent jobs more accurate and cheaper. This is supported by its performance in real-world benchmarks: In SWE-bench Verified – which evaluates how well AI models and agents solve real-world software engineering problems – it performed 10.5% better than the hard baseline and reduced costs per task by almost 50%.

But it can go beyond coding: In LongBench‑v2 Multi-Doc QA – which tests LLMs’ ability to handle long content, real-world problems – DeLM had the highest accuracy across all four model families, including GPT-5.4, Claude Sonnet, Gemini Flash, and DeepSeek‑V4-Pro.

DeLM outperforms other models on SWE-Bench for several reasons, as Mao explains in X.

First, agents share failures. In a typical parallel run, when one agent follows the wrong path, that failure remains private, and subsequent agents may waste time (and money) pursuing the same end. But with DeLM, failed hypotheses are written in a shared context.

“Over time the agents can read them as constraints, avoid repeated checks, and redirect their search to promising fixes,” Mao said.

Additionally, restrictions, once confirmed, are quickly added to the agents’ shared context. This means they become a binding sharing situation. “Later agents inherit them, build around them, and avoid repeating the world’s illegal simplifications,” Mao said.

Importantly, DeLM keeps shared progress compact enough to be reused. It’s collapsible, meaning agents see brief notes automatically, but can choose to expand them into more detailed summaries and raw evidence.

As the researchers note, providing all raw documents and tracking provides agents with a high amount of information, but that may exceed their context windows and ultimately increase costs.

“If agents share a full path, each task will need to read long histories, file dumps, failed edits, and intermediate reasoning, turning communication into another long content problem,” Mao said.

On the other hand, while sharing collective summaries is cheap, important information and evidence can be lost, leading to less reliable assumptions.

Therefore, the opening provides access to “coarse to fine” penetration. This can improve accuracy and cost.

Finally, with a framework like DeLM, agents can be more efficient because they are prevented from repeatedly reading the same documents or re-using the same failed analysis; more effectively because useful findings are distributed in parallel threads; and it is very strong because they only share verified claims.

For business developers, DeLM challenges a key assumption: that all multi-agent workflows require a central controller. The results of SWE-bench and LongBench-v2 suggest that the decentralized model is not only theoretically clean – it is faster, more accurate, and about half the cost.

Mosegas 3 hours ago

0 0 5 minutes read