Cohere breaks down lossless scaling and native citations with the first full Apache 2.0 open source model with the Command A+ license

Mosegas 3 hours ago

0 0 6 minutes read

Cohere breaks down lossless scaling and native citations with the first full Apache 2.0 open source model with the Command A+ license

Canadian AI lab Cohere made waves recently by announcing a merger with German AI startup Aleph Alpha, but now it’s reserved for global entrepreneurs: today, the firm was co-founded by a former Googler as well. "Attention Is All You Need" co-author Aidan Gomez revealed Command A+, a highly developed, 218 billion parameter language model designed for complex reasoning, cross-document processing, and agent workflows.

The most important aspect of the release is not just the skills of the model; it is its accessibility.

By releasing the free-weight model on the popular AI code-sharing platform, Hugging Face under the highly permissive Apache 2.0 open source license — a first for the company, according to a post by Gomez, now Cohere’s CEO, on X — Cohere is making a calculated bet "the main AI"—the idea that businesses, governments, and developers should have the ability to use, control, and adapt edge-to-edge AI entirely within their secure environment, without sacrificing functionality.

A small structure with extreme quantization

At the architectural level, the Command A+ represents a major evolution from Cohere’s previous compact models. It is a decoder-only Sparse Mixture-of-Experts Transformer (MoE) Transformer.

While the model stores a total of 218 billion parameters, fewer – only 25 billion – are applicable to any given production step. It is a very simple step and requires much less computing resources to work intelligently (working on the model in production environments to end users or by using agents) than the US giants such as OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7, which are estimated by third-party observers to be in the billions of parameters.

This small structure is the key to the model’s efficiency. In plain words, the MoE model routes incoming queries only to specific ones "an expert" neural networks are more suitable to handle them, leaving the rest of the model ineffective.

This is the standard construction followed by many leading LLMs these days, which allows models to retain the large knowledge base and multidimensional reasoning capabilities of a giant, but with faster speed and reduced computational and power requirements for a much smaller model, since only a small fraction of parameters are active at any time.

But where Cohere has gone a step further than most of Command A+ is that it focuses more on hardware efficiency by using quantization—a process that compresses the model’s memory by reducing accuracy of its boundaries.

Command A+ is available in 16-bit (BF16), 8-bit (FP8), and highly compressed 4-bit (W4A4) formats.

W4A4 quantization is the core technology of this release. In general, conceptual models deal with outsized "quantization tax," where compressing the model leads to noticeable regression in problem solving.

Cohere reduced this by only measuring the MoE experts at 4-bit, while to keep critical attention mechanisms completely accurate, complemented by a technique called Quantization-Aware Distillation.

The result is a almost lossless compression allowing this massive model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.

The speed benefits are equally notable. According to performance data released by the company, W4A4 quantization with low concurrency reaches 375 tokens per second (TOPS) with a Time-to-First-Token (TTFT) latency of only 113 milliseconds—representing a 63% increase in output speed and a 17% reduction in latency compared to the previous Guidance model.

In addition, Cohere has revised the model token. Tokens break text into chunks that are processed by AI models. The new token is designed for the most global business use, featuring native support for 48 languages.

More importantly, it greatly improves the efficiency of tokens in non-European languages, reducing the number of tokens needed to generate responses in Arabic by 20%, Japanese by 18%, and Korean by 16%. Because consideration costs are calculated per token, this directly translates into lower operational costs for global, multilingual or non-English deployments.

Agentic workflow and top benchmarks in statistics, specialty fields

While speed and size dictate use, the model’s utility is defined by its product capabilities. Command A+ is specially designed "the agency" tasks – workflows where AI works automatically or automatically, uses external tools, queries information, and integrates information in multiple steps.

The benchmark jump from the previous generation is strong.

In 𝜏²-Bench Telecom, which tests complex reasoning, the model jumped from a 37% result to 85%. In Terminal-Bench Hard, which measures the performance of agent code, it increased from 3% to 25%. In complex math, it scored 90% in AIME 25, up from 57%.

The Command A+ punches above its weight class (25B active parameters) in pure logic and math, directly competing with much larger models like the DeepSeek V4 Pro in math benchmarks. However, in the deep coding of the agent and the general index of broad-level intelligence, it is currently behind the latest generations from Chinese open source competitors such as DeepSeek, Z.ai (GLM), and MiniMax.

That said, comparing them doesn’t directly address Cohere’s value proposition: hardware efficiency.

Beyond benchmarks, Command A+ introduces deep integration of enterprise trust and authentication. The model supports the use of a chat tool with standard chat templates, allowing developers to easily connect it to internal APIs, search engines, or SQL databases.

Importantly, Command A+ features native quote generation. When Command A+ receives information from an external device, it doesn’t automatically compile the response; produces a graphic "grounding spans." Using special tags embedded in the output, i the model directly links all the factual claims it makes to a particular source document or database line it released information.

In highly business-controlled industries such as finance, healthcare, or law, this traceability is the difference between an interesting prototype and a production-ready application. When a user requests a daily sales report, the model will output the total sales number and clearly state the query result of the database that provided that number, reducing the risk of undetected fraud.

Additionally, Command A+ is fully multimodal, capable of processing both text and images natively within its large 128K input window, making it highly efficient for processing complex documents, such as analyzing scanned invoices, charts, or technical manuals.

The first Apache 2.0 model with the first Cohere AI license

In the current state of AI, "open source" it has become a difficult word. Many leading AI companies release their models under restricted commercial licenses or acceptable use policies that expressly prohibit large enterprises from using the models for commercial purposes, or prohibit the models from being used to train competing AI systems.

Indeed, Cohere’s previous models, including the Command R and Command R+, were released under a CC-BY-NC 4.0 (Creative Commons NonCommerce) license. Although their model weights were open to researchers and developers to download, think about, and test them, they were strictly prohibited from being used for commercial purposes without purchasing a separate business license from Cohere or by using an application program interface (API), similar to the arrangement used by many businesses to access AI models from OpenAI, Anthropic, Google and other leading labs.

Cohere changed its approach by releasing Command A+ under the Apache 2.0 license. This is an important distinction for the engineering community. Apache 2.0 is a true, open source license approved by the OSI. It allows anyone—from independent developers to Fortune 500 companies—to use, modify, distribute, and create a commercial model without paying licensing fees or complying with restrictive non-compete clauses.

As Gomez writes in X, this decision was inspired by Cohere’s co-founder Nick Frosst, who posted a two-minute long overview calling it. "the best model we have ever released."

In business, this license means complete independence for the seller. A company can download Command A+ workloads, fine tune them to highly segregated internal data, and run them on their own private servers or air-gapped networks. You are not tied to Cohere infrastructure, price changes, or API downtime. The ultimate realization of autonomous AI.

The release was met with immediate traction across the AI developer space, driven largely by its same-day integration with major open source indexing frameworks like Hugging Face and vLLM.

What’s next?

The release of Command A+ marks the growth of the open-source AI ecosystem. By combining boundary-level thinking, robust agent tooling, and multimodal capabilities with an architecture designed for hardware efficiency, Cohere is changing the equation for enterprise AI adoption.

The need for large, centralized computer clusters has long been an obstacle for companies that prioritize data privacy and cost control. By democratizing access to a model of this nature under a true open source license, Cohere has given the enterprise market exactly what it has been asking for: the power of the cloud, able to run securely in a server room down the hall.

Mosegas 3 hours ago

0 0 6 minutes read