Alibaba’s Qwen3.7-Max can work for 35 hours automatically and supports external harnesses such as Anthropic’s Claude Code.

Mosegas 2 hours ago

0 0 6 minutes read

Alibaba’s Qwen3.7-Max can work for 35 hours automatically and supports external harnesses such as Anthropic’s Claude Code.

The AI industry is in full swing "agent time," a paradigm where AI models do much more than generate text – they now actively plan, execute, and fix complex tasks in days rather than seconds.

Therefore, it is perhaps not surprising to see the Chinese e-commerce giant Alibaba’s famous Qwen Team of AI researchers release a model that can perform autonomous AI tasks for days: that model comes in the form of Qwen3.7-Max that the company reports on the acquired blog. "~ 35 hours of continuous automation" – however, in a proprietary format, not open source, as was previously the Qwen Team.

This is also expected – it is what many analysts and industry experts feared after the departure of several key leaders of the Qwen Team earlier this year. But it makes sense for Alibaba financially, at least in the short term: training AI models, especially powerful ones like Qwen3.7-Max, is expensive, and giving them away for free, like open source models, does not immediately help to recover any costs.

In that sense, Alibaba is simply aligning its efforts with American AI giants such as OpenAI and Google by offering the latest and greatest models only with paid APIs and subscriptions or paid web application bundles, and those that work less with open source.

However, the arrival of Qwen3.7-Max offers more options for businesses and individual users, as well as more competition for American AI labs – rarely a bad thing for consumers at all budget levels. However, the fact that the model is only accessible in China-based areas means that it may be limited in its marketing to American and European businesses that want to increase compliance with security shipments when filling government contracts, or even trying to comply with all the laws of the country, local, and relevant national data.

It’s marathon AI time

To understand why Qwen3.7-Max is a departure from previous models, one must look at how it was trained and how it works in practice.

Linguistic models often falter when forced to maintain a single train of thought over thousands of conversational turns; they forget instructions, see variables, or simply get stuck in logical loops. Qwen3.7-Max was specially designed as a "the basis of a multitasking agent" what he knows "long horizon thinking" to overcome this direct barrier.

The most prominent manifestation of this capability is the independent engineering work described by the Qwen team. The model was given access to an isolated server equipped with a T-Head ZW-M890 PPU—a hardware architecture that the model had never encountered during its training. Its function was to develop the attention kernel.

For 35 straight hours, Qwen3.7-Max was running automatically. It used 1,158 separate tool calls, performed 432 character tests, identified compilation failures, and iteratively optimized the code to achieve a 10.0x geometry definition speedup.

By comparison, competing Chinese models like z.ai’s GLM-5.1 and Moonshot’s Kimi K2.6 clocked out at 7.3x and 5.0x speedups respectively, often voluntarily shutting down their sessions when they failed to make progress. However, both are available as open source.

This resilience is achieved through what Alibaba calls it "measuring the environment". Just as classic LLMs grew smarter by importing various documents, Qwen3.7-Max was trained on a large, scaled dynamic agent environment.

It is able to simulate the one-year life cycle of a startup in "YC bench" evaluation, navigating through hundreds of decision-making rounds that include human resource management and contract evaluation. In this simulation, the model was able to generate $2.08 million in virtual revenue, which is almost double the performance of the previous generation, the Qwen3.6-Plus.

In addition, the model has built-in self-monitoring for reward hacking, automatically detecting when it tries to cheat the training environment and adding heuristic rules to adjust its behavior.

The brain of any scaffold

From a product perspective, Qwen3.7-Max is designed to be an information engine for modern software development and business automation.

The model offers a 1 million large token context window and a maximum output limit of 64K, which provides a large processing head for coding or long technical documents.

One of its most compelling features "cross-harness generalization". Rather than being hard-coded to work best within a specific proprietary interface, Qwen3.7-Max is designed to serve as a pull-down intelligence layer for various agent frameworks. It supports the Anthropic API protocol natively, allowing developers to plug directly into existing tools like Claude Code or OpenClaw.

Benchmark data provided by Alibaba shows that this standard approach has paid huge dividends.

On the Apex Math Reasoning benchmarkQwen3.7-Max scored 44.5, surpassing Claude Opus-4.6 Max’s score of 34.5 again DeepSeek V4-Pro Max’s 38.3. It sent again outstanding scores on the Final Human Test (41.4) and the MCP-Atlas code agent benchmark (76.4).

This translates into tangible benefits for end users. With the integrated open source Model Context Protocol (MCP), the model can act as an independent office assistant, able to read university formatting information and automatically reformat a dirty Word document with command line tools without human intervention.

Driving this level of intelligence comes at a distinct cost. Developers accessing the API through Alibaba Cloud Model Studio will pay $2.50 for 1 million input tokens and $7.50 for 1 million output tokens. The platform also features transparent cache creation and learning rates, as well as a $10 fee per 1,000 calls for integrated web searches, although the code interpreter tools remain free for a limited time.

Qwen3.7-Max takes strategic center stage in the current API economy. While it requires a significant premium over its more aggressively priced domestic rivals—it costs nearly twice as much as the DeepSeek V4 Pro ($5.22) and Z.ai’s GLM-5.1 ($5.80)—it undercuts the giants of the Western frontier in almost every measure.

For context, running an agent-heavy workflow with OpenAI’s GPT-5.4 or Anthropic’s Claude Opus 4.7 will run developers $17.50 and $30.00 per million tokens, respectively. See the VentureBeat price chart below:

VentureBeat Frontier AI Model API pricing summary

Model	Input	Output	Total Cost	The source
MiMo-V2.5 Flash	$0.10	$0.30	$0.40	Xiaomi MiMo
MiniMax M2.7	$0.30	$1.20	$1.50	MiniMax
Gemini 3.1 Flash-Lite	$0.25	$1.50	$1.75	Google
MiMo-V2.5	$0.40	$2.00	$2.40	Xiaomi MiMo
For me K2.6	$0.95	$4.00	$4.95	Moon shot/Kimi
GLM-5	$1.00	$3.20	$4.20	Z.ai
Grok 4.3 (low core)	$1.25	$2.50	$3.75	xAI
DeepSeek V4 Pro	$1.74	$3.48	$5.22	DeepSeek
GLM-5.1	$1.40	$4.40	$5.80	Z.ai
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Grok 4.3 (high core)	$2.50	$5.00	$7.50	xAI
Qwen3.7-Dimensions	$2.50	$7.50	$10.00	Alibaba Cloud
Gemini 3.5 Flash	$1.50	$9.00	$10.50	Google
Gemini 3.1 Pro Preview (≤200K)	$2.00	$12.00	$14.00	Google
GPT-5.4	$2.50	$15.00	$17.50	OpenAI
Gemini 3.1 Pro Preview (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.7	$5.00	$25.00	$30.00	Anthropic
GPT-5.5	$5.00	$30.00	$35.00	OpenAI

By placing the Qwen3.7-Max just below Google’s Gemini 3.5 Flash ($10.50) but above budget-class models, Alibaba is showing that this isn’t a product release; it’s a cutting-edge thinking engine priced to lure serious business work away from the more expensive offerings of Silicon Valley.

Licensing remains proprietary at this time

For all its technical brilliance, the most controversial part of Qwen3.7-Max is how it is distributed. Qwen pays the discharge as a "proprietary model". API only.

Historically, Qwen from Alibaba has been a hero in the open source and local LLM communities. Previous iterations, such as Qwen 2.5 and Qwen 3.6, have released their weights publicly. Open weights allow developers, researchers, and businesses to download a model, run it on their own hardware, and fine-tune it for use in very specific or sensitive data situations without sending proprietary information to a third-party server.

By locking Qwen3.7-Max behind the API, Alibaba focuses on the commercial playbook used by OpenAI (with GPT-4) and Anthropic (with Claude). For business users, this means running Qwen3.7-Max requires trusting Alibaba Cloud for their data streams and completely relying on an Internet connection to run their agent workflows. In the open source community, it means losing access to what is currently one of the most capable models in the world.

Public reaction was divided between surprise and disappointment

The reaction from the engineering community has been swift, characterized by a combination of deep respect for engineering success and frustration with the licensing model.

Prominent AI analyst Sudo su (@sudoingX) captured the sentiment on X (formerly Twitter). "qwen is not true," they wrote. "they just dropped 3.7 max and it beats the opus 4.6 max in most benchmarks they ran".

The technical metrics, especially the endurance of the model, have left many in the industry in awe. "high number of figures, 44.5 compared to opus 34.5, that is not a small gap," Sudo su commented. "35 straight hours of kernel development work with 1000+ tool calls is the part I keep relearning. that’s an agent period thing that happens, not a slide".

The speed of Alibaba’s iteration is also drawing notice. Since Qwen 3.6 was released last month, the jump to 3.7-Max highlights the endless improvements. As Sudo su observed, "no one else walks like this".

However, acculturation is greatly hindered by the transition to a closed ecosystem. The loss of model weights is seen as a setback for the local AI movement, which relies on high-end open models to push the boundaries of what can be done on consumer hardware or private company clusters.

"one thing though, please open source this one too," Sudo su pleaded in their post. "3.6 density makes the whole local llm ecosystem better. only max tier going api will close the door we kept open. give us the weights at the end".

Qwen3.7-Max proves that the era of the independent agent is no longer a guess; it is a reality that is now able to perform complex engineering tasks while people sleep. The only question now is whether this new frontier of AI will be a democratic utility that you can download to your laptop, or a strictly monitored intelligence service in the cloud. Currently, with Qwen3.7-Max, it is undoubtedly the latest.

Mosegas 2 hours ago

0 0 6 minutes read