Alibaba’s Qwen3.7-Plus supports text, video and image input at a low price of $0.4/$1.6 per 1M token — but it’s proprietary

Alibaba this week released Qwen3.7-Plus, the latest AI large-language model (LLM) in its globally popular and expanding Qwen family, which boasts versatile capabilities and a 60% lower cost than the previous, text-only model Qwen3.7-Max released a few weeks ago.
However, since its predecessor Qwen3.7-Plus is only available under a "it is closed" commercial license with proprietary application programming interfaces (API) and Qwen Chat.
That marks a departure from Qwen’s strategic discourse thus far, which has been largely focused on releasing powerful, high-quality open source models. Those businesses and users who rely on Qwen’s open source models – among them, US giants such as Airbnb – will undoubtedly be disappointed to see that Alibaba will be closed with its new release.
Nevertheless, the model is worth looking at because of its low cost and high performance in multimodal tasks such as creating business-level visualizations or video analysis, screenshots, which Qwen3.7-Max cannot do (text only). It’s among the cheapest AI-powered models available now, coming in at just above the limited-time discount price of its new Chinese competitor the MiniMax-M3.
VentureBeat Frontier AI Model API pricing summary
Model | Input | Output | Total Cost | The source |
MiMo-V2.5 Flash | $0.10 | $0.30 | $0.40 | Xiaomi MiMo |
deepseek-v4-flash | $0.14 | $0.28 | $0.42 | DeepSeek |
deepseek-v4-pro | $0.435 | $0.87 | $1,305 | DeepSeek |
MiniMax-M3 | $0.30 | $1.20 | $1.50 | MiniMax |
Qwen3.7-Plus | $0.40 | $1.60 | $2.00 | Alibaba Cloud |
Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $1.75 | |
MiMo-V2.5 | $0.40 | $2.00 | $2.40 | Xiaomi MiMo |
Grok 4.3 low core | $1.25 | $2.50 | $3.75 | xAI |
GLM-5 | $1.00 | $3.20 | $4.20 | Z.ai |
For me K2.6 | $0.95 | $4.00 | $4.95 | Moon shot/Kimi |
GLM-5.1 | $1.40 | $4.40 | $5.80 | Z.ai |
Grok 4.3 is an advanced core | $2.50 | $5.00 | $7.50 | xAI |
Qwen3.7-Dimensions | $2.50 | $7.50 | $10.00 | Alibaba Cloud |
Gemini 3.5 Flash | $1.50 | $9.00 | $10.50 | |
Gemini 3.1 Pro preview ≤200K | $2.00 | $12.00 | $14.00 | |
GPT-5.4 | $2.50 | $15.00 | $17.50 | OpenAI |
Gemini 3.1 Pro Preview >200K | $4.00 | $18.00 | $22.00 | |
Claude Opus 4.8 | $5.00 | $25.00 | $30.00 | Anthropic |
GPT-5.5 | $5.00 | $30.00 | $35.00 | OpenAI |
Maintaining continuity during the creation of a complex tool
For technical decision makers using autonomous agents, the main bottleneck is rarely the initial intelligence of the model. Rather, it is the decay of the state-the tendency of the agent framework to lose its analytical track over multi-step, long-horizon activities.
Qwen3.7-Plus addresses this architectural vulnerability by using an integrated approach to context management and stateful logic.
The model goes through a 1 million token context window and provides up to 256K tokens exclusively for internal thread processing. To make this capability basic, think of an automated cloud migration agent: it can import an entire codebase, map dependencies, and spend thousands of tokens silently testing edge cases before running a single line of bash script.
Importantly, the API exposes a parameter called ‘preserve_thinking.’ Across Alibaba’s ecosystem, the power serves as a fixed architectural bridge rather than a tiered perk. Alibaba introduced this feature during the previous generation of Qwen 3.6, combining it with the open weight Qwen3.6-27B and proprietary Max models.
At its core, a parameter works at the API and template level to store internals <think> Blocks in ongoing conversations.
This structural continuity solves an important bottleneck for engineers of long-horizon projects. By keeping these internal logic loops tight, the feature prevents the model from discarding its context or unnecessarily restoring its stored history during operation.
When the model performs complex, multi-step code operations, this retention allows the system to capture its original train of thought without losing structure or forgetting the underlying understanding of its previous actions.
Alibaba is far from alone in recognizing this need for technology, as the underlying concept now governs the design of almost all major artificial intelligence laboratories.
Anthropic uses this ability directly under the moniker "Extended Thinking" with its advanced models, including the Claude Opus 4.8. This framework requires developers to feed unchanged logic blocks directly back to the API in the next iteration to maintain an unbroken chain of logic.
OpenAI addresses a similar challenge by using a reverse logic approach embedded in models such as GPT-5.5. Within the OpenAI ecosystem, developers must return specific logic generated in conjunction with previous function calls, ensuring that the model clearly remembers the reason for using its tools.
Finally, preserve_thinking it simply stands for the Alibaba name in what has quickly become the undisputed table stakes of modernist diversity thinking.
Benchmarks show a competitive, but high-quality model
In unfiltered energy metrics, this thoughtful design translates into structural gains across multimodal and agency benchmarks. However, it still falls short of many of the best and previous generations of US proprietary models such as Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.4.
Opened Terminal Bench 2.0-Terminalwhich measures the model’s ability to run real terminal-level code safely and reproducibly, the Qwen3.7-Plus score 70.3the best performers are DeepSeek-V4-Pro Max (67.9) and Gemini-3.1 Pro (63.5).
For computer vision benchmarks that require an understanding of spatial interactions, e.g ScreenSpot Prothe model is struck 79.0which far exceeds legacy industry highlights like GPT-5.4 (xhigh) at 67.4 and Claude-Opus-4.6 at 49.5. Agent Evaluation Metrics (Selected Benchmarks)
What should businesses consider Qwen3.7-Plus?
For a business builder, the key question when analyzing Qwen3.7-Plus is clear: What is this replacing our current technology stack?
The model is designed to be a direct replacement for premier frontier models (such as GPT-5-tier or Claude-Max-tier models) within high-frequency engineer workflows, robotic process automation (RPA), and data engineering pipelines.
Rather than using an expensive, general-purpose model to handle repetitive system tasks, technical teams can move these tasks to Qwen3.7-Plus. It handles interface interpretation, command execution, and code generation simultaneously.
Alibaba has structured its API delivery to align with existing open source frameworks and corporate ownership. Endpoints are fully OpenAI compatible, meaning that changing existing dependencies requires minimal infrastructure modification. For teams using standalone terminal frameworks, integration is natively supported in many environments.
Developers can use Qwen3.7-Plus directly with their local terminal setup by changing the default environment target.
From a pure cost perspective, using an agent framework that constantly references large code repositories or virtual architecture histories can quickly become cost prohibitive.
Alibaba addresses this by revealing temporary storage price points.
Standard input processing remains at $0.40 per million tokens, but if the agent reads from an explicitly created cache (eg, a large base cache or a standard business UI kit that stays static in hundreds of automated loops), the cost drops significantly to $0.04 for each 1M tokens to read next.
This phase makes high-frequency, multi-turn agent iterations economically viable at enterprise scale.
No open source license or open weights raise the question of compliance for companies
When evaluating any model in the Qwen ecosystem, the primary concern for legal and security teams is the licensing framework and operational boundaries of the data pipeline.
While the previous iterations of the Qwen family have achieved a significant business trend with full open source weight availability under Apache 2.0 or customized open use licenses, Qwen3.7-Plus is delivered strictly as a managed, commercial cloud API through Alibaba Cloud Model Studio. In business risk management, this distinction carries certain implications:
No Local Weight Shipping: Organizations cannot download, sandbox, or host Qwen3.7-Plus workloads within their internal data centers that are completely airtight. All data validation, virtual processing, and callbacks must go through Alibaba Cloud’s international locations (eg, the Singapore instance highlighted in the developer documentation).
Compliance with Sovereignty: Since the model requires cloud-based considerations, companies operating under strict private data restrictions (such as healthcare companies subject to HIPAA/GDPR restrictions or defense contractors) should clearly check whether the external API route is compatible with their specific residential data obligations.
Managed Risk Reduction: In contrast, the managed API architecture removes the internal infrastructure burden of provisioning, scaling, and maintaining multiple GPU clusters (such as dedicated Nvidia H100 arrays) just to host the internal agent network.
Nevertheless, Qwen3.7-Plus offers high intelligence in all modes at a low cost
Early adoption from the engineering and venture capital communities highlights the dynamic economics of agent deployment.
Prominent industry voice and Web3 venture capitalist @Boxmining highlighted the cost benefit of the strategy, saying:
"Qwen 3.7 Plus is 40% cheaper than Max changes the conversation. If the output is close enough for most coding and more robust for visual workflows, do you really need Max every day or only for heavy terminal tasks?"
This idea is in line with the current trend of improving the budget of business operations: from raw, unrestricted computing to automated machines for directed work.
Dunjie Lu, a researcher working at Alibaba Qwen, commented:
"It shows clear advantages over Qwen3.6-Plus in computing power, with powerful integration that goes beyond standard desktop tasks into professional workflows such as data engineering and scientific research."
Finally, for business buyers deciding on their next infrastructure path, Qwen3.7-Plus presents a viable alternative. If your organization’s primary goal is to build powerful, virtualized software loops that interact directly with developer environments and cloud computing—without draining your imagination budget—the model provides a compelling reason to move away from expensive borderline alternatives.



