Alibaba AI launches four consecutive initiatives, sweeping the top spot on global open-source rankings

This week, Alibaba Tongyi team successively launched Qwen3-235B Non-Thinking Version, Qwen3-Coder Programming Model, Qwen3-235B-A22B-Thinking-2507 Reasoning Model, and WebSailor AI Agent Framework, with four products sweeping the open-source rankings in the fields of foundational models, programming models, reasoning models, and agents

The Alibaba Tongyi team made a powerful impact with a series of four releases, sweeping the GitHub open-source rankings.

From July 22 to 25 this week, Alibaba launched the Qwen3-235B Non-thinking version, Qwen3-Coder programming model, Qwen3-235B-A22B-Thinking-2507 reasoning model, and the WebSailor AI Agent framework, with four products dominating the open-source rankings in the fields of foundational models, programming models, reasoning models, and agents.

The authoritative organization Artificial Analysis even directly commented:

Tongyi Qianwen 3 is the most intelligent non-thinking foundational model in the world.

Non-thinking models can also achieve "explosive" performance

According to Hard AI, in the early hours of Tuesday, the Alibaba Tongyi Qianwen team launched the latest non-thinking model, named Qwen3-235B-A22B-Instruct-2507-FP8.

This non-thinking model performed exceptionally well in several key benchmark tests. It not only comprehensively surpassed top open-source models like Kimi-K2 but even outperformed top closed-source models such as Claude-Opus4-Non-thinking.

It is worth mentioning that the updated Qwen3 model shines particularly in Agent capabilities: it performed excellently in the BFCL (Agent capability) evaluation. This means the model has reached a new height in understanding complex instructions, autonomous planning, and utilizing tools to complete tasks. "Focusing on Agents" will be the core competitive advantage of future AI applications.

Programming model ignites excitement in the community

The Qwen3-Coder released on July 23 caused a sensation in the global developer community.

Wall Street Insights previously mentioned that this programming model based on the MoE architecture has a total of 480 billion parameters, 35 billion active parameters, natively supports 256K context, and can be expanded to 1 million.

In the SWE-bench Verified benchmark test, which developers are most concerned about, Qwen3-Coder achieved the best performance among open-source models

The model was trained on a scale of 7.5 trillion tokens, which includes 70% code data, and demonstrated outstanding capabilities in real-world multi-turn interactive tasks through long-term reinforcement learning and large-scale practical training in 20,000 virtual environments.

Alibaba also launched a supporting command-line tool, Qwen Code, providing developers with a complete programming solution.

Leaders in the tech industry have praised Qwen3-Coder, such as Perplexity CEO Aravind Srinivas, who lauded the strength of Qwen3-coder:

The results are astonishing, and open source is winning.

Twitter founder Jack Dorsey emphasized that Qwen3 and Goose—an AI Agent framework developed by his company Block—work excellently together:

Goose combined with Qwen3-Coder equals wow

AI Agent Framework Challenges Closed Source Monopoly

Alibaba's Tongyi Laboratory simultaneously open-sourced the WebSailor AI Agent framework, directly competing with OpenAI's Deep Research products.

This framework significantly outperformed all open-source agents in the BrowseComp-en/zh tests and can rival proprietary closed-source models.

WebSailor employs a dual technical architecture of complex task generation and reinforcement learning modules. By constructing complex knowledge graphs and dynamic sampling strategies, the system can efficiently retrieve and reason through vast amounts of information.

In addition to its outstanding performance on complex tasks, WebSailor also excels in simple tasks. For example, in the SimpleQA benchmark test, WebSailor's performance surpassed all other model products.

The project has received over 5,000 stars on GitHub and once ranked first in daily growth trends.

WebSailor's core technology mainly revolves around complex task generation and reinforcement learning modules, which work together to enhance the performance of open-source agents in complex information retrieval tasks.

The significance of this framework being open-source is substantial, breaking the monopoly of closed-source systems in the field of information retrieval and providing global developers with an open-source solution comparable to Deep Research.

Reasoning Model Tops Global Open Source Rankings

Released on July 25, Qwen3-235B-A22B-Thinking-2507 has become the most significant product of the week.

AIME25 (Mathematics) scored 92.3 points.

LiveCodeBench v6 (Programming) scored 74.1 points.

WritingBench (Writing) scored 88.3 points.

PolyMATH (Multilingual Mathematics) scored 60.1 points.

In terms of more detailed ranking performance, the Qwen3 reasoning model is also impressive compared to other models (except for R1, the others are top closed-source models).

This model adopts a MoE architecture, with a total of 235 billion parameters, 22 billion active parameters, 94 layers, and 128 expert systems, natively supporting a context length of 262,144 tokens. The model is specifically built for thinking modes, with the default chat template automatically including thinking tags, providing strong support for deep reasoning.

OpenRouter data shows that the API call volume for Alibaba's Qianwen has surged in the past few days, exceeding 100 billion tokens, capturing the top three most popular calling models. This data directly reflects the market's recognition of Alibaba's open-source model.

Global netizens are also amazed by Tongyi's strongest reasoning model. Some netizens directly stated:

China's open-source o4-mini.

AI Thinkers further commented:

China has just released a monster-level AI model.