February Surge! China's AI call volume surpasses the US for the first time. Four major models dominate the top five globally. Domestic computing power demand is experiencing exponential growth.

2026-02-26 11:52:25

In February, China’s AI model usage exploded, surpassing the United States for the first time.

According to OpenRouter, the world’s largest AI model API aggregation platform, from February 9 to 15, Chinese models had a total call volume of 4.12 trillion tokens, surpassing the US models’ 2.94 trillion tokens for the first time during the same period.

From February 16 to 22, Chinese models’ weekly call volume further surged to 5.16 trillion tokens, a 127% increase over three weeks, while US models’ call volume dropped to 2.7 trillion tokens. Meanwhile, among the top five models globally by call volume, four are from Chinese vendors. This strong growth momentum is not driven by a single blockbuster product but by the collective rise of Chinese AI companies.

Tokens are the smallest units of text processed by AI models. Compared to user numbers, token usage more accurately reflects the intensity of AI model use, user engagement, and commercial value.

Chinese AI vendors are rapidly iterating and leveraging cost advantages to capture the global market, with domestic computing power demand experiencing exponential growth.

Ranking Shake-up: China’s Token Call Volume Surpasses the US, Four Major Models Lead the Charts

OpenRouter aggregates hundreds of large language models worldwide, with over 5 million developer users, making it the largest AI model API platform globally. Its API call data is considered the most authentic indicator of global AI application trends, as it directly reflects developer choices and the popularity and competitiveness of models in real-world applications.

Notably, the platform’s users are mainly overseas developers, with US users accounting for 47.17%, and Chinese developers only 6.01%. This makes its ranking data more objectively reflect China’s AI model appeal on the global stage.

Daily Economic News reporters (hereafter “the reporters”) analyzed OpenRouter data and found that global large model token usage experienced astonishing explosive growth over the past year. In the week of March 3-9, 2025, the top ten models on the platform had a combined weekly call volume of only 1.24 trillion tokens. By mid-February 2026, this number had skyrocketed to 13.95 trillion tokens, more than tenfold in less than a year.

In 2025, US models were the main growth driver, accounting for nearly 70% of the top ten models’ weekly token calls, while Chinese models accounted for less than 20%. However, entering 2026, US model growth slowed, while Chinese models entered a “blazing” growth phase.

Data shows that in the first week of February 2026 (February 2-8), Chinese models’ weekly call volume reached 2.27 trillion tokens, signaling a strong push.

Just one week later, during February 9-15, Chinese models’ call volume surged to 4.12 trillion tokens, surpassing the US models’ 2.94 trillion tokens for the first time, achieving a historic overtaking.

This momentum continued, and in the week of February 16-22, Chinese models’ weekly call volume soared to 5.16 trillion tokens, a 127% increase over three weeks, further widening the lead. This powerful growth is driven by the collective rise of Chinese AI companies, not reliance on a single blockbuster.

The weekly rankings from February 16-22, 2026, show that four of the top five models are from Chinese vendors: MiniMax’s M2.5, Kimi K2.5 from Yue Zhi An, GLM-5 from Zhipu, and V3.2 from DeepSeek. These four models account for 85.7% of the total call volume of the Top 5.

Specifically, MiniMax’s M2.5, released on February 13, 2026, quickly topped the weekly call volume chart within less than a week. During the week of February 9-15, of the total 3.21 trillion tokens processed by OpenRouter, M2.5 alone contributed an astonishing 1.44 trillion tokens.

Yue Zhi An’s Kimi K2.5 model, released on January 27, 2026, leverages its native multimodal architecture and powerful agent parallel processing capabilities to achieve continuous jumps in call volume. It can coordinate up to 100 “agent clones” working in parallel, increasing complex task efficiency by 3 to 10 times. Media reports indicate that within less than a month of launching Kimi K2.5, its total revenue exceeded its entire 2025 income, driven by a surge in global paid users and API calls.

Zhipu’s flagship model GLM-5, released after February 12, 2026, with its 200K long context window and deep optimization for long-range agent tasks, saw rapid user growth, with call volume reaching 0.8 trillion tokens in its first week.

Over the past year, Alibaba’s Qianwen, though not frequently ranked individually, has a total token call volume of 5.59 trillion, ranking second globally after DeepSeek’s 14.37 trillion, according to a joint report by a16z and OpenRouter.

Frost & Sullivan reports that in China’s B2B large model market, in the second half of 2025, the Qwen series models accounted for 32.1% of daily token calls, nearly doubling from 17.7% in the first half of the year, widening the lead over ByteDance’s Doubao (21.3%) and DeepSeek (18.4%).

Regarding the landscape of Chinese AI large models, Hu Yanping, a distinguished professor at Shanghai University of Finance and Economics, described it as the “AI China Group.”

He believes that higher industry concentration is not necessarily better; having multiple leading companies forming a broad technological ecosystem, rather than a few oligopolies, benefits competition, innovation, talent development, and helps establish cluster advantages in the US-China AI race.

Martin Casado, partner at Andreessen Horowitz (a16z), observed that 80% of AI startup roadshows seeking funding in Silicon Valley now use Chinese open-source models as their core.

Cost Advantage: Why Are Chinese Tokens So Cheap Compared to US AI?

China’s models have rapidly gained global developer favor not only because of performance comparable to or surpassing top international models but also due to their highly competitive costs.

For example, on OpenRouter, the cost for input processing (Input) for MiniMax M2.5 and Zhipu’s GLM-5 is $0.3 per million tokens. In contrast, the mainstream overseas model Claude Opus4.6 costs as much as $5 per million tokens—about 16.7 times higher.

For output generation (Output), the cost gap is even more significant. MiniMax M2.5 costs $1.1 per million tokens, Zhipu’s GLM-5 $2.55, while Claude Opus4.6 skyrockets to $25 per million tokens—roughly 22.7 and 9.8 times higher, respectively.

This huge cost difference directly influences developers’ economic considerations when choosing APIs.

The primary reason for this cost gap is architectural innovation at the algorithm level.

Li Qing, director of Frost & Sullivan China, explained that the “Mixture-of-Experts (MoE)” architecture is a key technology enabling Chinese models to significantly reduce inference costs. Models like DeepSeek and Alibaba’s Tongyi Qianwen 3.5-Plus widely adopt MoE architecture.

MoE works by splitting a large model into multiple smaller “expert networks” and a gating network. Although the total parameters may be in the hundreds of billions, the gating network intelligently activates only a subset of relevant experts for each task, reducing computation and hardware demands. This “on-demand activation” mode can cut inference memory usage by 60% and increase throughput by up to 19 times, fundamentally lowering costs.

Beyond architectural innovation, Chinese AI companies are actively exploring “vertical integration” to further reduce token costs. This involves deep, integrated design and optimization of model algorithms, cloud infrastructure, and AI chips, solving hardware-software compatibility issues and maximizing computing power.

Li Qing cited Alibaba’s “Tongyi-Cloud-Chip” system as an example, which through top-tier resource scheduling, maximizes hardware utilization and drastically cuts infrastructure costs behind AI services. Such system-level optimization further reduces token generation costs.

JPMorgan’s research predicts that from 2025 to 2030, China’s token consumption will grow at a compound annual rate of 330%, reaching 370 times its current level in just five years.

Value Transformation: Tokens Are Becoming the “Fuel” of the AI Era

The exponential rise in token consumption reflects not only user growth and longer usage times but also a fundamental shift in AI usage patterns. AI is evolving from a simple Q&A tool for information and casual conversation into a productivity tool capable of deeply participating in workflows and handling complex tasks.

Recently, Guolian Minsheng Securities introduced the concept of “Token Inflation,” meaning that the structure of token consumption per user per unit time is increasing structurally—not that token prices are rising. This phenomenon is driven by three core trends:

First, user needs are shifting from shallow “Q&A” to deep “work,” such as code rewriting, document generation, and testing, which naturally consume many tokens.

Second, the rise of AI Agent technology amplifies token consumption. Agents plan, retrieve, execute, and reflect, repeatedly calling models, increasing token usage step by step.

Third, inference intensity is rising. More complex reasoning and longer chains of thought significantly increase token consumption for outputs and intermediate steps. For developers, this often means higher success rates and less rework, as users are willing to “invest more tokens for efficiency.”

These shifts mean tokens are no longer the near-zero marginal cost “traffic” of the internet era but essential “fuel” for executing production tasks.

This trend aligns with the views of top global chip manufacturers. Nvidia CEO Jensen Huang emphasized on February 26 that “computing equals revenue” and “inference equals revenue.” Without computing power, tokens cannot be generated; without tokens, revenue growth is impossible. In the AI era, inference performance directly determines revenue capacity, and the core of inference is the efficient generation of monetizable tokens. As data center power constraints intensify worldwide, “performance per watt” has become a key metric for AI service efficiency and revenue potential.

Li Qing told the reporters that the business model of AI services is evolving from simple “pay-as-you-go” to a hybrid of “fuel + results.” While token prices will continue to decline with technological progress and scale, companies will increasingly pay for direct “results,” leading to more subscription-based business models.

She also predicted that future AI service pricing will inevitably become highly customized and flexible. The arrival of the Agent era means task complexity varies greatly, and a single pricing model cannot meet all needs. Factors like computing consumption, call frequency, and whether tasks involve multi-step reasoning or planning will influence pricing, leading to a multi-dimensional, dynamic pricing system becoming mainstream.

(Article source: Daily Economic News)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes