United Securities: Token demand in "inflation" Short-term observation of large model vendors' price increases and demand-driven marginal improvements

robot
Abstract generation in progress

CryptoTimes Finance APP has learned that Guolian Minsheng Securities released a research report stating that cloud computing is gradually shifting from “selling resources” to large model vendors transforming into “selling tokens + results.” The price increase of Zhipu (02513) GLM Coding Plan reflects a change in industry pricing logic: When inference consumption becomes a production factor, model vendors have the opportunity to convert “computing power scarcity” into gross profit and cash flow through tiered pricing and subscription products.

In the short term, observe the marginal improvements brought by price hikes and demand (Token “inflation”); in the medium term, track enterprise seat growth and subscription retention for renewal and expansion; in the long term, the widespread adoption of governance tools will bring new markets for “AI firewalls.”

Guolian Minsheng Securities’ main points are as follows:

Event: On February 12, Zhipu announced via official channels that the subscription price of GLM Coding Plan would be increased by “at least 30%.”
Previously, overseas cloud providers also raised prices this month, with Google Cloud increasing prices in North America by 100%, and also raising prices in Europe and Asia; AWS’s prices were also increased by about 15%. Overall, the demand for tokens (“inflation”) not only benefits cloud computing power but also gives model vendors bargaining power.

Disrupting the traditional free internet model

The typical path of traditional internet software is to first use free offerings to gain user scale, leveraging “user numbers and duration” to gain bargaining power, then monetize through advertising, memberships, value-added services, and transaction commissions. The underlying reason for free is simply extremely low marginal costs. That is, each additional user or click can have its cost diluted by bandwidth and storage scale effects, approaching near-zero marginal cost.

In the era of cloud computing, a similar “free/low-cost expansion” approach appeared, but cloud billing units quickly shifted to CPU/storage/bandwidth/request counts, and customers became accustomed to “pay-as-you-go.” Cloud services can charge because they deliver explicit resources and SLA (Service Level Agreements) between providers and customers. However, as the industry remains in a “model price war,” Zhipu has signaled a price increase, indicating that in the large model era, the “measurement unit” has shifted from traffic (DAU/duration) to tokens (inference consumption), which are increasingly a necessity in many scenarios.

Changes in the large model era: Tokens become “measurable production materials,” no longer “free traffic”

Large models turn “dialogue/code writing/content generation,” which seem like services provided by software vendors, into online inference services that are heavily dependent on computing power. For model vendors, each response consumes GPU, memory, bandwidth, and electricity; for users, each time they “ask the model to think longer, write longer code, or run more complex tasks,” it results in more token consumption, making tokens a natural new measurement unit. Previously, Zhipu limited the Coding Plan due to phased GPU resource constraints, which later led to a typical supply-demand chain: demand surged in the short term → resources became rigidly constrained (leading to rate limiting/limiting) → prices increased.

When peak congestion and resource shortages occur, price hikes serve as a mechanism for model vendors to filter demand, better protecting user experience than indiscriminate rate limiting. Moreover, the cost side of model vendors remains strongly related to GPU supply, utilization, and inference optimization. Tiered pricing and price increases can help model vendors escape the “bigger scale, more losses” trap, improving gross margins and cash flow quality.

Token demand in “inflation”

“Token inflation” does not mean tokens themselves become more expensive, but rather that the structure of token consumption per unit time and per user increases structurally. The reasons for rising token demand include:

From “Q&A” to “doing work”:
As models develop, users are no longer satisfied with simple answers but want models to reconstruct code, rewrite files, generate documents, and run tests. Programming scenarios are naturally characterized by “long context, multi-turn iterations, large outputs,” which consume大量tokens. Zhipu’s statements also confirm that developers rely on its models for coding support, leading to rapid growth in token consumption.

From “single-turn” to “multi-turn agent”:
Zhipu positions GLM-5 as a new generation model for coding and agent scenarios; on February 12, MiniMax-WP(00100) also launched its latest flagship programming model M2.5, labeled as the world’s first production-level model natively designed for agent scenarios. M2.5’s programming and agentic capabilities (Coding & Agentic) are directly compared to Claude Opus4.6. Agents actively plan, retrieve, execute, and reflect, calling the model multiple times, with token consumption naturally accumulating step by step.

Increased inference intensity:
More “deep thinking” and “long chain reasoning” significantly increase token consumption for outputs and intermediate processes. For developers, this often results in higher success rates and less rework, and users are willing to “burn more tokens for efficiency.”

This means tokens are no longer the near-zero marginal cost “traffic” of the traditional internet era but are essential “fuel” for production tasks.

Investment recommendations

Cloud computing is gradually shifting to “selling resources,” while large model vendors are transforming into “selling token fuel + results.” The price increase of Zhipu’s GLM Coding Plan reflects a change in industry pricing logic: when inference consumption becomes a production factor, model vendors can convert “computing power scarcity” into gross profit and cash flow through tiered pricing and subscription products. Continued focus should be on:

Cloud providers and infrastructure:
AI-driven IT spending and infrastructure investment remain in an upcycle, with cloud providers benefiting from sustained growth in GPU computing, storage, and network I/O “accompanying consumption.”

Large model vendors:
When they can maintain subscription retention and enterprise seat expansion in high-ROI scenarios like coding, agents, and enterprise workflows, and convert “token usage” into delivery value that “saves manpower, time, and rework,” they will have the ability to withstand open-source and price wars.

Security governance and runtime protection tools:
As enterprises embed AI into workflows, risks such as data leakage and proxy overreach will drive the demand for “AI security platforms/governance platforms” as essential.

In the short term, monitor the marginal improvements from price hikes and demand (“token inflation”); in the medium term, track enterprise seat growth and subscription retention for renewal and expansion; in the long term, the widespread adoption of governance tools will create new markets for “AI firewalls.”

Risk warnings

Technological route changes are uncertain; industry competition may intensify.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)