Inference Computing Power Demand Surges; Industry Chain Enterprises Accelerate Layout

robot
Abstract generation in progress

Securities Daily Online Reporter Wang Jingru

As generative artificial intelligence technology gradually moves from “model training” to large-scale commercial deployment, the focus of computational power consumption is shifting from training to inference-driven continuous computing needs. On March 17, NVIDIA CEO Jensen Huang stated at the GTC conference that the inflection point for the AI inference market has arrived, with AI fully entering the inference and execution phase from training, leading to exponential growth in inference computing demand.

“With the expansion of generative AI applications, the demand for inference computing power may grow much faster than training power. On one hand, application demand is exploding, with generative AI and intelligent agent applications accelerating deployment, and high-frequency user interactions generating exponential inference requests; on the other hand, breakthroughs in specialized inference chips, liquid cooling, and optical interconnect technologies are significantly improving computing efficiency and concurrency, laying the foundation for large-scale deployment,” said Zhang Pengyuan, researcher at Qianhai PaiPaiNet Fund Sales Co., Ltd., in an interview with Securities Daily.

Industry forecasts indicate that the importance of inference computing power continues to rise. International Data Corporation (IDC) predicts that by 2027, inference computing will account for over 70% of total computing power in China. Huang Chao, founder and CEO of China IDC Circle, stated that by 2026, the industry’s intelligent agents will enter a blooming development stage, with computing power applications shifting from “training-led” to “inference-driven,” and the explosive demand for inference computing power is about to fully arrive.

In response to the rapid growth in inference computing demand, domestic upstream and downstream companies are accelerating technological research and product deployment. At the chip level, many manufacturers are launching chips optimized for inference scenarios. Compared to traditional training chips, inference chips emphasize power consumption control, cost efficiency, and deployment flexibility, offering broad application prospects in both cloud and edge environments.

Take Shenzhen Cloud Tian Lihui Technology Co., Ltd. (hereinafter “Yuntian Lihui”) as an example. The company focuses on NPU as the core, establishing the GPNPU technology route for large-scale cloud inference chips, with deep optimization in matrix, vector units, storage hierarchy, and effective bandwidth utilization. The goal is to exponentially reduce token costs and accelerate the large-scale, inclusive deployment of large models.

By 2025, Yuntian Lihui is expected to achieve revenue of 1.308 billion yuan, a year-on-year increase of 42.57%. A relevant executive told Securities Daily, “For enterprises, as industry competition shifts from training scale to inference efficiency, delivery costs, and system profitability, those who can integrate hardware, storage, and software earlier will have a better chance to take the lead in the inference era.”

At the server and system level, leading manufacturers are also continuously launching inference-optimized computing platforms. For example, Inspur Electronic Information Industry Co., Ltd. launched the YuanNao R1 inference server, capable of supporting 16 standard PCIe double-width cards in a single machine, which can deploy the DeepSeek-671B model; they also introduced YuanNao CPU inference servers for rapid deployment and efficient operation of next-generation inference models like DeepSeek-R132B and QwQ-32B.

Meanwhile, infrastructure construction for computing power is accelerating. In the past, many domestic intelligent computing centers adopted integrated training and inference models. On March 12, Yuntian Lihui won the bid for the Guangdong Zhanjiang AI penetration support new productivity infrastructure project, which is positioned as an AI inference cluster dedicated to inference tasks, mainly serving various industry applications and providing implementation examples for AI transformation of traditional domestic industries.

He Li, general manager of Beijing Zhi Yu Zhi Shan Investment Management Co., Ltd., believes that in this transformation, high-performance inference chips, HBM, and full-stack software will be the first to benefit from computing power dividends. Inference scenarios demand extremely low latency, high throughput, and high energy efficiency, with dedicated architectures like LPU and ASIC accelerating the replacement of general-purpose computing units. Storage technologies such as HBM4 will be key to breaking bandwidth bottlenecks. Additionally, as computing power shifts from data centers to the edge, the demand for high-density inference racks and advanced cooling technologies increases. Coupled with model quantization, parameter compression, and other compiler optimizations, this will drive the industry from hardware stacking toward hardware-software integration.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin