The Age of Reasoning is Accelerating! Nvidia Pushes LPU — Which Listed Companies Are Expected to Benefit from Million-Unit Shipment Dividends?

robot
Abstract generation in progress

On March 17th, at 2 a.m. Beijing time, NVIDIA CEO Jensen Huang officially announced the Groq 3 LPU (Language Processing Unit) inference chip during the GTC 2026 keynote speech, integrating it into the next-generation Vera Rubin AI platform.

This marks the first time since NVIDIA’s licensing agreement with Groq late last year that an LPU has been launched as a mass-produced product. Huang stated that Samsung is fully accelerating production of this chip, with the Groq 3-based LPX rack expected to be available in the second half of this year.

Just before the GTC opening, Tianfeng International Securities analyst Guo Minghao published a report stating that after NVIDIA’s investment in Groq, the shipment forecast for LPUs has been significantly raised. He predicts that from 2026 to 2027, total LPU shipments will reach 4 to 5 million units, with new architecture racks expected to begin mass production in Q4 this year. In 2026, rack shipments are estimated at around 300,000 to 500,000 units, increasing to 15,000–20,000 units in 2027.

While NVIDIA is bringing the LPU to the forefront, several domestic listed companies are also deploying around this technology route.

What does the LPU do?

In over two hours of this year’s GTC keynote, the word “inference” appeared nearly 40 times.

“A strong signal from this year’s GTC is that the inference era is accelerating,” said a relevant person from Yuntian Lifei (688343.SH) to Cailian Press.

He explained that as Agentic AI moves from “conversational” to “action-oriented,” large models are increasingly embedded into workflows. AI is shifting from a dialogue tool to a workforce capable of task decomposition, tool invocation, and process execution. Once AI enters production, industry focus shifts from model strength to computational power.

A core concept Huang repeatedly emphasized is the “token factory”: under fixed power, space, and cost constraints, enabling data centers to produce more, faster, and more commercially valuable tokens. He stated that tokens are the hard currency of the AI era, and computing power equals enterprise revenue.

The LPU is designed to improve the efficiency of this “factory.”

A relevant person from Yuntian Lifei explained the technical logic: the large model inference process can be divided into two stages—prefill and decode. Prefill requires high parallel computing power, large memory capacity, and high throughput; decode requires low latency, low jitter, and quick response. The Vera Rubin GPU and Groq 3 LPU, launched at GTC, target these two stages respectively.

He further clarified that not the entire decode stage is handled by the LPU, but specifically the token generation part, where the LPU excels. The Rubin GPU continues to handle attention calculations during prefill and decode.

Huang also provided specific deployment recommendations: about 25% of data centers should deploy Groq, with the remaining 75% using Vera Rubin. He noted that if users mainly perform high-value token generation tasks like encoding, the benefits of adding Groq are more significant.

In terms of performance, Zhiyi Intelligence (001339.SZ) stated during an investor conference in March that, based on test data announced by Groq CEO at the 2024 ISSCC (International Solid-State Circuits Conference), the LPU’s token generation speed is six times that of NVIDIA’s H100 GPU, with per-token costs reduced to a quarter of H100, and inference power consumption cut to one-third.

The speed advantage of the LPU comes from its architectural design.

Unlike the general-purpose parallel computing architecture used by GPUs, the LPU adopts a deterministic data flow processor architecture, where the compiler completes all scheduling during compilation, eliminating the need for dynamic arbitration at runtime. Additionally, the LPU is equipped with large on-chip SRAM, with data integrated directly on the chip, resulting in much lower access latency compared to GPUs reading data from external memory.

Zhiyi Intelligence illustrated this difference with an analogy: the static compile scheduling of the LPU is like a high-speed rail timetable, with all schedules predetermined and congestion extremely unlikely; whereas GPU dynamic scheduling is like driving freely on a highway, where individual randomness inevitably leads to systemic congestion.

NVIDIA disclosed that, after deploying Vera Rubin and Groq 3 LPU together, token generation efficiency per megawatt can be increased by 35 times. Currently, Groq 3 LPU is manufactured by Samsung, with LP30 chips in mass production. A single LPX rack can hold 256 LPUs, with shipments expected in Q3 this year.

A relevant person from Yuntian Lifei believes that GTC 2026 not only signals product developments from NVIDIA but also reflects a growing industry consensus: the inference era is no longer just about peak parameters, but about fine-tuning for different task-specific computational features, ensuring every bit of computing power is used where it’s needed.

Who is following domestically?

From concept to mass production, domestic listed companies have begun to deploy.

In chip design, Zhiyi Intelligence recently invested in Hangzhou Yuanchuan Micro Technology Co., Ltd. through Yaoteng Investment. Yuanchuan Micro is a domestic company based on LPU architecture, developing hardware data flow architecture and a full-resource compiler, with two main product lines: Mountain (computing power) and River (Agent), targeting large models and edge applications.

Zhiyi Intelligence stated that by partnering with upstream chip manufacturers, the company aims to strengthen its position from training to inference, enhancing its product capabilities in AI servers, embodied intelligence, edge, and terminal sectors. They also expect inference computing power to account for 90% of total AI workload in the future, with training only 10%, and that LPU will dominate the inference market.

Additionally, Xingchen Technology (301536.SZ) has made multiple rounds of investment in Yuanchuan Micro.

Yuntian Lifei is also pursuing similar technical routes from the chip architecture level. The company has publicly proposed the GPNPU (General-Purpose Programmable Neural Network Processor) architecture, planning P chips and D chips for large model inference scenarios, optimized for prefill and decode stages respectively, and using 3D stacked memory to alleviate bandwidth bottlenecks in inference chains.

A relevant person from Yuntian Lifei said that if NVIDIA is demonstrating “heterogeneous inference” with Rubin + LPX as a global example, domestic companies are continuously advancing inference architecture innovation through PD separation and storage collaboration, effectively following the same industry direction.

Wantong Zhikong (300643.SZ) is also active in the LPU field. The company has invested in Shenming Aosi (holding 5.66%), securing its global exclusive manufacturing and sales rights for LPU boards in embodied intelligence domain control. Shenming Aosi’s LPU chip, Fellow 1, was taped out in Q1 this year and entered sample testing in Q2.

In the upstream supply chain, large-scale application of LPU will also bring growth to the PCB (printed circuit board) industry.

Everbright Securities recently reported that due to the limited on-chip SRAM capacity of individual LPUs, large-scale models require hundreds of LPUs connected in series, which will significantly increase PCB substrate area compared to pure GPU solutions. Additionally, LPU demands higher PCB materials, likely using 52-layer M9-grade copper-clad laminates. These changes will drive increased PCB area requirements and manufacturing complexity.

Currently, companies like Huatian Technology (002463.SZ), Shenghong Technology (300476.SZ), and Shennan Circuit (002916.SZ) are involved in high-end PCB business.

(Source: Cailian Press)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin