Amazon(AMZN.US) Launches AI Cost Revolution! In-House AI ASICs Targeting Large Model Training, Nvidia's Computing Power Monopoly Faces Its Strongest Challenge

robot
Abstract generation in progress

TechNews APP reports that the leading U.S. e-commerce and cloud computing giant Amazon (AMZN.US) will extensively experiment with its self-developed AI chips—namely Trainium and Inferentia—AI ASIC clusters to develop and iterate its own large AI models, significantly reducing costs. In the context of the expansion prospects of NVIDIA and AMD-dominated AI GPU computing systems, Amazon’s move could bring about “medium- to long-term margin pressure + weakening of monopoly premiums.” Under the wave of AI inference and the trend of embedding AI large models into enterprise operations through “micro-training,” more cost-effective AI ASIC technology routes may pose the strongest challenge yet to NVIDIA’s nearly 90% market share in AI chips.

From the perspective of the AI computing industry chain and chip engineering, Amazon’s cloud platform AWS using self-developed AI chips to train large AI models—rather than focusing primarily on AI inference as before—is a significant milestone in Amazon’s self-developed AI ASIC computing route. However, it is not a starting milestone for AI ASICs—this was already demonstrated by Google’s TPU (belonging to the AI ASIC technology route). Now, AWS is upgrading its “self-developed AI ASIC cluster infrastructure participating in AI” to “self-developed AI ASIC directly supporting the core computing system for its cutting-edge AI large models,” which is of major significance for the industry chain of hyperscalers like Amazon, Google, and Microsoft.

Market concerns about NVIDIA’s prospects are justified

Amazon’s new head of AI infrastructure, Peter DeSantis, said in a media interview on Friday morning: “If we can build models on our own self-developed AI chips, we can do so at just a small fraction of the cost of pure AI large model providers.”

DeSantis also added: “Building ultra-large AI data centers does involve certain costs. If we ultimately want AI to change everything, costs must be different.”

The market generally believes that “AI chip super-giant” NVIDIA (NVDA.US) still holds the vast majority of market share in the most critical area of AI infrastructure—artificial intelligence chips. This chip giant, led by Jensen Huang, recently announced quarterly earnings and guidance for FY2026 that far exceeded expectations, but its stock price fell sharply by 5% on Thursday, mainly due to increasing market concern over recent announcements from hyperscalers about launching more cost-effective AI ASIC chips based on self-developed models. This trend increasingly signals risks to NVIDIA’s long-standing dominant position in the core global AI infrastructure—AI chips.

Undoubtedly, with Amazon’s announcement to experiment with Trainium and Inferentia for developing large AI models, market concerns are justified.

Earlier this month, Amazon’s management stated that capital expenditure in 2026 would reach approximately $200 billion, well above Wall Street expectations. Amazon CEO Andy Jassy said that part of this spending would be used to develop and upgrade its self-developed AI ASIC computing infrastructure.

Jassy stated: “Given the strong demand for our existing e-commerce services, traditional cloud computing, and AI computing power, as well as groundbreaking growth opportunities in AI large models, humanoid robots, and low Earth orbit satellites, we expect Amazon to invest about $200 billion in capital expenditure in 2026, which will generate strong long-term investment returns.”

The wave of AI inference is coming; NVIDIA may no longer be the “biggest winner” in AI

The real innovation in Amazon’s latest plan is not whether “self-developed AI ASICs can train large models,” but rather its intention to push self-developed AI chips from optional cloud AI computing power to the core path of developing its own foundational models.

NVIDIA’s AI GPU, which nearly monopolizes AI training, requires more powerful and versatile AI computing clusters and rapid iteration capabilities across the entire computing system. Meanwhile, in AI inference, after the scale-up of cutting-edge AI technology, the focus shifts more to unit token cost, latency, energy efficiency, and hardware-software co-optimization. For example, Google explicitly positions Ironwood as a “generation of TPU born for AI inference,” emphasizing performance/energy efficiency/scale-out cost-effectiveness. However, Amazon’s latest actions demonstrate that AI ASICs may have strong potential for training large models.

The AI ASIC computing system will undoubtedly continue to weaken NVIDIA’s monopoly premium and some market share in the medium to long term, but not replace GPU systems linearly. The fundamental reason is that, in the inference era, the core competition is no longer just “peak computing power,” but also token cost, power consumption, memory bandwidth utilization, interconnect efficiency, and total cost of ownership after hardware-software integration. In these metrics, ASICs tailored for specific workloads—optimized data flow, compilers, and interconnects—are inherently more cost-effective than general-purpose GPUs.

However, for NVIDIA and AMD, this largely means that marginal pressure is real, but it is more likely to manifest as declining bargaining power, market share being divided, and valuation premiums being compressed, rather than a complete demand collapse. AI ASICs will undoubtedly continue to impact NVIDIA’s GPU monopoly in the AI inference superwave, but this impact is more about reshaping industry profit pools and customer procurement structures than invalidating GPU expansion logic.

AWS explicitly positions Trainium and Inferentia as dedicated accelerators for generative AI training and inference, with Trainium 2 offering about 30%–40% better price-performance compared to its AI GPU cloud instances. Google also recently announced that Gemini 2.0’s training and inference are 100% run on TPUs. This indicates that “large cloud providers using self-developed ASICs for core model training/inference” is no longer just a concept but is entering a reproducible industrial phase.

However, claiming that “GPU systems will be quickly overwhelmed” is an overstatement. NVIDIA’s true moat is not only in its chips but also in CUDA, development tools, model adaptation breadth, and ecosystem inertia. Bloomberg analysts last year pointed out that over 4 million developers worldwide rely on CUDA, meaning that many cutting-edge training, complex hybrid workloads, and rapidly iterating new models are still more suitable for GPUs in the short term. Even as AWS promotes self-developed AI chips, it continues to incorporate GPU architectures into future chips and provides AI infrastructure based on NVIDIA’s computing power. This clearly shows that the true strategy of hyperscalers is not “de-GPU-ization,” but maintaining GPUs at the high-end training layer while increasing ASIC share in large-scale inference and their own model stacks. Therefore, from an engineering perspective, the future is more likely to be a “layered coexistence of GPU + ASIC” rather than a single route prevailing.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)