The Next Big Shake in AI: Why the Real Danger Isn't SaaS Killers, but the Computing Power Revolution?

robot
Abstract generation in progress

Written by: Bruce

Recently, the entire tech and investment circles have been closely watching the same thing: how AI applications are “killing” traditional SaaS. Since @AnthropicAI’s Claude Cowork demonstrated how easily it can help you write emails, create PPTs, and analyze Excel spreadsheets, a panic about “software is dead” has begun to spread. It’s indeed alarming, but if your focus stops here, you might be missing the real seismic shift.

It’s like all of us looking up at drone dogfights in the sky, while no one notices that the entire continent beneath our feet is quietly shifting. The true storm is hidden beneath the surface, in a corner most people overlook: the foundational computing power supporting the entire AI world is undergoing a “silent revolution.”

And this revolution might cause the AI chip sellers—Nvidia @nvidia—to end their grand party much earlier than everyone expected.

Two converging paths of revolution

This revolution isn’t a single event but is woven from two seemingly independent technological trajectories. They are like two advancing armies encircling, forming a pincer movement against Nvidia’s GPU dominance.

The first path is the algorithmic slimming revolution.

Have you ever wondered whether a superbrain truly needs to activate all its neurons when thinking? Obviously not. DeepSeek has figured this out and developed an MoE (Mixture of Experts) architecture.

You can think of it as a company with hundreds of specialists in different fields. But when solving a problem, you only need to call in the two or three most relevant experts, rather than brainstorming with everyone. That’s the cleverness of MoE: it allows a massive model to activate only a small subset of “experts” during each computation, greatly saving computing power.

What’s the result? DeepSeek-V2, nominally with 236 billion “experts” (parameters), only activates about 21 billion at a time—less than 10% of the total. Yet, its performance can rival GPT-4, which requires 100% full operation. What does this mean? AI capabilities are decoupling from the amount of compute they consume!

In the past, we all assumed that the stronger the AI, the more GPU power it needed. Now, DeepSeek shows that through smart algorithms, the same results can be achieved at one-tenth of the cost. This directly questions the necessity of Nvidia GPUs’ core value.

The second path is a hardware “change of lane” revolution.

AI tasks are divided into training and inference. Training is like going to school—reading thousands of books—where GPU’s massive parallel computing power is invaluable. But inference is more like our daily AI usage, where response speed matters most.

GPUs have an inherent flaw in inference: their memory (HBM) is external, causing latency when data moves back and forth. It’s like a chef whose ingredients are stored in a fridge in the next room—no matter how fast, it’s still a delay. Companies like Cerebras and Groq have taken a different route, designing dedicated inference chips with SRAM directly soldered onto the chip, keeping ingredients at hand for “zero latency” access.

The market has already voted with real money. OpenAI complains about Nvidia’s GPU inference performance but then signs a $10 billion deal with Cerebras to rent their inference services. Nvidia itself is panicking, spending $20 billion to acquire Groq to avoid falling behind in this new race.

When these two paths converge: a cost avalanche

Now, let’s put these two developments together: a DeepSeek model that’s “slimmed down” via algorithms running on a Cerebras chip with “zero latency.”

What happens?

A cost avalanche.

First, the slimmed-down model is small enough to fit entirely into the chip’s onboard memory. Second, without the bottleneck of external memory, AI response speeds become astonishingly fast. The final result: training costs drop by 90% thanks to the MoE architecture, and inference costs drop by an order of magnitude due to dedicated hardware and sparse computation. The total cost to own and operate a world-class AI could be just 10-15% of traditional GPU-based solutions.

This isn’t just an improvement; it’s a paradigm shift.

Nvidia’s throne is quietly being pulled out from under it

Now you should understand why this is more deadly than the “Cowork panic.”

Nvidia’s tens of trillions market cap is built on a simple story: AI is the future, and that future depends on their GPUs. But now, the foundation of that story is being shaken.

In the training market, even if Nvidia continues to monopolize, if customers can do the work with one-tenth of the cards, the overall market size could shrink significantly.

In inference, a market ten times larger than training, Nvidia faces not only a lack of absolute advantage but also encirclement from players like Google, Cerebras, and others. Even its biggest customer, OpenAI, is starting to defect.

Once Wall Street realizes that Nvidia’s “shovel” isn’t the only or even the best choice anymore, what happens to the valuation built on “perpetual monopoly”? Everyone knows the answer.

So, the biggest black swan in the next six months might not be some AI app knocking out another, but a seemingly insignificant tech news: a new paper on MoE algorithm efficiency, or a report showing a surge in market share for dedicated inference chips, quietly signaling a new phase in the compute war.

When the “shovel seller” no longer has the only option, his golden age may be coming to an end.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)