On the morning of March 24, 2026, I was sitting in the audience at YC W26 Batch Demo Day, and when the fifth company took the stage for their pitch, I decided to stop taking notes.

Not because it wasn’t important, but because I realized that what I recorded might be outdated by next month.

Among this batch of over a hundred companies, the work they’re doing is actually highly concentrated: about 80% are vertical agents, such as helping lawyers organize documents, assisting customer service with ticket distribution, or helping HR screen resumes.

If I had seen these projects last October, I would probably think “quite innovative.” But the problem is, in these five months, the world has changed.

Claude Code has shifted from a tool more aimed at developers to an interface that almost anyone can use directly. After Opus 4.6 was released, the entire vibe of coding has been pushed to the floor.

Those vertical agents, before forming business barriers, can be built by an ordinary engineer— even myself— over a weekend. They have already lost their investment value.

YC’s project cycle is three months. This batch, which started in December, plus the initial screening, means these are “good companies” selected five months ago. But in the current AI iteration speed, five months is enough for several paradigm shifts to occur.

When I first started my own business in 2012, and received a Fly Out (onsite interview invitation) from YC, YC was almost the sole leader in the accelerator track, and the companies they selected often represented “the next direction.” But the competitive landscape is changing. Over the past few years, YC has felt more like a lagging indicator.

YC’s batch system— from application, screening, admission, polishing, to pitching— has operated successfully for over a decade in the mobile internet era. But this rhythm was designed for a slower world.

Reflecting on the past year and a half in venture capital, I’ve been to Silicon Valley roughly once every quarter. The last time was October last year. Previously, I always felt change was rapid, but mostly perceptible month by month.

This time, it’s “weekly.”

One evening at dinner, a friend working on post-training casually said:

“I’ve realized that Silicon Valley itself is starting to fall behind itself.”

All-in token-maxxing: an unstoppable arms race

Half a year ago, if someone told me that Meta’s tens of thousands of engineers were all coding with competitors’ products, I would have thought they were joking.

But it’s true. The entire Meta is using Claude Code. This isn’t a startup, not an experimental team, but a trillion-dollar company.

Code security is out the window, token budgets are exploding, leaderboards are heating up— the whole Silicon Valley is throwing money into AI without regard for cost. But after the investment, what then?

First, code security. Half a year ago, this was unimaginable because code is a company’s core asset. How could you let an external API access it? Meta initially thought so too. They developed something called myclaw internally to address this. A Meta friend told me they built a coding product, but “it’s not user-friendly, nobody uses it.” After no one used it, the company had to loosen restrictions: as long as it doesn’t involve customer data, anyone can use Claude Code.

Then various departments started holding internal meetings on “how to become AI-native organizations,” doing training, and setting assessments. Code security, operational safety—these once sacred red lines—were all pushed to the back burner. The priority was to catch up on efficiency first.

For safety reasons, Google prohibits most employees from using Claude Code or Codex and other competitors’ tools, but DeepMind is an exception. Several teams responsible for the Gemini model and internal applications are using Claude Code.

Google has also made efforts: they launched an internal coding tool called Antigravity, and in February this year, claimed that about 50% of their new code was AI-generated.

But even so, DeepMind’s people are still using Claude Code. One key reason is that Anthropic has deployed it privately for them—since Anthropic’s inference and training are primarily on Google Cloud’s TPU, there’s a trust basis. But Meta and other tech giants don’t have this relationship; they’ve really thrown code security out the window. Everyone is betting on one thing: pushing speed to the max.

Code security is just the first flag to fall. The second is token budget.

In several AI-native startups I talked to in Palo Alto, a single engineer’s annual token budget is around $200,000. This number isn’t surprising; what’s shocking is that it’s close to the AI costs of a top engineer’s salary. It looks like companies are using AI to cut costs on hiring, but in reality, the total cost may not have decreased— just replacing human costs with token costs.

Meta is the most extreme here. They created an internal token consumption leaderboard: those who use the most tokens get on the list, and the bottom might be laid off. Meta employees even compete for an unofficial title called “token legend.”

Meanwhile, Meta has laid off over ten thousand people in two rounds this year. On one hand, everyone is using Claude Code to push token volume; on the other, they’re massively cutting staff.

These two aren’t contradictory; they are two sides of the same coin.

I visited a Series C company, and the CTO showed me Slack filled with running agents— dozens of Cursor agents running in parallel, with another window scheduling Claude Code. The most common anxiety among programmers now is: before sleeping, not knowing what those agents are doing makes them anxious.

But has productivity really increased by that much? Since late last year, many CTOs from top reasoning engine and database companies excitedly told me about “hundredfold engineers” and “tenfold efficiency gains”: tasks that used to take 60 people a year can now be done by 2 people plus Claude Code in a week.

I initially shared their excitement, but then I calmed down and asked myself: OK, efficiency has increased 100 times, but has the company’s revenue increased 100 times? Or has the product line expanded 100 times? It’s unlikely that a “100x” improvement would just result in trimming staff.

I didn’t get a positive answer. The fact is, a 100x efficiency boost, when applied to revenue growth, only shows about 50% or even 1x increase.

Where’s the gap? No one can clearly say yet.

“Using so many tokens, the company should have undergone a genetic mutation. But I don’t know what it has become.”

A founder with a background in B2B sales told me his team of 16, with two salespeople, went from zero to $30 million ARR in 12 months— all built on AI coding. Such cases are rare but occasionally seen. But more often, I see startups building more things, yet these don’t have product-market fit (PMF).

Silicon Valley now likes to experiment with “vibe coding” in a hundred ways to see what works, rather than trying ten. But who can grasp the next trend? It’s still very uncertain.

One of the most striking counterexamples comes from inside Anthropic. I asked a friend there: what’s the most painful scenario for your agents? He said it’s oncall (real-time response).

A typical oncall scenario: if Claude’s API suddenly slows down, a model inference node crashes, or a prompt output is abnormal, oncall engineers need to quickly locate the root cause—whether it’s a bug, compute resource issue, or model anomaly—and decide how to fix it.

Anthropic is the world’s strongest coding agent company, and this scenario is very close to their core capability. Yet, their internal oncall agents are still not very usable.

This is the real state in April 2026: the steam engine has been invented, but sometimes it’s still slower than a horse-drawn carriage. The key is, everyone knows the steam engine will eventually run faster, so they’re frantically pouring money into it: code security is ignored, token budgets explode, leaderboards heat up. But when will the steam engine truly surpass the horse? No one knows, but no one dares to stop and wait for that day.

Because the cost of stopping might be greater than burning the wrong tokens.

And token consumption probably won’t grow linearly. This reminds me of my experience with autonomous driving: in 2021, in Shanghai, we achieved continuous 5-hour autonomous driving without intervention for the first time. At that time, I thought it was a major breakthrough. Before, the test fleet was slowly increasing from 10, 15, to 20 cars; but after that inflection point, it quickly reached 100, then 1,000. Today’s coding agents are at a similar stage.

In 2021, in Shanghai, Didi’s autonomous driving first achieved 5 hours of continuous, intervention-free driving— a milestone for China’s autonomous driving. The photo shows Didi Autonomous Driving COO Meng Xing (me), in conversation with Sebastian Thrun, “the father of Google’s self-driving cars,” in 2021.

METR is a California-based research institute specializing in evaluating AI coding ability. Last year, they proposed an index: measuring how long an AI agent can complete a task with 50% success rate (based on human expert completion time). When first released in March 2025, Claude 3.7 Sonnet’s score was 50 minutes; by the end of 2025, Claude Opus 4.6 achieved 14.5 hours. Over the past two years, the doubling cycle of this metric has shrunk from 7 months to 4 months. Once agent reliability advances another step, token consumption won’t just increase by 50% annually—it could jump by an order of magnitude overnight.

A widely accepted prediction among friends: by the end of this year, many companies (including tech giants) will only need 20% of their staff.

xAI’s avalanche, and the start of model-building by rocket engineers

In a steakhouse in Mountain View, around 9 pm, a friend who worked with Elon Musk for a long time sat across from me. We talked for over three hours. Looking back, it seemed he didn’t say a single good word about Musk the whole time.

A detail: I asked him, after working at xAI for three years, what’s your daily rhythm? He said he’s basically lived at the company for three years, so his home is barely decorated, not even a bed. He sleeps in a sleeping pod at the office, similar to a hostel. I told him, now that he has huge equity, he should at least buy a bed. He just smiled.

The workload at xAI is notoriously intense in Silicon Valley, but now about 90% of the early team has left. They have a departure group, constantly adding new people.

The trigger was Tony Wu’s dismissal, which caused a chain reaction. An insider told me, “Other companies might need half a year to see senior management leave, but at xAI, it only took a month.” Some sensed Musk’s dissatisfaction as early as October last year, but no one expected such a swift purge.

Now Musk is pulling people from SpaceX and Tesla to take over xAI—“rocket builders are starting to build models.”

Musk’s dissatisfaction stems from pouring countless funds and compute power into it, yet Grok has never made it to the front line. Why? That’s a question I ask everyone who’s come out of xAI. The answer is simpler than I thought: the team’s combat effectiveness is very strong, and they work extremely hard, but the management style of manufacturing may not suit large model companies.

Having done autonomous driving for eight years, I have some personal insights. Musk’s past work with SpaceX and Tesla is fundamentally about systems engineering: long chains involving software, hardware, supply chains, each with room for innovation, but ultimately an end-to-end engineering problem.

He’s good at identifying key leverage points within these long chains and compresses timelines to solve problems. Cascading rocket engines, reusable landings—these are products of this thinking.

But at xAI, he’s not doing systems engineering. He’s doing three things: first, pouring a massive GPU cluster (so large that people joke xAI is no longer a neo lab but a neo cloud, providing compute for Cursor); second, setting pulse deadlines for the team; third, personally designing some product features. This is about targeting specific points, not making a complete plan.

People in autonomous driving know that in the late stage, “who leads whom” between software, infrastructure, and hardware teams becomes the core conflict. All three directions require CTO-level decision-making, but no one understands all three domains simultaneously. The best approach is for founders to understand resource balancing and prioritize phases—software first, then infrastructure—having a global plan.

xAI’s problem is the lack of this global planning—only sprinting. If the pressure weren’t so high, smart people could self-correct, given time, and find their own rhythm of collaboration. But Musk’s ultra-high-pressure management, combined with insufficient overall planning, causes disintegration. Each leader defends their own priorities, with no one overseeing the big picture.

The reason SpaceX and Tesla are so successful is often overlooked: in these industries, Musk has rarely faced competitors of similar scale. He’s competing against himself. But AI is different— it’s a fierce competition where even OpenAI could be poached by Anthropic.

A cofounder of xAI told me last year: two things he didn’t expect— first, how brutal the competition is; second, how few application innovations there are in the AI era, as most are eaten by the models.

Anthropic’s rise is the most dramatic reversal in the AI industry over the past year. It has also completely shifted the battlefield focus: a year ago, everyone was competing over C-end user numbers and video generation; now, the decisive battleground (stage-wise) is B2B and coding.

Of course, xAI’s story is also about “money coming too fast and too much”— a story of what happens when wealth floods in.

I think friends leaving xAI today won’t regret their decision to join. xAI has become Silicon Valley’s fastest wealth-creation myth. From its first round of hundreds of millions of dollars in funding to merging with SpaceX and becoming a $250 billion giant, it took just one year. Nearly every one of the 9 cofounders became a billionaire; core engineers have tens of millions to over a hundred million dollars. There’s just too much money in Silicon Valley. If they start a new venture now, they have enough confidence to pursue their interests rather than chasing quick profits.

Anxious engineers, even more anxious researchers

Talking with engineers, there’s a strange tacit understanding now: everyone admits they don’t code much anymore, but they pretend it’s no big deal because they will be armed with AI and eliminate those who aren’t AI-enabled.

Today, 80% of software engineers’ core skills have been replaced by models. The reason they still remain is that models sometimes make mistakes, and humans need to supervise. But even that supervision might soon become unnecessary.

A more radical thought: today’s so-called “AI-native organizations” sound very sexy—each department streamlining workflows, digitizing parts that can be AI-automated, turning skills into machine skills. But essentially, it’s human distillation: turning your capabilities into machine skills, so the company “inherits” your skills, effectively AI-ifying itself. Whether this leads to layoffs is a moral question. Today, Meta is doing exactly this.

Although everyone is competing for token-maxxing, you can still feel a pervasive underlying anxiety across Silicon Valley.

What surprised me even more is that this anxiety is spreading to researchers.

Researchers are the top-tier talent—not just “researchers” in general, but those responsible for training models and innovating algorithms at large model companies (OpenAI, Anthropic, DeepMind, etc.). Their difference from engineers: engineers “build things,” write code, deploy, optimize; researchers are more upstream—“think about what to build”: proposing new training methods, designing architectures, running experiments to verify hypotheses.

Now, even researchers’ work is being automated. That’s what DeepMind colleagues are doing—using models to train models, which is also a hot AI self-evolution trend this year. This year, the elimination is happening to engineers, and by year’s end, researchers will also start being replaced.

This isn’t a new concept. Andrej Karpathy’s auto-research (automated scientific research) pioneered this—today, various AI scientist tools and harness frameworks are heading in this direction. But most current closed loops only reach the “publish paper” stage—AI helps run experiments and write papers, but humans still make judgments.

Companies like OpenAI, Anthropic, and Google aim to go further: they want the closed loop to directly upgrade the model itself—not just fine-tune details, but let AI find the next paradigm-breaking breakthrough. If successful, this would truly replace researchers. Over a year ago, Google DeepMind was experimenting internally: letting models decide what experiments to run next, evaluate which paths are more promising, and follow that route—training the next generation of models through self-directed research.

Moreover, researchers have a strong incentive to be laid off—because they’re expensive. Top researchers worldwide earn millions, even tens of millions or hundreds of millions annually.

“The future might be that 10 people do the work of 100, earning 20% of the pay, while 90 people become unemployed.”

And the layoffs are actually larger than the surface numbers suggest. The first cuts many companies make aren’t on their financial statements but on outsourcing vendors. This means India, the Philippines, and other countries that once provided customer service, data labeling, and back-office finance for Western companies might be hit first. The “service industry ladder” that many developing countries relied on for economic upgrading could be being pulled out from under them by AI.

The entire Silicon Valley is watching Meta. If their experiment succeeds— revenue stays stable, efficiency truly improves— other giants will follow quickly. Layoffs will become the norm rather than exceptions. And layoffs tend to accelerate themselves: initially, no one dares to cut, fearing morale loss; but once it becomes routine, cuts happen faster and more ruthlessly.

But while old roles are cut, new ones are emerging.

Many startups are hiring for a new role called “AI builder”—a hybrid of product manager, front-end, and back-end engineer. Others combine data scientists and machine learning engineers into a hybrid role, or content operators who handle writing, advertising, and operations.

Demand for these new roles in Silicon Valley is very high, but the core challenge is: no one knows how to recruit them. You can’t filter by resume because these roles didn’t exist before; their capabilities are often hidden in personal projects. You can’t test by coding on the spot because the core skill is “aesthetic + AI usage,” a combination. So some startups are doing this: automatically generating simulated environments based on employer needs, then testing candidates on AI tools in real time. It’s like a new kind of coding test, but for a whole new skill set.

When AI can do everything, human value shifts from “what you can do” to “what’s worth doing and what’s not.”

Two valuations per funding round: Nvidia’s chips are always in play

Having discussed so many replaced roles— engineers, researchers, finance professionals— there’s one role that not only remains untouched but is increasingly acting as the behind-the-scenes boss in this reshuffle.

This seemingly distributed innovation world is actually highly centralized at the bottom.

And that center is Nvidia.

I used to think the scarcity of Nvidia’s chips had eased over the past year. It did for a while. Around mid-2025, some of Nvidia’s supported neo cloud providers (new cloud services rising in the AI wave, offering GPU compute) struggled to raise funds; some had sluggish growth, and a few even sold out at that time. But now I see, the scarcity has returned— even more ridiculously.

A specific signal: if you can reliably provide an API service today, like Claude’s API, with 99th percentile stability, you can charge two to three times the official API price.

After Anthropic’s demand surged, API outages are increasing, which is problematic for many agent products built on Claude.

Previously, routing services were “cheaper than official,” gaining traffic. Now, the logic is reversed: stability itself has become a scarce resource. Several startups are profiting from this, and mini versions of Coreweave/Nebius are sprouting across Silicon Valley.

And this time, the compute bottleneck isn’t just GPU allocation. Elad Gil recently wrote a very insightful piece: upstream memory manufacturers (Hynix, Samsung, Micron) need at least two more years to expand capacity. This means, until 2028, no AI company can significantly outpace others just by stacking compute. The physical constraints are reinforcing the oligopoly in large models—not because of lack of effort, but because of the slow manufacturing cycle in the physical world.

The underlying power structure is clear: whoever has chips is powerful; Nvidia decides who gets chips. Today, publicly listed companies like CoreWeave, Lambda, and Nebius are backed by Nvidia.

Nvidia’s strategy is deeper than I previously understood. Reflection’s investor mentioned that this neo lab initially focused on coding, then founder Jensen Huang told them: “Stop coding, come build ‘America’s DeepSeek,’ make open-source models for the US, I’ll fund you and give you chips.” Reflection pivoted 180 degrees.

This has also led to some unusual structures in the US capital market: in the same funding round, two valuation tiers are assigned. Well-connected early investors get in at a lower valuation; Nvidia, being cash-rich, and late investors are pushed into a higher valuation. This structure has recently begun appearing domestically as well.

But no matter how Nvidia tries to control distribution, it can’t control what doesn’t exist.

The entire US society’s resistance to data centers is escalating. Currently, about 100 data center projects nationwide face opposition, with 40 directly failing. Maine recently passed a law banning data center construction outright. A town approved a $6 billion data center project, but half of its members were recalled overnight, replaced by new officials whose only goal was to revoke that decision.

The shortage of compute isn’t due to poor products or insufficient users—it’s because the physical world can’t keep up with the digital world’s appetite.

This is another level of “falling behind.”

Silicon Valley’s valuation system is being rewritten

First, look at a number.

The US GDP is about $30 trillion. OpenAI and Anthropic each currently have an annualized revenue run rate of around $30 billion, meaning each already accounts for about 0.1% of US GDP. If both reach $100 billion by year’s end, plus cloud services and other AI revenues, AI will account for roughly 1% of US GDP. Going from nearly zero to 1% in just a few years.

This speed is unprecedented. But strangely, the faster the growth, the more investors are unsure how to price it— the valuation framework in Silicon Valley is collapsing under this rapid expansion.

I’ve had several deep conversations with friends in secondary markets. A recurring term is “re-rationalization”— the return to rational valuation.

In recent years, investing in AI was based on future cash flow: it didn’t matter if you lost money now, because I bet on your ARR in three or five years. But now, this framework is breaking.

The problem lies in the most basic valuation model: DCF (discounted cash flow). Normally, you forecast cash flows for 10 years, then add a terminal value—assuming the company will operate stably afterward, bundling the remaining value. Usually, terminal value accounts for 70-80% of the total valuation.

But now, two things are changing simultaneously: first, you might only be able to forecast 3 years instead of 10, because after 3 (or sometimes just 1) years, the industry could look entirely different; second, calculating terminal value is even more impossible because the premise— that the company will eventually stabilize— no longer holds if AI can disrupt everything at any moment. “Stability” becomes an invalid assumption.

I used a metaphor with a friend in secondary investment: companies not in the AI main track are now more like waiting for a “nuclear bomb”— you know it will be disrupted, just not when. So, the valuation shouldn’t focus on “what if not disrupted,” but on “how fast can they respond when disrupted.” That’s a completely different valuation logic.

SaaS was the first to be re-priced by Wall Street. In 2023, Snowflake’s free cash flow valuation implied nearly 100 years to break even; now, its valuation has halved. ServiceNow, Workday— similar trends. This is just the beginning.

In fact, only the top large model companies might still be suitable for DCF valuation, because their future seems more stable and upward trending—they won’t be “blown up,” but rather, their growth boundary can be expanded.

In the past, startups justified lower wages with “offering options that could be worth a lot in the future.” But that premise assumes the company will still be valuable in 15-20 years. If that’s no longer true, the most rational response from employees is: “Don’t give me options, just pay me in cash now.”

This, in turn, changes the company’s cost structure and financing logic.

Venture capital is also suffering. Over the past 3-6 months, nearly every fund in Silicon Valley invested in at least one neo lab— those researchers from famous AI labs raising hundreds of millions based on their ideas. But now, everyone feels it was somewhat impulsive and expensive. Why did they still invest? Because if that company really succeeds, its growth will be so fast that the initial valuation seems cheap.

A friend investor put it bluntly: “It’s either zero to 100, or zero to zero. Instead of investing in a costly Series A for ‘hard-earned money,’ better to bet on a neo lab with unlimited potential.”

Previously, everyone thought 1 dollar of ARR was 1 dollar of valuation, regardless of whether it was models, applications, or infrastructure. But now, that equivalence is broken.

Vertical agents have the lowest multiples (around 5x ARR), general agents are higher (around 10x), and models are the highest (20-30x ARR, e.g., Anthropic with $30B ARR at an $800B valuation, 26.7x). A year ago, I thought applying a uniform multiple to ARR was enough; today, that’s completely wrong.

Orange Trees and the AI Assassination List

Silicon Valley is experiencing a deep crisis of confidence.

On this trip, I repeatedly heard friends seriously discussing the same thing: buying Bitcoin, building bunkers, installing bulletproof glass at home—they weren’t joking.

Recently, “orange trees” have become popular in Silicon Valley because their branches are covered with 4-inch spikes— anyone trying to climb over pays a price.

The Wall Street Journal even reported a $15 million “fortress mansion”: concrete planters with orange trees, behind which is a moat, then laser intrusion detection, a front door with 3-inch solid steel plates and 13 locks, and inside, a safe room with a 2,000-pound door. Even the landscape is designed as a defensive fortification.

Companies providing residential security for CEOs have seen their highest growth since 2003. Especially after the UNH CEO was shot dead on Manhattan streets, this trend accelerated sharply.

Then, the gunfire reached the homes of AI bigwigs.

On April 11, at 4 am, a 20-year-old wearing a Champion hoodie flew from Texas to California, carrying a kerosene can, and stood in front of Sam Alt

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
255.72K Popularity
#
CryptoMarketSeesVolatility
311.23K Popularity
#
rsETHAttackUpdate
108.83K Popularity
#
US-IranTalksStall
499.04K Popularity
#
ETHMemeCoinFLORKSurges
56.75K Popularity

Sitemap

Everyone's Token-Maxxing: An Arms Race No One Dares to Stop

All-in token-maxxing: an unstoppable arms race

xAI’s avalanche, and the start of model-building by rocket engineers

Anxious engineers, even more anxious researchers

Two valuations per funding round: Nvidia’s chips are always in play

Silicon Valley’s valuation system is being rewritten

Orange Trees and the AI Assassination List

Trending Topics

WCTCTradingKingPK

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

ETHMemeCoinFLORKSurges

Pin