Those small-town youths labeling AI large models

BlockBeatNews · 2026-04-07T04:35:35+00:00

> Text | Sleepy.md Datong, Shanxi, a city that once supported half the country with coal, is now shaking off the coal dust, replacing its pickaxe with a sharp new tool, and smashing into another invisible mine. Inside the office building at Jinmao International Center in Pingcheng District, there are no more elevator shafts or coal transport trucks. Instead, there are thousands of closely packed computer workstations. The Shanghai Runxun Cloud Voice Valley Big Data Smart Service Base occupies several floors, with thousands of young employees wearing headsets, staring at screens, clicking, dragging, and selecting. According to official data, by November 2025, Datong has deployed 745k servers, introduced 69 call annotation data companies, created over 30k local jobs, and generated an output value of 750 million yuan. In this digital mine, 94% of the workers are local residents.

BlockBeatNews

2026-04-07 04:35:35

Section | Sleepy.md

Datong, in Shanxi Province—a city that once propped up half of its fortunes on coal—is now shaking off the coal dust clinging to it. With a sharper pickaxe in hand, it strikes down hard at another invisible mine.

Inside the office buildings at the Jinyin International Center in Pingcheng District, there are no longer any hoisting shafts, and no more coal-hauling trucks. In their place are thousands of tightly arranged computer stations. The Shanghai Runxun Cloud Zhong Shenggu Data Intelligence Service Base occupies several full floors. Thousands of young employees wearing headsets are staring at their screens, clicking, dragging, and selecting boxes.

According to official data, as of November 2025, Datong City has already put 745k servers into operation, brought in 69 call-labeling data enterprises, and has created more than 30k local nearby jobs. Output value totals 750 million yuan. In this digital quarry, 94% of people employed are local household residents.

It’s not just Datong. Among the first batch of data-labeling bases confirmed by the National Data Bureau, middle-and-western county towns such as Yonghe County in Shanxi, Bijie in Guizhou, and Mengzi in Yunnan are clearly on the list. In Yonghe County’s data-labeling base, 80% of the employees are women. Most of them are rural stay-at-home mothers, or young people who returned home because they couldn’t find suitable work.

A hundred years ago, in a textile mill in Manchester, England, it was packed with farmers who had lost their land. But today, in front of computer screens in these remote county towns, young people who can’t find a place in the real economy are packed in instead.

They’re doing a kind of piece-rate work that feels intensely futuristic yet is extremely primitive—producing the data feed that artificial intelligence giants in Beijing, Shenzhen, and Silicon Valley need to train large models.

Nobody thinks there’s anything wrong with that.

New production lines on the Loess Plateau

The essence of data labeling is to teach machines to recognize the world.

Autonomous driving needs to identify traffic lights and pedestrians; large models need to distinguish what is a cat and what is a dog. Machines themselves have no common sense. Humans must first draw a box on an image, telling it, “This is a pedestrian.” Only after it has consumed tens of millions of images will it learn to recognize on its own.

This job doesn’t require a high education level—only patience, and a single finger that can keep clicking nonstop.

In the golden age of 2017, a simple 2D box could cost more than a tenth of a yuan—up to a bit more. Even some companies offered a high price of 0.5 yuan. Fast-handed labelers could work ten-plus hours a day and earn five or six hundred yuan. In a county town, this was absolutely a high-paying, respectable job.

But as large models evolved, the brutal side of this assembly line started to show.

By 2023, the unit price for simple image labeling had already been smashed down to 3 to 4 fen. The drop exceeded 90%. Even for 3D point-cloud images, which are more difficult—images made of dense points that must be magnified countless times to see the edges clearly—labelers still have to draw a 3D box in the three-dimensional space that includes length, width, height, and a rotation angle, to tightly, perfectly encapsulate vehicles or pedestrians. And even that complex 3D box is only worth 5 fen.

The direct consequence of the unit price collapse is a surge in labor intensity. In order to tightly hold on to the monthly base pay of just two or three thousand yuan, labelers have to constantly, without stopping, improve their hand speed.

This is absolutely not a relaxed white-collar job. In many labeling bases, management is so strict it’s suffocating. During work, you’re not allowed to answer calls; your phone must be locked in a storage locker. The system precisely records each employee’s mouse trajectory and dwell time. If you stop for more than three minutes, warnings from the back end come for you like whips.

Even more unbearable is the tolerance rate. The industry’s passing line is usually above 95%. Some companies even require 98%–99%. That means: if you draw 100 boxes and get 2 wrong, the entire image will be rejected and sent back for rework.

For videos, frames are connected. When a vehicle changes lanes, it will be occluded, and labelers have to use inference to find them one by one. In 3D point-cloud images, for any object with more than 10 points, you must draw a box. For a complex parking-space project, if the lines are drawn long or boxes are missed, quality checks will always catch something wrong. Having to rework an image four or five times is commonplace. In the end, after counting it all, after spending about an hour, you only get a few dozen fen.

A labeler in Hunan posted her settlement slip on a social platform. After a day’s work, she drew more than 700 boxes. Her unit price was 4 fen, and her total income was 30.2 yuan.

It’s an extremely split picture.

On one side: tech tycoons who look dazzling at press conferences, talking about how AGI will liberate humanity. On the other side: in county towns on the Loess Plateau and in the southwestern mountains, young people stare at screens for eight to ten hours a day, mechanically drawing boxes—thousands, tens of thousands—even at night dreaming, with their fingers in midair drawing lane lines.

Someone once said that AI’s appearance is a roaring luxury car—but when you open the door, you find that inside, a hundred people are riding bicycles, clenching their teeth and pedaling like crazy.

Nobody thinks there’s anything wrong with that.

Piece-rate labor to teach machines “how to love”

Once the bottleneck of image recognition is broken through, large models enter an even deeper stage of evolution—they need to learn to think, to converse, and even to show “empathy,” like humans.

This gives rise to the most core—and most expensive—part of large-model training: RLHF (reinforcement learning from human feedback).

Put simply, it means letting real people rate AI-generated answers, telling it which answer is better and more in line with human values and emotional preferences.

The reason ChatGPT looks “human” is because behind it, countless RLHF labelers are teaching it lessons.

On crowdsourcing platforms, these labeling tasks are often priced openly: 3 to 7 yuan per item. Labelers need to give extremely subjective emotional ratings to the AI’s responses—to judge whether a response is “warm,” whether it “shows empathy,” and whether it “takes care of the user’s feelings.”

A bottom-level worker who holds a salary of a couple thousand yuan a month, who is worn out running for survival in the mud of real life—maybe even with no time to care for their own emotions—yet has to serve as the AI’s emotional mentor and values judge inside the system.

They have to forcibly tear apart these extremely complex and subtle human emotions—warmth and empathy—and then quantify them into cold scores from 1 to 5. If their ratings don’t match the system’s preset “correct answers,” they will be judged as not meeting the accuracy standard, and their piece-rate pay—already meager—will be deducted.

It’s a kind of cognition being emptied out. Complex and delicate human emotions, morality, and compassion are being dragged into the algorithm’s funnel by force. Inside cold quantized scores and standardized scales, they’re squeezed dry of the last bit of warmth. When you marvel that the cyber beast on the screen has already learned to write poetry and music, to ask after people’s wellbeing, and even to put on the skin of melancholy—outside the screen, that group of originally vivid humans degrades into emotionless rating machines, caught in endless mechanical judgments day after day.

This is the most hidden side of the entire industry chain. It never appears in any financing news or in technical white papers.

Nobody thinks there’s anything wrong with that.

Master’s degrees from 985 universities and kids from small towns

Bottom-level box-drawing work is being crushed by AI’s tracks; this cyber assembly line is beginning to spread upward, starting to devour higher-level mental labor.

The appetite of large models has changed. It no longer satisfies itself with chewing up simple common sense—it needs to consume human professional knowledge and high-level logic.

On major recruitment platforms, a special kind of part-time work has started to flash frequently, such as “large-model logic reasoning labeling” and “AI humanities training instructor.” The bar for this kind of gig is very high. It often requires “a 985/211 master’s degree or above,” and it covers professional fields including law, medicine, philosophy, and literature.

Many graduate students from prestigious schools are attracted and flow into these outsourced groups at major companies. But they quickly realize that this isn’t a casual mental workout at all—it’s a psychological torture.

Before taking official orders, they must read long documents of dozens of pages covering rating dimensions and evaluation standards, and complete two to three rounds of trial labeling. After passing, during official labeling, if the accuracy rate is lower than the average level, they lose eligibility and get kicked out of the group chat.

Most suffocating of all is that these standards aren’t even fixed. When faced with similar questions and answers, grading with the same way of thinking can still produce completely opposite results. It’s like taking an exam you can never finish and that has no standard answers in the first place. You can’t improve your accuracy through self-effort or learning—you can only keep circling in place, burning mental and physical energy.

This is the new form of exploitation in the era of large models—class collapse.

Knowledge, once regarded as the golden ladder for breaking barriers and climbing upward, has now fallen into becoming digital fodder offered to algorithms—fodder that is even more complex to chew. Under the absolute power of algorithms and systems, the 985 master’s degree holders in ivory towers and the kids from small towns on the Loess Plateau have arrived at the most bizarre same destination via different routes.

They fall together into this bottomless cyber mine, their halos stripped away, their differences leveled, and they’re all turned into cheap gears on the tracks that can be replaced at any time.

It’s the same abroad. In 2024, Apple directly cut an AI voice labeling team of 121 people in Santiago. These employees were responsible for improving Siri’s multilingual processing. They once thought they were standing at the edge of a major company’s core business—until they were instantly plunged into the abyss of unemployment.

In the eyes of technology giants, whether it’s the box-drawing aunt in a county town or the logic training instructor who graduated from a top university, at essence, they are “consumables” that can be replaced at any moment.

Nobody thinks there’s anything wrong with that.

A trillion-dollar Babel tower, paved with blood sweat worth a few cents

According to data released by China’s Academy of Information and Communications Technology, in 2023, China’s data-labeling market size reached 6.08 billion yuan. In 2025, it is expected to be 20 to 30 billion yuan. Forecasts further suggest that by 2030, global sales for data labeling and services will skyrocket to 117.1 billion yuan.

Behind these numbers is a valuation frenzy by tech giants such as OpenAI, Microsoft, and ByteDance—each worth trillions of yuan, or even tens of billions of dollars at the drop of a hat.

But none of this staggering wealth flows to the people who truly “feed” AI.

China’s data-labeling industry shows a typical inverted-pyramid outsourcing structure. At the very top are tech giants tightly holding the core algorithms. The second layer is large data service providers. The third layer is data-labeling bases spread across the country and small-to-mid-sized outsourcing companies. Only at the very bottom are the labeling “clods” who get paid piece-rate.

For each outsourcing layer, a whole layer of oil and moisture is scraped away hard. When the big companies set unit prices at 0.5 yuan, after layers of extraction, what ends up in the hands of county-town labelers might be less than even 5 fen.

In his book “Technological Feudalism,” Yanis Varoufakis, former Greek finance minister, put forward a penetrating view: today’s tech giants are no longer capitalists in the traditional sense, but “cloud lords” (Cloudalists).

What they hold isn’t factories and machines, but algorithms, platforms, and computing power—these are digital territories in the cyber age. In this new feudal system, users are not consumers; they are digital tenants. Every like, comment, and browse on our social media is freely supplying data to the cloud lords.

And those data labelers distributed in lower-tier markets are the lowest-level digital serfs in this system. They not only have to produce data, but also clean, categorize, and score massive amounts of raw data, turning it into high-quality feed that large models can digest.

This is an outbreak of a secretive land-grabbing campaign in cognition. Just like the enclosure movement in 19th-century England drove farmers into textile mills, today’s AI wave has driven those young people who can’t find a place in the real economy to the front of screens.

AI has not erased the class divide. Instead, it has built a “data and sweat conveyor belt” running from county towns in China’s central and western regions straight to the headquarters of tech giants in Beijing, Shanghai, Guangzhou, Shenzhen, and beyond. The narrative of technological revolutions is always grand and dazzling, but its underlying color is always the mass consumption of cheap labor.

Nobody thinks there’s anything wrong with that.

No longer needed: humanity’s tomorrow

The most brutal ending is coming fast—faster and faster.

As large-model capabilities leap forward, labeling tasks that once required humans to work day and night are being taken over by AI itself.

In April 2023, Li Xiang, founder of Ideal Auto, revealed data in a forum: in the past, Ideal needed to manually label roughly 10 million frames of autonomous driving images each year, and the outsourced cost was close to 745k yuan. But when they used large models for automated labeling, what used to take a year to do could be completed in basically three hours.

Efficiency is 1,000 times that of humans—and it was already in 2023. In the just-passed March, Ideal also released a new-generation MindVLA-o1 automatic labeling engine.

In the industry, there’s a self-mocking line that is incredibly true: “How much intelligence you have, you have as much labor.” But now, the investment by big companies in data-labeling outsourcing has already shown a cliff-like drop of 40%–50%.

Those small-town youths who sat in front of computers for countless day and night, burning their eyes red, personally fed a giant beast. And now this beast is turning around and smashing their job bowls to pieces.

As night falls, the office buildings in Pingcheng District, Datong remain as glaringly white as day. The young people doing shift handovers silently exchange exhausted bodies in the elevator area. In this folded space tightly sealed off by countless polygonal boxes, nobody cares what kind of epic leap the Transformer architecture across the ocean has made, and nobody understands the roaring of the computing power behind hundreds of billions of parameters.

Their attention is only welded to that red-green progress bar in the back end that stands for the “passing line,” calculating whether those few points and few cents of piece-rate numbers can be cobbled into a decent life by the end of the month.

On one side: the Nasdaq bell-ringing sound and endless coverage from tech media, while the giants raise cups in celebration of the arrival of AGI. On the other side: those digital serfs who grew AI by feeding it mouthful by mouthful with their own flesh and blood can only tremble in painful sleep, waiting for the giant beast they themselves raised—on some seemingly ordinary morning—to casually kick away their job bowls.

Nobody thinks there’s anything wrong with that.

Click to learn more: Lydong BlockBeats is recruiting for positions

Welcome to join the official Lydong BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group chat: https://t.me/BlockBeats_App

Twitter official account: https://twitter.com/BlockBeatsAsia

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.