Claude's explosive popularity research leaked, revealing the achievements of a Chinese team; has been hit and has issued a formal apology.

BlockBeatNews

According to 1M AI News monitoring, on April 2, Anthropic released a new paper studying Claude’s internal “emotion mechanisms,” finding 171 “emotion vectors” in Sonnet 4.5. These emotions get activated in the contexts associated with them, and they are similar to human psychological structures and emotional spaces.

MBZUAI master’s student Chenxi Wang found that the team’s paper published in October 2025 (《Do LLMs “Feel” Emotions? Discovery and Control of Emotion Loops》) is actually the first work to systematically study the internal mechanisms through which large language models generate emotions. When she read Anthropic’s paper, her first reaction was, “Isn’t this what we did last year?” The key difference between the two was this: previously, most research focused on the model recognizing emotions in text (i.e., emotion perception), while both sides were studying how the model itself generates emotions (i.e., emotion generation/internal mechanisms). Anthropic’s communications author Jack Lindsey initially thought the two works overlapped with existing research, but after Chenxi Wang read through them one by one and pointed out the differences, he accepted this distinction. At present, Anthropic has updated its paper blog and, in the “Related Work” section, explicitly added a citation to this work, and the event was resolved in a relatively friendly way.

The Chinese team’s paper mentioned three key findings:

First, within large models, there truly exist stable emotion representations that are unrelated to specific semantics; different emotions begin to form clear groupings even in the shallow layers of the neural network—for example, anger and disgust are close, and sadness and fear are close—consistent with human intuition.

Second, these emotion mechanisms are dominated by a small number of core neurons and attention heads. Through ablation experiments, they found that by simply shutting down 2–4 neurons or 1–2 attention heads, the model’s ability to express emotions drops significantly.

Third, the team integrated these core components into a cross-layer “emotion loop.” Directly adjusting this loop allows the model to generate the specified emotion with an accuracy of 99.65%, far surpassing traditional methods of prompt guidance and vector manipulation. Even the “surprise” emotion, which was previously the hardest to control, achieved 100% accurate expression.

This mechanism has been validated across multiple models, including LLaMA and Qwen, proving that it is a general rule for large language models.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments