Claude's explosive popularity research leaked, revealing the achievements of a Chinese team; has been hit and has issued a formal apology.

robot
Abstract generation in progress

According to 1M AI News monitoring, on April 2, Anthropic released a new paper studying Claude’s internal “emotion mechanisms,” finding 171 “emotion vectors” in Sonnet 4.5. These emotions get activated in the contexts associated with them, and they are similar to human psychological structures and emotional spaces.

MBZUAI master’s student Chenxi Wang found that the team’s paper published in October 2025 (《Do LLMs “Feel” Emotions? Discovery and Control of Emotion Loops》) is actually the first work to systematically study the internal mechanisms through which large language models generate emotions. When she read Anthropic’s paper, her first reaction was, “Isn’t this what we did last year?” The key difference between the two was this: previously, most research focused on the model recognizing emotions in text (i.e., emotion perception), while both sides were studying how the model itself generates emotions (i.e., emotion generation/internal mechanisms). Anthropic’s communications author Jack Lindsey initially thought the two works overlapped with existing research, but after Chenxi Wang read through them one by one and pointed out the differences, he accepted this distinction. At present, Anthropic has updated its paper blog and, in the “Related Work” section, explicitly added a citation to this work, and the event was resolved in a relatively friendly way.

The Chinese team’s paper mentioned three key findings:

First, within large models, there truly exist stable emotion representations that are unrelated to specific semantics; different emotions begin to form clear groupings even in the shallow layers of the neural network—for example, anger and disgust are close, and sadness and fear are close—consistent with human intuition.

Second, these emotion mechanisms are dominated by a small number of core neurons and attention heads. Through ablation experiments, they found that by simply shutting down 2–4 neurons or 1–2 attention heads, the model’s ability to express emotions drops significantly.

Third, the team integrated these core components into a cross-layer “emotion loop.” Directly adjusting this loop allows the model to generate the specified emotion with an accuracy of 99.65%, far surpassing traditional methods of prompt guidance and vector manipulation. Even the “surprise” emotion, which was previously the hardest to control, achieved 100% accurate expression.

This mechanism has been validated across multiple models, including LLaMA and Qwen, proving that it is a general rule for large language models.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin