Luo Fuli: Large Models Enter Post-Training Era, Top Teams Achieve 1:1 Compute Ratio for Pre-Training and Post-Training

According to monitoring by Dongcha Beating, Luo Fuli, head of Xiaomi’s large model team, pointed out that the competition in large models has shifted from the pre-training dominated Chat era to the post-training (Post-train) dominated Agent era. The current core competitive point is ‘how to effectively scale reinforcement learning (RL) on Agents.’ This paradigm shift has directly led to a restructuring of compute resource allocation. Luo revealed that during the Chat era, the compute ratio for research, pre-training, and post-training was approximately 3:5:1; whereas in today’s Agent era, a reasonable compute allocation ratio has become 3:1:1, indicating that the compute investment for pre-training and post-training is now nearly equal, with top model teams achieving a 1:1 ratio in their investments for these two areas. Additionally, the requirements for system architecture have undergone significant changes. Previously, RL infrastructure was primarily centered around ‘model inference engines’ that handled pure text calculations; now, the infrastructure must be centered around ‘Agents,’ supporting heterogeneous cluster scheduling and tolerating the ambiguity of Agents being interrupted in complex workflows due to various uncontrollable factors.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin