23
2025/12
从 0.5B 到 72B:揭秘 RL Post-Training 中的计算、数据与模型规模权衡
论文标题:Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empir
...