23

2025/12

从 0.5B 到 72B：揭秘 RL Post-Training 中的计算、数据与模型规模权衡

论文标题：Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empir ...

3 月前

398 1