10
2025/12
预训练、中期训练与强化学习在推理模型中的相互作用
论文标题:On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Lan
...