04
2026/01
论大语言模型强化学习训练中的 KL 正则化
论文标题:A COMEDY OF ESTIMATORS: ON KL REGULARIZATION IN RL TRAINING OF LLMS
论
...