09
2025/12
Differential Smoothing——缓解 RL 微调中的分布坍缩并提升 LLM 推理能力
论文标题:Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning
...
Natural Language Actor-Critic: 语言空间中的可扩展异策略学习 (NLAC)
论文标题:Natural Language Actor-Critic: SCALABLE OFF-POLICY LEARNING IN LANGUAGE
...