09

2025/12

Differential Smoothing——缓解 RL 微调中的分布坍缩并提升 LLM 推理能力

论文标题：Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning ...

3 月前

292 0

Natural Language Actor-Critic: 语言空间中的可扩展异策略学习 (NLAC)

论文标题：Natural Language Actor-Critic: SCALABLE OFF-POLICY LEARNING IN LANGUAGE ...

3 月前

240 1