21
2025/11
陈丹琦团队新作 Retaining by Doing:揭示 RL 比 SFT 为什么更能缓解灾难性遗忘
论文标题:Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
...