29
2025/11
Qwen 团队推出 SAPO,相较于 GRPO、GSPO 稳定且更优
论文标题:Soft Adaptive Policy Optimization
论文链接:https://arxiv.org/pdf/2511.203
...