27
2026/02
美团提出 DynaMO:面向RLVR的动态Rollout分配与优势调节策略
论文标题:How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage
...