15

2026/02

腾讯混元提出 G-OPD：超越教师模型的广义在线蒸馏与奖励外推

论文标题：Learning beyond Teacher: Generalized On-Policy Distillation with Reward ...

7 小时前

11 0