15
2026/02
腾讯混元提出 G-OPD:超越教师模型的广义在线蒸馏与奖励外推
论文标题:Learning beyond Teacher: Generalized On-Policy Distillation with Reward
...