17
2025/11
Meta FAIR 推出 HERO:LLM 强化中集成稀疏与密集奖励
论文标题:Hybrid Reinforcement: When Reward Is Sparse, It’s Better to Be Dense
...