04
2026/02
Agentic RL 的新范式:在强化学习循环中融合规则、模型奖励与自然语言批评
论文标题:Exploring Reasoning Reward Model for Agents
论文链接:https://arxiv.org/pd
...