31

2026/01

GPT 之父 Alec Radford 新作：通过 Token 级数据过滤实现比 RLHF 更稳健的拒绝边界

论文标题：Shaping capabilities with token-level data filtering 论文链接：https://arx ...

1 天前

15 0

美团 LongCat 技术报告解读：在高稀疏度下，Embedding 扩展优于专家扩展

论文标题：Scaling Embeddings Outperforms Scaling Experts in Language Models 论文链 ...

2 天前

32 0

阿里云提出 DASD 分布对齐的序列蒸馏：实现更优的长链思维推理

论文标题：Distribution-Aligned Sequence Distillation for Superior Long-CoT Reason ...

3 天前

30 0

DeepSeek-OCR 2: Visual Causal Flow 技术报告解读

论文标题：DeepSeek-OCR 2: Visual Causal Flow 论文链接：https://github.com/deepseek-a ...

4 天前

75 0

美团 LongCat-Flash-Thinking-2601 技术报告解读

论文标题：LongCat-Flash-Thinking-2601 Technical Report 论文链接：https://arxiv.org/p ...

6 天前

74 0

Agentic Reasoning for Large Language Models 综述：基础、进化与协作

论文标题：Agentic Reasoning for Large Language Models: Foundations, Evolution, Co ...

1 周前

122 0

JudgeRLVR：先判断后生成——打破推理模型“长思维链”的效率悖论

论文标题：JudgeRLVR: Judge First, Generate Second for Efficient Reasoning 论文链接： ...

1 周前

86 0

你的 GRPO 的优势估计是有偏差的：GRPO 中的统计陷阱与 HA-DW 修正方案

论文标题：Your Group-Relative Advantage Is Biased 论文链接：https://arxiv.org/pdf/26 ...

2 周前

97 0

Meta 提出 Dr.Zero：零数据训练的自进化 Search Agent

论文标题：Dr. Zero: Self-Evolving Search Agents without Training Data 论文链接：http ...

2 周前

114 0

深度解析 Ministral 3：基于级联蒸馏的参数高效密集模型训练方法论

论文标题：Ministral 3 论文链接：https://arxiv.org/pdf/2601.08584 TL;DR Mistral AI ...

2 周前

95 0