31
2026/01

GPT 之父 Alec Radford 新作:通过 Token 级数据过滤实现比 RLHF 更稳健的拒绝边界

论文标题:Shaping capabilities with token-level data filtering 论文链接:https://arx ...

美团 LongCat 技术报告解读:在高稀疏度下,Embedding 扩展优于专家扩展

论文标题:Scaling Embeddings Outperforms Scaling Experts in Language Models 论文链 ...

阿里云提出 DASD 分布对齐的序列蒸馏:实现更优的长链思维推理

论文标题:Distribution-Aligned Sequence Distillation for Superior Long-CoT Reason ...

DeepSeek-OCR 2: Visual Causal Flow 技术报告解读

论文标题:DeepSeek-OCR 2: Visual Causal Flow 论文链接:https://github.com/deepseek-a ...

美团 LongCat-Flash-Thinking-2601 技术报告解读

论文标题:LongCat-Flash-Thinking-2601 Technical Report 论文链接:https://arxiv.org/p ...

Agentic Reasoning for Large Language Models 综述:基础、进化与协作

论文标题:Agentic Reasoning for Large Language Models: Foundations, Evolution, Co ...

JudgeRLVR:先判断后生成——打破推理模型“长思维链”的效率悖论

论文标题:JudgeRLVR: Judge First, Generate Second for Efficient Reasoning 论文链接: ...

你的 GRPO 的优势估计是有偏差的:GRPO 中的统计陷阱与 HA-DW 修正方案

论文标题:Your Group-Relative Advantage Is Biased 论文链接:https://arxiv.org/pdf/26 ...

Meta 提出 Dr.Zero:零数据训练的自进化 Search Agent

论文标题:Dr. Zero: Self-Evolving Search Agents without Training Data 论文链接:http ...

深度解析 Ministral 3:基于级联蒸馏的参数高效密集模型训练方法论

论文标题:Ministral 3 论文链接:https://arxiv.org/pdf/2601.08584 TL;DR Mistral AI ...