31
2026/01
GPT 之父 Alec Radford 新作:通过 Token 级数据过滤实现比 RLHF 更稳健的拒绝边界
论文标题:Shaping capabilities with token-level data filtering
论文链接:https://arx
...
美团 LongCat 技术报告解读:在高稀疏度下,Embedding 扩展优于专家扩展
论文标题:Scaling Embeddings Outperforms Scaling Experts in Language Models
论文链
...
阿里云提出 DASD 分布对齐的序列蒸馏:实现更优的长链思维推理
论文标题:Distribution-Aligned Sequence Distillation for Superior Long-CoT Reason
...
DeepSeek-OCR 2: Visual Causal Flow 技术报告解读
论文标题:DeepSeek-OCR 2: Visual Causal Flow
论文链接:https://github.com/deepseek-a
...
美团 LongCat-Flash-Thinking-2601 技术报告解读
论文标题:LongCat-Flash-Thinking-2601 Technical Report
论文链接:https://arxiv.org/p
...
Agentic Reasoning for Large Language Models 综述:基础、进化与协作
论文标题:Agentic Reasoning for Large Language Models: Foundations, Evolution, Co
...
JudgeRLVR:先判断后生成——打破推理模型“长思维链”的效率悖论
论文标题:JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
论文链接:
...
你的 GRPO 的优势估计是有偏差的:GRPO 中的统计陷阱与 HA-DW 修正方案
论文标题:Your Group-Relative Advantage Is Biased
论文链接:https://arxiv.org/pdf/26
...
Meta 提出 Dr.Zero:零数据训练的自进化 Search Agent
论文标题:Dr. Zero: Self-Evolving Search Agents without Training Data
论文链接:http
...
深度解析 Ministral 3:基于级联蒸馏的参数高效密集模型训练方法论
论文标题:Ministral 3
论文链接:https://arxiv.org/pdf/2601.08584
TL;DR
Mistral AI
...