Tag: LLM
All the papers with the tag "LLM".
How well do LLMs reason over tabular data, really?
grok-3-latestScore: 0.70Published: at 11:35本文通过提出 LLM-as-a-judge 评估方法和引入现实世界表格特性,揭示了大型语言模型在表格推理上的显著性能不足,并强调了提升鲁棒性的必要性。
On the Robustness of Reward Models for Language Model Alignment
grok-3-latestScore: 0.63Published: at 06:48本文揭示了奖励模型过优化的根源在于隐藏状态范数分散,并提出批次和为零正则化(BSR)方法,显著提升了奖励模型的分布鲁棒性和 RLHF 对齐效果。
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
grok-3-latestScore: 0.62Published: at 05:29本文提出 *Comet* 系统,通过预测激活稀疏性并设计高效私有推理协议,在保护隐私的同时显著加速大型语言模型的推理,实现了 1.87× 到 2.63× 的性能提升。
Architectural Precedents for General Agents using Large Language Models
grok-3-latestScore: 0.54Published: at 18:29本文通过提炼传统认知架构中的认知设计模式,为基于大型语言模型的代理系统提供理论分析框架,识别其局限性并预测未来研究方向,以推动通用智能的实现。
LLM-Augmented Chemical Synthesis and Design Decision Programs
grok-3-latestScore: 0.50Published: at 15:43本文提出 LLM-Syn-Planner 框架,通过直接生成并优化多步逆合成路径,显著提升大型语言模型在逆合成规划中的表现,并成功扩展到可合成分子设计问题。
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering
grok-3-latestScore: 0.57Published: at 12:32本文通过构建专家标注数据集评估大型语言模型在离子液体碳捕获研究中的推理能力,揭示其领域特定推理的局限性并提出未来改进方向。