Tag: Reasoning
All the papers with the tag "Reasoning".
How well do LLMs reason over tabular data, really?
grok-3-latestScore: 0.70Published: at 11:35本文通过提出 LLM-as-a-judge 评估方法和引入现实世界表格特性,揭示了大型语言模型在表格推理上的显著性能不足,并强调了提升鲁棒性的必要性。
Architectural Precedents for General Agents using Large Language Models
grok-3-latestScore: 0.54Published: at 18:29本文通过提炼传统认知架构中的认知设计模式,为基于大型语言模型的代理系统提供理论分析框架,识别其局限性并预测未来研究方向,以推动通用智能的实现。
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering
grok-3-latestScore: 0.57Published: at 12:32本文通过构建专家标注数据集评估大型语言模型在离子液体碳捕获研究中的推理能力,揭示其领域特定推理的局限性并提出未来改进方向。
Crosslingual Reasoning through Test-Time Scaling
grok-3-latestScore: 0.89Published: at 16:50本文通过测试时计算扩展,揭示了英语中心推理模型在多语言数学推理中的潜力,并分析了语言混合模式和跨领域泛化的局限,为多语言推理研究提供了重要基准。
Scalable Chain of Thoughts via Elastic Reasoning
grok-3-latestScore: 0.69Published: at 15:01本文提出 Elastic Reasoning 框架,通过将推理分为思考和解决方案两阶段并结合预算约束训练,使大型推理模型在严格资源限制下仍能高效推理,同时降低训练成本并提升泛化能力。
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
grok-3-latestScore: 0.70Published: at 01:40本文提出 ConCISE 框架,通过信心 引导的推理压缩方法,显著减少大型推理模型的推理链冗余,同时保持高准确率,为高效推理提供了新途径。