Tag: Reasoning

All the papers with the tag "Reasoning".

How well do LLMs reason over tabular data, really?
grok-3-latest
Score: 0.70
Published:2025年5月12日 at 11:35
#LLM, #Tabular Data, #Reasoning, #Evaluation Metrics, #Robustness
本文通过提出 LLM-as-a-judge 评估方法和引入现实世界表格特性，揭示了大型语言模型在表格推理上的显著性能不足，并强调了提升鲁棒性的必要性。
Architectural Precedents for General Agents using Large Language Models
grok-3-latest
Score: 0.54
Published:2025年5月11日 at 18:29
#LLM, #Cognitive Architecture, #Agentic Systems, #Reasoning, #Memory Design
本文通过提炼传统认知架构中的认知设计模式，为基于大型语言模型的代理系统提供理论分析框架，识别其局限性并预测未来研究方向，以推动通用智能的实现。
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering
grok-3-latest
Score: 0.57
Published:2025年5月11日 at 12:32
#LLM, #Reasoning, #Domain Knowledge, #Benchmarking, #Carbon Capture
本文通过构建专家标注数据集评估大型语言模型在离子液体碳捕获研究中的推理能力，揭示其领域特定推理的局限性并提出未来改进方向。
Crosslingual Reasoning through Test-Time Scaling
grok-3-latest
Score: 0.89
Published:2025年5月8日 at 16:50
#LLM, #Reasoning, #Test Time Scaling, #Multilingual, #Chain of Thought
本文通过测试时计算扩展，揭示了英语中心推理模型在多语言数学推理中的潜力，并分析了语言混合模式和跨领域泛化的局限，为多语言推理研究提供了重要基准。
Scalable Chain of Thoughts via Elastic Reasoning
grok-3-latest
Score: 0.69
Published:2025年5月8日 at 15:01
#LLM, #Reasoning, #Test Time Scaling, #Post-Training, #RLHF
本文提出 Elastic Reasoning 框架，通过将推理分为思考和解决方案两阶段并结合预算约束训练，使大型推理模型在严格资源限制下仍能高效推理，同时降低训练成本并提升泛化能力。
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
grok-3-latest
Score: 0.70
Published:2025年5月8日 at 01:40
#LLM, #Reasoning, #Compression, #Confidence Guidance
本文提出 ConCISE 框架，通过信心引导的推理压缩方法，显著减少大型推理模型的推理链冗余，同时保持高准确率，为高效推理提供了新途径。

Tag: Reasoning

How well do LLMs reason over tabular data, really?

Architectural Precedents for General Agents using Large Language Models

From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering

Crosslingual Reasoning through Test-Time Scaling

Scalable Chain of Thoughts via Elastic Reasoning

ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning