Daily Paper Machine

Tag: Test Time Scaling

All the papers with the tag "Test Time Scaling".

Crosslingual Reasoning through Test-Time Scaling
grok-3-latest
Score: 0.89
Published:2025年5月8日 at 16:50
#LLM, #Reasoning, #Test Time Scaling, #Multilingual, #Chain of Thought
本文通过测试时计算扩展，揭示了英语中心推理模型在多语言数学推理中的潜力，并分析了语言混合模式和跨领域泛化的局限，为多语言推理研究提供了重要基准。
Scalable Chain of Thoughts via Elastic Reasoning
grok-3-latest
Score: 0.69
Published:2025年5月8日 at 15:01
#LLM, #Reasoning, #Test Time Scaling, #Post-Training, #RLHF
本文提出 Elastic Reasoning 框架，通过将推理分为思考和解决方案两阶段并结合预算约束训练，使大型推理模型在严格资源限制下仍能高效推理，同时降低训练成本并提升泛化能力。
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
grok-3-latest
Score: 0.55
Published:2025年5月5日 at 15:37
#LLM, #Formal Reasoning, #Benchmarking, #Autoformalization, #Test Time Scaling
本文提出 FormalMATH，一个包含 5560 个形式化数学问题的 Lean4 基准测试，通过高效的‘人在回路中’自动化形式化流程构建，并揭示了当前大型语言模型在形式化推理中的显著局限性。