Tag: Mathematical Reasoning
All the papers with the tag "Mathematical Reasoning".
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
grok-3-latestScore: 0.69Published: at 17:23本文通过ZeroTIR框架,揭示了Agent RL Scaling Law,验证了基础LLM可通过强化学习自发学习代码执行工具,显著提升数学推理能力。
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
grok-3-latestScore: 0.81Published: at 07:38本文通过系统性重写预训练数据,构建 SwallowCode 和 SwallowMath 数据集,显著提升了大型语言模型在代码生成和数学推理任务上的性能,提出了一种创新的‘改造并保留’数据处理范式。
DeepCritic: Deliberate Critique with Large Language Models
grok-3-latestScore: 0.72Published: at 17:03本文提出 DeepCritic 框架,通过两阶段训练(监督微调与强化学习)显著提升大型语言模型在数学推理任务中的批判能力,为自动化监督和模型自我改进铺平道路。