Tag: Formal Reasoning
All the papers with the tag "Formal Reasoning".
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
grok-3-latestScore: 0.55Published: at 15:37本文提出 FormalMATH,一个包含 5560 个形式化数学问题的 Lean4 基准测试,通过高效的‘人在回路中’自动化形式化流程构建,并揭示了当前大型语言模型在形式化推理中的显著局限性。