Daily Paper Machine

Tag: Robustness

All the papers with the tag "Robustness".

How well do LLMs reason over tabular data, really?
grok-3-latest
Score: 0.70
Published:2025年5月12日 at 11:35
#LLM, #Tabular Data, #Reasoning, #Evaluation Metrics, #Robustness
本文通过提出 LLM-as-a-judge 评估方法和引入现实世界表格特性，揭示了大型语言模型在表格推理上的显著性能不足，并强调了提升鲁棒性的必要性。