Daily Paper Machine

Tag: Preference Optimization

All the papers with the tag "Preference Optimization".

LookAlike: Consistent Distractor Generation in Math MCQs
grok-3-latest
Score: 0.68
Published:2025年5月3日 at 19:18
#LLM, #Preference Optimization, #Synthetic Data, #Consistency, #Education
本文提出LOOK A LIKE方法，通过合成偏好对挖掘和交替优化策略，显著提高了数学多选题中干扰项和错误描述生成的一致性，超越了现有最先进方法。
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
grok-3-latest
Score: 0.86
Published:2025年4月30日 at 18:48
#LLM, #Reasoning, #Chain of Thought, #Sampling, #Preference Optimization
本文通过实证分析揭示了大型语言模型推理长度与正确性之间的非线性关系，并通过长度偏好优化实验提出了一种在无监督条件下减少生成长度的有效方法，为自适应推理研究提供了新视角。