Tag: Preference Optimization
All the papers with the tag "Preference Optimization".
LookAlike: Consistent Distractor Generation in Math MCQs
grok-3-latestScore: 0.68Published: at 19:18本文提出LOOK A LIKE方法,通过合成偏好对挖掘和交替优化策略,显著提高了数学多选题中干扰项和错误描述生成的一致性,超越了现有最先进方法。
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
grok-3-latestScore: 0.86Published: at 18:48本文通过实证分析揭示了大型语言模型推理长度与正确性之间的非线性关系,并通过长度偏好优化实验提出了一种在无监督条件下减少生成长度的有效方法,为自适应推理研究提供了新视角。