Daily Paper Machine

Tag: RLHF

All the papers with the tag "RLHF".

On the Robustness of Reward Models for Language Model Alignment
grok-3-latest
Score: 0.63
Published:2025年5月12日 at 06:48
#LLM, #Reward Model, #RLHF, #Over-Optimization, #Regularization
本文揭示了奖励模型过优化的根源在于隐藏状态范数分散，并提出批次和为零正则化（BSR）方法，显著提升了奖励模型的分布鲁棒性和 RLHF 对齐效果。
Scalable Chain of Thoughts via Elastic Reasoning
grok-3-latest
Score: 0.69
Published:2025年5月8日 at 15:01
#LLM, #Reasoning, #Test Time Scaling, #Post-Training, #RLHF
本文提出 Elastic Reasoning 框架，通过将推理分为思考和解决方案两阶段并结合预算约束训练，使大型推理模型在严格资源限制下仍能高效推理，同时降低训练成本并提升泛化能力。
RM-R1: Reward Modeling as Reasoning
grok-3-latest
Score: 0.83
Published:2025年5月5日 at 06:11
#LLM, #Reward Modeling, #Reasoning, #Distillation, #RLHF
本文提出 RM-R1 模型家族，通过将奖励建模转化为推理任务，利用蒸馏和强化学习显著提升了奖励模型的解释性和性能，超越了更大规模的开源和商业模型。