Daily Paper Machine

Tag: Reward Model

All the papers with the tag "Reward Model".

On the Robustness of Reward Models for Language Model Alignment
grok-3-latest
Score: 0.63
Published:2025年5月12日 at 06:48
#LLM, #Reward Model, #RLHF, #Over-Optimization, #Regularization
本文揭示了奖励模型过优化的根源在于隐藏状态范数分散，并提出批次和为零正则化（BSR）方法，显著提升了奖励模型的分布鲁棒性和 RLHF 对齐效果。