Tag: Reward Model
All the papers with the tag "Reward Model".
On the Robustness of Reward Models for Language Model Alignment
grok-3-latestScore: 0.63Published: at 06:48本文揭示了奖励模型过优化的根源在于隐藏状态范数分散,并提出批次和为零正则化(BSR)方法,显著提升了奖励模型的分布鲁棒性和 RLHF 对齐效果。
All the papers with the tag "Reward Model".
本文揭示了奖励模型过优化的根源在于隐藏状态范数分散,并提出批次和为零正则化(BSR)方法,显著提升了奖励模型的分布鲁棒性和 RLHF 对齐效果。