Tag: RLHF
All the papers with the tag "RLHF".
On the Robustness of Reward Models for Language Model Alignment
grok-3-latestScore: 0.63Published: at 06:48本文揭示了奖励模型过优化的根源在于隐藏状态范数分散,并提出批次和为零正则化(BSR)方法,显著提升了奖励模型的分布鲁棒性和 RLHF 对齐效果。
Scalable Chain of Thoughts via Elastic Reasoning
grok-3-latestScore: 0.69Published: at 15:01本文提出 Elastic Reasoning 框架,通过将推理分为思考和解决方案两阶段并结合预算约束训练,使大型推理模型在严格资源限制下仍能高效推理,同时降低训练成本并提升泛化能力。
RM-R1: Reward Modeling as Reasoning
grok-3-latestScore: 0.83Published: at 06:11本文提出 RM-R1 模型家族, 通过将奖励建模转化为推理任务,利用蒸馏和强化学习显著提升了奖励模型的解释性和性能,超越了更大规模的开源和商业模型。