Tag: Reward Design
All the papers with the tag "Reward Design".
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
grok-3-latestScore: 0.66Published: at 17:18本文提出Reward Neutralization框架,通过训练模型生成最小信息拒绝来中和恶意RL微调的奖励信号,显著提升开源模型在攻击下的安全性。
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
grok-3-latestScore: 0.62Published: at 03:14本文系统综述了强化学习(RL)在多模态大语言模型(MLLMs)推理中的应用,分析了算法设计、奖励机制和应用场景,提出了当前局限和未来方向,为多模态推理研究提供了结构化指南。