Tag: Reinforcement Learning
All the papers with the tag "Reinforcement Learning".
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
grok-3-latestScore: 0.69Published: at 17:23本文通过ZeroTIR框架,揭示了Agent RL Scaling Law,验证了基础LLM可通过强化学习自发学习代码执行工具,显著提升数学推理能力。
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
grok-3-latestScore: 0.60Published: at 17:59本文提出 EchoInk-R1 框架,通过 Group Relative Policy Optimization 强化学习显著提升多模态大语言模型在音频-视觉推理任务上的性能,首次实现音频、视觉和文本模态的统一开放世界推理。
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
grok-3-latestScore: 0.66Published: at 17:18本文提出Reward Neutralization框架,通过训练模型生成最小信息拒绝来中和恶意RL微调的奖励信号,显著提升开源模型在攻击下的安全性。
Frog Soup: Zero-Shot, In-Context, and Sample-Efficient Frogger Agents
grok-3-latestScore: 0.61Published: at 19:51本文展示了推理型 LLM 在零样本 Atari 游戏 Frogger 中的潜力,并通过 LLM 示范数据提升传统 DQN 智能体 35.3% 的样本效率。
The Steganographic Potentials of Language Models
grok-3-latestScore: 0.76Published: at 11:25本文通过强化学习微调和提示实验,首次系统性量化了大型语言模型的隐写潜力,揭示其在特定场景下隐藏非琐碎有效载荷的能力,并指出了对 AI 对齐和监控的潜在风险。
Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents
grok-3-latestScore: 0.77Published: at 11:18本文提出一种模块化架构,通过分离程序性、语义和关联功能,弥补大型语言模型在复杂‘恶劣’学习环境中的认知缺陷,为构建适应性AI代理提供了理论基础。