Tag: Reinforcement Learning

All the papers with the tag "Reinforcement Learning".

Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning
grok-3-latest
Score: 0.55
Published:2025年5月4日 at 19:51
#Foundation Model, #Reinforcement Learning, #Prompt Response, #Student-Teacher Learning, #Memory Augmentation
本文提出记忆增强的学生-教师学习框架，将基础模型的提示响应能力与强化学习的灵巧控制能力结合，实现了在杂乱场景中基于提示的目标物体拾取。
DeepCritic: Deliberate Critique with Large Language Models
grok-3-latest
Score: 0.72
Published:2025年5月1日 at 17:03
#LLM, #Critique Model, #Mathematical Reasoning, #Supervised Fine-Tuning, #Reinforcement Learning
本文提出 DeepCritic 框架，通过两阶段训练（监督微调与强化学习）显著提升大型语言模型在数学推理任务中的批判能力，为自动化监督和模型自我改进铺平道路。
MULE: Multi-terrain and Unknown Load Adaptation for Effective Quadrupedal Locomotion
grok-3-latest
Score: 0.28
Published:2025年5月1日 at 12:41
#Reinforcement Learning, #Adaptive Control, #Quadrupedal Locomotion, #Payload Adaptation, #Terrain Adaptation
本文提出一种基于强化学习的自适应控制框架，通过名义策略和自适应策略的协同工作，使四足机器人在未知负载和多样化地形下实现鲁棒运动控制，并在模拟与硬件实验中验证了其优越性。
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
grok-3-latest
Score: 0.57
Published:2025年4月30日 at 16:57
#LLM, #Formal Proof, #Subgoal Decomposition, #Reinforcement Learning, #Reasoning
本文提出了一种基于子目标分解和强化学习的训练框架，显著提升了大型语言模型在形式化定理证明中的性能，并在多个基准数据集上取得了最先进的成果。
Q-function Decomposition with Intervention Semantics with Factored Action Spaces
grok-3-latest
Score: 0.43
Published:2025年4月30日 at 05:26
#Reinforcement Learning, #Factored Action Spaces, #Q-function Decomposition, #Sample Efficiency, #Causal Inference
本文提出了一种基于因果干预语义的 Q 函数分解方法，通过投影动作空间和数据增强显著提高大规模分解动作空间中强化学习的样本效率，并在在线和离线环境中均取得优于基线的结果。
Phi-4-reasoning Technical Report
grok-3-latest
Score: 0.73
Published:2025年4月30日 at 05:05
#LLM, #Reasoning, #Supervised Fine-Tuning, #Reinforcement Learning, #Inference Scaling
本文通过监督微调和强化学习，基于 14B 参数的 Phi-4 模型开发出 Phi-4-reasoning 和 Phi-4-reasoning-plus，显著提升复杂推理任务性能并展现出与更大规模模型的竞争力。

Tag: Reinforcement Learning

Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning

DeepCritic: Deliberate Critique with Large Language Models

MULE: Multi-terrain and Unknown Load Adaptation for Effective Quadrupedal Locomotion

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Q-function Decomposition with Intervention Semantics with Factored Action Spaces

Phi-4-reasoning Technical Report