Tag: Reasoning

All the papers with the tag "Reasoning".

GroverGPT-2: Simulating Grover's Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization
grok-3-latest
Score: 0.52
Published:2025年5月8日 at 01:38
#LLM, #Quantum Simulation, #Tokenization, #Reasoning, #Scalability
本文提出GroverGPT-2，通过量子原生分词和思维链推理，利用大型语言模型高效模拟Grover量子搜索算法，展示了经典机器内化量子逻辑的潜力，为探索经典与量子计算边界提供了新工具。
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
grok-3-latest
Score: 0.48
Published:2025年5月6日 at 21:08
#LLM, #Reasoning, #Multimodal, #Post-Training, #Generalization
本文提出 X-REASONER，通过仅基于通用领域文本的两阶段后训练策略（SFT + RL），成功实现推理能力跨模态和跨领域泛化，并在多个通用和医学基准测试中超越现有 SOTA。
ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant
grok-3-latest
Score: 0.56
Published:2025年5月6日 at 16:00
#LLM, #Multimodal Learning, #Knowledge Graph, #Personalization, #Reasoning
本文提出 ReGraP-LLaVA 模型，通过知识图谱和思维链问答数据增强个性化多模态大语言模型的关系推理能力，显著提升了上下文理解和复杂任务表现。
AI-Driven Scholarly Peer Review via Persistent Workflow Prompting, Meta-Prompting, and Meta-Reasoning
grok-3-latest
Score: 0.48
Published:2025年5月6日 at 09:06
#LLM, #Prompt Engineering, #Workflow Design, #Reasoning, #Bias Mitigation
本文提出持久工作流程提示（PWP）方法，通过结构化提示库和元提示技术，指导大型语言模型完成复杂的学术同行评审任务，并在抑制输入偏见方面取得初步成功。
RM-R1: Reward Modeling as Reasoning
grok-3-latest
Score: 0.83
Published:2025年5月5日 at 06:11
#LLM, #Reward Modeling, #Reasoning, #Distillation, #RLHF
本文提出 RM-R1 模型家族，通过将奖励建模转化为推理任务，利用蒸馏和强化学习显著提升了奖励模型的解释性和性能，超越了更大规模的开源和商业模型。
RM-R1: Reward Modeling as Reasoning
grok-3-latest
Score: 0.83
Published:2025年5月5日 at 06:11
#LLM, #Reward Modeling, #Reasoning, #Distillation, #Reinforcement Learning
本文提出将奖励建模作为推理任务的范式，通过推理链蒸馏和强化学习训练 RM-R1 模型，显著提升了奖励模型的解释性和性能，超越了更大规模的开源和闭源模型。

Tag: Reasoning

GroverGPT-2: Simulating Grover's Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

AI-Driven Scholarly Peer Review via Persistent Workflow Prompting, Meta-Prompting, and Meta-Reasoning

RM-R1: Reward Modeling as Reasoning

RM-R1: Reward Modeling as Reasoning