Tag: Reasoning

All the papers with the tag "Reasoning".

Structured Prompting and Feedback-Guided Reasoning with LLMs for Data Interpretation
grok-3-latest
Score: 0.65
Published:2025年5月3日 at 00:05
#LLM, #Structured Data, #Prompt Engineering, #Feedback Loop, #Reasoning
本文提出 STROT 框架，通过结构化提示和反馈驱动的推理机制，显著提升大型语言模型在结构化数据分析中的可靠性、解释性和稳定性。
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
grok-3-mini-latest
Score: 0.78
Published:2025年5月1日 at 10:25
#LLM, #Proxy Model, #Distillation, #Sampling, #Reasoning
本文提出WebThinker框架，通过Deep Web Explorer和Autonomous Think-Search-and-Draft策略增强LRMs的网页搜索与报告生成能力，并利用RL-based训练优化工具交互，实现显著的复杂任务性能提升。
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
grok-3-mini-latest
Score: 0.79
Published:2025年5月1日 at 10:25
#LLM, #Explainable AI, #Jailbreaking, #Sampling, #Reasoning
本文提出XBreaking方法，利用Explainable AI分析审查和非审查LLMs的内部模式，识别关键层并通过噪声注入绕过安全机制，同时保持模型功能。
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
grok-3-latest
Score: 0.86
Published:2025年4月30日 at 18:48
#LLM, #Reasoning, #Chain of Thought, #Sampling, #Preference Optimization
本文通过实证分析揭示了大型语言模型推理长度与正确性之间的非线性关系，并通过长度偏好优化实验提出了一种在无监督条件下减少生成长度的有效方法，为自适应推理研究提供了新视角。
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
grok-3-latest
Score: 0.57
Published:2025年4月30日 at 16:57
#LLM, #Formal Proof, #Subgoal Decomposition, #Reinforcement Learning, #Reasoning
本文提出了一种基于子目标分解和强化学习的训练框架，显著提升了大型语言模型在形式化定理证明中的性能，并在多个基准数据集上取得了最先进的成果。
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
grok-3-latest
Score: 0.74
Published:2025年4月30日 at 14:01
#LLM, #Reasoning, #Efficiency Optimization, #Model Merging, #Preference Training
本文提出 AdaR1 框架，通过模型融合和双层偏好训练实现自适应推理，显著降低大型语言模型推理成本（平均长度减少超 50%）同时保持高性能。

Tag: Reasoning

Structured Prompting and Feedback-Guided Reasoning with LLMs for Data Interpretation

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization