Tag: Sampling

All the papers with the tag "Sampling".

Soft Best-of-n Sampling for Model Alignment
grok-3-latest
Score: 0.71
Published:2025年5月6日 at 04:03
#LLM, #Model Alignment, #Sampling, #Reward Optimization, #KL Divergence
本文提出 Soft Best-of-n 采样方法，通过温度参数 λ 实现奖励优化与分布相似性的平滑权衡，并在理论上证明其以 O(1/n) 速率逼近最优倾斜分布，为大型语言模型对齐提供了一种高效且灵活的推理时策略。
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
grok-3-latest
Score: 0.58
Published:2025年5月5日 at 06:26
#LLM, #Chain of Thought, #Sampling, #Variance Minimization, #Reinforcement Learning
本文提出GVM-RAFT方法，通过动态采样分配策略最小化梯度方差，显著提升大型语言模型在链式思维推理任务中的训练效率和性能。
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
grok-3-mini-latest
Score: 0.78
Published:2025年5月1日 at 10:25
#LLM, #Proxy Model, #Distillation, #Sampling, #Reasoning
本文提出WebThinker框架，通过Deep Web Explorer和Autonomous Think-Search-and-Draft策略增强LRMs的网页搜索与报告生成能力，并利用RL-based训练优化工具交互，实现显著的复杂任务性能提升。
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
grok-3-mini-latest
Score: 0.79
Published:2025年5月1日 at 10:25
#LLM, #Explainable AI, #Jailbreaking, #Sampling, #Reasoning
本文提出XBreaking方法，利用Explainable AI分析审查和非审查LLMs的内部模式，识别关键层并通过噪声注入绕过安全机制，同时保持模型功能。
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
grok-3-latest
Score: 0.86
Published:2025年4月30日 at 18:48
#LLM, #Reasoning, #Chain of Thought, #Sampling, #Preference Optimization
本文通过实证分析揭示了大型语言模型推理长度与正确性之间的非线性关系，并通过长度偏好优化实验提出了一种在无监督条件下减少生成长度的有效方法，为自适应推理研究提供了新视角。

Tag: Sampling

Soft Best-of-n Sampling for Model Alignment

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs