Tag: Sampling
All the papers with the tag "Sampling".
Soft Best-of-n Sampling for Model Alignment
grok-3-latestScore: 0.71Published: at 04:03本文提出 Soft Best-of-n 采样方法,通过温度参数 λ 实现奖励优化与分布相似性的平滑权衡,并在理论上证明其以 O(1/n) 速率逼近最优倾斜分布,为大型语言模型对齐提供了一种高效且灵活的推理时策略。
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
grok-3-latestScore: 0.58Published: at 06:26本文提出GVM-RAFT方法,通过动态采样分配策略最小化梯度方差,显著提升大型语言模型在链式思维推理任务中的训练效率和性能。
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
grok-3-mini-latestScore: 0.78Published: at 10:25本文提出WebThinker框架,通过Deep Web Explorer和Autonomous Think-Search-and-Draft策略增强LRMs的网页搜索与报告生成能力,并利用RL-based训练优化工具交互,实现显著的复杂任务性能提升。
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
grok-3-mini-latestScore: 0.79Published: at 10:25本文提出XBreaking方法,利用Explainable AI分析审查和非审查LLMs的内部模式,识别关键层并通过噪声注入绕过安全机制,同时保持模型功能。
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
grok-3-latestScore: 0.86Published: at 18:48本文通过实证分析揭示了大型语言模型推理长度与正确性之间的非线性关系,并通过长度偏好优化实验提出了一种在无监督条件下减少生成长度的有效方法,为自适应推理研究提供了新视角。