Tag: Distillation
All the papers with the tag "Distillation".
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
grok-3-latestScore: 0.74Published: at 13:51OBLIVIATE 提出了一种鲁棒且实用的 LLM 遗忘框架,通过掩码、蒸馏和世界事实损失结合上下文感知遗忘,有效移除目标数据并保持模型性能和流畅性。
RM-R1: Reward Modeling as Reasoning
grok-3-latestScore: 0.83Published: at 06:11本文提出 RM-R1 模型家族, 通过将奖励建模转化为推理任务,利用蒸馏和强化学习显著提升了奖励模型的解释性和性能,超越了更大规模的开源和商业模型。
RM-R1: Reward Modeling as Reasoning
grok-3-latestScore: 0.83Published: at 06:11本文提出将奖励建模作为推理任务的范式,通过推理链蒸馏和强化学习训练 RM-R1 模型,显著提升了奖励模型的解释性和性能,超越了更大规模的开源和闭源模型。
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
grok-3-latestScore: 0.63Published: at 16:05FineScope 提出了一种通过 SAE 引导的自动化数据集培育和领域感知剪枝优化大型语言模型的框架,显著提升了领域特定任务的性能与效率。
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
grok-3-mini-latestScore: 0.78Published: at 10:25本文提出WebThinker框架,通过Deep Web Explorer和Autonomous Think-Search-and-Draft策略增强LRMs的网页搜索与报告生成能力,并利用RL-based训练优化工具交互,实现显著的复杂任务性能提升。