Tag: Distillation

All the papers with the tag "Distillation".

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
grok-3-latest
Score: 0.74
Published:2025年5月7日 at 13:51
#LLM, #Machine Unlearning, #Privacy Protection, #Fine-Tuning, #Distillation
OBLIVIATE 提出了一种鲁棒且实用的 LLM 遗忘框架，通过掩码、蒸馏和世界事实损失结合上下文感知遗忘，有效移除目标数据并保持模型性能和流畅性。
RM-R1: Reward Modeling as Reasoning
grok-3-latest
Score: 0.83
Published:2025年5月5日 at 06:11
#LLM, #Reward Modeling, #Reasoning, #Distillation, #RLHF
本文提出 RM-R1 模型家族，通过将奖励建模转化为推理任务，利用蒸馏和强化学习显著提升了奖励模型的解释性和性能，超越了更大规模的开源和商业模型。
RM-R1: Reward Modeling as Reasoning
grok-3-latest
Score: 0.83
Published:2025年5月5日 at 06:11
#LLM, #Reward Modeling, #Reasoning, #Distillation, #Reinforcement Learning
本文提出将奖励建模作为推理任务的范式，通过推理链蒸馏和强化学习训练 RM-R1 模型，显著提升了奖励模型的解释性和性能，超越了更大规模的开源和闭源模型。
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
grok-3-latest
Score: 0.63
Published:2025年5月1日 at 16:05
#LLM, #Domain Adaptation, #Pruning, #Distillation, #Data Curation
FineScope 提出了一种通过 SAE 引导的自动化数据集培育和领域感知剪枝优化大型语言模型的框架，显著提升了领域特定任务的性能与效率。
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
grok-3-mini-latest
Score: 0.78
Published:2025年5月1日 at 10:25
#LLM, #Proxy Model, #Distillation, #Sampling, #Reasoning
本文提出WebThinker框架，通过Deep Web Explorer和Autonomous Think-Search-and-Draft策略增强LRMs的网页搜索与报告生成能力，并利用RL-based训练优化工具交互，实现显著的复杂任务性能提升。

Tag: Distillation

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

RM-R1: Reward Modeling as Reasoning

RM-R1: Reward Modeling as Reasoning

FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation

WebThinker: Empowering Large Reasoning Models with Deep Research Capability