Daily Paper Machine

Tag: Supervised Fine-Tuning

All the papers with the tag "Supervised Fine-Tuning".

DeepCritic: Deliberate Critique with Large Language Models
grok-3-latest
Score: 0.72
Published:2025年5月1日 at 17:03
#LLM, #Critique Model, #Mathematical Reasoning, #Supervised Fine-Tuning, #Reinforcement Learning
本文提出 DeepCritic 框架，通过两阶段训练（监督微调与强化学习）显著提升大型语言模型在数学推理任务中的批判能力，为自动化监督和模型自我改进铺平道路。
Phi-4-reasoning Technical Report
grok-3-latest
Score: 0.73
Published:2025年4月30日 at 05:05
#LLM, #Reasoning, #Supervised Fine-Tuning, #Reinforcement Learning, #Inference Scaling
本文通过监督微调和强化学习，基于 14B 参数的 Phi-4 模型开发出 Phi-4-reasoning 和 Phi-4-reasoning-plus，显著提升复杂推理任务性能并展现出与更大规模模型的竞争力。