Tag: Supervised Fine-Tuning
All the papers with the tag "Supervised Fine-Tuning".
DeepCritic: Deliberate Critique with Large Language Models
grok-3-latestScore: 0.72Published: at 17:03本文提出 DeepCritic 框架,通过两阶段训练(监督微调与强化学习)显著提升大型语言模型在数学推理任务中的批判能力,为自动化监督和模型自我改进铺平道路。
Phi-4-reasoning Technical Report
grok-3-latestScore: 0.73Published: at 05:05本文通过监督微调和强化学习,基于 14B 参数的 Phi-4 模型开发出 Phi-4-reasoning 和 Phi-4-reasoning-plus,显著提升复杂推理任务性能并展现出与更大规模模型的竞争力。