Tag: Continual Pre-Training
All the papers with the tag "Continual Pre-Training".
Learning Dynamics in Continual Pre-Training for Large Language Models
grok-3-latestScore: 0.77Published: at 17:47本文提出一个 CPT 缩放法则,通过解耦分布偏移和学习率退火的影响,量化持续预训练过程中损失变化规律,并预测任意训练步骤下的性能表现,为超参数优化提供指导。