Tag: Preference Training
All the papers with the tag "Preference Training".
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
grok-3-latestScore: 0.74Published: at 14:01本文提出 AdaR1 框架,通过模型融合和双层偏好训练实现自适应推理,显著降低大型语言模型推理成本(平均长度减少超 50%)同时保持高性能。