Tag: Compute Efficiency
All the papers with the tag "Compute Efficiency".
Intra-Layer Recurrence in Transformers for Language Modeling
grok-3-latestScore: 0.73Published: at 16:16本文提出 Intra-Layer Recurrence (ILR) 方法,通过在 Transformer 模型中选择性循环个别层,显著降低困惑度并验证早期层循环效果最佳,为高效架构设计提供了新思路。