Daily Paper Machine

Tag: Resource Allocation

All the papers with the tag "Resource Allocation".

Large Language Model Partitioning for Low-Latency Inference at the Edge
grok-3-latest
Score: 0.63
Published:2025年5月5日 at 10:16
#LLM, #Edge Computing, #Transformer Partitioning, #Resource Allocation, #Low-Latency Inference
本文提出一种资源感知的 Transformer 分区算法，通过注意力头级别的细粒度分区和动态块迁移，显著降低边缘环境下大型语言模型的推理延迟并优化内存使用。