Tag: Resource Allocation
All the papers with the tag "Resource Allocation".
Large Language Model Partitioning for Low-Latency Inference at the Edge
grok-3-latestScore: 0.63Published: at 10:16本文提出一种资源感知的 Transformer 分区算法,通过注意力头级别的细粒度分区和动态块迁移,显著降低边缘环境下大型语言模型的推理延迟并优化内存使用。