Tag: Quantization

All the papers with the tag "Quantization".

QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
grok-3-latest
Score: 0.58
Published:2025年5月12日 at 13:13
#LLM, #Quantization, #Hardware Awareness, #Post-Training, #Compression
QuantX 提出了一种硬件感知的量化框架，通过针对权重分布差异和硬件约束设计多种量化策略，将大型语言模型和视觉语言模型量化到3比特，同时保持性能损失在6%以内，显著优于现有方法。
Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation
grok-3-latest
Score: 0.52
Published:2025年5月6日 at 10:31
#LLM, #Retrieval Augmented Generation, #Fine-Tuning, #Quantization, #Clinical Decision Support
本文提出了一种轻量化的临床决策支持系统，通过结合检索增强生成（RAG）和量化低秩适应（QLoRA）微调技术，提升了大型语言模型在医疗任务中的准确性和效率，同时降低了计算资源需求。
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
grok-3-latest
Score: 0.70
Published:2025年5月5日 at 01:27
#LLM, #Model Compression, #Knowledge Distillation, #Quantization, #Pruning, #Edge Deployment, #Efficiency
本文综述了大型语言模型（LLMs）在资源受限环境中的压缩技术，包括知识蒸馏、模型量化和模型剪枝，系统分析了其原理、变体及应用效果，并探讨了未来研究方向，为边缘设备部署 LLMs 提供了重要参考。
Optimizing Deep Neural Networks using Safety-Guided Self Compression
grok-3-latest
Score: 0.50
Published:2025年5月1日 at 06:50
#Model Compression, #Quantization, #Generalization, #Safety-Driven Optimization
本文提出安全驱动的自压缩框架，通过保留集和可微量化机制，在深度神经网络压缩中实现模型大小与性能的平衡，显著提升资源受限环境下的部署能力。
Optimizing Deep Neural Networks using Safety-Guided Self Compression
grok-3-latest
Score: 0.50
Published:2025年5月1日 at 06:50
#Deep Learning, #Model Compression, #Quantization, #Generalization, #Safety-Driven
本文提出安全驱动的量化框架，通过保留集指导深度神经网络的自压缩，在显著减小模型体积的同时提升性能和泛化能力，为资源受限环境下的部署提供可靠优化策略。
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
grok-3-latest
Score: 0.52
Published:2025年5月1日 at 06:47
#LLM, #Optimization, #Quantization, #Memory Efficiency, #Training
本文提出 SOLO 框架，通过针对 EMA 更新特性的对数量化和动量调整，将优化器状态精度降低至 2 位或 3 位，同时保持接近全精度的训练性能，为资源受限环境下的 AI 研究提供可行解决方案。

Tag: Quantization

QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads

Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation

Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Optimizing Deep Neural Networks using Safety-Guided Self Compression

Optimizing Deep Neural Networks using Safety-Guided Self Compression

Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics