Tag: Fine-Tuning
All the papers with the tag "Fine-Tuning".
An Open-Source Dual-Loss Embedding Model for Semantic Retrieval in Higher Education
grok-3-latestScore: 0.51Published: at 03:14本文提出两种针对教育领域的开源嵌入模型,通过双损失微调策略显著提升语义检索性能,接近专有模型水平,为学术问答和检索系统提供了透明、低成本的解决方案。
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
grok-3-latestScore: 0.66Published: at 17:18本文提出Reward Neutralization框架,通过训练模型生成最小信息拒绝来中和恶意RL微调的奖励信号,显著提升开源模型在攻击下的安全性。
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
grok-3-latestScore: 0.74Published: at 13:51OBLIVIATE 提出了一种鲁棒且实用的 LLM 遗忘框架,通过掩码、蒸馏和世界事实损失结合上下文感知遗忘,有效移除目标数据并保持模型性能和流畅性。
Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation
grok-3-latestScore: 0.52Published: at 10:31本文提出了一种轻量化的临床决策支持系统,通过结合检索增强生成(RAG)和量化低秩适应(QLoRA)微调技术,提升了大型语言模型在医疗任务中的准确性和效率,同时降低了计算资源需求。
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
grok-3-latestScore: 0.70Published: at 11:30EMORL 框架通过集成学习和隐藏状态聚合,为多目标 LLM 微调提供了一种高效、灵活且可解释的方法,在资源消耗和稳定性上显著优于传统方法,同时保持了相当的性能。
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
grok-3-latestScore: 0.83Published: at 05:42本文通过校准感知微调方法(CFT 和 RCFT),结合理论状态划分和 EM 算法正则化,显著改善了偏好对齐后大型语言模型的校准性能,同时保持或提升模型准确率。