Daily Paper Machine

Tag: Real-time Interaction

All the papers with the tag "Real-time Interaction".

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
grok-3-latest
Score: 0.57
Published:2025年5月6日 at 17:59
#LLM, #Cross-Modal Mapping, #Speech Generation, #Token Prediction, #Real-Time Interaction
VITA-Audio通过轻量级MCTP模块和四阶段训练策略，在端到端语音模型中实现首次音频token零延迟生成，显著提升推理速度并在ASR、TTS、SQA任务上达到开源模型最优性能，为实时语音交互设定了新标准。
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
grok-3-latest
Score: 0.73
Published:2025年5月5日 at 12:53
#LLM, #Speech Synthesis, #Streaming Generation, #Modular Design, #Real-time Interaction
LLaMA-Omni 2 通过模块化设计和自回归流式语音生成，以较低成本实现高质量端到端语音交互，显著超越依赖大规模数据的基线模型。
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
grok-3-latest
Score: 0.77
Published:2025年5月5日 at 12:53
#LLM, #Speech Synthesis, #Streaming Generation, #Modular Design, #Real-Time Interaction
LLaMA-Omni 2 通过模块化设计和自回归流式语音生成技术，显著提升了实时语音交互的智能性、自然性和低延迟表现，超越了现有 SpeechLM 模型。