Tag: Real-time Interaction
All the papers with the tag "Real-time Interaction".
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
grok-3-latestScore: 0.57Published: at 17:59VITA-Audio通过轻量级MCTP模块和四阶段训练策略,在端到端语音模型中实现首次音频token零延迟生成,显著提升推理速度并在ASR、TTS、SQA任务上达到开源模型最优性能,为实时语音交互设定了新标准。
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
grok-3-latestScore: 0.73Published: at 12:53LLaMA-Omni 2 通过模块化设计和自回归流式语音生成,以较低成本实现高质量端到端语音交互,显著超越依赖大规模数据的基线模型。
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
grok-3-latestScore: 0.77Published: at 12:53LLaMA-Omni 2 通过模块化设计和自回归流式语音生成技术,显著提升了实时语音交互的智能性、自然性和低延迟表现,超越了现有 SpeechLM 模型。