Tag: Cross-Modal Mapping
All the papers with the tag "Cross-Modal Mapping".
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
grok-3-latestScore: 0.57Published: at 17:59VITA-Audio通过轻量级MCTP模块和四阶段训练策略,在端到端语音模型中实现首次音频token零延迟生成,显著提升推理速度并在ASR、TTS、SQA任务上达到开源模型最优性能,为实时语音交互设定了新标准。