Tag: Audio-Visual Reasoning
All the papers with the tag "Audio-Visual Reasoning".
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
grok-3-latestScore: 0.60Published: at 17:59本文提出 EchoInk-R1 框架,通过 Group Relative Policy Optimization 强化学习显著提升多模态大语言模型在音频-视觉推理任务上的性能,首次实现音频、视觉和文本模态的统一开放世界推理。