Computational Audiovisual Scene Analysis in Online Adaptation of Audio-Motor Maps

IEEE Transactions on Autonomous Mental Development Pub Date : 2013-12-01 DOI:10.1109/TAMD.2013.2257766

Rujiao Yan, Tobias Rodemann, B. Wrede

{"title":"Computational Audiovisual Scene Analysis in Online Adaptation of Audio-Motor Maps","authors":"Rujiao Yan, Tobias Rodemann, B. Wrede","doi":"10.1109/TAMD.2013.2257766","DOIUrl":null,"url":null,"abstract":"For sound localization, the binaural auditory system of a robot needs audio-motor maps, which represent the relationship between certain audio features and the position of the sound source. This mapping is normally learned during an offline calibration in controlled environments, but we show that using computational audiovisual scene analysis (CAVSA), it can be adapted online in free interaction with a number of a priori unknown speakers. CAVSA enables a robot to understand dynamic dialog scenarios, such as the number and position of speakers, as well as who is the current speaker. Our system does not require specific robot motions and thus can work during other tasks. The performance of online-adapted maps is continuously monitored by computing the difference between online-adapted and offline-calibrated maps and also comparing sound localization results with ground truth data (if available). We show that our approach is more robust in multiperson scenarios than the state of the art in terms of learning progress. We also show that our system is able to bootstrap with a randomized audio-motor map and adapt to hardware modifications that induce a change in audio-motor maps.","PeriodicalId":49193,"journal":{"name":"IEEE Transactions on Autonomous Mental Development","volume":"5 1","pages":"273-287"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TAMD.2013.2257766","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Autonomous Mental Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAMD.2013.2257766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

For sound localization, the binaural auditory system of a robot needs audio-motor maps, which represent the relationship between certain audio features and the position of the sound source. This mapping is normally learned during an offline calibration in controlled environments, but we show that using computational audiovisual scene analysis (CAVSA), it can be adapted online in free interaction with a number of a priori unknown speakers. CAVSA enables a robot to understand dynamic dialog scenarios, such as the number and position of speakers, as well as who is the current speaker. Our system does not require specific robot motions and thus can work during other tasks. The performance of online-adapted maps is continuously monitored by computing the difference between online-adapted and offline-calibrated maps and also comparing sound localization results with ground truth data (if available). We show that our approach is more robust in multiperson scenarios than the state of the art in terms of learning progress. We also show that our system is able to bootstrap with a randomized audio-motor map and adapt to hardware modifications that induce a change in audio-motor maps.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

音频-运动地图在线适配中的计算视听场景分析

对于声音定位，机器人的双耳听觉系统需要音频-电机地图，它代表了某些音频特征与声源位置之间的关系。这种映射通常是在受控环境下的离线校准中学习的，但我们表明，使用计算视听场景分析(CAVSA)，它可以在线适应与许多先验未知说话者的自由交互。CAVSA使机器人能够理解动态对话场景，例如说话人的数量和位置，以及当前说话人是谁。我们的系统不需要特定的机器人运动，因此可以在其他任务中工作。通过计算在线调整地图和离线校准地图之间的差异，并将声音定位结果与地面真实数据(如果可用)进行比较，持续监测在线调整地图的性能。我们表明，就学习进度而言，我们的方法在多人场景中比目前的技术更健壮。我们还表明，我们的系统能够引导随机的音频-运动映射，并适应硬件修改，诱导音频-运动映射的变化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Autonomous Mental Development COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ROBOTICS

自引率

0.00%

发文量

审稿时长

3 months