Connecting concepts from vision and speech processing

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI:10.1109/ISIU.1999.824829

S. Wachsmuth, G. Sagerer

引用次数: 6

Abstract

This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

连接视觉和语音处理的概念

本文讨论了如何在语音和视觉数据的解释之间建立参考联系的问题。为了消除错误、模糊或不完整的概念描述，我们提出了一个概率交互方案。利用贝叶斯网络实现依赖关系的建模和推理的计算。该交互方案为消歧和错误恢复提供了基础。我们在一个组装任务环境中实现了一个交互组件。机器人建造者可以通过语音和手势指示来连接木制玩具建造套件的原始部件。该系统在一个测试数据集上进行评估，该数据集由16位说话者的448个语音组成，这些说话者在来自不同场景的10张图像上命名物体。第一个结果表明了概率方法的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings Integration of Speech and Image Understanding

自引率

0.00%

发文量

期刊最新文献

Knowledge based image and speech analysis for service robots Towards affective integration of vision, behavior, and speech processing Connecting concepts from vision and speech processing From images to sentences via spatial relations From video to language-a detour via logic vs. jumping to conclusions