Vision-based speaker detection using Bayesian networks

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149) Pub Date : 1999-06-23 DOI:10.1109/CVPR.1999.784617

James M. Rehg, Kevin P. Murphy, P. Fieguth

引用次数: 79

Abstract

The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视觉的贝叶斯网络说话人检测

基于视觉和语音的用户界面的开发需要解决一个具有挑战性的统计推断问题:必须从嘈杂和模糊的数据中推断出多个个体的意图和行为。我们认为贝叶斯网络模型是这些应用中线索融合的一个有吸引力的统计框架。贝叶斯网络结合了表达上下文信息的自然机制和高效的学习和推理算法。我们通过开发用于检测用户何时说话的贝叶斯网络模型来说明这些要点。该模型结合了四种简单的视觉传感器:面部检测、肤色、皮肤纹理和口腔运动。我们提出了一些有希望的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

自引率

0.00%

发文量