首页 > 最新文献

Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)最新文献

英文 中文
Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration 利用语音/手势共现改善天气叙述中的连续手势识别
Rajeev Sharma, Jiongyu Cai, Srivatsan Chakravarthy, Indrajit Poddar, Y. Sethi
In order to incorporate naturalness in the design of human computer interfaces (HCI), it is desirable to develop recognition techniques capable of handling continuous natural gesture and speech inputs. Though many different researchers have reported high recognition rates for gesture recognition using hidden Markov models (HMM), the gestures used are mostly pre-defined and are bound with syntactical and grammatical constraints. But natural gestures do not string together in syntactical bindings. Moreover, strict classification of natural gestures is not feasible. We have examined hand gestures made in a very natural domain, that of a weather person narrating in front of a weather map. The gestures made by the weather person are embedded in a narration. This provides us with abundant data from an uncontrolled environment to study the interaction between speech and gesture in the context of a display. We hypothesize that this domain is very similar to that of a natural human-computer interface. We present an HMM architecture for continuous gesture recognition framework and keyword spotting. To explore the relation between gesture and speech, we conducted a statistical co-occurrence analysis of different gestures with a selected set of spoken keywords. We then demonstrate how this co-occurrence analysis can be exploited to improve the performance of continuous gesture recognition.
为了在人机界面(HCI)的设计中融入自然性,需要开发能够处理连续自然手势和语音输入的识别技术。尽管许多不同的研究人员已经报道了使用隐马尔可夫模型(HMM)进行手势识别的高识别率,但使用的手势大多是预定义的,并且受到句法和语法约束。但是自然的手势不会在语法绑定中串在一起。此外,对自然手势进行严格的分类是不可行的。我们已经研究了一个非常自然的手势,即天气预报员在天气图前解说的手势。天气预报员的手势被嵌入到旁白中。这为我们提供了大量来自非受控环境的数据,以研究显示背景下语音和手势之间的相互作用。我们假设这个领域与自然的人机界面非常相似。我们提出了一种用于连续手势识别框架和关键字识别的HMM架构。为了探究手势和语音之间的关系,我们选择了一组语音关键词,对不同手势进行了统计共现分析。然后,我们演示了如何利用这种共现分析来提高连续手势识别的性能。
{"title":"Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration","authors":"Rajeev Sharma, Jiongyu Cai, Srivatsan Chakravarthy, Indrajit Poddar, Y. Sethi","doi":"10.1109/AFGR.2000.840669","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840669","url":null,"abstract":"In order to incorporate naturalness in the design of human computer interfaces (HCI), it is desirable to develop recognition techniques capable of handling continuous natural gesture and speech inputs. Though many different researchers have reported high recognition rates for gesture recognition using hidden Markov models (HMM), the gestures used are mostly pre-defined and are bound with syntactical and grammatical constraints. But natural gestures do not string together in syntactical bindings. Moreover, strict classification of natural gestures is not feasible. We have examined hand gestures made in a very natural domain, that of a weather person narrating in front of a weather map. The gestures made by the weather person are embedded in a narration. This provides us with abundant data from an uncontrolled environment to study the interaction between speech and gesture in the context of a display. We hypothesize that this domain is very similar to that of a natural human-computer interface. We present an HMM architecture for continuous gesture recognition framework and keyword spotting. To explore the relation between gesture and speech, we conducted a statistical co-occurrence analysis of different gestures with a selected set of spoken keywords. We then demonstrate how this co-occurrence analysis can be exploited to improve the performance of continuous gesture recognition.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"345 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115290260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Segmenting hands of arbitrary color 分割任意颜色的手
Xiaojin Zhu, Jie Yang, A. Waibel
Hand segmentation is a prerequisite for many gesture recognition tasks. Color has been widely used for hand segmentation. However, many approaches rely on predefined skin color models. It is very difficult to predefine a color model in a mobile application where the light condition may change dramatically over time. We propose a novel statistical approach to hand segmentation based on Bayes decision theory. The proposed method requires no predefined skin color model. Instead it generates a hand color model and a background color model for a given image, and uses these models to classify each pixel in the image as either a hand pixel or a background pixel. Models are generated using a Gaussian mixture model with the restricted EM algorithm. Our method is capable of segmenting hands of arbitrary color in a complex scene. It performs well even when there is a significant overlap between hand and background colors, or when the user wears gloves. We show that the Bayes decision method is superior to a commonly used method by comparing their upper bound performance. Experimental results demonstrate the feasibility of the proposed method.
手部分割是许多手势识别任务的先决条件。颜色被广泛应用于手部分割。然而,许多方法依赖于预定义的肤色模型。在移动应用程序中预定义颜色模型是非常困难的,因为光线条件可能随着时间的推移而发生巨大变化。提出了一种基于贝叶斯决策理论的手部分割统计方法。该方法不需要预先定义肤色模型。相反,它为给定的图像生成一个手部颜色模型和一个背景颜色模型,并使用这些模型将图像中的每个像素分类为手部像素或背景像素。模型的生成采用高斯混合模型和受限电磁算法。我们的方法能够在复杂的场景中分割任意颜色的手。即使手和背景颜色有明显的重叠,或者用户戴着手套,它也能表现良好。通过比较贝叶斯决策方法的上界性能,证明了贝叶斯决策方法优于常用的决策方法。实验结果证明了该方法的可行性。
{"title":"Segmenting hands of arbitrary color","authors":"Xiaojin Zhu, Jie Yang, A. Waibel","doi":"10.1109/AFGR.2000.840673","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840673","url":null,"abstract":"Hand segmentation is a prerequisite for many gesture recognition tasks. Color has been widely used for hand segmentation. However, many approaches rely on predefined skin color models. It is very difficult to predefine a color model in a mobile application where the light condition may change dramatically over time. We propose a novel statistical approach to hand segmentation based on Bayes decision theory. The proposed method requires no predefined skin color model. Instead it generates a hand color model and a background color model for a given image, and uses these models to classify each pixel in the image as either a hand pixel or a background pixel. Models are generated using a Gaussian mixture model with the restricted EM algorithm. Our method is capable of segmenting hands of arbitrary color in a complex scene. It performs well even when there is a significant overlap between hand and background colors, or when the user wears gloves. We show that the Bayes decision method is superior to a commonly used method by comparing their upper bound performance. Experimental results demonstrate the feasibility of the proposed method.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115545730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 172
Dual-state parametric eye tracking 双状态参数眼动追踪
Ying-li Tian, T. Kanade, J. Cohn
Most eye trackers work well for open eyes. However blinking is a physiological necessity for humans. More over, for applications such as facial expression analysis and driver awareness systems, we need to do more than tracking of the locations of the person's eyes but obtain their detailed description. We need to recover the state of the eyes (i.e., whether they are open or closed), and the parameters of an eye model (e.g., the location and radius of the iris, and the corners and height of the eye opening). We develop a dual-state model-based system for tracking eye features that uses convergent tracking techniques and show how it can be used to detect whether the eyes are open or closed, and to recover the parameters of the eye model. Processing speed on a Pentium II 400 MHz PC is approximately 3 frames/second. In experimental tests on 500 image sequences from child and adult subjects with varying colors of skin and eye, accurate tracking results are obtained in 98% of image sequences.
大多数眼动仪在睁开眼睛时效果很好。然而,眨眼是人类的生理需要。此外,对于面部表情分析和驾驶员意识系统等应用,我们需要做的不仅仅是跟踪人的眼睛的位置,而是获得他们的详细描述。我们需要恢复眼睛的状态(即眼睛是开着还是闭着),以及眼睛模型的参数(例如虹膜的位置和半径,以及眼睛张开的角和高度)。我们开发了一个基于双状态模型的系统,用于跟踪眼睛特征,该系统使用收敛跟踪技术,并展示了如何使用它来检测眼睛是打开还是关闭,并恢复眼睛模型的参数。奔腾II 400 MHz PC的处理速度大约是3帧/秒。在对来自不同肤色和眼睛的儿童和成人受试者的500个图像序列进行实验测试中,98%的图像序列获得了准确的跟踪结果。
{"title":"Dual-state parametric eye tracking","authors":"Ying-li Tian, T. Kanade, J. Cohn","doi":"10.1109/AFGR.2000.840620","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840620","url":null,"abstract":"Most eye trackers work well for open eyes. However blinking is a physiological necessity for humans. More over, for applications such as facial expression analysis and driver awareness systems, we need to do more than tracking of the locations of the person's eyes but obtain their detailed description. We need to recover the state of the eyes (i.e., whether they are open or closed), and the parameters of an eye model (e.g., the location and radius of the iris, and the corners and height of the eye opening). We develop a dual-state model-based system for tracking eye features that uses convergent tracking techniques and show how it can be used to detect whether the eyes are open or closed, and to recover the parameters of the eye model. Processing speed on a Pentium II 400 MHz PC is approximately 3 frames/second. In experimental tests on 500 image sequences from child and adult subjects with varying colors of skin and eye, accurate tracking results are obtained in 98% of image sequences.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129889515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 189
Wide-range, person- and illumination-insensitive head orientation estimation 宽范围,人和光照不敏感的头部方向估计
Ying Wu, K. Toyama
We present an algorithm for estimation of head orientation, given cropped images of a subject's head from any viewpoint. Our algorithm handles dramatic changes in illumination, applies to many people without per-user initialization, and covers a wider range (e.g., side and back) of head orientations than previous algorithms. The algorithm builds an ellipsoidal model of the head, where points on the model maintain probabilistic information about surface edge density. To collect data for each point on the model, edge-density features are extracted from hand-annotated training images and projected into the model. Each model point learns a probability density function from the training observations. During pose estimation, features are extracted from input images; then, the maximum a posteriori pose is sought, given the current observation.
我们提出了一种估计头部方向的算法,给定从任何视点裁剪的受试者头部图像。我们的算法处理光照的巨大变化,适用于许多人而无需每个用户初始化,并且比以前的算法涵盖更广泛的头部方向(例如,侧面和背面)。该算法建立头部的椭球体模型,模型上的点保持表面边缘密度的概率信息。为了收集模型上每个点的数据,从手工标注的训练图像中提取边缘密度特征并投影到模型中。每个模型点从训练观察中学习一个概率密度函数。姿态估计时,从输入图像中提取特征;然后,在给定当前观测值的情况下,寻求最大的后验姿态。
{"title":"Wide-range, person- and illumination-insensitive head orientation estimation","authors":"Ying Wu, K. Toyama","doi":"10.1109/AFGR.2000.840632","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840632","url":null,"abstract":"We present an algorithm for estimation of head orientation, given cropped images of a subject's head from any viewpoint. Our algorithm handles dramatic changes in illumination, applies to many people without per-user initialization, and covers a wider range (e.g., side and back) of head orientations than previous algorithms. The algorithm builds an ellipsoidal model of the head, where points on the model maintain probabilistic information about surface edge density. To collect data for each point on the model, edge-density features are extracted from hand-annotated training images and projected into the model. Each model point learns a probability density function from the training observations. During pose estimation, features are extracted from input images; then, the maximum a posteriori pose is sought, given the current observation.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"97 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128827498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Tracking interacting people 跟踪互动的人
S. McKenna, S. Jabri, Zoran Duric, H. Wechsler
A computer vision system for tracking multiple people in relatively unconstrained environments is described. Tracking is performed at three levels of abstraction: regions, people and groups. A novel, adaptive background subtraction method that combines colour and gradient information is used to cope with shadows and unreliable colour cues. People are tracked through mutual occlusions as they form groups and part from one another. Strong use is made of colour information to disambiguate occlusions and to provide qualitative estimates of depth ordering and position during occlusion. Some simple interactions with objects can also be detected. The system is tested using indoor and outdoor sequences. It is robust and should provide a useful mechanism for bootstrapping and reinitialisation of tracking using more-specific but less-robust human models.
描述了一种在相对不受约束的环境中跟踪多人的计算机视觉系统。跟踪在三个抽象层次上执行:区域、人员和群体。一种结合颜色和梯度信息的自适应背景减法,用于处理阴影和不可靠的颜色线索。当人们形成群体和彼此分离时,他们会被追踪到相互闭塞的地方。强烈使用颜色信息来消除歧义遮挡,并在遮挡期间提供深度排序和位置的定性估计。还可以检测到与对象的一些简单交互。系统使用室内和室外序列进行了测试。它是鲁棒的,应该为使用更具体但鲁棒性较差的人类模型的引导和重新初始化跟踪提供有用的机制。
{"title":"Tracking interacting people","authors":"S. McKenna, S. Jabri, Zoran Duric, H. Wechsler","doi":"10.1109/AFGR.2000.840658","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840658","url":null,"abstract":"A computer vision system for tracking multiple people in relatively unconstrained environments is described. Tracking is performed at three levels of abstraction: regions, people and groups. A novel, adaptive background subtraction method that combines colour and gradient information is used to cope with shadows and unreliable colour cues. People are tracked through mutual occlusions as they form groups and part from one another. Strong use is made of colour information to disambiguate occlusions and to provide qualitative estimates of depth ordering and position during occlusion. Some simple interactions with objects can also be detected. The system is tested using indoor and outdoor sequences. It is robust and should provide a useful mechanism for bootstrapping and reinitialisation of tracking using more-specific but less-robust human models.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125322560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Toward real-time human-computer interaction with continuous dynamic hand gestures 向着持续动态手势的实时人机交互方向发展
Yuanxin Zhu, Haibing Ren, Guangyou Xu, X. Lin
This paper, aiming at real-time gesture-controlled interaction, describes visual modeling, analysis, and recognition of continuous dynamic hand gestures. By hierarchically integrating multiple cues, a spatio-temporal appearance model and novel approaches are proposed for modeling and analysis of dynamic gestures respectively. At low level, fusion of flesh chrominance analysis and coarse image motion detection is employed to detect and segment hand gestures; at high level, parameters of the spatio-temporal appearance model are recovered by combining robust parameterized image motion estimation and hand shape analysis. The approach, therefore, fulfils real-time processing as well as high recognition rates. Without resorting to any special marks, twelve kinds of hand gestures can be recognized with average accuracy over 89%. A prototype system, gesture-controlled panoramic map browser is designed and implemented to demonstrate the usability of gesture-controlled interaction.
本文以实时手势控制交互为目标,描述了连续动态手势的视觉建模、分析和识别。通过分层整合多线索,提出了动态手势的时空外观模型和分析方法。在低层次上,采用肉色分析和粗图像运动检测相融合的方法对手势进行检测和分割;在高层次上,将鲁棒参数化图像运动估计与手形分析相结合,恢复时空外观模型的参数。因此,该方法实现了实时处理和高识别率。在不借助任何特殊标记的情况下,可以识别12种手势,平均准确率超过89%。为了验证手势控制交互的可用性,设计并实现了一个原型系统——手势控制全景地图浏览器。
{"title":"Toward real-time human-computer interaction with continuous dynamic hand gestures","authors":"Yuanxin Zhu, Haibing Ren, Guangyou Xu, X. Lin","doi":"10.1109/AFGR.2000.840688","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840688","url":null,"abstract":"This paper, aiming at real-time gesture-controlled interaction, describes visual modeling, analysis, and recognition of continuous dynamic hand gestures. By hierarchically integrating multiple cues, a spatio-temporal appearance model and novel approaches are proposed for modeling and analysis of dynamic gestures respectively. At low level, fusion of flesh chrominance analysis and coarse image motion detection is employed to detect and segment hand gestures; at high level, parameters of the spatio-temporal appearance model are recovered by combining robust parameterized image motion estimation and hand shape analysis. The approach, therefore, fulfils real-time processing as well as high recognition rates. Without resorting to any special marks, twelve kinds of hand gestures can be recognized with average accuracy over 89%. A prototype system, gesture-controlled panoramic map browser is designed and implemented to demonstrate the usability of gesture-controlled interaction.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117111279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Face analysis for the synthesis of photo-realistic talking heads 人脸分析合成逼真的说话头
H. Graf, E. Cosatto, Tony Ezzat
This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.
本文描述了从说话人的视频中提取面部部位位图的技术。目标是合成高质量的逼真的说话头,显示出完美的画面外观和逼真的头部运动,并具有良好的唇音同步。为了合成一个会说话的头,将面部各部分的位图组合成一个完整的头,然后将这些图像的序列与来自文本语音合成器的音频集成在一起。为了在动画中无缝地整合面部部分,必须高精度地知道它们的形状和视觉外观。识别系统不仅要找到面部特征的位置,还必须能够确定头部的方向并识别面部表情。我们的人脸识别分多个步骤进行,每一步都提高了精度。利用运动、颜色和形状信息,首先确定头部的位置和主要面部特征的位置。然后用匹配的滤波器搜索较小的区域,以高精度地识别特定的面部特征。根据这些信息计算出头部的三维方向。面部部分从图像中剪切出来,使用头部的方向,将其扭曲成具有“标准化”方向和比例的位图。
{"title":"Face analysis for the synthesis of photo-realistic talking heads","authors":"H. Graf, E. Cosatto, Tony Ezzat","doi":"10.1109/AFGR.2000.840633","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840633","url":null,"abstract":"This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124452820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Audio-visual speaker detection using dynamic Bayesian networks 基于动态贝叶斯网络的视听说话人检测
A. Garg, V. Pavlovic, James M. Rehg
The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBN). The DBN framework allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. We demonstrate the use of DBN in tackling the problem of audio/visual speaker detection. "Off-the-shelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors) are optimally fused along with contextual information in a DBN architecture that infers instances when an individual is speaking. Results obtained in the setup of an actual human-machine interaction system (Genie Casino Kiosk) demonstrate superiority of our approach over that of static, context-free fusion architecture.
人机界面的发展提出了一个具有挑战性的问题:不同用户的行为和意图必须从嘈杂和模糊的感官数据序列中推断出来。动态贝叶斯网络(DBN)可以有效地实现多传感器的时间融合。DBN框架允许将统计推断和学习的力量与问题的上下文知识相结合。我们演示了DBN在解决音频/视觉说话人检测问题中的应用。“现成的”视觉和音频传感器(面部、皮肤、纹理、嘴部运动和沉默探测器)在DBN架构中与上下文信息完美融合,从而推断出一个人何时在说话。在实际人机交互系统(Genie Casino Kiosk)的设置中获得的结果表明,我们的方法优于静态的、无上下文的融合架构。
{"title":"Audio-visual speaker detection using dynamic Bayesian networks","authors":"A. Garg, V. Pavlovic, James M. Rehg","doi":"10.1109/AFGR.2000.840663","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840663","url":null,"abstract":"The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBN). The DBN framework allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. We demonstrate the use of DBN in tackling the problem of audio/visual speaker detection. \"Off-the-shelf\" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors) are optimally fused along with contextual information in a DBN architecture that infers instances when an individual is speaking. Results obtained in the setup of an actual human-machine interaction system (Genie Casino Kiosk) demonstrate superiority of our approach over that of static, context-free fusion architecture.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Relevant features for video-based continuous sign language recognition 基于视频的连续手语识别的相关特征
Britta Bauer, Hermann Hienz
This paper describes the development of a video-based continuous sign language recognition system. The system is based on continuous density hidden Markov models (HMM) with one model for each sign. Feature vectors reflecting manual sign parameters serve as input for training and recognition. To reduce computational complexity during the recognition task beam search is employed. The system aims for an automatic signer-dependent recognition of sign language sentences, based on a lexicon of 97 signs of German sign language (GSL). A further colour video camera is used for image recording. Furthermore the influence of different features reflecting different manual sign parameters on the recognition results are examined. Results are given for varying sized vocabulary. The system achieves an accuracy of 91.7% based on a lexicon of 97 signs.
本文介绍了一种基于视频的连续手语识别系统的开发。该系统基于连续密度隐马尔可夫模型(HMM),每个符号一个模型。反映手势参数的特征向量作为训练和识别的输入。为了降低识别过程中的计算复杂度,采用了波束搜索。该系统的目标是基于德国手语(GSL)的97个符号的词典,自动识别依赖于手势的手语句子。另一个彩色摄像机用于图像记录。此外,还考察了反映不同手势参数的不同特征对识别结果的影响。对于不同大小的词汇表给出了结果。基于97个符号的词典,该系统达到了91.7%的准确率。
{"title":"Relevant features for video-based continuous sign language recognition","authors":"Britta Bauer, Hermann Hienz","doi":"10.1109/AFGR.2000.840672","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840672","url":null,"abstract":"This paper describes the development of a video-based continuous sign language recognition system. The system is based on continuous density hidden Markov models (HMM) with one model for each sign. Feature vectors reflecting manual sign parameters serve as input for training and recognition. To reduce computational complexity during the recognition task beam search is employed. The system aims for an automatic signer-dependent recognition of sign language sentences, based on a lexicon of 97 signs of German sign language (GSL). A further colour video camera is used for image recording. Furthermore the influence of different features reflecting different manual sign parameters on the recognition results are examined. Results are given for varying sized vocabulary. The system achieves an accuracy of 91.7% based on a lexicon of 97 signs.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132686351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 143
Face recognition by support vector machines 基于支持向量机的人脸识别
G. Guo, S. Li, K. Chan
Support vector machines (SVM) have been recently proposed as a new technique for pattern recognition. SVM with a binary tree recognition strategy are used to tackle the face recognition problem. We illustrate the potential of SVM on the Cambridge ORL face database, which consists of 400 images of 40 individuals, containing quite a high degree of variability in expression, pose, and facial details. We also present the recognition experiment on a larger face database of 1079 images of 137 individuals. We compare the SVM-based recognition with the standard eigenface approach using the nearest center classification (NCC) criterion.
支持向量机(SVM)是近年来提出的一种新的模式识别技术。采用二叉树识别策略的支持向量机来解决人脸识别问题。我们在剑桥ORL人脸数据库上展示了支持向量机的潜力,该数据库由40个人的400张图像组成,在表情、姿势和面部细节方面包含相当高的可变性。我们还在一个包含137个个体的1079张图像的更大的人脸数据库上进行了识别实验。我们将基于支持向量机的识别方法与使用最近中心分类(NCC)标准的标准特征脸方法进行比较。
{"title":"Face recognition by support vector machines","authors":"G. Guo, S. Li, K. Chan","doi":"10.1109/AFGR.2000.840634","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840634","url":null,"abstract":"Support vector machines (SVM) have been recently proposed as a new technique for pattern recognition. SVM with a binary tree recognition strategy are used to tackle the face recognition problem. We illustrate the potential of SVM on the Cambridge ORL face database, which consists of 400 images of 40 individuals, containing quite a high degree of variability in expression, pose, and facial details. We also present the recognition experiment on a larger face database of 1079 images of 137 individuals. We compare the SVM-based recognition with the standard eigenface approach using the nearest center classification (NCC) criterion.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131917429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 611
期刊
Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1