基于扩展曲率Gabor滤波器的说话人依赖视觉语音识别

2013 IEEE International Conference on Consumer Electronics (ICCE) Pub Date : 2013-03-28 DOI:10.1109/ICCE.2013.6486907

Jeongwoo Ju, Heechul Jung, Junmo Kim

{"title":"基于扩展曲率Gabor滤波器的说话人依赖视觉语音识别","authors":"Jeongwoo Ju, Heechul Jung, Junmo Kim","doi":"10.1109/ICCE.2013.6486907","DOIUrl":null,"url":null,"abstract":"Performance of a speech recognition system often degrades severely under low SNR environment. To overcome this difficulty, the visual signal is also considered as an additional aid these days. In this paper, we address speaker dependent visual speech recognition problem using Extended Curvature Gabor (ECG) wavelet. First, lip image sequences are filtered using the ECG, because the variation of the filter response well represents the lip movement. Next, the distance between the output and training data is calculated using the Multi Dimensional Dynamic Time Warping (MDDTW) with new cost matrix. Finally, the lip sequences are classified into the corresponding utterance. In this process, the parameters of ECG must be selected appropriately, where we compare a simple greedy selection method and selection scheme based on AdaBoost.","PeriodicalId":6432,"journal":{"name":"2013 IEEE International Conference on Consumer Electronics (ICCE)","volume":"138 1","pages":"314-315"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speaker dependent visual speech recognition using Extended Curvature Gabor filters\",\"authors\":\"Jeongwoo Ju, Heechul Jung, Junmo Kim\",\"doi\":\"10.1109/ICCE.2013.6486907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance of a speech recognition system often degrades severely under low SNR environment. To overcome this difficulty, the visual signal is also considered as an additional aid these days. In this paper, we address speaker dependent visual speech recognition problem using Extended Curvature Gabor (ECG) wavelet. First, lip image sequences are filtered using the ECG, because the variation of the filter response well represents the lip movement. Next, the distance between the output and training data is calculated using the Multi Dimensional Dynamic Time Warping (MDDTW) with new cost matrix. Finally, the lip sequences are classified into the corresponding utterance. In this process, the parameters of ECG must be selected appropriately, where we compare a simple greedy selection method and selection scheme based on AdaBoost.\",\"PeriodicalId\":6432,\"journal\":{\"name\":\"2013 IEEE International Conference on Consumer Electronics (ICCE)\",\"volume\":\"138 1\",\"pages\":\"314-315\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Consumer Electronics (ICCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE.2013.6486907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE.2013.6486907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在低信噪比环境下，语音识别系统的性能往往会严重下降。为了克服这一困难，视觉信号最近也被认为是一种额外的辅助手段。本文利用扩展曲率Gabor (Extended Curvature Gabor, ECG)小波来解决基于说话人的视觉语音识别问题。首先，利用ECG对唇形图像序列进行滤波，因为滤波器响应的变化能很好地反映唇形的运动。其次，利用新的代价矩阵，利用多维动态时间翘曲(MDDTW)计算输出数据与训练数据之间的距离。最后，将唇序列分类为相应的话语。在此过程中，必须对心电参数进行适当的选择，我们比较了一种简单的贪心选择方法和基于AdaBoost的选择方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Speaker dependent visual speech recognition using Extended Curvature Gabor filters

Performance of a speech recognition system often degrades severely under low SNR environment. To overcome this difficulty, the visual signal is also considered as an additional aid these days. In this paper, we address speaker dependent visual speech recognition problem using Extended Curvature Gabor (ECG) wavelet. First, lip image sequences are filtered using the ECG, because the variation of the filter response well represents the lip movement. Next, the distance between the output and training data is calculated using the Multi Dimensional Dynamic Time Warping (MDDTW) with new cost matrix. Finally, the lip sequences are classified into the corresponding utterance. In this process, the parameters of ECG must be selected appropriately, where we compare a simple greedy selection method and selection scheme based on AdaBoost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Conference on Consumer Electronics (ICCE)

自引率

0.00%

发文量