{"title":"基于扩展曲率Gabor滤波器的说话人依赖视觉语音识别","authors":"Jeongwoo Ju, Heechul Jung, Junmo Kim","doi":"10.1109/ICCE.2013.6486907","DOIUrl":null,"url":null,"abstract":"Performance of a speech recognition system often degrades severely under low SNR environment. To overcome this difficulty, the visual signal is also considered as an additional aid these days. In this paper, we address speaker dependent visual speech recognition problem using Extended Curvature Gabor (ECG) wavelet. First, lip image sequences are filtered using the ECG, because the variation of the filter response well represents the lip movement. Next, the distance between the output and training data is calculated using the Multi Dimensional Dynamic Time Warping (MDDTW) with new cost matrix. Finally, the lip sequences are classified into the corresponding utterance. In this process, the parameters of ECG must be selected appropriately, where we compare a simple greedy selection method and selection scheme based on AdaBoost.","PeriodicalId":6432,"journal":{"name":"2013 IEEE International Conference on Consumer Electronics (ICCE)","volume":"138 1","pages":"314-315"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speaker dependent visual speech recognition using Extended Curvature Gabor filters\",\"authors\":\"Jeongwoo Ju, Heechul Jung, Junmo Kim\",\"doi\":\"10.1109/ICCE.2013.6486907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance of a speech recognition system often degrades severely under low SNR environment. To overcome this difficulty, the visual signal is also considered as an additional aid these days. In this paper, we address speaker dependent visual speech recognition problem using Extended Curvature Gabor (ECG) wavelet. First, lip image sequences are filtered using the ECG, because the variation of the filter response well represents the lip movement. Next, the distance between the output and training data is calculated using the Multi Dimensional Dynamic Time Warping (MDDTW) with new cost matrix. Finally, the lip sequences are classified into the corresponding utterance. In this process, the parameters of ECG must be selected appropriately, where we compare a simple greedy selection method and selection scheme based on AdaBoost.\",\"PeriodicalId\":6432,\"journal\":{\"name\":\"2013 IEEE International Conference on Consumer Electronics (ICCE)\",\"volume\":\"138 1\",\"pages\":\"314-315\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Consumer Electronics (ICCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE.2013.6486907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE.2013.6486907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speaker dependent visual speech recognition using Extended Curvature Gabor filters
Performance of a speech recognition system often degrades severely under low SNR environment. To overcome this difficulty, the visual signal is also considered as an additional aid these days. In this paper, we address speaker dependent visual speech recognition problem using Extended Curvature Gabor (ECG) wavelet. First, lip image sequences are filtered using the ECG, because the variation of the filter response well represents the lip movement. Next, the distance between the output and training data is calculated using the Multi Dimensional Dynamic Time Warping (MDDTW) with new cost matrix. Finally, the lip sequences are classified into the corresponding utterance. In this process, the parameters of ECG must be selected appropriately, where we compare a simple greedy selection method and selection scheme based on AdaBoost.