M. Tariquzzaman, Song-Min Gyu, Kim Jin Young, Na Seung You, M. A. Rashid
{"title":"Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion","authors":"M. Tariquzzaman, Song-Min Gyu, Kim Jin Young, Na Seung You, M. A. Rashid","doi":"10.1109/ICICIS.2011.58","DOIUrl":null,"url":null,"abstract":"In state-of-the-art ASR technology, audio and video (AV) information based speech recognition is one of key challenges to cope with noise problem. AV fusion is one of the robust approaches for ASR. The main issues of AV fusion is where and how to integrate the two modalities' information. To enhance the AV fusion performance the paper [1] has proposed the optimum reliability fusion (ORF) and applied the ORF to AV speaker identification. In this paper we adopt the ORF based fusion in AV based speech recognition and evaluate the performance improvement in that domain. The ORF's main idea is to introduce weighting factors in score-base reliability measure (SCRM) for solving the over- or under-estimation problem in SCRM calculation. Our AV speech recognition system is implemented for Korean digit recognition using SAMSUMG AV database. Experimental results show that ORF effectively reduce the relative error rate of 42.8% in comparison with the baseline system adopt the previous AV fusion scheme [2].","PeriodicalId":255291,"journal":{"name":"2011 International Conference on Internet Computing and Information Services","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Internet Computing and Information Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIS.2011.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In state-of-the-art ASR technology, audio and video (AV) information based speech recognition is one of key challenges to cope with noise problem. AV fusion is one of the robust approaches for ASR. The main issues of AV fusion is where and how to integrate the two modalities' information. To enhance the AV fusion performance the paper [1] has proposed the optimum reliability fusion (ORF) and applied the ORF to AV speaker identification. In this paper we adopt the ORF based fusion in AV based speech recognition and evaluate the performance improvement in that domain. The ORF's main idea is to introduce weighting factors in score-base reliability measure (SCRM) for solving the over- or under-estimation problem in SCRM calculation. Our AV speech recognition system is implemented for Korean digit recognition using SAMSUMG AV database. Experimental results show that ORF effectively reduce the relative error rate of 42.8% in comparison with the baseline system adopt the previous AV fusion scheme [2].