{"title":"Audio-visual speech recognition in a Portuguese language based application","authors":"V. Pera, F. Sa, P. Afonso, R. Ferreira","doi":"10.1109/ICIT.2003.1290738","DOIUrl":null,"url":null,"abstract":"We present in this article experimental results obtained with an automatic speech recogniser developed for a speaker dependent and continuous speech alphanumeric recognition application based on the European Portuguese language. An audio-visual speech recognition approach was followed to design and build this system. Besides the well known complementary between the acoustic and the visual information for speech recognition purposes, the visual features are obviously immune to any acoustic disturbance, thus making the system more robust in acoustically contaminated environments. The results presented clearly show that the inclusion of a video stream, using a multi-stream decoding formalism, decreases the word error rate in approximately 56%/sub rel/ over a wide range of acoustical signal-noise ratio.","PeriodicalId":193510,"journal":{"name":"IEEE International Conference on Industrial Technology, 2003","volume":"2004 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Industrial Technology, 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2003.1290738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
We present in this article experimental results obtained with an automatic speech recogniser developed for a speaker dependent and continuous speech alphanumeric recognition application based on the European Portuguese language. An audio-visual speech recognition approach was followed to design and build this system. Besides the well known complementary between the acoustic and the visual information for speech recognition purposes, the visual features are obviously immune to any acoustic disturbance, thus making the system more robust in acoustically contaminated environments. The results presented clearly show that the inclusion of a video stream, using a multi-stream decoding formalism, decreases the word error rate in approximately 56%/sub rel/ over a wide range of acoustical signal-noise ratio.