{"title":"Recognition of Fricative Phoneme based Hindi Words in Speech-to-Text System using Wav2Vec2.0 Model","authors":"S. Gupta, S. V, S. Koolagudi","doi":"10.1109/GlobConPT57482.2022.9938222","DOIUrl":null,"url":null,"abstract":"In this work, we have discussed issues with Microsoft's state-of-the-art Speech-to-Text (STT) system. Two key issues have been identified: recognition of Hindi words starting with the fricative phoneme (/ha/) and recognition power of the system with background noise. The solution for correctly identifying the unrecognized Hindi fricative phoneme is by training the Wav2Vec2.0 model on the OpenSLR Hindi dataset. The evaluation of the proposed model is given by the performance metric Char-acter Error Rate (CER). To test the performance of the proposed model, 20 fricative words in both clean and noisy conditions are fed to the trained model. The second issue of handling noisy speech samples is resolved using an amplitude-based automatic noise detection method. The results achieved from the proposed model are observed to be better than the state-of-the-art STT model when trained with and without the language model in terms of CER in clean conditions.","PeriodicalId":431406,"journal":{"name":"2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobConPT57482.2022.9938222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this work, we have discussed issues with Microsoft's state-of-the-art Speech-to-Text (STT) system. Two key issues have been identified: recognition of Hindi words starting with the fricative phoneme (/ha/) and recognition power of the system with background noise. The solution for correctly identifying the unrecognized Hindi fricative phoneme is by training the Wav2Vec2.0 model on the OpenSLR Hindi dataset. The evaluation of the proposed model is given by the performance metric Char-acter Error Rate (CER). To test the performance of the proposed model, 20 fricative words in both clean and noisy conditions are fed to the trained model. The second issue of handling noisy speech samples is resolved using an amplitude-based automatic noise detection method. The results achieved from the proposed model are observed to be better than the state-of-the-art STT model when trained with and without the language model in terms of CER in clean conditions.