{"title":"基于wa2vec2.0模型的语音-文本系统中基于摩擦音素的印地语单词识别","authors":"S. Gupta, S. V, S. Koolagudi","doi":"10.1109/GlobConPT57482.2022.9938222","DOIUrl":null,"url":null,"abstract":"In this work, we have discussed issues with Microsoft's state-of-the-art Speech-to-Text (STT) system. Two key issues have been identified: recognition of Hindi words starting with the fricative phoneme (/ha/) and recognition power of the system with background noise. The solution for correctly identifying the unrecognized Hindi fricative phoneme is by training the Wav2Vec2.0 model on the OpenSLR Hindi dataset. The evaluation of the proposed model is given by the performance metric Char-acter Error Rate (CER). To test the performance of the proposed model, 20 fricative words in both clean and noisy conditions are fed to the trained model. The second issue of handling noisy speech samples is resolved using an amplitude-based automatic noise detection method. The results achieved from the proposed model are observed to be better than the state-of-the-art STT model when trained with and without the language model in terms of CER in clean conditions.","PeriodicalId":431406,"journal":{"name":"2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recognition of Fricative Phoneme based Hindi Words in Speech-to-Text System using Wav2Vec2.0 Model\",\"authors\":\"S. Gupta, S. V, S. Koolagudi\",\"doi\":\"10.1109/GlobConPT57482.2022.9938222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we have discussed issues with Microsoft's state-of-the-art Speech-to-Text (STT) system. Two key issues have been identified: recognition of Hindi words starting with the fricative phoneme (/ha/) and recognition power of the system with background noise. The solution for correctly identifying the unrecognized Hindi fricative phoneme is by training the Wav2Vec2.0 model on the OpenSLR Hindi dataset. The evaluation of the proposed model is given by the performance metric Char-acter Error Rate (CER). To test the performance of the proposed model, 20 fricative words in both clean and noisy conditions are fed to the trained model. The second issue of handling noisy speech samples is resolved using an amplitude-based automatic noise detection method. The results achieved from the proposed model are observed to be better than the state-of-the-art STT model when trained with and without the language model in terms of CER in clean conditions.\",\"PeriodicalId\":431406,\"journal\":{\"name\":\"2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT)\",\"volume\":\"2010 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobConPT57482.2022.9938222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobConPT57482.2022.9938222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recognition of Fricative Phoneme based Hindi Words in Speech-to-Text System using Wav2Vec2.0 Model
In this work, we have discussed issues with Microsoft's state-of-the-art Speech-to-Text (STT) system. Two key issues have been identified: recognition of Hindi words starting with the fricative phoneme (/ha/) and recognition power of the system with background noise. The solution for correctly identifying the unrecognized Hindi fricative phoneme is by training the Wav2Vec2.0 model on the OpenSLR Hindi dataset. The evaluation of the proposed model is given by the performance metric Char-acter Error Rate (CER). To test the performance of the proposed model, 20 fricative words in both clean and noisy conditions are fed to the trained model. The second issue of handling noisy speech samples is resolved using an amplitude-based automatic noise detection method. The results achieved from the proposed model are observed to be better than the state-of-the-art STT model when trained with and without the language model in terms of CER in clean conditions.