T. Kumar, Adithya Jayan, Shreenidhi Bhat, M. Anvith, A. V. Narasimhadhan
{"title":"Monophone and Triphone Acoustic Phonetic Model for Kannada Speech Recognition System","authors":"T. Kumar, Adithya Jayan, Shreenidhi Bhat, M. Anvith, A. V. Narasimhadhan","doi":"10.1109/wispnet54241.2022.9767115","DOIUrl":null,"url":null,"abstract":"The automatic Speech Recognition system (ASR) is the most widely used application in the speech domain. ASR systems generate text data from spoken utterances without manual intervention. In this work, we build an ASR system for the Kannada language. For building the proposed system, we extract Mel Frequency Cepstral Coefficients (MFCC) features from the audio data, and the Kannada language model is developed using corresponding labels. The dictionary generation and phonetic labelings are automated. Recognition performance is compared for both monophonic and triphone models. The word error rate of 15.73 % and the sentence error rate of 55.5 % are achieved for the triphone model. Comparatively, the triphone model gives a better performance than the monophonic model.","PeriodicalId":432794,"journal":{"name":"2022 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","volume":"4 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/wispnet54241.2022.9767115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The automatic Speech Recognition system (ASR) is the most widely used application in the speech domain. ASR systems generate text data from spoken utterances without manual intervention. In this work, we build an ASR system for the Kannada language. For building the proposed system, we extract Mel Frequency Cepstral Coefficients (MFCC) features from the audio data, and the Kannada language model is developed using corresponding labels. The dictionary generation and phonetic labelings are automated. Recognition performance is compared for both monophonic and triphone models. The word error rate of 15.73 % and the sentence error rate of 55.5 % are achieved for the triphone model. Comparatively, the triphone model gives a better performance than the monophonic model.