Mohamed O. M. Khelifa, M. Belkasmi, A. Yousfi, Y. Elhadj
{"title":"一个精确的基于hsmm的阿拉伯语音素识别系统","authors":"Mohamed O. M. Khelifa, M. Belkasmi, A. Yousfi, Y. Elhadj","doi":"10.1109/ICACI.2017.7974511","DOIUrl":null,"url":null,"abstract":"The majority of successful automatic speech recognition (ASR) systems utilize a probabilistic modeling of the speech signal via hidden Markov models (HMMs). In a standard HMM model, state duration probabilities decrease exponentially with time, which fails to satisfactorily describe the temporal structure of speech. Incorporating explicit state durational probability distribution functions (pdf) into the HMM is a famous solution to overcome this feebleness. This way is well-known as a hidden semi-Markov model (HSMM). Previous papers have confirmed that using HSMM models instead of the standard HMMs have enhanced the recognition accuracy in many targeted languages. This paper addresses an important stage of our on-going work which aims to construct an accurate Arabic recognizer for teaching and learning purposes. It presents an implementation of an HSMM model whose principal goal is improving the classical HMM's durational behavior. In this implementation, the Gaussian distribution is used for modeling state durations. Experiments have been carried out on a particular Arabic speech corpus collected from recitations of the Holy Quran. Results show an increase in recognition accuracy by around 1% We confirmed via these results that such a system outperforms the baseline HTK when the Gaussian distribution is integrated into the HTK's recognizer back-end.","PeriodicalId":260701,"journal":{"name":"2017 Ninth International Conference on Advanced Computational Intelligence (ICACI)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An accurate HSMM-based system for Arabic phonemes recognition\",\"authors\":\"Mohamed O. M. Khelifa, M. Belkasmi, A. Yousfi, Y. Elhadj\",\"doi\":\"10.1109/ICACI.2017.7974511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The majority of successful automatic speech recognition (ASR) systems utilize a probabilistic modeling of the speech signal via hidden Markov models (HMMs). In a standard HMM model, state duration probabilities decrease exponentially with time, which fails to satisfactorily describe the temporal structure of speech. Incorporating explicit state durational probability distribution functions (pdf) into the HMM is a famous solution to overcome this feebleness. This way is well-known as a hidden semi-Markov model (HSMM). Previous papers have confirmed that using HSMM models instead of the standard HMMs have enhanced the recognition accuracy in many targeted languages. This paper addresses an important stage of our on-going work which aims to construct an accurate Arabic recognizer for teaching and learning purposes. It presents an implementation of an HSMM model whose principal goal is improving the classical HMM's durational behavior. In this implementation, the Gaussian distribution is used for modeling state durations. Experiments have been carried out on a particular Arabic speech corpus collected from recitations of the Holy Quran. Results show an increase in recognition accuracy by around 1% We confirmed via these results that such a system outperforms the baseline HTK when the Gaussian distribution is integrated into the HTK's recognizer back-end.\",\"PeriodicalId\":260701,\"journal\":{\"name\":\"2017 Ninth International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Ninth International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI.2017.7974511\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Ninth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2017.7974511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An accurate HSMM-based system for Arabic phonemes recognition
The majority of successful automatic speech recognition (ASR) systems utilize a probabilistic modeling of the speech signal via hidden Markov models (HMMs). In a standard HMM model, state duration probabilities decrease exponentially with time, which fails to satisfactorily describe the temporal structure of speech. Incorporating explicit state durational probability distribution functions (pdf) into the HMM is a famous solution to overcome this feebleness. This way is well-known as a hidden semi-Markov model (HSMM). Previous papers have confirmed that using HSMM models instead of the standard HMMs have enhanced the recognition accuracy in many targeted languages. This paper addresses an important stage of our on-going work which aims to construct an accurate Arabic recognizer for teaching and learning purposes. It presents an implementation of an HSMM model whose principal goal is improving the classical HMM's durational behavior. In this implementation, the Gaussian distribution is used for modeling state durations. Experiments have been carried out on a particular Arabic speech corpus collected from recitations of the Holy Quran. Results show an increase in recognition accuracy by around 1% We confirmed via these results that such a system outperforms the baseline HTK when the Gaussian distribution is integrated into the HTK's recognizer back-end.