{"title":"基于CNN-BLSTM组合的阿拉伯语语音识别","authors":"Rafik Amari, Abdelkarim Mars, M. Zrigui","doi":"10.1109/SETIT54465.2022.9875681","DOIUrl":null,"url":null,"abstract":"Despite advances in speech recognition technology, Arabic speech recognition remains largely unsolved due to its many difficulties and challenges. The performance of the best existing recognizers is much lower than those developed in English. Deep Neural Networks (DNNs) have shown excellent performance in acoustic modeling for speech recognition. In this work, a new discontinuous Arabic speech recognition model is proposed. It associates a deep convolutional neural network (CNN) architecture with a long-term bi-directional memory (BLSTM). The optimal network structure and training strategy for the model are examined. The Arabic Speech Corpus of Isolated Words (ASDS) and the Spoken Arabic Digits (SAD) database were used for all experiments. The results demonstrate the strength and benefits of the CNN-BLSTM method, which provides the best detection accuracy.","PeriodicalId":126155,"journal":{"name":"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Arabic speech recognition based on a CNN-BLSTM combination\",\"authors\":\"Rafik Amari, Abdelkarim Mars, M. Zrigui\",\"doi\":\"10.1109/SETIT54465.2022.9875681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite advances in speech recognition technology, Arabic speech recognition remains largely unsolved due to its many difficulties and challenges. The performance of the best existing recognizers is much lower than those developed in English. Deep Neural Networks (DNNs) have shown excellent performance in acoustic modeling for speech recognition. In this work, a new discontinuous Arabic speech recognition model is proposed. It associates a deep convolutional neural network (CNN) architecture with a long-term bi-directional memory (BLSTM). The optimal network structure and training strategy for the model are examined. The Arabic Speech Corpus of Isolated Words (ASDS) and the Spoken Arabic Digits (SAD) database were used for all experiments. The results demonstrate the strength and benefits of the CNN-BLSTM method, which provides the best detection accuracy.\",\"PeriodicalId\":126155,\"journal\":{\"name\":\"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SETIT54465.2022.9875681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SETIT54465.2022.9875681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Arabic speech recognition based on a CNN-BLSTM combination
Despite advances in speech recognition technology, Arabic speech recognition remains largely unsolved due to its many difficulties and challenges. The performance of the best existing recognizers is much lower than those developed in English. Deep Neural Networks (DNNs) have shown excellent performance in acoustic modeling for speech recognition. In this work, a new discontinuous Arabic speech recognition model is proposed. It associates a deep convolutional neural network (CNN) architecture with a long-term bi-directional memory (BLSTM). The optimal network structure and training strategy for the model are examined. The Arabic Speech Corpus of Isolated Words (ASDS) and the Spoken Arabic Digits (SAD) database were used for all experiments. The results demonstrate the strength and benefits of the CNN-BLSTM method, which provides the best detection accuracy.