Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Z. Fayed
{"title":"使用端到端深度学习的阿拉伯语语音识别","authors":"Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Z. Fayed","doi":"10.1049/SIL2.12057","DOIUrl":null,"url":null,"abstract":"Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state ‐ of ‐ the ‐ art end ‐ to ‐ end deep learning approaches is inves-tigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel ‐ Frequency Cepstral Coefficients and the log Mel ‐ Scale Filter Bank energies as acoustic features. To the best of our knowledge, end ‐ to ‐ end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC ‐ based ASR, CNN ‐ LSTM, and an attention ‐ based end ‐ to ‐ end approach for improving diacritisedArabic ASR. In addition, a word ‐ based language model is employed to achieve better results. The end ‐ to ‐ end approaches applied in this work are based on state ‐ of ‐ the ‐ art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN ‐ LSTM","PeriodicalId":272888,"journal":{"name":"IET Signal Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Arabic speech recognition using end-to-end deep learning\",\"authors\":\"Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Z. Fayed\",\"doi\":\"10.1049/SIL2.12057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state ‐ of ‐ the ‐ art end ‐ to ‐ end deep learning approaches is inves-tigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel ‐ Frequency Cepstral Coefficients and the log Mel ‐ Scale Filter Bank energies as acoustic features. To the best of our knowledge, end ‐ to ‐ end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC ‐ based ASR, CNN ‐ LSTM, and an attention ‐ based end ‐ to ‐ end approach for improving diacritisedArabic ASR. In addition, a word ‐ based language model is employed to achieve better results. The end ‐ to ‐ end approaches applied in this work are based on state ‐ of ‐ the ‐ art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN ‐ LSTM\",\"PeriodicalId\":272888,\"journal\":{\"name\":\"IET Signal Process.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Signal Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/SIL2.12057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Signal Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/SIL2.12057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Arabic speech recognition using end-to-end deep learning
Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state ‐ of ‐ the ‐ art end ‐ to ‐ end deep learning approaches is inves-tigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel ‐ Frequency Cepstral Coefficients and the log Mel ‐ Scale Filter Bank energies as acoustic features. To the best of our knowledge, end ‐ to ‐ end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC ‐ based ASR, CNN ‐ LSTM, and an attention ‐ based end ‐ to ‐ end approach for improving diacritisedArabic ASR. In addition, a word ‐ based language model is employed to achieve better results. The end ‐ to ‐ end approaches applied in this work are based on state ‐ of ‐ the ‐ art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN ‐ LSTM