Maryam Asadolahzade Kermanshahi, A. Akbari, B. Nasersharif
{"title":"端到端ASR迁移学习处理波斯语低资源问题","authors":"Maryam Asadolahzade Kermanshahi, A. Akbari, B. Nasersharif","doi":"10.1109/CSICC52343.2021.9420540","DOIUrl":null,"url":null,"abstract":"End-to-end models are state of the art for Automatic Speech Recognition (ASR) systems. Despite all their advantages, they suffer a significant problem: huge amounts of training data are required to achieve excellent performance. This problem is a serious challenge for low-resource languages such as Persian. Therefore, we need some methods and techniques to overcome this issue. One simple, yet effective method towards addressing this issue is transfer learning. We aim to explore the effect of transfer learning on a speech recognition system for the Persian language. To this end, we first train the network on 960 hours of English LibriSpeech corpus. Then, we transfer the trained network and fine-tune it on only about 3.5 hours of training data from the Persian FarsDat corpus. Transfer learning exhibits better performance while needing shorter training time than the model trained from scratch. Experimental results on FarsDat corpus indicate that transfer learning with a few hours of Persian training data can achieve 31.48% relative Phoneme Error Rate (PER) reduction compared to the model trained from scratch.","PeriodicalId":374593,"journal":{"name":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Transfer Learning for End-to-End ASR to Deal with Low-Resource Problem in Persian Language\",\"authors\":\"Maryam Asadolahzade Kermanshahi, A. Akbari, B. Nasersharif\",\"doi\":\"10.1109/CSICC52343.2021.9420540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"End-to-end models are state of the art for Automatic Speech Recognition (ASR) systems. Despite all their advantages, they suffer a significant problem: huge amounts of training data are required to achieve excellent performance. This problem is a serious challenge for low-resource languages such as Persian. Therefore, we need some methods and techniques to overcome this issue. One simple, yet effective method towards addressing this issue is transfer learning. We aim to explore the effect of transfer learning on a speech recognition system for the Persian language. To this end, we first train the network on 960 hours of English LibriSpeech corpus. Then, we transfer the trained network and fine-tune it on only about 3.5 hours of training data from the Persian FarsDat corpus. Transfer learning exhibits better performance while needing shorter training time than the model trained from scratch. Experimental results on FarsDat corpus indicate that transfer learning with a few hours of Persian training data can achieve 31.48% relative Phoneme Error Rate (PER) reduction compared to the model trained from scratch.\",\"PeriodicalId\":374593,\"journal\":{\"name\":\"2021 26th International Computer Conference, Computer Society of Iran (CSICC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 26th International Computer Conference, Computer Society of Iran (CSICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSICC52343.2021.9420540\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSICC52343.2021.9420540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transfer Learning for End-to-End ASR to Deal with Low-Resource Problem in Persian Language
End-to-end models are state of the art for Automatic Speech Recognition (ASR) systems. Despite all their advantages, they suffer a significant problem: huge amounts of training data are required to achieve excellent performance. This problem is a serious challenge for low-resource languages such as Persian. Therefore, we need some methods and techniques to overcome this issue. One simple, yet effective method towards addressing this issue is transfer learning. We aim to explore the effect of transfer learning on a speech recognition system for the Persian language. To this end, we first train the network on 960 hours of English LibriSpeech corpus. Then, we transfer the trained network and fine-tune it on only about 3.5 hours of training data from the Persian FarsDat corpus. Transfer learning exhibits better performance while needing shorter training time than the model trained from scratch. Experimental results on FarsDat corpus indicate that transfer learning with a few hours of Persian training data can achieve 31.48% relative Phoneme Error Rate (PER) reduction compared to the model trained from scratch.