{"title":"基于端到端深度学习模型的印尼语自动语音识别","authors":"Anis Sirwan, Kurniawan Adhie Thama, S. Suyanto","doi":"10.1109/CyberneticsCom55287.2022.9865253","DOIUrl":null,"url":null,"abstract":"The Indonesian language is different from English in phonetics. It is challenging to develop AI technology, machine learning, and deep learning with various algorithms to select the appropriate methods and algorithms for Indonesian speech recognition needs. Much research on speech recognition has been performed for high-resource languages, such as English. Unfortunately, those models cannot be directly used for the Indonesian language. To create an excellent speech recognition model, we need a high-quality and quantity dataset of the Indonesian language. But, such a dataset is not available at the moment. Hence, in this research, we start collecting such a dataset. Next, the developed dataset is used to train an end-to-end deep learning-based speech recognition model. The evaluation shows that the developed model achieves a word error rate of 14.172%, better than two previous models: Mozilla DeepSpeech (23.10%) and Kaituoxu Speech-Transformer (22.00%).","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Indonesian Automatic Speech Recognition Based on End-to-end Deep Learning Model\",\"authors\":\"Anis Sirwan, Kurniawan Adhie Thama, S. Suyanto\",\"doi\":\"10.1109/CyberneticsCom55287.2022.9865253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Indonesian language is different from English in phonetics. It is challenging to develop AI technology, machine learning, and deep learning with various algorithms to select the appropriate methods and algorithms for Indonesian speech recognition needs. Much research on speech recognition has been performed for high-resource languages, such as English. Unfortunately, those models cannot be directly used for the Indonesian language. To create an excellent speech recognition model, we need a high-quality and quantity dataset of the Indonesian language. But, such a dataset is not available at the moment. Hence, in this research, we start collecting such a dataset. Next, the developed dataset is used to train an end-to-end deep learning-based speech recognition model. The evaluation shows that the developed model achieves a word error rate of 14.172%, better than two previous models: Mozilla DeepSpeech (23.10%) and Kaituoxu Speech-Transformer (22.00%).\",\"PeriodicalId\":178279,\"journal\":{\"name\":\"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberneticsCom55287.2022.9865253\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Indonesian Automatic Speech Recognition Based on End-to-end Deep Learning Model
The Indonesian language is different from English in phonetics. It is challenging to develop AI technology, machine learning, and deep learning with various algorithms to select the appropriate methods and algorithms for Indonesian speech recognition needs. Much research on speech recognition has been performed for high-resource languages, such as English. Unfortunately, those models cannot be directly used for the Indonesian language. To create an excellent speech recognition model, we need a high-quality and quantity dataset of the Indonesian language. But, such a dataset is not available at the moment. Hence, in this research, we start collecting such a dataset. Next, the developed dataset is used to train an end-to-end deep learning-based speech recognition model. The evaluation shows that the developed model achieves a word error rate of 14.172%, better than two previous models: Mozilla DeepSpeech (23.10%) and Kaituoxu Speech-Transformer (22.00%).