Hoa Le Viet, Toai Tran Hoang Cong, Tuan Trinh Nguyen Bao, Duy Tran Ngoc Bao, N. H. Tuong
{"title":"基于深度学习的越南语发音错误检测策略","authors":"Hoa Le Viet, Toai Tran Hoang Cong, Tuan Trinh Nguyen Bao, Duy Tran Ngoc Bao, N. H. Tuong","doi":"10.1109/ICSSE58758.2023.10227159","DOIUrl":null,"url":null,"abstract":"Despite the growing interest in learning Vietnamese, pronunciation remains a significant challenge for many language learners. This study explores the use of deep learning techniques to automatically detect incorrect pronunciation in Vietnamese. Our approach utilizes a multi-task setup that incorporates an Audio Encoder and a Phoneme Recognizer, enabling the model to learn the alignment between phonemes and acoustic features. This alignment information is then employed by the Incorrect Pronunciation Detector to identify words with incorrect pronunciation. Notably, we propose a novel strategy for generating pronunciation features, which involves “manually” grouping phonemes of the same word, thereby facilitating the model’s learning process. To evaluate the effectiveness of the proposed method, we build a small non-native (L2) Vietnamese speech dataset for training and testing. Compared to the baseline model, our final result improves the accuracy by 5.2% and $F_{1}$ score by 21.14%.","PeriodicalId":280745,"journal":{"name":"2023 International Conference on System Science and Engineering (ICSSE)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Learning-Based Strategy for Vietnamese Incorrect Pronunciation Detection\",\"authors\":\"Hoa Le Viet, Toai Tran Hoang Cong, Tuan Trinh Nguyen Bao, Duy Tran Ngoc Bao, N. H. Tuong\",\"doi\":\"10.1109/ICSSE58758.2023.10227159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the growing interest in learning Vietnamese, pronunciation remains a significant challenge for many language learners. This study explores the use of deep learning techniques to automatically detect incorrect pronunciation in Vietnamese. Our approach utilizes a multi-task setup that incorporates an Audio Encoder and a Phoneme Recognizer, enabling the model to learn the alignment between phonemes and acoustic features. This alignment information is then employed by the Incorrect Pronunciation Detector to identify words with incorrect pronunciation. Notably, we propose a novel strategy for generating pronunciation features, which involves “manually” grouping phonemes of the same word, thereby facilitating the model’s learning process. To evaluate the effectiveness of the proposed method, we build a small non-native (L2) Vietnamese speech dataset for training and testing. Compared to the baseline model, our final result improves the accuracy by 5.2% and $F_{1}$ score by 21.14%.\",\"PeriodicalId\":280745,\"journal\":{\"name\":\"2023 International Conference on System Science and Engineering (ICSSE)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on System Science and Engineering (ICSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSSE58758.2023.10227159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE58758.2023.10227159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Deep Learning-Based Strategy for Vietnamese Incorrect Pronunciation Detection
Despite the growing interest in learning Vietnamese, pronunciation remains a significant challenge for many language learners. This study explores the use of deep learning techniques to automatically detect incorrect pronunciation in Vietnamese. Our approach utilizes a multi-task setup that incorporates an Audio Encoder and a Phoneme Recognizer, enabling the model to learn the alignment between phonemes and acoustic features. This alignment information is then employed by the Incorrect Pronunciation Detector to identify words with incorrect pronunciation. Notably, we propose a novel strategy for generating pronunciation features, which involves “manually” grouping phonemes of the same word, thereby facilitating the model’s learning process. To evaluate the effectiveness of the proposed method, we build a small non-native (L2) Vietnamese speech dataset for training and testing. Compared to the baseline model, our final result improves the accuracy by 5.2% and $F_{1}$ score by 21.14%.