基于多任务学习的端到端维吾尔语语音识别

2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE) Pub Date : 2022-12-16 DOI:10.1109/ICARCE55724.2022.10046434

Chuang Liu, Qiong Li, Zhiwei You

{"title":"基于多任务学习的端到端维吾尔语语音识别","authors":"Chuang Liu, Qiong Li, Zhiwei You","doi":"10.1109/ICARCE55724.2022.10046434","DOIUrl":null,"url":null,"abstract":"With the emergence of deep learning, speech recognition research for languages with a wide speaker base, such as English and Chinese, has become reasonably mature. However, Uyghur speech recognition research has only developed slowly, because the modeling is subpar since Uighur is a low-resource language. To solve the aforementioned problem, we propose an end-to-end Uyghur speech recognition model based on multi-task learning with Branchformer. Branchformer can capture both global and local contexts, allowing it to learn richer features from low volumes of data to improve model performance in resource-constrained situations. The multi-task learning model can fully utilize the data through multiple related tasks, enhancing the model’s generalizability under low resources. In this paper, the multi-task learning model is trained and decoded using modeling units including phoneme, sub-word, and word. The performance of the model is then evaluated using various datasets. The results show that the proposed model outperforms the mainstream models in terms of recognition, in the self-built database and open-source corpus Thuyg-20, respectively, the word accuracy of the proposed model reaches 94.48% and 91.16%, which is a significant improvement compared with each benchmark model.","PeriodicalId":416305,"journal":{"name":"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An End-to-End Uyghur Speech Recognition Based on Multi-task Learning\",\"authors\":\"Chuang Liu, Qiong Li, Zhiwei You\",\"doi\":\"10.1109/ICARCE55724.2022.10046434\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the emergence of deep learning, speech recognition research for languages with a wide speaker base, such as English and Chinese, has become reasonably mature. However, Uyghur speech recognition research has only developed slowly, because the modeling is subpar since Uighur is a low-resource language. To solve the aforementioned problem, we propose an end-to-end Uyghur speech recognition model based on multi-task learning with Branchformer. Branchformer can capture both global and local contexts, allowing it to learn richer features from low volumes of data to improve model performance in resource-constrained situations. The multi-task learning model can fully utilize the data through multiple related tasks, enhancing the model’s generalizability under low resources. In this paper, the multi-task learning model is trained and decoded using modeling units including phoneme, sub-word, and word. The performance of the model is then evaluated using various datasets. The results show that the proposed model outperforms the mainstream models in terms of recognition, in the self-built database and open-source corpus Thuyg-20, respectively, the word accuracy of the proposed model reaches 94.48% and 91.16%, which is a significant improvement compared with each benchmark model.\",\"PeriodicalId\":416305,\"journal\":{\"name\":\"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCE55724.2022.10046434\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCE55724.2022.10046434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

随着深度学习的出现，针对英语和汉语等使用人群广泛的语言的语音识别研究已经相当成熟。然而，由于维吾尔语是一种低资源语言，其建模水平不高，因此维吾尔语语音识别研究进展缓慢。为了解决上述问题，我们提出了一种基于Branchformer多任务学习的端到端维吾尔语语音识别模型。Branchformer可以捕获全局和局部上下文，允许它从少量数据中学习更丰富的特征，以提高资源受限情况下的模型性能。多任务学习模型可以通过多个相关任务充分利用数据，增强了模型在低资源条件下的泛化能力。本文采用音素、子词、词等建模单元对多任务学习模型进行训练和解码。然后使用各种数据集评估模型的性能。结果表明，所提模型在识别方面优于主流模型，在自建数据库和开源语料库Thuyg-20中，所提模型的词正确率分别达到94.48%和91.16%，与各基准模型相比均有显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An End-to-End Uyghur Speech Recognition Based on Multi-task Learning

With the emergence of deep learning, speech recognition research for languages with a wide speaker base, such as English and Chinese, has become reasonably mature. However, Uyghur speech recognition research has only developed slowly, because the modeling is subpar since Uighur is a low-resource language. To solve the aforementioned problem, we propose an end-to-end Uyghur speech recognition model based on multi-task learning with Branchformer. Branchformer can capture both global and local contexts, allowing it to learn richer features from low volumes of data to improve model performance in resource-constrained situations. The multi-task learning model can fully utilize the data through multiple related tasks, enhancing the model’s generalizability under low resources. In this paper, the multi-task learning model is trained and decoded using modeling units including phoneme, sub-word, and word. The performance of the model is then evaluated using various datasets. The results show that the proposed model outperforms the mainstream models in terms of recognition, in the self-built database and open-source corpus Thuyg-20, respectively, the word accuracy of the proposed model reaches 94.48% and 91.16%, which is a significant improvement compared with each benchmark model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)

自引率

0.00%

发文量

期刊最新文献

Design and Implementation of MobileRobot Navigation System Based on ROS Platform Cooperative Pursuit in a Non-closed Bounded Domain 3D Reconstruction of Astronomical Site Selection Based on Multi-Source Remote Sensing Design and Implementation of Manipulator Based on Arduino Dynamic Reversible Data Hiding for Edge Contrast Enhancement of Medical Image