{"title":"Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition","authors":"Yigang Liu, Yue Zhao, Xiaona Xu, Liang Xu, Xubei Zhang, Qiang Ji","doi":"10.1186/s13636-024-00361-7","DOIUrl":null,"url":null,"abstract":"The disparities in phonetics and corpuses across the three major dialects of Tibetan exacerbate the difficulty of a single task model for one dialect to accommodate other different dialects. To address this issue, this paper proposes task-diverse meta-learning. Our model can acquire more comprehensive and robust features, facilitating its adaptation to the variations among different dialects. This study uses Tibetan dialect ID recognition and Tibetan speaker recognition as the source tasks for meta-learning, which aims to augment the ability of the model to discriminate variations and differences among different dialects. Consequently, the model’s performance in Tibetan multi-dialect speech recognition tasks is enhanced. The experimental results show that task-diverse meta-learning leads to improved performance in Tibetan multi-dialect speech recognition. This demonstrates the effectiveness and applicability of task-diverse meta-learning, thereby contributing to the advancement of speech recognition techniques in multi-dialect environments.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"97 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Audio Speech and Music Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13636-024-00361-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The disparities in phonetics and corpuses across the three major dialects of Tibetan exacerbate the difficulty of a single task model for one dialect to accommodate other different dialects. To address this issue, this paper proposes task-diverse meta-learning. Our model can acquire more comprehensive and robust features, facilitating its adaptation to the variations among different dialects. This study uses Tibetan dialect ID recognition and Tibetan speaker recognition as the source tasks for meta-learning, which aims to augment the ability of the model to discriminate variations and differences among different dialects. Consequently, the model’s performance in Tibetan multi-dialect speech recognition tasks is enhanced. The experimental results show that task-diverse meta-learning leads to improved performance in Tibetan multi-dialect speech recognition. This demonstrates the effectiveness and applicability of task-diverse meta-learning, thereby contributing to the advancement of speech recognition techniques in multi-dialect environments.
藏语三大方言在语音学和语料方面的差异加剧了一种方言的单一任务模型难以适应其他不同方言的问题。为解决这一问题,本文提出了任务多样化元学习(task-diverse meta-learning)。我们的模型可以获得更全面、更稳健的特征,便于适应不同方言之间的差异。本研究将藏语方言 ID 识别和藏语说话人识别作为元学习的源任务,旨在增强模型辨别不同方言之间差异的能力。因此,该模型在藏语多方言语音识别任务中的性能得到了提高。实验结果表明,任务多样化元学习提高了藏语多方言语音识别的性能。这证明了任务多样化元学习的有效性和适用性,从而推动了多方言环境下语音识别技术的发展。
期刊介绍:
The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.