语音翻译中语言扩展的任务算术

arXiv - CS - Computation and Language Pub Date : 2024-09-17 DOI:arxiv-2409.11274

Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe

{"title":"语音翻译中语言扩展的任务算术","authors":"Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe","doi":"arxiv-2409.11274","DOIUrl":null,"url":null,"abstract":"Recent advances in large language models (LLMs) have gained interest in\nspeech-text multimodal foundation models, achieving strong performance on\ninstruction-based speech translation (ST). However, expanding language pairs\nfrom an existing instruction-tuned ST system is costly due to the necessity of\nre-training on a combination of new and previous datasets. We propose to expand\nnew language pairs by merging the model trained on new language pairs and the\nexisting model, using task arithmetic. We find that the direct application of\ntask arithmetic for ST causes the merged model to fail to follow instructions;\nthus, generating translation in incorrect languages. To eliminate language\nconfusion, we propose an augmented task arithmetic method that merges an\nadditional language control model. It is trained to generate the correct target\nlanguage token following the instructions. Our experiments demonstrate that our\nproposed language control model can achieve language expansion by eliminating\nlanguage confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66\nand 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the\nuse of our task arithmetic framework can expand to a language pair where\nneither paired ST training data nor a pre-trained ST model is available. We\nfirst synthesize the ST system from machine translation (MT) systems via task\nanalogy, then merge the synthesized ST system to the existing ST model.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Task Arithmetic for Language Expansion in Speech Translation\",\"authors\":\"Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe\",\"doi\":\"arxiv-2409.11274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in large language models (LLMs) have gained interest in\\nspeech-text multimodal foundation models, achieving strong performance on\\ninstruction-based speech translation (ST). However, expanding language pairs\\nfrom an existing instruction-tuned ST system is costly due to the necessity of\\nre-training on a combination of new and previous datasets. We propose to expand\\nnew language pairs by merging the model trained on new language pairs and the\\nexisting model, using task arithmetic. We find that the direct application of\\ntask arithmetic for ST causes the merged model to fail to follow instructions;\\nthus, generating translation in incorrect languages. To eliminate language\\nconfusion, we propose an augmented task arithmetic method that merges an\\nadditional language control model. It is trained to generate the correct target\\nlanguage token following the instructions. Our experiments demonstrate that our\\nproposed language control model can achieve language expansion by eliminating\\nlanguage confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66\\nand 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the\\nuse of our task arithmetic framework can expand to a language pair where\\nneither paired ST training data nor a pre-trained ST model is available. We\\nfirst synthesize the ST system from machine translation (MT) systems via task\\nanalogy, then merge the synthesized ST system to the existing ST model.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11274\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，大型语言模型（LLMs）在语音-文本多模态基础模型方面取得了巨大进步，在基于指令的语音翻译（ST）方面表现出色。然而，从现有的指令调整语音翻译系统中扩展语言对代价高昂，因为必须在新的和以前的数据集上进行重新训练。我们建议使用任务演算法，通过合并在新语言对上训练的模型和现有模型来扩展新语言对。我们发现，将任务运算直接应用于 ST 会导致合并后的模型无法遵循指令，从而产生错误语言的翻译。为了消除语言混淆，我们提出了一种增强任务演算法，该方法合并了一个额外的语言控制模型。经过训练，该模型可以根据指令生成正确的目标语言标记。实验证明，我们提出的语言控制模型可以通过消除语言混淆实现语言扩展。在我们的 MuST-C 和 CoVoST-2 实验中，它的 BLEU 分数分别提高了 4.66 和 4.92。此外，我们还证明了使用我们的任务运算框架可以扩展到既没有配对 ST 训练数据也没有预训练 ST 模型的语言对。我们首先通过任务演算法从机器翻译（MT）系统中合成 ST 系统，然后将合成的 ST 系统与现有的 ST 模型合并。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Task Arithmetic for Language Expansion in Speech Translation

Recent advances in large language models (LLMs) have gained interest in speech-text multimodal foundation models, achieving strong performance on instruction-based speech translation (ST). However, expanding language pairs from an existing instruction-tuned ST system is costly due to the necessity of re-training on a combination of new and previous datasets. We propose to expand new language pairs by merging the model trained on new language pairs and the existing model, using task arithmetic. We find that the direct application of task arithmetic for ST causes the merged model to fail to follow instructions; thus, generating translation in incorrect languages. To eliminate language confusion, we propose an augmented task arithmetic method that merges an additional language control model. It is trained to generate the correct target language token following the instructions. Our experiments demonstrate that our proposed language control model can achieve language expansion by eliminating language confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66 and 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the use of our task arithmetic framework can expand to a language pair where neither paired ST training data nor a pre-trained ST model is available. We first synthesize the ST system from machine translation (MT) systems via task analogy, then merge the synthesized ST system to the existing ST model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computation and Language

自引率

0.00%

发文量