Task Arithmetic for Language Expansion in Speech Translation

Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe
{"title":"Task Arithmetic for Language Expansion in Speech Translation","authors":"Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe","doi":"arxiv-2409.11274","DOIUrl":null,"url":null,"abstract":"Recent advances in large language models (LLMs) have gained interest in\nspeech-text multimodal foundation models, achieving strong performance on\ninstruction-based speech translation (ST). However, expanding language pairs\nfrom an existing instruction-tuned ST system is costly due to the necessity of\nre-training on a combination of new and previous datasets. We propose to expand\nnew language pairs by merging the model trained on new language pairs and the\nexisting model, using task arithmetic. We find that the direct application of\ntask arithmetic for ST causes the merged model to fail to follow instructions;\nthus, generating translation in incorrect languages. To eliminate language\nconfusion, we propose an augmented task arithmetic method that merges an\nadditional language control model. It is trained to generate the correct target\nlanguage token following the instructions. Our experiments demonstrate that our\nproposed language control model can achieve language expansion by eliminating\nlanguage confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66\nand 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the\nuse of our task arithmetic framework can expand to a language pair where\nneither paired ST training data nor a pre-trained ST model is available. We\nfirst synthesize the ST system from machine translation (MT) systems via task\nanalogy, then merge the synthesized ST system to the existing ST model.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in large language models (LLMs) have gained interest in speech-text multimodal foundation models, achieving strong performance on instruction-based speech translation (ST). However, expanding language pairs from an existing instruction-tuned ST system is costly due to the necessity of re-training on a combination of new and previous datasets. We propose to expand new language pairs by merging the model trained on new language pairs and the existing model, using task arithmetic. We find that the direct application of task arithmetic for ST causes the merged model to fail to follow instructions; thus, generating translation in incorrect languages. To eliminate language confusion, we propose an augmented task arithmetic method that merges an additional language control model. It is trained to generate the correct target language token following the instructions. Our experiments demonstrate that our proposed language control model can achieve language expansion by eliminating language confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66 and 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the use of our task arithmetic framework can expand to a language pair where neither paired ST training data nor a pre-trained ST model is available. We first synthesize the ST system from machine translation (MT) systems via task analogy, then merge the synthesized ST system to the existing ST model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音翻译中语言扩展的任务算术
最近,大型语言模型(LLMs)在语音-文本多模态基础模型方面取得了巨大进步,在基于指令的语音翻译(ST)方面表现出色。然而,从现有的指令调整语音翻译系统中扩展语言对代价高昂,因为必须在新的和以前的数据集上进行重新训练。我们建议使用任务演算法,通过合并在新语言对上训练的模型和现有模型来扩展新语言对。我们发现,将任务运算直接应用于 ST 会导致合并后的模型无法遵循指令,从而产生错误语言的翻译。为了消除语言混淆,我们提出了一种增强任务演算法,该方法合并了一个额外的语言控制模型。经过训练,该模型可以根据指令生成正确的目标语言标记。实验证明,我们提出的语言控制模型可以通过消除语言混淆实现语言扩展。在我们的 MuST-C 和 CoVoST-2 实验中,它的 BLEU 分数分别提高了 4.66 和 4.92。此外,我们还证明了使用我们的任务运算框架可以扩展到既没有配对 ST 训练数据也没有预训练 ST 模型的语言对。我们首先通过任务演算法从机器翻译(MT)系统中合成 ST 系统,然后将合成的 ST 系统与现有的 ST 模型合并。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
LLMs + Persona-Plug = Personalized LLMs MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Human-like Affective Cognition in Foundation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1