{"title":"Task Arithmetic for Language Expansion in Speech Translation","authors":"Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe","doi":"arxiv-2409.11274","DOIUrl":null,"url":null,"abstract":"Recent advances in large language models (LLMs) have gained interest in\nspeech-text multimodal foundation models, achieving strong performance on\ninstruction-based speech translation (ST). However, expanding language pairs\nfrom an existing instruction-tuned ST system is costly due to the necessity of\nre-training on a combination of new and previous datasets. We propose to expand\nnew language pairs by merging the model trained on new language pairs and the\nexisting model, using task arithmetic. We find that the direct application of\ntask arithmetic for ST causes the merged model to fail to follow instructions;\nthus, generating translation in incorrect languages. To eliminate language\nconfusion, we propose an augmented task arithmetic method that merges an\nadditional language control model. It is trained to generate the correct target\nlanguage token following the instructions. Our experiments demonstrate that our\nproposed language control model can achieve language expansion by eliminating\nlanguage confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66\nand 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the\nuse of our task arithmetic framework can expand to a language pair where\nneither paired ST training data nor a pre-trained ST model is available. We\nfirst synthesize the ST system from machine translation (MT) systems via task\nanalogy, then merge the synthesized ST system to the existing ST model.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in large language models (LLMs) have gained interest in
speech-text multimodal foundation models, achieving strong performance on
instruction-based speech translation (ST). However, expanding language pairs
from an existing instruction-tuned ST system is costly due to the necessity of
re-training on a combination of new and previous datasets. We propose to expand
new language pairs by merging the model trained on new language pairs and the
existing model, using task arithmetic. We find that the direct application of
task arithmetic for ST causes the merged model to fail to follow instructions;
thus, generating translation in incorrect languages. To eliminate language
confusion, we propose an augmented task arithmetic method that merges an
additional language control model. It is trained to generate the correct target
language token following the instructions. Our experiments demonstrate that our
proposed language control model can achieve language expansion by eliminating
language confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66
and 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the
use of our task arithmetic framework can expand to a language pair where
neither paired ST training data nor a pre-trained ST model is available. We
first synthesize the ST system from machine translation (MT) systems via task
analogy, then merge the synthesized ST system to the existing ST model.
最近,大型语言模型(LLMs)在语音-文本多模态基础模型方面取得了巨大进步,在基于指令的语音翻译(ST)方面表现出色。然而,从现有的指令调整语音翻译系统中扩展语言对代价高昂,因为必须在新的和以前的数据集上进行重新训练。我们建议使用任务演算法,通过合并在新语言对上训练的模型和现有模型来扩展新语言对。我们发现,将任务运算直接应用于 ST 会导致合并后的模型无法遵循指令,从而产生错误语言的翻译。为了消除语言混淆,我们提出了一种增强任务演算法,该方法合并了一个额外的语言控制模型。经过训练,该模型可以根据指令生成正确的目标语言标记。实验证明,我们提出的语言控制模型可以通过消除语言混淆实现语言扩展。在我们的 MuST-C 和 CoVoST-2 实验中,它的 BLEU 分数分别提高了 4.66 和 4.92。此外,我们还证明了使用我们的任务运算框架可以扩展到既没有配对 ST 训练数据也没有预训练 ST 模型的语言对。我们首先通过任务演算法从机器翻译(MT)系统中合成 ST 系统,然后将合成的 ST 系统与现有的 ST 模型合并。