从SpokenVocab生成合成语音用于语音翻译

Findings (Sydney (N.S.W.) Pub Date : 2022-10-15 DOI:10.48550/arXiv.2210.08174

Jinming Zhao, Gholamreza Haffar, Ehsan Shareghi

{"title":"从SpokenVocab生成合成语音用于语音翻译","authors":"Jinming Zhao, Gholamreza Haffar, Ehsan Shareghi","doi":"10.48550/arXiv.2210.08174","DOIUrl":null,"url":null,"abstract":"Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert text-based machine translation (MT) data to ST data via text-to-speech (TTS) systems.Yet, using TTS systems can be tedious and slow. In this work, we propose SpokenVocab, a simple, scalable and effective data augmentation technique to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets, corresponding to words in an MT sentence, from a spoken vocabulary bank. Our experiments on multiple language pairs show that stitched speech helps to improve translation quality by an average of 1.83 BLEU score, while performing equally well as TTS-generated speech in improving translation quality. We also showcase how SpokenVocab can be applied in code-switching ST for which often no TTS systems exit.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"1930-1936"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Generating Synthetic Speech from SpokenVocab for Speech Translation\",\"authors\":\"Jinming Zhao, Gholamreza Haffar, Ehsan Shareghi\",\"doi\":\"10.48550/arXiv.2210.08174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert text-based machine translation (MT) data to ST data via text-to-speech (TTS) systems.Yet, using TTS systems can be tedious and slow. In this work, we propose SpokenVocab, a simple, scalable and effective data augmentation technique to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets, corresponding to words in an MT sentence, from a spoken vocabulary bank. Our experiments on multiple language pairs show that stitched speech helps to improve translation quality by an average of 1.83 BLEU score, while performing equally well as TTS-generated speech in improving translation quality. We also showcase how SpokenVocab can be applied in code-switching ST for which often no TTS systems exit.\",\"PeriodicalId\":73025,\"journal\":{\"name\":\"Findings (Sydney (N.S.W.)\",\"volume\":\"1 1\",\"pages\":\"1930-1936\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Findings (Sydney (N.S.W.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.08174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Findings (Sydney (N.S.W.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.08174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

训练端到端语音翻译（ST）系统需要足够大规模的数据，而这对于大多数语言对和领域来说是不可用的。数据稀缺问题的一个实际解决方案是通过文本到语音（TTS）系统将基于文本的机器翻译（MT）数据转换为ST数据。然而，使用TTS系统可能是乏味和缓慢的。在这项工作中，我们提出了SpokenVocab，这是一种简单、可扩展且有效的数据扩充技术，可以在飞行中将MT数据转换为ST数据。这个想法是从口语词汇库中检索并缝合与MT句子中的单词相对应的音频片段。我们在多语言对上的实验表明，拼接语音有助于将翻译质量平均提高1.83 BLEU分数，同时在提高翻译质量方面与TTS生成的语音表现相同。我们还展示了SpokenVocab如何应用于通常没有TTS系统退出的代码切换ST。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Generating Synthetic Speech from SpokenVocab for Speech Translation

Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert text-based machine translation (MT) data to ST data via text-to-speech (TTS) systems.Yet, using TTS systems can be tedious and slow. In this work, we propose SpokenVocab, a simple, scalable and effective data augmentation technique to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets, corresponding to words in an MT sentence, from a spoken vocabulary bank. Our experiments on multiple language pairs show that stitched speech helps to improve translation quality by an average of 1.83 BLEU score, while performing equally well as TTS-generated speech in improving translation quality. We also showcase how SpokenVocab can be applied in code-switching ST for which often no TTS systems exit.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Findings (Sydney (N.S.W.)

自引率

0.00%

发文量

审稿时长

4 weeks