语音翻译的思维链提示

arXiv - CS - Computation and Language Pub Date : 2024-09-17 DOI:arxiv-2409.11538

Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

{"title":"语音翻译的思维链提示","authors":"Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg","doi":"arxiv-2409.11538","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have demonstrated remarkable advancements in\nlanguage understanding and generation. Building on the success of text-based\nLLMs, recent research has adapted these models to use speech embeddings for\nprompting, resulting in Speech-LLM models that exhibit strong performance in\nautomatic speech recognition (ASR) and automatic speech translation (AST). In\nthis work, we propose a novel approach to leverage ASR transcripts as prompts\nfor AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM\nmodel consists of a speech encoder and an encoder-decoder structure\nMegatron-T5. By first decoding speech to generate ASR transcripts and\nsubsequently using these transcripts along with encoded speech for prompting,\nwe guide the speech translation in a two-step process like chain-of-thought\n(CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model\nadaptation and shows superior performance to full model fine-tuning.\nExperimental results show that the proposed CoT prompting significantly\nimproves AST performance, achieving an average increase of 2.4 BLEU points\nacross 6 En->X or X->En AST tasks compared to speech prompting alone.\nAdditionally, compared to a related CoT prediction method that predicts a\nconcatenated sequence of ASR and AST transcripts, our method performs better by\nan average of 2 BLEU points.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chain-of-Thought Prompting for Speech Translation\",\"authors\":\"Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg\",\"doi\":\"arxiv-2409.11538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) have demonstrated remarkable advancements in\\nlanguage understanding and generation. Building on the success of text-based\\nLLMs, recent research has adapted these models to use speech embeddings for\\nprompting, resulting in Speech-LLM models that exhibit strong performance in\\nautomatic speech recognition (ASR) and automatic speech translation (AST). In\\nthis work, we propose a novel approach to leverage ASR transcripts as prompts\\nfor AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM\\nmodel consists of a speech encoder and an encoder-decoder structure\\nMegatron-T5. By first decoding speech to generate ASR transcripts and\\nsubsequently using these transcripts along with encoded speech for prompting,\\nwe guide the speech translation in a two-step process like chain-of-thought\\n(CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model\\nadaptation and shows superior performance to full model fine-tuning.\\nExperimental results show that the proposed CoT prompting significantly\\nimproves AST performance, achieving an average increase of 2.4 BLEU points\\nacross 6 En->X or X->En AST tasks compared to speech prompting alone.\\nAdditionally, compared to a related CoT prediction method that predicts a\\nconcatenated sequence of ASR and AST transcripts, our method performs better by\\nan average of 2 BLEU points.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大语言模型（LLM）在语言理解和生成方面取得了显著进步。在基于文本的大型语言模型取得成功的基础上，最近的研究将这些模型调整为使用语音嵌入进行提示，从而产生了在自动语音识别（ASR）和自动语音翻译（AST）中表现出色的语音大型语言模型。在这项工作中，我们提出了一种新方法，在基于编码器-解码器文本 LLM 的 Speech-LLM 中利用 ASR 转录作为 AST 的提示。语音 LLM 模型由一个语音编码器和一个编码器-解码器结构（Megatron-T5）组成。我们首先对语音进行解码，生成 ASR 转录本，然后使用这些转录本和编码语音进行提示，通过类似于思维链（CoT）提示的两步过程引导语音翻译。实验结果表明，建议的 CoT 提示显著提高了 AST 性能，与单独的语音提示相比，在 6 个 En->X 或 X->En AST 任务中平均提高了 2.4 个 BLEU 点。此外，与预测 ASR 和 AST 转录本合并序列的相关 CoT 预测方法相比，我们的方法平均提高了 2 个 BLEU 点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Chain-of-Thought Prompting for Speech Translation

Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST). In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM model consists of a speech encoder and an encoder-decoder structure Megatron-T5. By first decoding speech to generate ASR transcripts and subsequently using these transcripts along with encoded speech for prompting, we guide the speech translation in a two-step process like chain-of-thought (CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model adaptation and shows superior performance to full model fine-tuning. Experimental results show that the proposed CoT prompting significantly improves AST performance, achieving an average increase of 2.4 BLEU points across 6 En->X or X->En AST tasks compared to speech prompting alone. Additionally, compared to a related CoT prediction method that predicts a concatenated sequence of ASR and AST transcripts, our method performs better by an average of 2 BLEU points.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computation and Language

自引率

0.00%

发文量