状态空间不够:机器翻译需要注意

European Association for Machine Translation Conferences/Workshops Pub Date : 2023-04-25 DOI:10.48550/arXiv.2304.12776

Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz

{"title":"状态空间不够:机器翻译需要注意","authors":"Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz","doi":"10.48550/arXiv.2304.12776","DOIUrl":null,"url":null,"abstract":"Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"State Spaces Aren’t Enough: Machine Translation Needs Attention\",\"authors\":\"Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz\",\"doi\":\"10.48550/arXiv.2304.12776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.\",\"PeriodicalId\":137211,\"journal\":{\"name\":\"European Association for Machine Translation Conferences/Workshops\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Association for Machine Translation Conferences/Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2304.12776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Association for Machine Translation Conferences/Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.12776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

序列的结构化状态空间(S4)是最近提出的序列模型，已成功应用于各种任务，如视觉，语言建模和音频。由于其数学公式，它将其输入压缩为单个隐藏状态，并且能够捕获长期依赖关系，同时避免需要注意机制。在这项工作中，我们将S4应用于机器翻译(MT)，并评估了WMT ' 14和WMT ' 16上的几个编码器-解码器变体。与语言建模的成功相比，我们发现S4落后于Transformer大约4个BLEU点，并且与直觉相反，它在处理长句子时遇到了困难。最后，我们证明了这一差距是由于S4无法在单一隐藏状态下总结完整的源句子造成的，并且表明我们可以通过引入注意机制来缩小这一差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

State Spaces Aren’t Enough: Machine Translation Needs Attention

Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Association for Machine Translation Conferences/Workshops

自引率

0.00%

发文量