状态空间不够:机器翻译需要注意

Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz
{"title":"状态空间不够:机器翻译需要注意","authors":"Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz","doi":"10.48550/arXiv.2304.12776","DOIUrl":null,"url":null,"abstract":"Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"State Spaces Aren’t Enough: Machine Translation Needs Attention\",\"authors\":\"Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz\",\"doi\":\"10.48550/arXiv.2304.12776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.\",\"PeriodicalId\":137211,\"journal\":{\"name\":\"European Association for Machine Translation Conferences/Workshops\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Association for Machine Translation Conferences/Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2304.12776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Association for Machine Translation Conferences/Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.12776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

序列的结构化状态空间(S4)是最近提出的序列模型,已成功应用于各种任务,如视觉,语言建模和音频。由于其数学公式,它将其输入压缩为单个隐藏状态,并且能够捕获长期依赖关系,同时避免需要注意机制。在这项工作中,我们将S4应用于机器翻译(MT),并评估了WMT ' 14和WMT ' 16上的几个编码器-解码器变体。与语言建模的成功相比,我们发现S4落后于Transformer大约4个BLEU点,并且与直觉相反,它在处理长句子时遇到了困难。最后,我们证明了这一差距是由于S4无法在单一隐藏状态下总结完整的源句子造成的,并且表明我们可以通过引入注意机制来缩小这一差距。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
State Spaces Aren’t Enough: Machine Translation Needs Attention
Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Incorporating Human Translator Style into English-Turkish Literary Machine Translation Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages State Spaces Aren’t Enough: Machine Translation Needs Attention An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1