Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem

Adaptive Agents and Multi-Agent Systems Pub Date : 2023-01-04 DOI:10.48550/arXiv.2301.01772

Peiwang Tang, Xianchao Zhang

{"title":"Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem","authors":"Peiwang Tang, Xianchao Zhang","doi":"10.48550/arXiv.2301.01772","DOIUrl":null,"url":null,"abstract":"The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV), since the ability to efficiently capture the precise long-range dependency coupling between input sequences. With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem (LTFP). To address these difficulties: (i) we revisit the learned attention patterns of the vanilla self-attention, redesigned the calculation method of self-attention based the Maximum Entropy Principle. (ii) we propose a new method to sparse the self-attention, which can prevent the loss of more important self-attention scores due to random sampling.(iii) We propose Keys/Values Distilling method motivated that a large amount of feature in the original self-attention map is redundant, which can further reduce the time and spatial complexity and make it possible to input longer time-series. Finally, we propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition, i.e., using the encoder-decoder architecture to capture more specific seasonal parts. A large number of experiments on several large-scale datasets show that our Infomaxformer is obviously superior to the existing methods. We expect this to open up a new solution for Transformer to solve LTFP, and exploring the ability of the Transformer architecture to capture much longer temporal dependencies.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"9 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adaptive Agents and Multi-Agent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.01772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV), since the ability to efficiently capture the precise long-range dependency coupling between input sequences. With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem (LTFP). To address these difficulties: (i) we revisit the learned attention patterns of the vanilla self-attention, redesigned the calculation method of self-attention based the Maximum Entropy Principle. (ii) we propose a new method to sparse the self-attention, which can prevent the loss of more important self-attention scores due to random sampling.(iii) We propose Keys/Values Distilling method motivated that a large amount of feature in the original self-attention map is redundant, which can further reduce the time and spatial complexity and make it possible to input longer time-series. Finally, we propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition, i.e., using the encoder-decoder architecture to capture more specific seasonal parts. A large number of experiments on several large-scale datasets show that our Infomaxformer is obviously superior to the existing methods. We expect this to open up a new solution for Transformer to solve LTFP, and exploring the ability of the Transformer architecture to capture much longer temporal dependencies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

长时间序列预测问题的最大熵变压器

Transformer体系结构在许多任务中产生最先进的结果，例如自然语言处理(NLP)和计算机视觉(CV)，因为它能够有效地捕获输入序列之间精确的远程依赖耦合。然而，使用这种高级功能，二次元时间复杂度和高内存使用会阻止Transformer处理长时间序列预测问题(LTFP)。为了解决这些困难:(i)我们重新审视了香草自注意的学习注意模式，重新设计了基于最大熵原理的自注意计算方法。(ii)提出了一种新的自注意稀疏方法，可以防止由于随机采样而丢失更重要的自注意分数。(iii)我们提出了key /Values Distilling方法，这是基于原始自注意图中大量的特征是冗余的，可以进一步降低时间和空间复杂度，使输入更长的时间序列成为可能。最后，我们提出了一种将编码器-解码器架构与季节趋势分解相结合的方法，即使用编码器-解码器架构来捕获更具体的季节部分。在多个大规模数据集上的大量实验表明，我们的Infomaxformer明显优于现有的方法。我们期望这将为Transformer打开一个解决LTFP的新解决方案，并探索Transformer体系结构捕获更长的时间依赖性的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊