The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets

IF 0.6 4区工程技术 Q4 Engineering Nuclear Engineering International Pub Date : 2020-12-22 DOI:10.18034/ei.v8i2.570

N. Bynagari

{"title":"The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets","authors":"N. Bynagari","doi":"10.18034/ei.v8i2.570","DOIUrl":null,"url":null,"abstract":"In theory, recurrent networks (RN) can leverage their feedback connections to store activations as representations of recent input events. The most extensively used methods for learning what to put in short-term memory, on the other hand, take far too long to be practicable or do not work at all, especially when the time lags between inputs and instructor signals are long. They do not provide significant practical advantages over, the backdrop in feedforward networks with limited time windows, despite being theoretically fascinating. The goal of this article is to have a succinct overview of this rapidly evolving topic, with a focus on recent advancements. Also, we examine the asymptotic behavior of error gradients as a function of time lags to provide a hypothetical treatment of this topic. The methodology adopted in the study was to review some scholarly research papers on the subject matter to address the difficulty of learning long-term dependencies with gradient flow in recurrent nets. RNNs are the most general and powerful sequence learning algorithm currently available. Unlike Hidden Markov Models (HMMs), which have proven to be the most successful technique in a variety of sequence processing applications, they are not limited to discrete internal states and can represent continuous, dispersed sequences. As a result, they can address problems that no other method can. Conventional RNNs, on the other hand, are difficult to train due to the problem of vanishing gradients.","PeriodicalId":49736,"journal":{"name":"Nuclear Engineering International","volume":"12 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2020-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nuclear Engineering International","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.18034/ei.v8i2.570","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 21

Abstract

In theory, recurrent networks (RN) can leverage their feedback connections to store activations as representations of recent input events. The most extensively used methods for learning what to put in short-term memory, on the other hand, take far too long to be practicable or do not work at all, especially when the time lags between inputs and instructor signals are long. They do not provide significant practical advantages over, the backdrop in feedforward networks with limited time windows, despite being theoretically fascinating. The goal of this article is to have a succinct overview of this rapidly evolving topic, with a focus on recent advancements. Also, we examine the asymptotic behavior of error gradients as a function of time lags to provide a hypothetical treatment of this topic. The methodology adopted in the study was to review some scholarly research papers on the subject matter to address the difficulty of learning long-term dependencies with gradient flow in recurrent nets. RNNs are the most general and powerful sequence learning algorithm currently available. Unlike Hidden Markov Models (HMMs), which have proven to be the most successful technique in a variety of sequence processing applications, they are not limited to discrete internal states and can represent continuous, dispersed sequences. As a result, they can address problems that no other method can. Conventional RNNs, on the other hand, are difficult to train due to the problem of vanishing gradients.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

递归网络中梯度流学习长期依赖关系的困难

理论上，循环网络(RN)可以利用它们的反馈连接来存储作为最近输入事件表示的激活。另一方面，最广泛使用的学习短期记忆的方法需要太长时间才能实现，或者根本不起作用，特别是当输入和指导信号之间的时间滞后很长时。在有限时间窗口的前馈网络中，它们并没有提供显著的实际优势，尽管理论上很吸引人。本文的目标是简要概述这个快速发展的主题，重点是最近的进展。此外，我们研究了误差梯度作为时间滞后函数的渐近行为，以提供对该主题的假设处理。本研究采用的方法是回顾一些关于该主题的学术研究论文，以解决学习循环网络中梯度流的长期依赖关系的困难。rnn是目前可用的最通用和最强大的序列学习算法。与隐马尔可夫模型(hmm)不同，隐马尔可夫模型已被证明是各种序列处理应用中最成功的技术，它们不局限于离散的内部状态，可以表示连续的、分散的序列。因此，它们可以解决其他方法无法解决的问题。另一方面，由于梯度消失的问题，传统的rnn很难训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊