采用 Neumann-Cayley 变换的正交门控循环单元

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-11-19 DOI:10.1162/neco_a_01710

Vasily Zadorozhnyy;Edison Mucllari;Cole Pospisil;Duc Nguyen;Qiang Ye

{"title":"采用 Neumann-Cayley 变换的正交门控循环单元","authors":"Vasily Zadorozhnyy;Edison Mucllari;Cole Pospisil;Duc Nguyen;Qiang Ye","doi":"10.1162/neco_a_01710","DOIUrl":null,"url":null,"abstract":"In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 12","pages":"2651-2676"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation\",\"authors\":\"Vasily Zadorozhnyy;Edison Mucllari;Cole Pospisil;Duc Nguyen;Qiang Ye\",\"doi\":\"10.1162/neco_a_01710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"36 12\",\"pages\":\"2651-2676\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10810340/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10810340/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，利用正交矩阵改进递归神经网络（RNN）的训练、稳定性和收敛性，特别是控制梯度，已被证明是一种很有前途的方法。虽然门控递归单元（GRU）和长短期记忆（LSTM）架构通过使用各种门和记忆单元解决了梯度消失问题，但它们仍然容易出现梯度爆炸问题。在这项工作中，我们分析了 GRU 中的梯度，并建议使用正交矩阵来防止梯度爆炸问题并增强长期记忆。我们研究了在何处使用正交矩阵，并提出了一种基于 Neumann 序列的缩放 Cayley 变换，用于在 GRU 中训练正交矩阵，我们称之为 Neumann-Cayley 正交 GRU（NC-GRU）。我们在多个合成任务和实际任务中对我们的模型进行了详细实验，结果表明 NC-GRU 明显优于 GRU 和其他几个 RNN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.

期刊最新文献

Gradual Domain Adaptation via Normalizing Flows Improving Recall in Sparse Associative Memories That Use Neurogenesis Replay as a Basis for Backpropagation Through Time in the Brain Toward a Free-Response Paradigm of Decision Making in Spiking Neural Networks Uncovering Dynamical Equations of Stochastic Decision Models Using Data-Driven SINDy Algorithm