Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation.

IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-09-23 DOI:10.1162/neco_a_01710
Vasily Zadorozhnyy, Edison Mucllari, Cole Pospisil, Duc Nguyen, Qiang Ye
{"title":"Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation.","authors":"Vasily Zadorozhnyy, Edison Mucllari, Cole Pospisil, Duc Nguyen, Qiang Ye","doi":"10.1162/neco_a_01710","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series-based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/neco_a_01710","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series-based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
采用 Neumann-Cayley 变换的正交门控循环单元
近年来,利用正交矩阵改进递归神经网络(RNN)的训练、稳定性和收敛性,特别是控制梯度,已被证明是一种很有前途的方法。虽然门控递归单元(GRU)和长短期记忆(LSTM)架构通过使用各种门和记忆单元解决了梯度消失问题,但它们仍然容易出现梯度爆炸问题。在这项工作中,我们分析了 GRU 中的梯度,并建议使用正交矩阵来防止梯度爆炸问题并增强长期记忆。我们研究了在何处使用正交矩阵,并提出了一种基于 Neumann 序列的缩放 Cayley 变换,用于在 GRU 中训练正交矩阵,我们称之为 Neumann-Cayley 正交 GRU(NC-GRU)。我们在多个合成任务和实际任务中对我们的模型进行了详细实验,结果表明 NC-GRU 明显优于 GRU 和其他几个 RNN。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Computation
Neural Computation 工程技术-计算机:人工智能
CiteScore
6.30
自引率
3.40%
发文量
83
审稿时长
3.0 months
期刊介绍: Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.
期刊最新文献
Associative Learning and Active Inference. Deep Nonnegative Matrix Factorization with Beta Divergences. KLIF: An Optimized Spiking Neuron Unit for Tuning Surrogate Gradient Function. ℓ 1 -Regularized ICA: A Novel Method for Analysis of Task-Related fMRI Data. Latent Space Bayesian Optimization With Latent Data Augmentation for Enhanced Exploration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1