Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-11-19 DOI:10.1162/neco_a_01710

Vasily Zadorozhnyy;Edison Mucllari;Cole Pospisil;Duc Nguyen;Qiang Ye

引用次数: 0

Abstract

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

采用 Neumann-Cayley 变换的正交门控循环单元

近年来，利用正交矩阵改进递归神经网络（RNN）的训练、稳定性和收敛性，特别是控制梯度，已被证明是一种很有前途的方法。虽然门控递归单元（GRU）和长短期记忆（LSTM）架构通过使用各种门和记忆单元解决了梯度消失问题，但它们仍然容易出现梯度爆炸问题。在这项工作中，我们分析了 GRU 中的梯度，并建议使用正交矩阵来防止梯度爆炸问题并增强长期记忆。我们研究了在何处使用正交矩阵，并提出了一种基于 Neumann 序列的缩放 Cayley 变换，用于在 GRU 中训练正交矩阵，我们称之为 Neumann-Cayley 正交 GRU（NC-GRU）。我们在多个合成任务和实际任务中对我们的模型进行了详细实验，结果表明 NC-GRU 明显优于 GRU 和其他几个 RNN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.