Generalization and risk bounds for recurrent neural networks

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-11-19 DOI:10.1016/j.neucom.2024.128825

Xuewei Cheng , Ke Huang , Shujie Ma

{"title":"Generalization and risk bounds for recurrent neural networks","authors":"Xuewei Cheng , Ke Huang , Shujie Ma","doi":"10.1016/j.neucom.2024.128825","DOIUrl":null,"url":null,"abstract":"<div><div>Recurrent Neural Networks (RNNs) have achieved great success in the prediction of sequential data. However, their theoretical studies are still lagging behind because of their complex interconnected structures. In this paper, we establish a new generalization error bound for vanilla RNNs, and provide a unified framework to calculate the Rademacher complexity that can be applied to a variety of loss functions. When the ramp loss is used, we show that our bound is tighter than the existing bounds based on the same assumptions on the Frobenius and spectral norms of the weight matrices and a few mild conditions. Our numerical results show that our new generalization bound is the tightest among all existing bounds in three public datasets. Our bound improves the second tightest one by an average percentage of 13.80% and 3.01% when the <span><math><mo>tanh</mo></math></span> and ReLU activation functions are used, respectively. Moreover, we derive a sharp estimation error bound for RNN-based estimators obtained through empirical risk minimization (ERM) in multi-class classification problems when the loss function satisfies a Bernstein condition.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128825"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015960","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recurrent Neural Networks (RNNs) have achieved great success in the prediction of sequential data. However, their theoretical studies are still lagging behind because of their complex interconnected structures. In this paper, we establish a new generalization error bound for vanilla RNNs, and provide a unified framework to calculate the Rademacher complexity that can be applied to a variety of loss functions. When the ramp loss is used, we show that our bound is tighter than the existing bounds based on the same assumptions on the Frobenius and spectral norms of the weight matrices and a few mild conditions. Our numerical results show that our new generalization bound is the tightest among all existing bounds in three public datasets. Our bound improves the second tightest one by an average percentage of 13.80% and 3.01% when the

tanh

and ReLU activation functions are used, respectively. Moreover, we derive a sharp estimation error bound for RNN-based estimators obtained through empirical risk minimization (ERM) in multi-class classification problems when the loss function satisfies a Bernstein condition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

递归神经网络的泛化与风险界

递归神经网络（RNNs）在序列数据预测方面取得了巨大成功。但由于其复杂的相互联系结构，对其理论研究还比较滞后。在本文中，我们为普通rnn建立了一个新的泛化误差界，并提供了一个统一的框架来计算可应用于各种损失函数的Rademacher复杂度。当使用斜坡损失时，我们证明了基于权矩阵的Frobenius和谱范数的相同假设以及一些温和的条件，我们的界比现有的界更紧。我们的数值结果表明，我们的新泛化边界在三个公共数据集的所有现有边界中是最紧的。当使用tanh和ReLU激活函数时，我们的边界分别提高了13.80%和3.01%的平均百分比。此外，在多类分类问题中，当损失函数满足Bernstein条件时，通过经验风险最小化（ERM）得到基于rnn的估计量，我们得到了一个尖锐的估计误差界。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.