{"title":"Stability of accuracy for the training of DNNs via the uniform doubling condition","authors":"Yitzchak Shmalo","doi":"10.1007/s10472-023-09919-1","DOIUrl":null,"url":null,"abstract":"<div><p>We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in <span>\\(\\mathbb {R}^n\\)</span>, this doubling condition is formulated using slabs in <span>\\(\\mathbb {R}^n\\)</span> and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set <i>T</i> that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time <span>\\(t_0\\)</span> will have high accuracy for all time <span>\\(t>t_0\\)</span>. Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. We demonstrate how to numerically implement a simplified version of this uniform doubling condition on a dataset and apply it to achieve stability of accuracy using a few model examples. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 2","pages":"439 - 483"},"PeriodicalIF":1.2000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10472-023-09919-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in \(\mathbb {R}^n\), this doubling condition is formulated using slabs in \(\mathbb {R}^n\) and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set T that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time \(t_0\) will have high accuracy for all time \(t>t_0\). Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. We demonstrate how to numerically implement a simplified version of this uniform doubling condition on a dataset and apply it to achieve stability of accuracy using a few model examples. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.
期刊介绍:
Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning.
The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors.
Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.