On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS IEICE Transactions on Information and Systems Pub Date : 2023-09-01 DOI:10.1587/transinf.2023edp7008

Katsuyuki HAGIWARA

{"title":"On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies","authors":"Katsuyuki HAGIWARA","doi":"10.1587/transinf.2023edp7008","DOIUrl":null,"url":null,"abstract":"In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the ℓ2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":"186 1","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEICE Transactions on Information and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1587/transinf.2023edp7008","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the ℓ2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在线噪声副本数据增强下的梯度下降训练

在机器学习中，数据增强(data augmentation, DA)是一种提高模型泛化性能的技术。在本文中，我们主要考虑使用数据集的噪声副本的线性回归的梯度下降，其中噪声被注入到输入中。我们分析了新生成的噪声副本并在每个epoch注入输入的情况，即使用在线噪声副本的情况。因此，本文也可以看作是对一种利用数据挖掘将噪声注入到训练过程中的方法的分析。我们考虑了三种训练情况下的训练过程，即平方和误差下的全批训练和均方误差下的全批和小批训练。结果表明，在所有情况下，具有在线副本的数据分析训练近似等同于l2正则化训练，其中注入噪声的方差是重要的，而副本的数量则不是。此外，我们还发现在线副本的数据处理在平方和误差下的全批条件下和均方误差下的小批条件下明显导致学习率的提高。学习率和正则化效果的显著提高可分别归因于原始输入和噪声副本中的加性噪声。这些结果在数值实验中得到了证实，我们发现我们的结果可以应用于低参数化情况下的常规离线数据分析，而不能应用于高参数化情况。此外，我们还通过实验研究了具有离线噪声副本的DA下神经网络的训练过程，发现我们的线性回归分析可以定性地应用于神经网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEICE Transactions on Information and Systems 工程技术-计算机：软件工程

CiteScore

1.80

自引率

0.00%

发文量

238

审稿时长

5.0 months

期刊介绍： Published by The Institute of Electronics, Information and Communication Engineers Subject Area: Mathematics Physics Biology, Life Sciences and Basic Medicine General Medicine, Social Medicine, and Nursing Sciences Clinical Medicine Engineering in General Nanosciences and Materials Sciences Mechanical Engineering Electrical and Electronic Engineering Information Sciences Economics, Business & Management Psychology, Education.

期刊最新文献

Frameworks for Privacy-Preserving Federated Learning Lightweight and Fast Low-Light Image Enhancement Method Based on PoolFormer An Evaluation of the Impact of Distance on Perceptual Quality of Textured 3D Meshes Efficient Action Spotting Using Saliency Feature Weighting Testing and Delay-Monitoring for the High Reliability of Memory-Based Programmable Logic Device