On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS IEICE Transactions on Information and Systems Pub Date : 2023-09-01 DOI:10.1587/transinf.2023edp7008
Katsuyuki HAGIWARA
{"title":"On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies","authors":"Katsuyuki HAGIWARA","doi":"10.1587/transinf.2023edp7008","DOIUrl":null,"url":null,"abstract":"In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the ℓ2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEICE Transactions on Information and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1587/transinf.2023edp7008","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the ℓ2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在线噪声副本数据增强下的梯度下降训练
在机器学习中,数据增强(data augmentation, DA)是一种提高模型泛化性能的技术。在本文中,我们主要考虑使用数据集的噪声副本的线性回归的梯度下降,其中噪声被注入到输入中。我们分析了新生成的噪声副本并在每个epoch注入输入的情况,即使用在线噪声副本的情况。因此,本文也可以看作是对一种利用数据挖掘将噪声注入到训练过程中的方法的分析。我们考虑了三种训练情况下的训练过程,即平方和误差下的全批训练和均方误差下的全批和小批训练。结果表明,在所有情况下,具有在线副本的数据分析训练近似等同于l2正则化训练,其中注入噪声的方差是重要的,而副本的数量则不是。此外,我们还发现在线副本的数据处理在平方和误差下的全批条件下和均方误差下的小批条件下明显导致学习率的提高。学习率和正则化效果的显著提高可分别归因于原始输入和噪声副本中的加性噪声。这些结果在数值实验中得到了证实,我们发现我们的结果可以应用于低参数化情况下的常规离线数据分析,而不能应用于高参数化情况。此外,我们还通过实验研究了具有离线噪声副本的DA下神经网络的训练过程,发现我们的线性回归分析可以定性地应用于神经网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEICE Transactions on Information and Systems
IEICE Transactions on Information and Systems 工程技术-计算机:软件工程
CiteScore
1.80
自引率
0.00%
发文量
238
审稿时长
5.0 months
期刊介绍: Published by The Institute of Electronics, Information and Communication Engineers Subject Area: Mathematics Physics Biology, Life Sciences and Basic Medicine General Medicine, Social Medicine, and Nursing Sciences Clinical Medicine Engineering in General Nanosciences and Materials Sciences Mechanical Engineering Electrical and Electronic Engineering Information Sciences Economics, Business & Management Psychology, Education.
期刊最新文献
Fresh Tea Sprouts Segmentation via Capsule Network Finformer: Fast Incremental and General Time Series Data Prediction Weighted Generalized Hesitant Fuzzy Sets and Its Application in Ensemble Learning TECDR: Cross-Domain Recommender System Based on Domain Knowledge Transferor and Latent Preference Extractor Investigating the Efficacy of Partial Decomposition in Kit-Build Concept Maps for Reducing Cognitive Load and Enhancing Reading Comprehension
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1