{"title":"An Adaptive Gradient Descent Optimization Algorithm Based on Stratified Sampling","authors":"Yajing Sun, Aixian Chen","doi":"10.1109/cvidliccea56201.2022.9825268","DOIUrl":null,"url":null,"abstract":"A necessary part of deep learning is the adjustment of hyperparameters, which is also one of the most expensive parts of deep learning. The current mainstream adaptive learning rate algorithms include AdaGrad, RMSProp, Adam, and AdamW. AdaGrad can adapt to different learning rates for different parameters. However, its adaptive learning rate is monotonically reduced, which will lead to a weak ability to update parameters in the later stage. Although RMSProp, Adam, and AdamW solved the problem of gradually decreasing the adaptive learning rate of AdaGrad, they all introduced a hyperparameter—the momentum coefficient. In this paper, a new optimization algorithm SAdam is proposed. SAdam uses the stratified sampling technique to combine the windows with fixed first-order and second-order gradient information, which not only solves the problem that the adaptive learning rate of AdaGrad is constantly decreasing but also does not introduce additional hyperparameters. Moreover, experiments show that the test accuracy of SAdam is no less than Adam.","PeriodicalId":23649,"journal":{"name":"Vision","volume":"30 1","pages":"1225-1231"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cvidliccea56201.2022.9825268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A necessary part of deep learning is the adjustment of hyperparameters, which is also one of the most expensive parts of deep learning. The current mainstream adaptive learning rate algorithms include AdaGrad, RMSProp, Adam, and AdamW. AdaGrad can adapt to different learning rates for different parameters. However, its adaptive learning rate is monotonically reduced, which will lead to a weak ability to update parameters in the later stage. Although RMSProp, Adam, and AdamW solved the problem of gradually decreasing the adaptive learning rate of AdaGrad, they all introduced a hyperparameter—the momentum coefficient. In this paper, a new optimization algorithm SAdam is proposed. SAdam uses the stratified sampling technique to combine the windows with fixed first-order and second-order gradient information, which not only solves the problem that the adaptive learning rate of AdaGrad is constantly decreasing but also does not introduce additional hyperparameters. Moreover, experiments show that the test accuracy of SAdam is no less than Adam.