改进的深度神经网络Adam优化器

2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS) Pub Date : 2018-06-01 DOI:10.1109/IWQoS.2018.8624183

Zijun Zhang

{"title":"改进的深度神经网络Adam优化器","authors":"Zijun Zhang","doi":"10.1109/IWQoS.2018.8624183","DOIUrl":null,"url":null,"abstract":"Adaptive optimization algorithms, such as Adam and RMSprop, have witnessed better optimization performance than stochastic gradient descent (SGD) in some scenarios. However, recent studies show that they often lead to worse generalization performance than SGD, especially for training deep neural networks (DNNs). In this work, we identify the reasons that Adam generalizes worse than SGD, and develop a variant of Adam to eliminate the generalization gap. The proposed method, normalized direction-preserving Adam (ND-Adam), enables more precise control of the direction and step size for updating weight vectors, leading to significantly improved generalization performance. Following a similar rationale, we further improve the generalization performance in classification tasks by regularizing the softmax logits. By bridging the gap between SGD and Adam, we also hope to shed light on why certain optimization algorithms generalize better than others.","PeriodicalId":222290,"journal":{"name":"2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"583","resultStr":"{\"title\":\"Improved Adam Optimizer for Deep Neural Networks\",\"authors\":\"Zijun Zhang\",\"doi\":\"10.1109/IWQoS.2018.8624183\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adaptive optimization algorithms, such as Adam and RMSprop, have witnessed better optimization performance than stochastic gradient descent (SGD) in some scenarios. However, recent studies show that they often lead to worse generalization performance than SGD, especially for training deep neural networks (DNNs). In this work, we identify the reasons that Adam generalizes worse than SGD, and develop a variant of Adam to eliminate the generalization gap. The proposed method, normalized direction-preserving Adam (ND-Adam), enables more precise control of the direction and step size for updating weight vectors, leading to significantly improved generalization performance. Following a similar rationale, we further improve the generalization performance in classification tasks by regularizing the softmax logits. By bridging the gap between SGD and Adam, we also hope to shed light on why certain optimization algorithms generalize better than others.\",\"PeriodicalId\":222290,\"journal\":{\"name\":\"2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"583\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWQoS.2018.8624183\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS.2018.8624183","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 583

摘要

自适应优化算法，如Adam和RMSprop，在某些情况下比随机梯度下降(SGD)的优化性能更好。然而，最近的研究表明，它们往往导致比SGD更差的泛化性能，特别是在训练深度神经网络(dnn)时。在这项工作中，我们确定了Adam泛化比SGD差的原因，并开发了Adam的变体来消除泛化差距。提出的归一化方向保持亚当(ND-Adam)方法可以更精确地控制更新权重向量的方向和步长，从而显著提高泛化性能。遵循类似的原理，我们通过规范softmax logits进一步提高了分类任务的泛化性能。通过弥合SGD和Adam之间的差距，我们还希望阐明为什么某些优化算法比其他算法泛化得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improved Adam Optimizer for Deep Neural Networks

Adaptive optimization algorithms, such as Adam and RMSprop, have witnessed better optimization performance than stochastic gradient descent (SGD) in some scenarios. However, recent studies show that they often lead to worse generalization performance than SGD, especially for training deep neural networks (DNNs). In this work, we identify the reasons that Adam generalizes worse than SGD, and develop a variant of Adam to eliminate the generalization gap. The proposed method, normalized direction-preserving Adam (ND-Adam), enables more precise control of the direction and step size for updating weight vectors, leading to significantly improved generalization performance. Following a similar rationale, we further improve the generalization performance in classification tasks by regularizing the softmax logits. By bridging the gap between SGD and Adam, we also hope to shed light on why certain optimization algorithms generalize better than others.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS)

自引率

0.00%

发文量

期刊最新文献

Welcome from General Chair Back How Would you Like Your Packets Delivered? An SDN-Enabled Open Platform for QoS Routing Byte Segment Neural Network for Network Traffic Classification Enabling Privacy-Preserving Header Matching for Outsourced Middleboxes