Unifying and revisiting Sharpness-Aware Minimization with noise-injected micro-batch scheduler for efficiency improvement

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2025-05-01 Epub Date: 2025-02-03 DOI:10.1016/j.neunet.2025.107205
Zheng Wei, Xingjun Zhang, Zhendong Tan
{"title":"Unifying and revisiting Sharpness-Aware Minimization with noise-injected micro-batch scheduler for efficiency improvement","authors":"Zheng Wei,&nbsp;Xingjun Zhang,&nbsp;Zhendong Tan","doi":"10.1016/j.neunet.2025.107205","DOIUrl":null,"url":null,"abstract":"<div><div>Sharpness-aware minimization (SAM) has been proposed to improve generalization by encouraging the model to converge to a flatter region. However, SAM’s two sequential gradient computations lead to 2<span><math><mo>×</mo></math></span> computation overhead compared to the base optimizer (e.g., SGD). Recent works improve SAM’s efficiency either by switching between SAM and base optimizer or by reducing data samples. In this paper, we first propose the micro-batch scheduler to unify the above two ideas and summarize that the commonality of them is adopting a smaller micro-batch to approximate the perturbation. However, its role is not fully explored. Thus, we revisit the effect of micro-batch approximated perturbation on accuracy and efficiency and empirically observe that a too-small micro-batch causes accuracy degradation as it leads to a sharper loss landscape. To alleviate it, we inject random noise into the micro-batch approximated gradient in SAM’s first ascent step, which implicitly leverages random perturbation before SAM’s second descent step. The visualization results confirm that it encourages the model to converge to a flatter region. Extensive experiments with various models (e.g., ResNet-18/50, WideResNet-28-10, PyramidNet-110, and ViT-B/16, etc.) evaluated on CIFAR-10 and ImageNet-1K show that the proposed method achieves competitive accuracy with higher efficiency when compared to several efficient SAM variants (e.g., ESAM, LooKSAM-5, AE-SAM, K-SAM, etc.).</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107205"},"PeriodicalIF":6.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089360802500084X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Sharpness-aware minimization (SAM) has been proposed to improve generalization by encouraging the model to converge to a flatter region. However, SAM’s two sequential gradient computations lead to 2× computation overhead compared to the base optimizer (e.g., SGD). Recent works improve SAM’s efficiency either by switching between SAM and base optimizer or by reducing data samples. In this paper, we first propose the micro-batch scheduler to unify the above two ideas and summarize that the commonality of them is adopting a smaller micro-batch to approximate the perturbation. However, its role is not fully explored. Thus, we revisit the effect of micro-batch approximated perturbation on accuracy and efficiency and empirically observe that a too-small micro-batch causes accuracy degradation as it leads to a sharper loss landscape. To alleviate it, we inject random noise into the micro-batch approximated gradient in SAM’s first ascent step, which implicitly leverages random perturbation before SAM’s second descent step. The visualization results confirm that it encourages the model to converge to a flatter region. Extensive experiments with various models (e.g., ResNet-18/50, WideResNet-28-10, PyramidNet-110, and ViT-B/16, etc.) evaluated on CIFAR-10 and ImageNet-1K show that the proposed method achieves competitive accuracy with higher efficiency when compared to several efficient SAM variants (e.g., ESAM, LooKSAM-5, AE-SAM, K-SAM, etc.).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为了提高效率,统一并重新审视了带有注入噪声的微批调度程序的锐度感知最小化
提出了锐利感知最小化(SAM),通过鼓励模型收敛到更平坦的区域来提高泛化。然而,与基本优化器(例如SGD)相比,SAM的两个顺序梯度计算导致了2倍的计算开销。最近的工作通过在SAM和基本优化器之间切换或减少数据样本来提高SAM的效率。在本文中,我们首先提出了微批调度器来统一上述两种思想,并总结出它们的共同点是采用更小的微批来近似摄动。然而,其作用尚未得到充分探讨。因此,我们重新审视微批近似扰动对精度和效率的影响,并经验地观察到,太小的微批会导致精度下降,因为它会导致更尖锐的损失景观。为了缓解这一问题,我们将随机噪声注入到SAM第一上升步骤的微批近似梯度中,这隐含地利用了SAM第二下降步骤之前的随机扰动。可视化结果证实,该方法能促使模型收敛到更平坦的区域。在CIFAR-10和ImageNet-1K上对各种模型(如ResNet-18/50、WideResNet-28-10、PyramidNet-110和ViT-B/16等)进行了广泛的实验,结果表明,与几种有效的SAM变体(如ESAM、LooKSAM-5、AE-SAM、K-SAM等)相比,所提出的方法具有更高的效率和竞争性的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
期刊最新文献
Minimizing command timing variability is a key factor in skilled actions Inferring gene regulatory networks via adversarially regularized directed graph autoencoder A continual learning framework with long-term and multiple short-term memory networks A Multi-Agent Continual reinforcement learning framework with multi-Timescale replay and dynamic task classification A survey of recent advances in adversarial attack and defense on vision-language models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1