Are Neural Architecture Search Benchmarks Well Designed? A Deeper Look Into Operation Importance

Vasco Lopes, Bruno Degardin, L. Alexandre
{"title":"Are Neural Architecture Search Benchmarks Well Designed? A Deeper Look Into Operation Importance","authors":"Vasco Lopes, Bruno Degardin, L. Alexandre","doi":"10.48550/arXiv.2303.16938","DOIUrl":null,"url":null,"abstract":"Neural Architecture Search (NAS) benchmarks significantly improved the capability of developing and comparing NAS methods while at the same time drastically reduced the computational overhead by providing meta-information about thousands of trained neural networks. However, tabular benchmarks have several drawbacks that can hinder fair comparisons and provide unreliable results. These usually focus on providing a small pool of operations in heavily constrained search spaces -- usually cell-based neural networks with pre-defined outer-skeletons. In this work, we conducted an empirical analysis of the widely used NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks in terms of their generability and how different operations influence the performance of the generated architectures. We found that only a subset of the operation pool is required to generate architectures close to the upper-bound of the performance range. Also, the performance distribution is negatively skewed, having a higher density of architectures in the upper-bound range. We consistently found convolution layers to have the highest impact on the architecture's performance, and that specific combination of operations favors top-scoring architectures. These findings shed insights on the correct evaluation and comparison of NAS methods using NAS benchmarks, showing that directly searching on NAS-Bench-201, ImageNet16-120 and TransNAS-Bench-101 produces more reliable results than searching only on CIFAR-10. Furthermore, with this work we provide suggestions for future benchmark evaluations and design. The code used to conduct the evaluations is available at https://github.com/VascoLopes/NAS-Benchmark-Evaluation.","PeriodicalId":13641,"journal":{"name":"Inf. Sci.","volume":"5 1","pages":"119695"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.16938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Neural Architecture Search (NAS) benchmarks significantly improved the capability of developing and comparing NAS methods while at the same time drastically reduced the computational overhead by providing meta-information about thousands of trained neural networks. However, tabular benchmarks have several drawbacks that can hinder fair comparisons and provide unreliable results. These usually focus on providing a small pool of operations in heavily constrained search spaces -- usually cell-based neural networks with pre-defined outer-skeletons. In this work, we conducted an empirical analysis of the widely used NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks in terms of their generability and how different operations influence the performance of the generated architectures. We found that only a subset of the operation pool is required to generate architectures close to the upper-bound of the performance range. Also, the performance distribution is negatively skewed, having a higher density of architectures in the upper-bound range. We consistently found convolution layers to have the highest impact on the architecture's performance, and that specific combination of operations favors top-scoring architectures. These findings shed insights on the correct evaluation and comparison of NAS methods using NAS benchmarks, showing that directly searching on NAS-Bench-201, ImageNet16-120 and TransNAS-Bench-101 produces more reliable results than searching only on CIFAR-10. Furthermore, with this work we provide suggestions for future benchmark evaluations and design. The code used to conduct the evaluations is available at https://github.com/VascoLopes/NAS-Benchmark-Evaluation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
神经架构搜索基准设计得好吗?对操作重要性的深入探讨
Neural Architecture Search (NAS)基准测试显著提高了开发和比较NAS方法的能力,同时通过提供关于数千个训练过的神经网络的元信息,大大降低了计算开销。然而,表格基准测试有几个缺点,可能会妨碍公平的比较,并提供不可靠的结果。这些通常侧重于在严重受限的搜索空间中提供一小部分操作——通常是具有预定义外部骨架的基于细胞的神经网络。在这项工作中,我们对广泛使用的NAS-Bench-101、NAS-Bench-201和TransNAS-Bench-101基准进行了实证分析,分析了它们的可通用性,以及不同的操作如何影响所生成架构的性能。我们发现,只需要操作池的一个子集就可以生成接近性能范围上限的体系结构。此外,性能分布呈负偏态,在上限范围内具有更高的架构密度。我们一直发现卷积层对体系结构的性能有最大的影响,并且特定的操作组合有利于得分最高的体系结构。这些发现揭示了使用NAS基准正确评估和比较NAS方法的见解,表明直接在NAS- bench -201、ImageNet16-120和TransNAS-Bench-101上搜索比仅在CIFAR-10上搜索产生更可靠的结果。此外,通过这项工作,我们为未来的基准评估和设计提供了建议。用于进行评估的代码可在https://github.com/VascoLopes/NAS-Benchmark-Evaluation上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Borderline scenarios of outranking classification based on α-cut variation in fuzzy intervals: Application in police investigations A group decision-making and optimization method based on relative inverse number Representations of L-fuzzy rough approximation operators Distributed quantile regression in decentralized optimization Word2Vec-based efficient privacy-preserving shared representation learning for federated recommendation system in a cross-device setting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1