Raphael T. Husistein, Markus Reiher, Marco Eckhoff
{"title":"NEAR:机器学习模型性能的免训练预估器","authors":"Raphael T. Husistein, Markus Reiher, Marco Eckhoff","doi":"arxiv-2408.08776","DOIUrl":null,"url":null,"abstract":"Artificial neural networks have been shown to be state-of-the-art machine\nlearning models in a wide variety of applications, including natural language\nprocessing and image recognition. However, building a performant neural network\nis a laborious task and requires substantial computing power. Neural\nArchitecture Search (NAS) addresses this issue by an automatic selection of the\noptimal network from a set of potential candidates. While many NAS methods\nstill require training of (some) neural networks, zero-cost proxies promise to\nidentify the optimal network without training. In this work, we propose the\nzero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on\nthe effective rank of the pre- and post-activation matrix, i.e., the values of\na neural network layer before and after applying its activation function. We\ndemonstrate the cutting-edge correlation between this network score and the\nmodel accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present\na simple approach to estimate the optimal layer sizes in multi-layer\nperceptrons. Furthermore, we show that this score can be utilized to select\nhyperparameters such as the activation function and the neural network weight\ninitialization scheme.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance\",\"authors\":\"Raphael T. Husistein, Markus Reiher, Marco Eckhoff\",\"doi\":\"arxiv-2408.08776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial neural networks have been shown to be state-of-the-art machine\\nlearning models in a wide variety of applications, including natural language\\nprocessing and image recognition. However, building a performant neural network\\nis a laborious task and requires substantial computing power. Neural\\nArchitecture Search (NAS) addresses this issue by an automatic selection of the\\noptimal network from a set of potential candidates. While many NAS methods\\nstill require training of (some) neural networks, zero-cost proxies promise to\\nidentify the optimal network without training. In this work, we propose the\\nzero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on\\nthe effective rank of the pre- and post-activation matrix, i.e., the values of\\na neural network layer before and after applying its activation function. We\\ndemonstrate the cutting-edge correlation between this network score and the\\nmodel accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present\\na simple approach to estimate the optimal layer sizes in multi-layer\\nperceptrons. Furthermore, we show that this score can be utilized to select\\nhyperparameters such as the activation function and the neural network weight\\ninitialization scheme.\",\"PeriodicalId\":501065,\"journal\":{\"name\":\"arXiv - PHYS - Data Analysis, Statistics and Probability\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Data Analysis, Statistics and Probability\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.08776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Data Analysis, Statistics and Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在自然语言处理和图像识别等多种应用中,人工神经网络已被证明是最先进的机器学习模型。然而,构建一个性能良好的神经网络是一项艰巨的任务,需要强大的计算能力。神经架构搜索(NAS)通过从一组潜在候选网络中自动选择最佳网络来解决这一问题。虽然许多 NAS 方法仍需要对(某些)神经网络进行训练,但零成本代理有望在无需训练的情况下识别出最佳网络。在这项工作中,我们提出了零成本代理 "激活等级网络表达性"(NEAR)。它基于激活前和激活后矩阵的有效秩,即神经网络层在应用激活函数前后的值。我们在 NAS-Bench-101 和 NATS-Bench-SSS/TSS 上展示了该网络得分与模型准确性之间的尖端相关性。此外,我们还介绍了一种估算多层感知器最佳层大小的简单方法。此外,我们还展示了可以利用这一分数来选择激活函数和神经网络权重初始化方案等参数。
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
Artificial neural networks have been shown to be state-of-the-art machine
learning models in a wide variety of applications, including natural language
processing and image recognition. However, building a performant neural network
is a laborious task and requires substantial computing power. Neural
Architecture Search (NAS) addresses this issue by an automatic selection of the
optimal network from a set of potential candidates. While many NAS methods
still require training of (some) neural networks, zero-cost proxies promise to
identify the optimal network without training. In this work, we propose the
zero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on
the effective rank of the pre- and post-activation matrix, i.e., the values of
a neural network layer before and after applying its activation function. We
demonstrate the cutting-edge correlation between this network score and the
model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present
a simple approach to estimate the optimal layer sizes in multi-layer
perceptrons. Furthermore, we show that this score can be utilized to select
hyperparameters such as the activation function and the neural network weight
initialization scheme.