NEAR：机器学习模型性能的免训练预估器

arXiv - PHYS - Data Analysis, Statistics and Probability Pub Date : 2024-08-16 DOI:arxiv-2408.08776

Raphael T. Husistein, Markus Reiher, Marco Eckhoff

{"title":"NEAR：机器学习模型性能的免训练预估器","authors":"Raphael T. Husistein, Markus Reiher, Marco Eckhoff","doi":"arxiv-2408.08776","DOIUrl":null,"url":null,"abstract":"Artificial neural networks have been shown to be state-of-the-art machine\nlearning models in a wide variety of applications, including natural language\nprocessing and image recognition. However, building a performant neural network\nis a laborious task and requires substantial computing power. Neural\nArchitecture Search (NAS) addresses this issue by an automatic selection of the\noptimal network from a set of potential candidates. While many NAS methods\nstill require training of (some) neural networks, zero-cost proxies promise to\nidentify the optimal network without training. In this work, we propose the\nzero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on\nthe effective rank of the pre- and post-activation matrix, i.e., the values of\na neural network layer before and after applying its activation function. We\ndemonstrate the cutting-edge correlation between this network score and the\nmodel accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present\na simple approach to estimate the optimal layer sizes in multi-layer\nperceptrons. Furthermore, we show that this score can be utilized to select\nhyperparameters such as the activation function and the neural network weight\ninitialization scheme.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance\",\"authors\":\"Raphael T. Husistein, Markus Reiher, Marco Eckhoff\",\"doi\":\"arxiv-2408.08776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial neural networks have been shown to be state-of-the-art machine\\nlearning models in a wide variety of applications, including natural language\\nprocessing and image recognition. However, building a performant neural network\\nis a laborious task and requires substantial computing power. Neural\\nArchitecture Search (NAS) addresses this issue by an automatic selection of the\\noptimal network from a set of potential candidates. While many NAS methods\\nstill require training of (some) neural networks, zero-cost proxies promise to\\nidentify the optimal network without training. In this work, we propose the\\nzero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on\\nthe effective rank of the pre- and post-activation matrix, i.e., the values of\\na neural network layer before and after applying its activation function. We\\ndemonstrate the cutting-edge correlation between this network score and the\\nmodel accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present\\na simple approach to estimate the optimal layer sizes in multi-layer\\nperceptrons. Furthermore, we show that this score can be utilized to select\\nhyperparameters such as the activation function and the neural network weight\\ninitialization scheme.\",\"PeriodicalId\":501065,\"journal\":{\"name\":\"arXiv - PHYS - Data Analysis, Statistics and Probability\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Data Analysis, Statistics and Probability\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.08776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Data Analysis, Statistics and Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在自然语言处理和图像识别等多种应用中，人工神经网络已被证明是最先进的机器学习模型。然而，构建一个性能良好的神经网络是一项艰巨的任务，需要强大的计算能力。神经架构搜索（NAS）通过从一组潜在候选网络中自动选择最佳网络来解决这一问题。虽然许多 NAS 方法仍需要对（某些）神经网络进行训练，但零成本代理有望在无需训练的情况下识别出最佳网络。在这项工作中，我们提出了零成本代理 "激活等级网络表达性"（NEAR）。它基于激活前和激活后矩阵的有效秩，即神经网络层在应用激活函数前后的值。我们在 NAS-Bench-101 和 NATS-Bench-SSS/TSS 上展示了该网络得分与模型准确性之间的尖端相关性。此外，我们还介绍了一种估算多层感知器最佳层大小的简单方法。此外，我们还展示了可以利用这一分数来选择激活函数和神经网络权重初始化方案等参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

Artificial neural networks have been shown to be state-of-the-art machine learning models in a wide variety of applications, including natural language processing and image recognition. However, building a performant neural network is a laborious task and requires substantial computing power. Neural Architecture Search (NAS) addresses this issue by an automatic selection of the optimal network from a set of potential candidates. While many NAS methods still require training of (some) neural networks, zero-cost proxies promise to identify the optimal network without training. In this work, we propose the zero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on the effective rank of the pre- and post-activation matrix, i.e., the values of a neural network layer before and after applying its activation function. We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present a simple approach to estimate the optimal layer sizes in multi-layer perceptrons. Furthermore, we show that this score can be utilized to select hyperparameters such as the activation function and the neural network weight initialization scheme.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - PHYS - Data Analysis, Statistics and Probability

自引率

0.00%

发文量

期刊最新文献

PASS: An Asynchronous Probabilistic Processor for Next Generation Intelligence Astrometric Binary Classification Via Artificial Neural Networks XENONnT Analysis: Signal Reconstruction, Calibration and Event Selection Converting sWeights to Probabilities with Density Ratios Challenges and perspectives in recurrence analyses of event time series