Removing Neurons From Deep Neural Networks Trained With Tabular Data

IEEE Open Journal of the Computer Society Pub Date : 2024-09-25 DOI:10.1109/OJCS.2024.3467182

Antti Klemetti;Mikko Raatikainen;Juhani Kivimäki;Lalli Myllyaho;Jukka K. Nurminen

{"title":"Removing Neurons From Deep Neural Networks Trained With Tabular Data","authors":"Antti Klemetti;Mikko Raatikainen;Juhani Kivimäki;Lalli Myllyaho;Jukka K. Nurminen","doi":"10.1109/OJCS.2024.3467182","DOIUrl":null,"url":null,"abstract":"Deep neural networks bear substantial cloud computational loads and often surpass client devices' capabilities. Research has concentrated on reducing the inference burden of convolutional neural networks processing images. Unstructured pruning, which leads to sparse matrices requiring specialized hardware, has been extensively studied. However, neural networks trained with tabular data and structured pruning, which produces dense matrices handled by standard hardware, are less explored. We compare two approaches: 1) Removing neurons followed by training from scratch, and 2) Structured pruning followed by fine-tuning through additional training over a limited number of epochs. We evaluate these approaches using three models of varying sizes (1.5, 9.2, and 118.7 million parameters) from Kaggle-winning neural networks trained with tabular data. Approach 1 consistently outperformed Approach 2 in predictive performance. The models from Approach 1 had 52%, 8%, and 12% fewer parameters than the original models, with latency reductions of 18%, 5%, and 5%, respectively. Approach 2 required at least one epoch of fine-tuning for recovering predictive performance, with further fine-tuning offering diminishing returns. Approach 1 yields lighter models for retraining in the presence of concept drift and avoids shifting computational load from inference to training, which is inherent in Approach 2. However, Approach 2 can be used to pinpoint the layers that have the least impact on the model's predictive performance when neurons are removed. We found that the feed-forward component of the transformer architecture used in large language models is a promising target for neuron removal.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"542-552"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10693557","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10693557/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks bear substantial cloud computational loads and often surpass client devices' capabilities. Research has concentrated on reducing the inference burden of convolutional neural networks processing images. Unstructured pruning, which leads to sparse matrices requiring specialized hardware, has been extensively studied. However, neural networks trained with tabular data and structured pruning, which produces dense matrices handled by standard hardware, are less explored. We compare two approaches: 1) Removing neurons followed by training from scratch, and 2) Structured pruning followed by fine-tuning through additional training over a limited number of epochs. We evaluate these approaches using three models of varying sizes (1.5, 9.2, and 118.7 million parameters) from Kaggle-winning neural networks trained with tabular data. Approach 1 consistently outperformed Approach 2 in predictive performance. The models from Approach 1 had 52%, 8%, and 12% fewer parameters than the original models, with latency reductions of 18%, 5%, and 5%, respectively. Approach 2 required at least one epoch of fine-tuning for recovering predictive performance, with further fine-tuning offering diminishing returns. Approach 1 yields lighter models for retraining in the presence of concept drift and avoids shifting computational load from inference to training, which is inherent in Approach 2. However, Approach 2 can be used to pinpoint the layers that have the least impact on the model's predictive performance when neurons are removed. We found that the feed-forward component of the transformer architecture used in large language models is a promising target for neuron removal.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从使用表格数据训练的深度神经网络中移除神经元

深度神经网络承担着巨大的云计算负荷，往往超过客户端设备的能力。研究主要集中在减轻卷积神经网络处理图像的推理负担上。非结构化剪枝会导致矩阵稀疏，需要专用硬件，已被广泛研究。然而，使用表格数据训练的神经网络和结构化剪枝（可产生由标准硬件处理的密集矩阵）的研究较少。我们比较了两种方法：1) 删除神经元，然后从头开始训练；2) 结构化剪枝，然后通过有限次数的额外训练进行微调。我们使用三个不同规模的模型（1.5、9.2 和 1.187 亿个参数）对这两种方法进行了评估，这些模型来自使用表格数据训练的 Kaggle 获奖神经网络。方法 1 的预测性能始终优于方法 2。与原始模型相比，方法 1 的模型参数分别减少了 52%、8% 和 12%，延迟分别减少了 18%、5% 和 5%。方法 2 至少需要一个历时的微调才能恢复预测性能，进一步微调的回报率会越来越低。在概念漂移的情况下，方法 1 可以产生较轻的模型进行再训练，并避免将计算负荷从推理转移到训练上，而这正是方法 2 所固有的。不过，方法 2 可用来确定神经元被移除时对模型预测性能影响最小的层。我们发现，在大型语言模型中使用的变压器架构的前馈组件是一个很有希望的神经元移除目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量