Removing Neurons From Deep Neural Networks Trained With Tabular Data

Antti Klemetti;Mikko Raatikainen;Juhani Kivimäki;Lalli Myllyaho;Jukka K. Nurminen
{"title":"Removing Neurons From Deep Neural Networks Trained With Tabular Data","authors":"Antti Klemetti;Mikko Raatikainen;Juhani Kivimäki;Lalli Myllyaho;Jukka K. Nurminen","doi":"10.1109/OJCS.2024.3467182","DOIUrl":null,"url":null,"abstract":"Deep neural networks bear substantial cloud computational loads and often surpass client devices' capabilities. Research has concentrated on reducing the inference burden of convolutional neural networks processing images. Unstructured pruning, which leads to sparse matrices requiring specialized hardware, has been extensively studied. However, neural networks trained with tabular data and structured pruning, which produces dense matrices handled by standard hardware, are less explored. We compare two approaches: 1) Removing neurons followed by training from scratch, and 2) Structured pruning followed by fine-tuning through additional training over a limited number of epochs. We evaluate these approaches using three models of varying sizes (1.5, 9.2, and 118.7 million parameters) from Kaggle-winning neural networks trained with tabular data. Approach 1 consistently outperformed Approach 2 in predictive performance. The models from Approach 1 had 52%, 8%, and 12% fewer parameters than the original models, with latency reductions of 18%, 5%, and 5%, respectively. Approach 2 required at least one epoch of fine-tuning for recovering predictive performance, with further fine-tuning offering diminishing returns. Approach 1 yields lighter models for retraining in the presence of concept drift and avoids shifting computational load from inference to training, which is inherent in Approach 2. However, Approach 2 can be used to pinpoint the layers that have the least impact on the model's predictive performance when neurons are removed. We found that the feed-forward component of the transformer architecture used in large language models is a promising target for neuron removal.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"542-552"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10693557","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10693557/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deep neural networks bear substantial cloud computational loads and often surpass client devices' capabilities. Research has concentrated on reducing the inference burden of convolutional neural networks processing images. Unstructured pruning, which leads to sparse matrices requiring specialized hardware, has been extensively studied. However, neural networks trained with tabular data and structured pruning, which produces dense matrices handled by standard hardware, are less explored. We compare two approaches: 1) Removing neurons followed by training from scratch, and 2) Structured pruning followed by fine-tuning through additional training over a limited number of epochs. We evaluate these approaches using three models of varying sizes (1.5, 9.2, and 118.7 million parameters) from Kaggle-winning neural networks trained with tabular data. Approach 1 consistently outperformed Approach 2 in predictive performance. The models from Approach 1 had 52%, 8%, and 12% fewer parameters than the original models, with latency reductions of 18%, 5%, and 5%, respectively. Approach 2 required at least one epoch of fine-tuning for recovering predictive performance, with further fine-tuning offering diminishing returns. Approach 1 yields lighter models for retraining in the presence of concept drift and avoids shifting computational load from inference to training, which is inherent in Approach 2. However, Approach 2 can be used to pinpoint the layers that have the least impact on the model's predictive performance when neurons are removed. We found that the feed-forward component of the transformer architecture used in large language models is a promising target for neuron removal.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从使用表格数据训练的深度神经网络中移除神经元
深度神经网络承担着巨大的云计算负荷,往往超过客户端设备的能力。研究主要集中在减轻卷积神经网络处理图像的推理负担上。非结构化剪枝会导致矩阵稀疏,需要专用硬件,已被广泛研究。然而,使用表格数据训练的神经网络和结构化剪枝(可产生由标准硬件处理的密集矩阵)的研究较少。我们比较了两种方法:1) 删除神经元,然后从头开始训练;2) 结构化剪枝,然后通过有限次数的额外训练进行微调。我们使用三个不同规模的模型(1.5、9.2 和 1.187 亿个参数)对这两种方法进行了评估,这些模型来自使用表格数据训练的 Kaggle 获奖神经网络。方法 1 的预测性能始终优于方法 2。与原始模型相比,方法 1 的模型参数分别减少了 52%、8% 和 12%,延迟分别减少了 18%、5% 和 5%。方法 2 至少需要一个历时的微调才能恢复预测性能,进一步微调的回报率会越来越低。在概念漂移的情况下,方法 1 可以产生较轻的模型进行再训练,并避免将计算负荷从推理转移到训练上,而这正是方法 2 所固有的。不过,方法 2 可用来确定神经元被移除时对模型预测性能影响最小的层。我们发现,在大型语言模型中使用的变压器架构的前馈组件是一个很有希望的神经元移除目标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
12.60
自引率
0.00%
发文量
0
期刊最新文献
Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers Video-Based Deception Detection via Capsule Network With Channel-Wise Attention and Supervised Contrastive Learning An Auditable, Privacy-Preserving, Transparent Unspent Transaction Output Model for Blockchain-Based Central Bank Digital Currency An Innovative Dense ResU-Net Architecture With T-Max-Avg Pooling for Advanced Crack Detection in Concrete Structures Polarity Classification of Low Resource Roman Urdu and Movie Reviews Sentiments Using Machine Learning-Based Ensemble Approaches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1