基于不确定性的贝叶斯深度神经网络压缩知识提炼

IF 3.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Approximate Reasoning Pub Date : 2024-10-01 DOI:10.1016/j.ijar.2024.109301

Mina Hemmatian , Ali Shahzadi , Saeed Mozaffari

{"title":"基于不确定性的贝叶斯深度神经网络压缩知识提炼","authors":"Mina Hemmatian , Ali Shahzadi , Saeed Mozaffari","doi":"10.1016/j.ijar.2024.109301","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning models have been widely employed across various fields. In real-world scenarios, especially safety-critical applications, quantifying uncertainty is as crucial as achieving high accuracy. To address this concern, Bayesian deep neural networks (BDNNs) emerged to estimate two different types of uncertainty: Aleatoric and Epistemic. Nevertheless, implementing a BDNN on resource-constrained devices poses challenges due to the substantial computational and storage costs imposed by approximation inference techniques. Thus, efficient compression methods should be utilized. We propose an uncertainty-based knowledge distillation method to compress BDNNs. Knowledge distillation is a model compression technique that involves transferring knowledge from a complex network, known as the teacher network, to a simpler one, referred to as the student network. Our method incorporates uncertainty into knowledge distillation to address situations where inappropriate teacher supervision undermines compression performance. We utilize the Epistemic uncertainty of teacher predictions to tailor supervision for each sample individually to take into account teacher's limited knowledge. Additionally, we adjust the temperature parameter of the distillation process for each sample based on the Aleatoric uncertainty of the teacher predictions, ensuring that the student receives appropriate supervision even in the presence of ambiguous data. As a result, the proposed method enables the Bayesian student network to be trained under both appropriate supervision of the Bayesian teacher network and ground truth labels. We evaluated our method on the CIFAR-10, CIFAR-100, and RAF-DB datasets, demonstrating notable improvements in accuracy over state-of-the-art knowledge distillation-based methods. Furthermore, the robustness of our approach was assessed through testing weakly trained teacher networks and the analysis of blurred and low-resolution data, which have high uncertainty. Experimental results show that the proposed method outperformed existing methods.</div></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"175 ","pages":"Article 109301"},"PeriodicalIF":3.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncertainty-based knowledge distillation for Bayesian deep neural network compression\",\"authors\":\"Mina Hemmatian , Ali Shahzadi , Saeed Mozaffari\",\"doi\":\"10.1016/j.ijar.2024.109301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep learning models have been widely employed across various fields. In real-world scenarios, especially safety-critical applications, quantifying uncertainty is as crucial as achieving high accuracy. To address this concern, Bayesian deep neural networks (BDNNs) emerged to estimate two different types of uncertainty: Aleatoric and Epistemic. Nevertheless, implementing a BDNN on resource-constrained devices poses challenges due to the substantial computational and storage costs imposed by approximation inference techniques. Thus, efficient compression methods should be utilized. We propose an uncertainty-based knowledge distillation method to compress BDNNs. Knowledge distillation is a model compression technique that involves transferring knowledge from a complex network, known as the teacher network, to a simpler one, referred to as the student network. Our method incorporates uncertainty into knowledge distillation to address situations where inappropriate teacher supervision undermines compression performance. We utilize the Epistemic uncertainty of teacher predictions to tailor supervision for each sample individually to take into account teacher's limited knowledge. Additionally, we adjust the temperature parameter of the distillation process for each sample based on the Aleatoric uncertainty of the teacher predictions, ensuring that the student receives appropriate supervision even in the presence of ambiguous data. As a result, the proposed method enables the Bayesian student network to be trained under both appropriate supervision of the Bayesian teacher network and ground truth labels. We evaluated our method on the CIFAR-10, CIFAR-100, and RAF-DB datasets, demonstrating notable improvements in accuracy over state-of-the-art knowledge distillation-based methods. Furthermore, the robustness of our approach was assessed through testing weakly trained teacher networks and the analysis of blurred and low-resolution data, which have high uncertainty. Experimental results show that the proposed method outperformed existing methods.</div></div>\",\"PeriodicalId\":13842,\"journal\":{\"name\":\"International Journal of Approximate Reasoning\",\"volume\":\"175 \",\"pages\":\"Article 109301\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Approximate Reasoning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0888613X24001889\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X24001889","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

深度学习模型已被广泛应用于各个领域。在现实世界的应用场景中，尤其是安全关键型应用中，量化不确定性与实现高精度同样重要。为了解决这一问题，贝叶斯深度神经网络（BDNN）应运而生，用于估计两种不同类型的不确定性：Aleatoric 和 Epistemic。然而，由于近似推理技术需要大量的计算和存储成本，在资源受限的设备上实施贝叶斯深度神经网络面临着挑战。因此，应采用高效的压缩方法。我们提出了一种基于不确定性的知识蒸馏方法来压缩 BDNN。知识蒸馏是一种模型压缩技术，涉及将复杂网络（称为教师网络）中的知识转移到更简单的网络（称为学生网络）中。我们的方法将不确定性纳入知识蒸馏，以解决教师监督不当会影响压缩性能的情况。我们利用教师预测的认识不确定性，对每个样本进行量身定制的监督，以考虑教师的有限知识。此外，我们还根据教师预测的不确定性（Aleatoric uncertainty）调整每个样本的蒸馏过程温度参数，确保学生即使在数据不明确的情况下也能得到适当的监督。因此，所提出的方法能使贝叶斯学生网络在贝叶斯教师网络和地面实况标签的适当监督下得到训练。我们在 CIFAR-10、CIFAR-100 和 RAF-DB 数据集上对我们的方法进行了评估，结果表明与最先进的基于知识提炼的方法相比，我们的方法在准确性上有显著提高。此外，我们还通过测试训练不足的教师网络以及分析具有高不确定性的模糊和低分辨率数据，评估了我们方法的鲁棒性。实验结果表明，所提出的方法优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Uncertainty-based knowledge distillation for Bayesian deep neural network compression

Deep learning models have been widely employed across various fields. In real-world scenarios, especially safety-critical applications, quantifying uncertainty is as crucial as achieving high accuracy. To address this concern, Bayesian deep neural networks (BDNNs) emerged to estimate two different types of uncertainty: Aleatoric and Epistemic. Nevertheless, implementing a BDNN on resource-constrained devices poses challenges due to the substantial computational and storage costs imposed by approximation inference techniques. Thus, efficient compression methods should be utilized. We propose an uncertainty-based knowledge distillation method to compress BDNNs. Knowledge distillation is a model compression technique that involves transferring knowledge from a complex network, known as the teacher network, to a simpler one, referred to as the student network. Our method incorporates uncertainty into knowledge distillation to address situations where inappropriate teacher supervision undermines compression performance. We utilize the Epistemic uncertainty of teacher predictions to tailor supervision for each sample individually to take into account teacher's limited knowledge. Additionally, we adjust the temperature parameter of the distillation process for each sample based on the Aleatoric uncertainty of the teacher predictions, ensuring that the student receives appropriate supervision even in the presence of ambiguous data. As a result, the proposed method enables the Bayesian student network to be trained under both appropriate supervision of the Bayesian teacher network and ground truth labels. We evaluated our method on the CIFAR-10, CIFAR-100, and RAF-DB datasets, demonstrating notable improvements in accuracy over state-of-the-art knowledge distillation-based methods. Furthermore, the robustness of our approach was assessed through testing weakly trained teacher networks and the analysis of blurred and low-resolution data, which have high uncertainty. Experimental results show that the proposed method outperformed existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.

期刊最新文献

Incremental attribute reduction with α,β-level intuitionistic fuzzy sets Anomaly detection based on improved k-nearest neighbor rough sets Fuzzy centrality measures in social network analysis: Theory and application in a university department collaboration network Editorial Board Inner product reduction for fuzzy formal contexts