论随机森林对抗无目标数据中毒的鲁棒性:基于集合的方法

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Sustainable Computing Pub Date : 2023-07-07 DOI:10.1109/TSUSC.2023.3293269
Marco Anisetti;Claudio A. Ardagna;Alessandro Balestrucci;Nicola Bena;Ernesto Damiani;Chan Yeob Yeun
{"title":"论随机森林对抗无目标数据中毒的鲁棒性:基于集合的方法","authors":"Marco Anisetti;Claudio A. Ardagna;Alessandro Balestrucci;Nicola Bena;Ernesto Damiani;Chan Yeob Yeun","doi":"10.1109/TSUSC.2023.3293269","DOIUrl":null,"url":null,"abstract":"Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"8 4","pages":"540-554"},"PeriodicalIF":3.0000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach\",\"authors\":\"Marco Anisetti;Claudio A. Ardagna;Alessandro Balestrucci;Nicola Bena;Ernesto Damiani;Chan Yeob Yeun\",\"doi\":\"10.1109/TSUSC.2023.3293269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"8 4\",\"pages\":\"540-554\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10175648/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10175648/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

机器学习正变得无处不在。从金融到医学,机器学习模型正在推动决策过程,甚至在某些任务中超越人类。然而,在预测质量方面取得的这一巨大进步并没有在此类模型和相应预测的安全性方面找到对应的解决方案,对训练集的部分内容进行扰动(中毒)会严重破坏模型的准确性。在过去十年中,有关中毒攻击和防御的研究受到越来越多的关注,并产生了几种有望提高机器学习鲁棒性的解决方案。其中,基于集合的防御,即在部分训练集上训练不同的模型,然后汇总它们的预测结果,以线性开销为代价,提供了强有力的理论保证。令人惊讶的是,基于集合的防御方法对基础模型不做任何限制,但却没有应用于提高随机森林的鲁棒性。本文的研究旨在通过设计和实施一种新颖的基于哈希值的集合方法来填补这一空白,从而保护随机森林免受无针对性的随机中毒攻击。广泛的实验评估衡量了我们的方法抵御各种攻击的性能,以及在资源消耗和性能方面的可持续性,并将其与基于随机森林的传统单一模型进行了比较。最后的讨论介绍了我们的主要发现,并将我们的方法与现有的针对随机森林的中毒防御进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach
Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Sustainable Computing
IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization
CiteScore
7.70
自引率
2.60%
发文量
54
期刊最新文献
Editorial Dynamic Event-Triggered State Estimation for Power Harmonics With Quantization Effects: A Zonotopic Set-Membership Approach 2024 Reviewers List Deadline-Aware Cost and Energy Efficient Offloading in Mobile Edge Computing Impacts of Increasing Temperature and Relative Humidity in Air-Cooled Tropical Data Centers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1