基于SMOTE的非平衡数据集过采样的可解释性

Aum Patil, Aman Framewala, F. Kazi
{"title":"基于SMOTE的非平衡数据集过采样的可解释性","authors":"Aum Patil, Aman Framewala, F. Kazi","doi":"10.1109/ICICT50521.2020.00015","DOIUrl":null,"url":null,"abstract":"Since the advent of Artificial Intelligence (AI), the problem of imbalanced datasets and the lack of interpretability of complex AI models has been a matter of concern for the research community. These datasets contain a very low proportion of one class (minority class) and very large proportion of another class (majority class). Even though the quantitative representation is less for minority class they have high qualitative importance as the cost associated in case of misclassification in these domains is very high. The paper presents a novel solution to deal with the issue of imbalanced dataset by using the proven method of resampling Synthetic Minority Oversampling Technique (SMOTE). Further, the interpretability of such an approach is demonstrated by some powerful eXplainable AI (XAI) techniques such as LRP, SHAP and LIME. In this paper state-of-art models like Deep Learning and Boosting classifiers were trained to classify fraud instances with high accuracy and proved to be reliable by producing explanations for their predicted instances. The results of confusion matrices and explanations showcase excellent performance and reliability of the models.","PeriodicalId":445000,"journal":{"name":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Explainability of SMOTE Based Oversampling for Imbalanced Dataset Problems\",\"authors\":\"Aum Patil, Aman Framewala, F. Kazi\",\"doi\":\"10.1109/ICICT50521.2020.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the advent of Artificial Intelligence (AI), the problem of imbalanced datasets and the lack of interpretability of complex AI models has been a matter of concern for the research community. These datasets contain a very low proportion of one class (minority class) and very large proportion of another class (majority class). Even though the quantitative representation is less for minority class they have high qualitative importance as the cost associated in case of misclassification in these domains is very high. The paper presents a novel solution to deal with the issue of imbalanced dataset by using the proven method of resampling Synthetic Minority Oversampling Technique (SMOTE). Further, the interpretability of such an approach is demonstrated by some powerful eXplainable AI (XAI) techniques such as LRP, SHAP and LIME. In this paper state-of-art models like Deep Learning and Boosting classifiers were trained to classify fraud instances with high accuracy and proved to be reliable by producing explanations for their predicted instances. The results of confusion matrices and explanations showcase excellent performance and reliability of the models.\",\"PeriodicalId\":445000,\"journal\":{\"name\":\"2020 3rd International Conference on Information and Computer Technologies (ICICT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Information and Computer Technologies (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT50521.2020.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT50521.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

自人工智能(AI)出现以来,数据集不平衡和复杂AI模型缺乏可解释性的问题一直是研究界关注的问题。这些数据集包含非常低比例的一个类(少数类)和非常大比例的另一个类(多数类)。尽管少数类别的定量表征较少,但由于在这些领域中错误分类的相关成本非常高,因此它们具有很高的定性重要性。本文提出了一种新的方法来处理数据不平衡问题,该方法是利用已被证明的重采样方法合成少数派过采样技术(SMOTE)。此外,这种方法的可解释性通过一些强大的可解释AI (XAI)技术(如LRP、SHAP和LIME)得到了证明。在本文中,像深度学习和增强分类器这样的最先进的模型被训练以高精度地对欺诈实例进行分类,并通过为其预测的实例提供解释来证明其可靠性。混淆矩阵和解释的结果表明模型具有良好的性能和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Explainability of SMOTE Based Oversampling for Imbalanced Dataset Problems
Since the advent of Artificial Intelligence (AI), the problem of imbalanced datasets and the lack of interpretability of complex AI models has been a matter of concern for the research community. These datasets contain a very low proportion of one class (minority class) and very large proportion of another class (majority class). Even though the quantitative representation is less for minority class they have high qualitative importance as the cost associated in case of misclassification in these domains is very high. The paper presents a novel solution to deal with the issue of imbalanced dataset by using the proven method of resampling Synthetic Minority Oversampling Technique (SMOTE). Further, the interpretability of such an approach is demonstrated by some powerful eXplainable AI (XAI) techniques such as LRP, SHAP and LIME. In this paper state-of-art models like Deep Learning and Boosting classifiers were trained to classify fraud instances with high accuracy and proved to be reliable by producing explanations for their predicted instances. The results of confusion matrices and explanations showcase excellent performance and reliability of the models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Significance of Agile Software Development and SQA Powered by Automation Improved Generalizability of Deep-Fakes Detection using Transfer Learning Based CNN Framework A New Homomorphic Message Authentication Code Scheme for Network Coding Conspiracy and Rumor Correction: Analysis of Social Media Users' Comments A Novel System for Ammonia Gas Control in Broiler Production Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1