XAI-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI.

IF 2.5 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Supercomputing Pub Date : 2023-05-12 DOI:10.1007/s11227-023-05356-3
Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De
{"title":"XAI-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI.","authors":"Surajit Das,&nbsp;Mahamuda Sultana,&nbsp;Suman Bhattacharya,&nbsp;Diganta Sengupta,&nbsp;Debashis De","doi":"10.1007/s11227-023-05356-3","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-31"},"PeriodicalIF":2.5000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177719/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercomputing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11227-023-05356-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
XAI还原:使用可解释人工智能对心脏病分类进行降维,但仍能保持准确性。
机器学习(ML)已用于心脏病分类近十年,尽管了解黑匣子的内部工作,即不可解释的模型,仍然是一个棘手的问题。这种ML模型中的另一个主要挑战是维度诅咒,导致使用综合特征向量集(CFV)进行资源密集型分类。这项研究的重点是使用可解释的人工智能进行降维,而没有就心脏病分类的准确性进行协商。使用SHAP的四个可解释的ML模型用于分类,这些模型反映了CFV中每个特征的特征贡献(FC)和特征权重(FW),以生成最终结果。在生成降维特征子集(FS)时考虑了FC和FW。研究结果如下:(a)XGBoost对心脏病进行了最好的解释分类,模型准确度比现有的最佳建议提高了2%;(b)使用FS的可解释分类比大多数文献建议表现出更好的准确度;(c)随着可解释性的提高,使用XGBoost分类器对心脏病进行分类可以保持准确性,并且(d)在XGBoost基于特征贡献的分类器上使用的五种可解释技术所反映的所有解释中,都表现出了负责心脏病诊断的前四个特征。据我们所知,这是首次尝试使用五种可解释的技术来解释XGBoost分类用于心脏病诊断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Supercomputing
Journal of Supercomputing 工程技术-工程:电子与电气
CiteScore
6.30
自引率
12.10%
发文量
734
审稿时长
13 months
期刊介绍: The Journal of Supercomputing publishes papers on the technology, architecture and systems, algorithms, languages and programs, performance measures and methods, and applications of all aspects of Supercomputing. Tutorial and survey papers are intended for workers and students in the fields associated with and employing advanced computer systems. The journal also publishes letters to the editor, especially in areas relating to policy, succinct statements of paradoxes, intuitively puzzling results, partial results and real needs. Published theoretical and practical papers are advanced, in-depth treatments describing new developments and new ideas. Each includes an introduction summarizing prior, directly pertinent work that is useful for the reader to understand, in order to appreciate the advances being described.
期刊最新文献
Topic sentiment analysis based on deep neural network using document embedding technique. A Fechner multiscale local descriptor for face recognition. Data quality model for assessing public COVID-19 big datasets. BTDA: Two-factor dynamic identity authentication scheme for data trading based on alliance chain. Driving behavior analysis and classification by vehicle OBD data using machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1