Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De
{"title":"XAI-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI.","authors":"Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De","doi":"10.1007/s11227-023-05356-3","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-31"},"PeriodicalIF":2.5000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177719/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercomputing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11227-023-05356-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.
期刊介绍:
The Journal of Supercomputing publishes papers on the technology, architecture and systems, algorithms, languages and programs, performance measures and methods, and applications of all aspects of Supercomputing. Tutorial and survey papers are intended for workers and students in the fields associated with and employing advanced computer systems. The journal also publishes letters to the editor, especially in areas relating to policy, succinct statements of paradoxes, intuitively puzzling results, partial results and real needs.
Published theoretical and practical papers are advanced, in-depth treatments describing new developments and new ideas. Each includes an introduction summarizing prior, directly pertinent work that is useful for the reader to understand, in order to appreciate the advances being described.