{"title":"Prediction and Analysis of Heart Disease Using Machine Learning","authors":"Yu Lin","doi":"10.1109/RAAI52226.2021.9507928","DOIUrl":null,"url":null,"abstract":"Heart disease is one of the most significant causes of global mortality since its intricacy and the rate of misdiagnosis have brought a great challenge to medical workers. As machine learning has shown robust efficacy in decision-making and predictions, it is essential to construct a machine learning model to assist with heart disease diagnosis. In this paper, a heart-disease dataset from Cleveland was analyzed and preprocessed including cleaning, one-hot encoding, and standardization, and preliminary findings of relevant features to heart disease were discovered, such as male. In the process of model training, six machine learning algorithms (Logistic Regression, K-nearest Neighbors, Adaboost, CART, Random Forest, XGBoost) were applied, and the Random Forest was determined to be the optimal model after hyperparameter tuning and cross-validation as it outperformed other models with superior scores of accuracy (0.848), f1 (0.829), PRC-AUC (0.909), and ROC-AUC (0.917). In addition, the most relevant features, including reversible defect of thallium stress test, high ST depression caused by exercise relative to rest, and asymptomatic chest pain, etc., were unearthed by plotting permutation feature importance and partial importance plots of the Random Forest classifier.","PeriodicalId":293290,"journal":{"name":"2021 IEEE International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAAI52226.2021.9507928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Heart disease is one of the most significant causes of global mortality since its intricacy and the rate of misdiagnosis have brought a great challenge to medical workers. As machine learning has shown robust efficacy in decision-making and predictions, it is essential to construct a machine learning model to assist with heart disease diagnosis. In this paper, a heart-disease dataset from Cleveland was analyzed and preprocessed including cleaning, one-hot encoding, and standardization, and preliminary findings of relevant features to heart disease were discovered, such as male. In the process of model training, six machine learning algorithms (Logistic Regression, K-nearest Neighbors, Adaboost, CART, Random Forest, XGBoost) were applied, and the Random Forest was determined to be the optimal model after hyperparameter tuning and cross-validation as it outperformed other models with superior scores of accuracy (0.848), f1 (0.829), PRC-AUC (0.909), and ROC-AUC (0.917). In addition, the most relevant features, including reversible defect of thallium stress test, high ST depression caused by exercise relative to rest, and asymptomatic chest pain, etc., were unearthed by plotting permutation feature importance and partial importance plots of the Random Forest classifier.