{"title":"Performance evaluation of machine learning based big data processing framework for prediction of heart disease","authors":"Abderrahmane Ed-daoudy, K. Maalmi","doi":"10.1109/ISACS48493.2019.9068901","DOIUrl":null,"url":null,"abstract":"Heart disease is one of the most prominent and dangerous diseases that threaten public health around the world, often leading to heart attacks and strokes. More recently, the amount of heart disease patients under supervision has been increasing significantly owing to lack of awareness and lifestyle related factors affecting health. There is therefore a need to ensure an effective and scalable solution to effectively find and prevent the heart disease within a short and very specific timeline. At this stage, the performance of four well-known classification algorithms; SVM, Decision Tree, Random Forest and Logistic Regression was evaluated for prediction of heart disease using Apache Spark, a fast and general engine for big data processing with its machine learning library, MLlib for batch data processing. The overall performance comparison was assessed in terms of prediction accuracy, building time and prediction time. Experimental results on processed cleveland data from heart disease dataset show that the highest classification accuracy of 87.50 % was reported using Random Forest with sensitivity and specificity of 86.67 and 88.37 %, respectively. On the other hand, the fast algorithm will be logistic regression.","PeriodicalId":312521,"journal":{"name":"2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISACS48493.2019.9068901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Heart disease is one of the most prominent and dangerous diseases that threaten public health around the world, often leading to heart attacks and strokes. More recently, the amount of heart disease patients under supervision has been increasing significantly owing to lack of awareness and lifestyle related factors affecting health. There is therefore a need to ensure an effective and scalable solution to effectively find and prevent the heart disease within a short and very specific timeline. At this stage, the performance of four well-known classification algorithms; SVM, Decision Tree, Random Forest and Logistic Regression was evaluated for prediction of heart disease using Apache Spark, a fast and general engine for big data processing with its machine learning library, MLlib for batch data processing. The overall performance comparison was assessed in terms of prediction accuracy, building time and prediction time. Experimental results on processed cleveland data from heart disease dataset show that the highest classification accuracy of 87.50 % was reported using Random Forest with sensitivity and specificity of 86.67 and 88.37 %, respectively. On the other hand, the fast algorithm will be logistic regression.