A. Mosa, Chalermpon Thongmotai, Humayera Islam, Tanmoy Paul, K. S. M. T. Hossain, Vasanthi Mandhadi
{"title":"Evaluation of machine learning applications using real-world EHR data for predicting diabetes-related long-term complications","authors":"A. Mosa, Chalermpon Thongmotai, Humayera Islam, Tanmoy Paul, K. S. M. T. Hossain, Vasanthi Mandhadi","doi":"10.1080/2573234X.2021.1979901","DOIUrl":null,"url":null,"abstract":"ABSTRACT The biggest concern about diabetes-related complications is that they are unrecognised in the early stages but can be immutable and devastating with time. Identifying the population at high risk of developing such complications can help intervene in preventative care at an early stage. This study aims to present a data-driven approach to predict the patients at higher risk for diabetes-related complications using real-world data. We used comorbid diagnostic features from the electronic health records called “Cerner Health Facts EMR Data” to build machine learning-based prediction models for three diabetes-related long-term complications: (a) eye diseases, (b) kidney diseases, and (c) neuropathy. Our developed pipeline was able to generate highly accurate models for predictions. We deduced from the F1-scores that applying the class balancing techniques improved the overall performance of the models, and SVM with oversampling technique was the most consistent classifier for all three cohorts.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2021.1979901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT The biggest concern about diabetes-related complications is that they are unrecognised in the early stages but can be immutable and devastating with time. Identifying the population at high risk of developing such complications can help intervene in preventative care at an early stage. This study aims to present a data-driven approach to predict the patients at higher risk for diabetes-related complications using real-world data. We used comorbid diagnostic features from the electronic health records called “Cerner Health Facts EMR Data” to build machine learning-based prediction models for three diabetes-related long-term complications: (a) eye diseases, (b) kidney diseases, and (c) neuropathy. Our developed pipeline was able to generate highly accurate models for predictions. We deduced from the F1-scores that applying the class balancing techniques improved the overall performance of the models, and SVM with oversampling technique was the most consistent classifier for all three cohorts.