Sheikh Mohammed Shariful Islam, Moloud Abrar, Teketo Tegegne, Liliana Loranjo, Chandan Karmakar, Md Abdul Awal, Md. Shahadat Hossain, Muhammad Ashad Kabir, Mufti Mahmud, Abbas Khosravi, George Siopis, Jeban C Moses, Ralph Maddison
{"title":"Machine Learning Models for the Identification of Cardiovascular Diseases Using UK Biobank Data","authors":"Sheikh Mohammed Shariful Islam, Moloud Abrar, Teketo Tegegne, Liliana Loranjo, Chandan Karmakar, Md Abdul Awal, Md. Shahadat Hossain, Muhammad Ashad Kabir, Mufti Mahmud, Abbas Khosravi, George Siopis, Jeban C Moses, Ralph Maddison","doi":"arxiv-2407.16721","DOIUrl":null,"url":null,"abstract":"Machine learning models have the potential to identify cardiovascular\ndiseases (CVDs) early and accurately in primary healthcare settings, which is\ncrucial for delivering timely treatment and management. Although\npopulation-based CVD risk models have been used traditionally, these models\noften do not consider variations in lifestyles, socioeconomic conditions, or\ngenetic predispositions. Therefore, we aimed to develop machine learning models\nfor CVD detection using primary healthcare data, compare the performance of\ndifferent models, and identify the best models. We used data from the UK\nBiobank study, which included over 500,000 middle-aged participants from\ndifferent primary healthcare centers in the UK. Data collected at baseline\n(2006--2010) and during imaging visits after 2014 were used in this study.\nBaseline characteristics, including sex, age, and the Townsend Deprivation\nIndex, were included. Participants were classified as having CVD if they\nreported at least one of the following conditions: heart attack, angina,\nstroke, or high blood pressure. Cardiac imaging data such as electrocardiogram\nand echocardiography data, including left ventricular size and function,\ncardiac output, and stroke volume, were also used. We used 9 machine learning\nmodels (LSVM, RBFSVM, GP, DT, RF, NN, AdaBoost, NB, and QDA), which are\nexplainable and easily interpretable. We reported the accuracy, precision,\nrecall, and F-1 scores; confusion matrices; and area under the curve (AUC)\ncurves.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.16721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning models have the potential to identify cardiovascular
diseases (CVDs) early and accurately in primary healthcare settings, which is
crucial for delivering timely treatment and management. Although
population-based CVD risk models have been used traditionally, these models
often do not consider variations in lifestyles, socioeconomic conditions, or
genetic predispositions. Therefore, we aimed to develop machine learning models
for CVD detection using primary healthcare data, compare the performance of
different models, and identify the best models. We used data from the UK
Biobank study, which included over 500,000 middle-aged participants from
different primary healthcare centers in the UK. Data collected at baseline
(2006--2010) and during imaging visits after 2014 were used in this study.
Baseline characteristics, including sex, age, and the Townsend Deprivation
Index, were included. Participants were classified as having CVD if they
reported at least one of the following conditions: heart attack, angina,
stroke, or high blood pressure. Cardiac imaging data such as electrocardiogram
and echocardiography data, including left ventricular size and function,
cardiac output, and stroke volume, were also used. We used 9 machine learning
models (LSVM, RBFSVM, GP, DT, RF, NN, AdaBoost, NB, and QDA), which are
explainable and easily interpretable. We reported the accuracy, precision,
recall, and F-1 scores; confusion matrices; and area under the curve (AUC)
curves.