{"title":"Machine Learning Models for ASCVD Risk Prediction in an Asian Population - How to Validate the Model is Important.","authors":"Yu-Chung Hsiao, Chen-Yuan Kuo, Fang-Ju Lin, Yen-Wen Wu, Tsung-Hsien Lin, Hung-I Yeh, Jaw-Wen Chen, Chau-Chung Wu","doi":"10.6515/ACS.202311_39(6).20230528A","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Atherosclerotic cardiovascular disease (ASCVD) is prevalent worldwide including Taiwan, however widely accepted tools to assess the risk of ASCVD are lacking in Taiwan. Machine learning models are potentially useful for risk evaluation. In this study we used two cohorts to test the feasibility of machine learning with transfer learning for developing an ASCVD risk prediction model in Taiwan.</p><p><strong>Methods: </strong>Two multi-center observational registry cohorts, T-SPARCLE and T-PPARCLE were used in this study. The variables selected were based on European, U.S. and Asian guidelines. Both registries recorded the ASCVD outcomes of the patients. Ten-fold validation and temporal validation methods were used to evaluate the performance of the binary classification analysis [prediction of major adverse cardiovascular (CV) events in one year]. Time-to-event analyses were also performed.</p><p><strong>Results: </strong>In the binary classification analysis, eXtreme Gradient Boosting (XGBoost) and random forest had the best performance, with areas under the receiver operating characteristic curve (AUC-ROC) of 0.72 (0.68-0.76) and 0.73 (0.69-0.77), respectively, although it was not significantly better than other models. Temporal validation was also performed, and the data showed significant differences in the distribution of various features and event rate. The AUC-ROC of XGBoost dropped to 0.66 (0.59-0.73), while that of random forest dropped to 0.69 (0.62-0.76) in the temporal validation method, and the performance also became numerically worse than that of the logistic regression model. In the time-to-event analysis, most models had a concordance index of around 0.70.</p><p><strong>Conclusions: </strong>Machine learning models with appropriate transfer learning may be a useful tool for the development of CV risk prediction models and may help improve patient care in the future.</p>","PeriodicalId":6957,"journal":{"name":"Acta Cardiologica Sinica","volume":"39 6","pages":"901-912"},"PeriodicalIF":1.8000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646597/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Cardiologica Sinica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.6515/ACS.202311_39(6).20230528A","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Atherosclerotic cardiovascular disease (ASCVD) is prevalent worldwide including Taiwan, however widely accepted tools to assess the risk of ASCVD are lacking in Taiwan. Machine learning models are potentially useful for risk evaluation. In this study we used two cohorts to test the feasibility of machine learning with transfer learning for developing an ASCVD risk prediction model in Taiwan.
Methods: Two multi-center observational registry cohorts, T-SPARCLE and T-PPARCLE were used in this study. The variables selected were based on European, U.S. and Asian guidelines. Both registries recorded the ASCVD outcomes of the patients. Ten-fold validation and temporal validation methods were used to evaluate the performance of the binary classification analysis [prediction of major adverse cardiovascular (CV) events in one year]. Time-to-event analyses were also performed.
Results: In the binary classification analysis, eXtreme Gradient Boosting (XGBoost) and random forest had the best performance, with areas under the receiver operating characteristic curve (AUC-ROC) of 0.72 (0.68-0.76) and 0.73 (0.69-0.77), respectively, although it was not significantly better than other models. Temporal validation was also performed, and the data showed significant differences in the distribution of various features and event rate. The AUC-ROC of XGBoost dropped to 0.66 (0.59-0.73), while that of random forest dropped to 0.69 (0.62-0.76) in the temporal validation method, and the performance also became numerically worse than that of the logistic regression model. In the time-to-event analysis, most models had a concordance index of around 0.70.
Conclusions: Machine learning models with appropriate transfer learning may be a useful tool for the development of CV risk prediction models and may help improve patient care in the future.
期刊介绍:
Acta Cardiologica Sinica welcomes all the papers in the fields related to cardiovascular medicine including basic research, vascular biology, clinical pharmacology, clinical trial, critical care medicine, coronary artery disease, interventional cardiology, arrythmia and electrophysiology, atherosclerosis, hypertension, cardiomyopathy and heart failure, valvular and structure cardiac disease, pediatric cardiology, cardiovascular surgery, and so on. We received papers from more than 20 countries and areas of the world. Currently, 40% of the papers were submitted to Acta Cardiologica Sinica from Taiwan, 20% from China, and 20% from the other countries and areas in the world. The acceptance rate for publication was around 50% in general.