{"title":"Impact of machine learning and feature selection on type 2 diabetes risk prediction","authors":"Päivi Riihimaa","doi":"10.21037/jmai-20-4","DOIUrl":null,"url":null,"abstract":"This survey summarizes the state of the art for type 2 diabetes mellitus (T2DM) prediction and compares the prediction accuracies obtained by conventional statistical regression and machine learning methods, including deep learning. The impact of feature selection and inclusion of clinical and genomic data on T2DM risk prediction accuracy is also reviewed. The results show that there is a tendency that machine learning algorithms outperform logistic regression in the accuracy of T2DM prediction. Inclusion of clinical data and biomarkers to the core feature set improves accuracy, while incorporating genetic markers in the prediction model is still challenging, due to dimensionality problem and the genetic heterogeneity of T2DM.","PeriodicalId":73815,"journal":{"name":"Journal of medical artificial intelligence","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.21037/jmai-20-4","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of medical artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21037/jmai-20-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This survey summarizes the state of the art for type 2 diabetes mellitus (T2DM) prediction and compares the prediction accuracies obtained by conventional statistical regression and machine learning methods, including deep learning. The impact of feature selection and inclusion of clinical and genomic data on T2DM risk prediction accuracy is also reviewed. The results show that there is a tendency that machine learning algorithms outperform logistic regression in the accuracy of T2DM prediction. Inclusion of clinical data and biomarkers to the core feature set improves accuracy, while incorporating genetic markers in the prediction model is still challenging, due to dimensionality problem and the genetic heterogeneity of T2DM.