Purpose
Though, type 2 diabetes (T2DM) is largely genetically heritable, several environmental factors could contribute to the occurrence of T2DM. The current study employed machine learning (ML) algorithms to predict the risk of T2DM using the social determinants of health reported in the All of Us Research Program Data.
Methods
Data were sourced from the All of Us Research Program. All patients with a complete record of the social determinants of health survey were included in the analysis. The participants were categorized based on history of diabetes (case = 1) and without history of diabetes (case = 0). The major ML models tested were gradient boost model, RandomForest model, and support vector machines. The model performance measures include accuracy, precision, and recall. Feature importance was evaluated based on the mean decrease in accuracy score, an output from the best model.
Results
Overall, the social determinants of health were able to improve the performance of the ML models to predict the risk of T2DM. The accuracy of the ML models was in the range of 88%–92%. The sensitivity of all the models were more-than 90%. Also, important features out of the social determinants, were reported as predictors of T2DM.
Conclusion
The social determinants of health reported in the All of Us dataset were able to predict the risk of diabetes using machine learning algorithms. These factors could be used to screen patients with a risk of T2DM.