Background: Exposure to heavy metals in the environment has always been the focus of public concern. More and more evidence suggests that heavy metal exposure may lead to bone degeneration and an increased risk of pathological fractures. In this study, we analyzed the data of NHANES (National Health and Nutrition Survey) and applied nine machine learning models to check the relationship between heavy metal exposure and osteoporosis.
Methods: The data originates from NHANES conducted during the periods of 2003-2004,2005-2006,2007-2008,2009-2010,2013-2014 and 2017-2018 and is utilized for the development of machine learning models. The Spearman Correlation analysis was employed to identify the relationships among all independent variables, while the Boruta algorithm was utilized for feature selection. The chosen data was equilibrated with SMOTE and partitioned into training and testing sets in a 7:3 ratio. Support Vector Machine, Gradient Boosting Machine, Neural Network, Random Forest, XGBoost, K-Nearest Neighbors, AdaBoost, LightGBM, and CatBoost were employed to construct machine learning models. The optimum model was chosen for further research based on area under the curve (AUC), accuracy, sensitivity, specificity, precision, and F1 score. The Shapley additive explanation (SHAP) method was employed to elucidate the contribution of variables to the machine learning model.
Results: The XGBoost model among nine machine learning models demonstrated the best and most balanced performance in evaluating the correlation between heavy metal exposure and osteoporosis (AUC value of 0.834), significantly outperforming the other eight models. It achieved an accuracy of 0.822, sensitivity of 0.709, specificity of 0.830. Age was identified as the primary influencing factor in this machine learning model (mean |SHAP| = 0.30). Based on SHAP feature importance, the metals were ranked (descending) as Tl, Pb, Cd, Ba, Mo, Sb, Cs, Co and Tu, with Tl showing the strongest contribution to osteoporosis prediction.SHAP dependency plots and waterfall plots further illustrate the decision-making mechanism of the model.
Conclusions: In this study, the XGBoost model showed better performance than the other eight models. Among the nine types of urine metals, thallium (Tl) is the most important variable in the prediction of osteoporosis in machine learning models. Among all independent variables, age and gender are considered the most important components of the model. Subsequent research should develop more sophisticated algorithms to authenticate these findings and adjust relevant parameters to improve the model's precision.
扫码关注我们
求助内容:
应助结果提醒方式:
