Nafiseh Hosseini, Hamid Tanzadehpanah, Amin Mansoori, Mostafa Sabzekar, Gordon A Ferns, Habibollah Esmaily, Majid Ghayour-Mobarhan
{"title":"Using a robust model to detect the association between anthropometric factors and T2DM: machine learning approaches.","authors":"Nafiseh Hosseini, Hamid Tanzadehpanah, Amin Mansoori, Mostafa Sabzekar, Gordon A Ferns, Habibollah Esmaily, Majid Ghayour-Mobarhan","doi":"10.1186/s12911-025-02887-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The aim of this study was to evaluate the potential models to determine the most important anthropometric factors associated with type 2 diabetes mellitus (T2DM).</p><p><strong>Method: </strong>A dataset derived from the Mashhad Stroke and heart atherosclerotic disorders (MASHAD) study comprising 9354 subject aged 65 - 35. 25% (2336 people) of subjects were diabetic and 75% (7018 people) where non-diabetic was used for the analysis of 10 anthropometric factors and age that were measured in all patients. A K-nearest neighbor (KNN) model was used to assess the association between T2DM and selected factors. The model was evaluated using accuracy, sensitivity, specificity, precision and f1-measure parameters. The receiver operating characteristic (ROC) curve and factor importance analysis were also determined. The performance of the KNN model was compared with Artificial neural network (ANN) and support vector machine (SVM) models.</p><p><strong>Result: </strong>After feature selection analysis and assessing multicollinearity, six factors (Mid-arm Circumference (MAC), Waist Circumference (WC), Body Roundness Index (BRI), Body Adiposity Index (BAI), Body Mass Index (BMI), age) were used in the final model. BRI, BAI and MAC factors in males and BMI, BRI, and MAC factors in females were found to have the greatest association with T2DM. The accuracy of the KNN model was approximately 93% for both genders. The best K (number of neighbors) for the model was 4 which had the lowest error rate. The area under the ROC curve (AUC) was 0.985 for men and 0.986 for women. The KNN model achieved the best result of the models explored.</p><p><strong>Conclusion: </strong>The KNN model had a high accuracy (93%) for predicting the association between anthropometric factors and T2DM. Selecting the K parameter (nearest neighbor) has an essential impact on reducing the error rate. Feature selection analysis reduces the dimensions of the KNN model and increases the accuracy of final results.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"49"},"PeriodicalIF":3.3000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02887-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The aim of this study was to evaluate the potential models to determine the most important anthropometric factors associated with type 2 diabetes mellitus (T2DM).
Method: A dataset derived from the Mashhad Stroke and heart atherosclerotic disorders (MASHAD) study comprising 9354 subject aged 65 - 35. 25% (2336 people) of subjects were diabetic and 75% (7018 people) where non-diabetic was used for the analysis of 10 anthropometric factors and age that were measured in all patients. A K-nearest neighbor (KNN) model was used to assess the association between T2DM and selected factors. The model was evaluated using accuracy, sensitivity, specificity, precision and f1-measure parameters. The receiver operating characteristic (ROC) curve and factor importance analysis were also determined. The performance of the KNN model was compared with Artificial neural network (ANN) and support vector machine (SVM) models.
Result: After feature selection analysis and assessing multicollinearity, six factors (Mid-arm Circumference (MAC), Waist Circumference (WC), Body Roundness Index (BRI), Body Adiposity Index (BAI), Body Mass Index (BMI), age) were used in the final model. BRI, BAI and MAC factors in males and BMI, BRI, and MAC factors in females were found to have the greatest association with T2DM. The accuracy of the KNN model was approximately 93% for both genders. The best K (number of neighbors) for the model was 4 which had the lowest error rate. The area under the ROC curve (AUC) was 0.985 for men and 0.986 for women. The KNN model achieved the best result of the models explored.
Conclusion: The KNN model had a high accuracy (93%) for predicting the association between anthropometric factors and T2DM. Selecting the K parameter (nearest neighbor) has an essential impact on reducing the error rate. Feature selection analysis reduces the dimensions of the KNN model and increases the accuracy of final results.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.