{"title":"A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering","authors":"Junhong Liu, Bo Peng, Zezhao Yin","doi":"10.1145/3583788.3583809","DOIUrl":null,"url":null,"abstract":"Diabetes is a common disease, and due to the increasing incidence year by year. But most diabetics can not be easily detected in the early stage, since the symptoms are not obvious. The objective of this study is to propose a machine-learning method based on unsupervised clustering to improve the accuracy of diabetes detection. Due to massive unlabeled data sets and the problems in the traditional K-means clustering algorithms, we adopt the Fuzzy c-means clustering algorithm with an improvement on the calculation of parameter m. Our method includes a combination of the principal component analysis(PCA), an improved Fuzzy c-means (FCM) clustering algorithm, and K-nearest neighbor(KNN) classification algorithm optimized with K value. After 10 times 10-fold cross-validation, the average accuracy of the proposed method reaches 99.31%, which is higher than that of other machine learning models. Therefore, our method is proven to be more suitable for detecting diabetes. At the same time, further experiments on a new data set validate the applicability of our method in a more practical way for the diabetes detection.","PeriodicalId":292167,"journal":{"name":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583788.3583809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetes is a common disease, and due to the increasing incidence year by year. But most diabetics can not be easily detected in the early stage, since the symptoms are not obvious. The objective of this study is to propose a machine-learning method based on unsupervised clustering to improve the accuracy of diabetes detection. Due to massive unlabeled data sets and the problems in the traditional K-means clustering algorithms, we adopt the Fuzzy c-means clustering algorithm with an improvement on the calculation of parameter m. Our method includes a combination of the principal component analysis(PCA), an improved Fuzzy c-means (FCM) clustering algorithm, and K-nearest neighbor(KNN) classification algorithm optimized with K value. After 10 times 10-fold cross-validation, the average accuracy of the proposed method reaches 99.31%, which is higher than that of other machine learning models. Therefore, our method is proven to be more suitable for detecting diabetes. At the same time, further experiments on a new data set validate the applicability of our method in a more practical way for the diabetes detection.