HuiZhi Shao, Xiang Liu, DaShuai Zong, QingJun Song
{"title":"Optimization of diabetes prediction methods based on combinatorial balancing algorithm.","authors":"HuiZhi Shao, Xiang Liu, DaShuai Zong, QingJun Song","doi":"10.1038/s41387-024-00324-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance.</p><p><strong>Objectives: </strong>To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS) for data balancing and uses Optuna for hyperparameter optimization of machine learning models. This approach aims to fill the gap in current research concerning data balancing and model optimization, thereby improving prediction accuracy and computational efficiency.</p><p><strong>Methods: </strong>First, the study uses SMOTE and RUS methods to process the imbalanced diabetes dataset, balancing the data distribution. Then, Optuna is utilized to optimize the hyperparameters of the LightGBM model to enhance its performance. During the experiment, the effectiveness of the proposed methods is evaluated by comparing the training results of the dataset before and after balancing.</p><p><strong>Results: </strong>The experimental results show that the enhanced LightGBM-Optuna model improves the accuracy from 97.07% to 97.11%, and the precision from 97.17% to 98.99%. The time required for a single search is only 2.5 seconds. These results demonstrate the superiority of the proposed method in handling imbalanced datasets and optimizing model performance.</p><p><strong>Conclusions: </strong>The study indicates that combining SMOTE and RUS data balancing algorithms with Optuna for hyperparameter optimization can effectively enhance machine learning models, especially in dealing with imbalanced datasets for diabetes prediction.</p>","PeriodicalId":19339,"journal":{"name":"Nutrition & Diabetes","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324958/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nutrition & Diabetes","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41387-024-00324-z","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance.
Objectives: To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS) for data balancing and uses Optuna for hyperparameter optimization of machine learning models. This approach aims to fill the gap in current research concerning data balancing and model optimization, thereby improving prediction accuracy and computational efficiency.
Methods: First, the study uses SMOTE and RUS methods to process the imbalanced diabetes dataset, balancing the data distribution. Then, Optuna is utilized to optimize the hyperparameters of the LightGBM model to enhance its performance. During the experiment, the effectiveness of the proposed methods is evaluated by comparing the training results of the dataset before and after balancing.
Results: The experimental results show that the enhanced LightGBM-Optuna model improves the accuracy from 97.07% to 97.11%, and the precision from 97.17% to 98.99%. The time required for a single search is only 2.5 seconds. These results demonstrate the superiority of the proposed method in handling imbalanced datasets and optimizing model performance.
Conclusions: The study indicates that combining SMOTE and RUS data balancing algorithms with Optuna for hyperparameter optimization can effectively enhance machine learning models, especially in dealing with imbalanced datasets for diabetes prediction.
期刊介绍:
Nutrition & Diabetes is a peer-reviewed, online, open access journal bringing to the fore outstanding research in the areas of nutrition and chronic disease, including diabetes, from the molecular to the population level.