{"title":"加权SMOTE算法:一种改善不平衡数据疾病预测的工具","authors":"Rakesh Kumar Patnaik, Ming-Chih Ho, J. A. Yeh","doi":"10.1109/ICCE-Taiwan58799.2023.10226703","DOIUrl":null,"url":null,"abstract":"In the medical field, acquiring a sufficient number of medical samples can be challenging, and the collected datasets may be imbalanced and small. To address these issues, we propose a weighted SMOTE algorithm that targets imbalanced datasets. This technique has been applied to a dataset of breath biomarkers of liver disease as a feature set and a supervised learning model. Our results show that the proposed method significantly improves the prediction probability and classification performance of the chosen model in both the original imbalanced dataset and the balanced dataset. This study demonstrates the potential of the proposed approach to enhance machine learning performance while dealing with small and imbalanced datasets in medical applications.","PeriodicalId":112903,"journal":{"name":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weighted SMOTE Algorithm: A Tool To Improve Disease Prediction With Imbalanced Data\",\"authors\":\"Rakesh Kumar Patnaik, Ming-Chih Ho, J. A. Yeh\",\"doi\":\"10.1109/ICCE-Taiwan58799.2023.10226703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the medical field, acquiring a sufficient number of medical samples can be challenging, and the collected datasets may be imbalanced and small. To address these issues, we propose a weighted SMOTE algorithm that targets imbalanced datasets. This technique has been applied to a dataset of breath biomarkers of liver disease as a feature set and a supervised learning model. Our results show that the proposed method significantly improves the prediction probability and classification performance of the chosen model in both the original imbalanced dataset and the balanced dataset. This study demonstrates the potential of the proposed approach to enhance machine learning performance while dealing with small and imbalanced datasets in medical applications.\",\"PeriodicalId\":112903,\"journal\":{\"name\":\"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Weighted SMOTE Algorithm: A Tool To Improve Disease Prediction With Imbalanced Data
In the medical field, acquiring a sufficient number of medical samples can be challenging, and the collected datasets may be imbalanced and small. To address these issues, we propose a weighted SMOTE algorithm that targets imbalanced datasets. This technique has been applied to a dataset of breath biomarkers of liver disease as a feature set and a supervised learning model. Our results show that the proposed method significantly improves the prediction probability and classification performance of the chosen model in both the original imbalanced dataset and the balanced dataset. This study demonstrates the potential of the proposed approach to enhance machine learning performance while dealing with small and imbalanced datasets in medical applications.