Vikas Khullar, Mohit Angurala, K. Singh, P. Prasant, V. Pabbi, Veeramanickam M.R. M
{"title":"Exploring Methods for Dealing with Class Imbalances in Supervised Machine Learning Structured Datasets","authors":"Vikas Khullar, Mohit Angurala, K. Singh, P. Prasant, V. Pabbi, Veeramanickam M.R. M","doi":"10.1109/ACCESS57397.2023.10199296","DOIUrl":null,"url":null,"abstract":"The class imbalanced datasets are major challenge for classification techniques. In this paper, the role and possibilities of handling of imbalanced classes in structured and tabular dataset have been experimentally discussed. In methodology, diverse over sampling and under sampling techniques were applied and analyzed on basis of parameters viz., accuracy, precision, recall, and f1-score. Haberman Breast Cancer, Pima Indian diabetes and synthetic datasets were considered for experimental study, unbalanced datasets were considered. All three are unbalanced datasets were analyzed through classification algorithms. Further, class balancing techniques were applied through over sampling and under sampling methods and then supervised classification algorithms were applied and analyzed on basis of metrics. The results reflected with best fit metrics for both under and over sampling methods. In conclusion a best technique out of implemented methods were identified and proposed for future use.","PeriodicalId":345351,"journal":{"name":"2023 3rd International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCESS57397.2023.10199296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The class imbalanced datasets are major challenge for classification techniques. In this paper, the role and possibilities of handling of imbalanced classes in structured and tabular dataset have been experimentally discussed. In methodology, diverse over sampling and under sampling techniques were applied and analyzed on basis of parameters viz., accuracy, precision, recall, and f1-score. Haberman Breast Cancer, Pima Indian diabetes and synthetic datasets were considered for experimental study, unbalanced datasets were considered. All three are unbalanced datasets were analyzed through classification algorithms. Further, class balancing techniques were applied through over sampling and under sampling methods and then supervised classification algorithms were applied and analyzed on basis of metrics. The results reflected with best fit metrics for both under and over sampling methods. In conclusion a best technique out of implemented methods were identified and proposed for future use.