{"title":"使用不同分类器的SMOTE、Borderline-SMOTE和ADASYN过采样技术的比较研究","authors":"I. Dey, Vibhav Pratap","doi":"10.1109/ICSMDI57622.2023.00060","DOIUrl":null,"url":null,"abstract":"With the advent of machine learning and its numerous techniques, many real-world problems have been solved like credit card fraud detection, cancer susceptibility and survival prediction, identification of spam, and customer segmentation, to name a few. Machine learning works on huge loads of data to give the correct prediction and maximum accuracy. Now, accuracy of any machine learning model depends on the dataset been fed into that model, in the first place. And from here comes the concept of oversampling and under-sampling. Under-sampling is the process of shortening the majority class or deleting samples from the majority class in order to balance the dataset, and over-sampling is the process of adding additional synthetic samples to the minority class. So, this study is based on the three methods namely, SMOTE, Borderline-SMOTE, and ADASYN. This study includes the collation of the above-mentioned oversampling techniques based on their accuracy, precision, recall, F1-measure and ROC curve.","PeriodicalId":373017,"journal":{"name":"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers\",\"authors\":\"I. Dey, Vibhav Pratap\",\"doi\":\"10.1109/ICSMDI57622.2023.00060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advent of machine learning and its numerous techniques, many real-world problems have been solved like credit card fraud detection, cancer susceptibility and survival prediction, identification of spam, and customer segmentation, to name a few. Machine learning works on huge loads of data to give the correct prediction and maximum accuracy. Now, accuracy of any machine learning model depends on the dataset been fed into that model, in the first place. And from here comes the concept of oversampling and under-sampling. Under-sampling is the process of shortening the majority class or deleting samples from the majority class in order to balance the dataset, and over-sampling is the process of adding additional synthetic samples to the minority class. So, this study is based on the three methods namely, SMOTE, Borderline-SMOTE, and ADASYN. This study includes the collation of the above-mentioned oversampling techniques based on their accuracy, precision, recall, F1-measure and ROC curve.\",\"PeriodicalId\":373017,\"journal\":{\"name\":\"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSMDI57622.2023.00060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSMDI57622.2023.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers
With the advent of machine learning and its numerous techniques, many real-world problems have been solved like credit card fraud detection, cancer susceptibility and survival prediction, identification of spam, and customer segmentation, to name a few. Machine learning works on huge loads of data to give the correct prediction and maximum accuracy. Now, accuracy of any machine learning model depends on the dataset been fed into that model, in the first place. And from here comes the concept of oversampling and under-sampling. Under-sampling is the process of shortening the majority class or deleting samples from the majority class in order to balance the dataset, and over-sampling is the process of adding additional synthetic samples to the minority class. So, this study is based on the three methods namely, SMOTE, Borderline-SMOTE, and ADASYN. This study includes the collation of the above-mentioned oversampling techniques based on their accuracy, precision, recall, F1-measure and ROC curve.