Ita Sulistiani, Windu Wulandari, Fathia Dwi Astuti, Widodo
{"title":"Breast Cancer Prediction Using Random Forest and Gaussian Naïve Bayes Algorithms","authors":"Ita Sulistiani, Windu Wulandari, Fathia Dwi Astuti, Widodo","doi":"10.1109/ICISIT54091.2022.9872808","DOIUrl":null,"url":null,"abstract":"Breast cancer is the second deadliest cancer after lung cancer. In 2021, ASCO-American Society of Clinical Oncology states that female invasive breast cancer increased by half a percent from 2008 to 2017. Breast cancer is induced by a misspelling of a cell, which causes the cell to become uncontrollable. If the problem is not treated soon within a few months, a large number of cells containing the wrong instructions can be detected as cancer. Machine learning has been widely used for developing breast cancer prediction models. Unfortunately, the problem of imbalanced datasets tends to have little to no attention in previous research using machine learning. This research aimed to develop breast cancer prediction models using Random Forest and Gaussian Naïve Bayes Classifier. Borderline Synthetic Minority Oversampling Technique (BSM) is applied to handle the imbalanced dataset problem; meanwhile, machine learning algorithms such as Random Forest and Gaussian Naïve Bayes algorithms were used to build the prediction models. Using UCI Machine Learning Wisconsin Breast Cancer Dataset (WBCD), the combination of BSM and Random Forest algorithm showed the highest recall score, approximately around 99.8%. Meanwhile, the BSM and Gaussian Naïve Bayes Classifier combination provided the lowest recall score among generated models, 78.2%.","PeriodicalId":214014,"journal":{"name":"2022 1st International Conference on Information System & Information Technology (ICISIT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st International Conference on Information System & Information Technology (ICISIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISIT54091.2022.9872808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer is the second deadliest cancer after lung cancer. In 2021, ASCO-American Society of Clinical Oncology states that female invasive breast cancer increased by half a percent from 2008 to 2017. Breast cancer is induced by a misspelling of a cell, which causes the cell to become uncontrollable. If the problem is not treated soon within a few months, a large number of cells containing the wrong instructions can be detected as cancer. Machine learning has been widely used for developing breast cancer prediction models. Unfortunately, the problem of imbalanced datasets tends to have little to no attention in previous research using machine learning. This research aimed to develop breast cancer prediction models using Random Forest and Gaussian Naïve Bayes Classifier. Borderline Synthetic Minority Oversampling Technique (BSM) is applied to handle the imbalanced dataset problem; meanwhile, machine learning algorithms such as Random Forest and Gaussian Naïve Bayes algorithms were used to build the prediction models. Using UCI Machine Learning Wisconsin Breast Cancer Dataset (WBCD), the combination of BSM and Random Forest algorithm showed the highest recall score, approximately around 99.8%. Meanwhile, the BSM and Gaussian Naïve Bayes Classifier combination provided the lowest recall score among generated models, 78.2%.