{"title":"基于贝叶斯定理和集成学习的新型范围乳腺癌预测模型","authors":"Sam Khozama, Ali Mahmoud Mayya","doi":"10.5755/j01.itc.51.4.31347","DOIUrl":null,"url":null,"abstract":"Breast cancer prediction is essential for preventing and treating cancer. In this research, a novel breast cancer prediction model is introduced. In addition, this research aims to provide a range-based cancer score instead of binary classification results (yes or no). The Breast Cancer Surveillance Consortium dataset (BCSC) dataset is used and modified by applying a proposed probabilistic model to achieve the range-based cancer score. The suggested model analyses a sub dataset of the whole BCSC dataset, including 67632 records and 13 risk factors. Three types of statistics are acquired (general cancer and non-cancer probabilities, previous medical knowledge, and the likelihood of each risk factor given all prediction classes). The model also uses the weighting methodology to achieve the best fusion of the BCSC's risk factors. The computation of the final prediction score is done using the post probability of the weighted combination of risk factors and the three statistics acquired from the probabilistic model. This final prediction is added to the BCSC dataset, and the new version of the BCSC dataset is used to train an ensemble model consisting of 30 learners. The experiments are applied using the sub and the whole datasets (including 317880 medical records). The results indicate that the new range-based model is accurate and robust with an accuracy of 91.33%, a false rejection rate of 1.12%, and an AUC of 0.9795. The new version of the BCSC dataset can be used for further research and analysis.","PeriodicalId":54982,"journal":{"name":"Information Technology and Control","volume":"6 1","pages":"757-770"},"PeriodicalIF":2.0000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A New Range-based Breast Cancer Prediction Model Using the Bayes' Theorem and Ensemble Learning\",\"authors\":\"Sam Khozama, Ali Mahmoud Mayya\",\"doi\":\"10.5755/j01.itc.51.4.31347\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer prediction is essential for preventing and treating cancer. In this research, a novel breast cancer prediction model is introduced. In addition, this research aims to provide a range-based cancer score instead of binary classification results (yes or no). The Breast Cancer Surveillance Consortium dataset (BCSC) dataset is used and modified by applying a proposed probabilistic model to achieve the range-based cancer score. The suggested model analyses a sub dataset of the whole BCSC dataset, including 67632 records and 13 risk factors. Three types of statistics are acquired (general cancer and non-cancer probabilities, previous medical knowledge, and the likelihood of each risk factor given all prediction classes). The model also uses the weighting methodology to achieve the best fusion of the BCSC's risk factors. The computation of the final prediction score is done using the post probability of the weighted combination of risk factors and the three statistics acquired from the probabilistic model. This final prediction is added to the BCSC dataset, and the new version of the BCSC dataset is used to train an ensemble model consisting of 30 learners. The experiments are applied using the sub and the whole datasets (including 317880 medical records). The results indicate that the new range-based model is accurate and robust with an accuracy of 91.33%, a false rejection rate of 1.12%, and an AUC of 0.9795. The new version of the BCSC dataset can be used for further research and analysis.\",\"PeriodicalId\":54982,\"journal\":{\"name\":\"Information Technology and Control\",\"volume\":\"6 1\",\"pages\":\"757-770\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Technology and Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.5755/j01.itc.51.4.31347\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Control","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.5755/j01.itc.51.4.31347","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A New Range-based Breast Cancer Prediction Model Using the Bayes' Theorem and Ensemble Learning
Breast cancer prediction is essential for preventing and treating cancer. In this research, a novel breast cancer prediction model is introduced. In addition, this research aims to provide a range-based cancer score instead of binary classification results (yes or no). The Breast Cancer Surveillance Consortium dataset (BCSC) dataset is used and modified by applying a proposed probabilistic model to achieve the range-based cancer score. The suggested model analyses a sub dataset of the whole BCSC dataset, including 67632 records and 13 risk factors. Three types of statistics are acquired (general cancer and non-cancer probabilities, previous medical knowledge, and the likelihood of each risk factor given all prediction classes). The model also uses the weighting methodology to achieve the best fusion of the BCSC's risk factors. The computation of the final prediction score is done using the post probability of the weighted combination of risk factors and the three statistics acquired from the probabilistic model. This final prediction is added to the BCSC dataset, and the new version of the BCSC dataset is used to train an ensemble model consisting of 30 learners. The experiments are applied using the sub and the whole datasets (including 317880 medical records). The results indicate that the new range-based model is accurate and robust with an accuracy of 91.33%, a false rejection rate of 1.12%, and an AUC of 0.9795. The new version of the BCSC dataset can be used for further research and analysis.
期刊介绍:
Periodical journal covers a wide field of computer science and control systems related problems including:
-Software and hardware engineering;
-Management systems engineering;
-Information systems and databases;
-Embedded systems;
-Physical systems modelling and application;
-Computer networks and cloud computing;
-Data visualization;
-Human-computer interface;
-Computer graphics, visual analytics, and multimedia systems.