Breast Cancer Classification Using an Extreme Gradient Boosting Model with F-Score Feature Selection Technique

IF 1.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Advances in Information Technology Pub Date : 2023-01-01 DOI:10.12720/jait.14.2.363-372

Tina Elizabeth Mathew

{"title":"Breast Cancer Classification Using an Extreme Gradient Boosting Model with F-Score Feature Selection Technique","authors":"Tina Elizabeth Mathew","doi":"10.12720/jait.14.2.363-372","DOIUrl":null,"url":null,"abstract":"—Breast cancer is considered the most problematic of all cancers affecting women. With high incidence and mortality rates, it is ranked as the primary and most significant health hazard for women globally. Early detection of the disease is the key to ensure the survival of the patient. Several medical techniques comprising of Mammography, Magnetic Resonance Imaging, Thermography and many more are available to detect the disease. But these techniques create much stress and pain, besides employing harmful rays for detection, to the patient undergoing them. Hence for early detection other categories of techniques can be implemented. Machine-learning assisted detection and classification is one such alternative. In this paper a hyper parameter optimized extreme gradient boosting model implemented along with F-Score feature selection is proposed and the model is used for classification of the breast tumor as either malignant or benign on the Wisconsin Breast Cancer dataset. The implementation of feature importance is investigated using F-Score and this is used for selecting the most relevant features that influence the target variable and classification is based on this. Experimentation is done using different training-testing partitions and the best performance of 99.27% accuracy score was shown by the 80−20 partition by the proposed XGBoost and F-Score Model.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":"1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.2.363-372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 3

Abstract

—Breast cancer is considered the most problematic of all cancers affecting women. With high incidence and mortality rates, it is ranked as the primary and most significant health hazard for women globally. Early detection of the disease is the key to ensure the survival of the patient. Several medical techniques comprising of Mammography, Magnetic Resonance Imaging, Thermography and many more are available to detect the disease. But these techniques create much stress and pain, besides employing harmful rays for detection, to the patient undergoing them. Hence for early detection other categories of techniques can be implemented. Machine-learning assisted detection and classification is one such alternative. In this paper a hyper parameter optimized extreme gradient boosting model implemented along with F-Score feature selection is proposed and the model is used for classification of the breast tumor as either malignant or benign on the Wisconsin Breast Cancer dataset. The implementation of feature importance is investigated using F-Score and this is used for selecting the most relevant features that influence the target variable and classification is based on this. Experimentation is done using different training-testing partitions and the best performance of 99.27% accuracy score was shown by the 80−20 partition by the proposed XGBoost and F-Score Model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于f -评分特征选择技术的极端梯度增强模型的乳腺癌分类

乳腺癌被认为是影响女性的所有癌症中问题最大的。由于发病率和死亡率高，它被列为全球妇女主要和最严重的健康危害。早期发现疾病是保证患者生存的关键。包括乳房x光摄影、磁共振成像、热成像等多种医学技术可用于检测该疾病。但是这些技术除了使用有害射线进行检测外，还会给患者带来很大的压力和痛苦。因此，为了早期发现，可以实施其他类别的技术。机器学习辅助检测和分类就是这样一种选择。本文提出了一种与F-Score特征选择一起实现的超参数优化的极端梯度增强模型，并将该模型用于威斯康星乳腺癌数据集上乳腺肿瘤的恶性或良性分类。使用F-Score调查特征重要性的实现，这用于选择影响目标变量的最相关特征，并以此为基础进行分类。使用不同的训练测试分区进行了实验，结果表明，使用所提出的XGBoost和F-Score模型进行的80−20分区的准确率达到99.27%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Advances in Information Technology Computer Science-Information Systems

CiteScore

4.20

自引率

20.00%

发文量