Breast Cancer Classification Using an Extreme Gradient Boosting Model with F-Score Feature Selection Technique

Pub Date : 2023-01-01 DOI:10.12720/jait.14.2.363-372
Tina Elizabeth Mathew
{"title":"Breast Cancer Classification Using an Extreme Gradient Boosting Model with F-Score Feature Selection Technique","authors":"Tina Elizabeth Mathew","doi":"10.12720/jait.14.2.363-372","DOIUrl":null,"url":null,"abstract":"—Breast cancer is considered the most problematic of all cancers affecting women. With high incidence and mortality rates, it is ranked as the primary and most significant health hazard for women globally. Early detection of the disease is the key to ensure the survival of the patient. Several medical techniques comprising of Mammography, Magnetic Resonance Imaging, Thermography and many more are available to detect the disease. But these techniques create much stress and pain, besides employing harmful rays for detection, to the patient undergoing them. Hence for early detection other categories of techniques can be implemented. Machine-learning assisted detection and classification is one such alternative. In this paper a hyper parameter optimized extreme gradient boosting model implemented along with F-Score feature selection is proposed and the model is used for classification of the breast tumor as either malignant or benign on the Wisconsin Breast Cancer dataset. The implementation of feature importance is investigated using F-Score and this is used for selecting the most relevant features that influence the target variable and classification is based on this. Experimentation is done using different training-testing partitions and the best performance of 99.27% accuracy score was shown by the 80−20 partition by the proposed XGBoost and F-Score Model.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.2.363-372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

—Breast cancer is considered the most problematic of all cancers affecting women. With high incidence and mortality rates, it is ranked as the primary and most significant health hazard for women globally. Early detection of the disease is the key to ensure the survival of the patient. Several medical techniques comprising of Mammography, Magnetic Resonance Imaging, Thermography and many more are available to detect the disease. But these techniques create much stress and pain, besides employing harmful rays for detection, to the patient undergoing them. Hence for early detection other categories of techniques can be implemented. Machine-learning assisted detection and classification is one such alternative. In this paper a hyper parameter optimized extreme gradient boosting model implemented along with F-Score feature selection is proposed and the model is used for classification of the breast tumor as either malignant or benign on the Wisconsin Breast Cancer dataset. The implementation of feature importance is investigated using F-Score and this is used for selecting the most relevant features that influence the target variable and classification is based on this. Experimentation is done using different training-testing partitions and the best performance of 99.27% accuracy score was shown by the 80−20 partition by the proposed XGBoost and F-Score Model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
基于f -评分特征选择技术的极端梯度增强模型的乳腺癌分类
乳腺癌被认为是影响女性的所有癌症中问题最大的。由于发病率和死亡率高,它被列为全球妇女主要和最严重的健康危害。早期发现疾病是保证患者生存的关键。包括乳房x光摄影、磁共振成像、热成像等多种医学技术可用于检测该疾病。但是这些技术除了使用有害射线进行检测外,还会给患者带来很大的压力和痛苦。因此,为了早期发现,可以实施其他类别的技术。机器学习辅助检测和分类就是这样一种选择。本文提出了一种与F-Score特征选择一起实现的超参数优化的极端梯度增强模型,并将该模型用于威斯康星乳腺癌数据集上乳腺肿瘤的恶性或良性分类。使用F-Score调查特征重要性的实现,这用于选择影响目标变量的最相关特征,并以此为基础进行分类。使用不同的训练测试分区进行了实验,结果表明,使用所提出的XGBoost和F-Score模型进行的80−20分区的准确率达到99.27%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1