Efficient breast cancer detection using sequential feature selection techniques

Taha M. Mohamed
{"title":"Efficient breast cancer detection using sequential feature selection techniques","authors":"Taha M. Mohamed","doi":"10.1109/INTELCIS.2015.7397261","DOIUrl":null,"url":null,"abstract":"Breast cancer is one of the most dangerous cancers in the world especially in the Arab countries and Egypt. Due to the large spreading of the disease, automatic recognition systems can help physicians to classify the tumors as benign or malignant. However, performing a lot of pathological analysis consumes time and money. In this paper, we propose an algorithm for decreasing the number of features required to detect the tumor. Two classifiers are chosen to test the classification accuracy; linear and quadratic. The experimental results show that, there are strong correlations between the features in the data set. When using the sequential feature selection algorithm, results show that, discarding more than 50% of the features has no significant loss on classification accuracy when using the quadratic discriminate classifier. Additionally, only four PCA components can be used with the same accuracy as using nine components when being classified by the linear discriminate classifier. Additionally, the outliers in the data set have no notable effect on the classification accuracy. The data set is proved to be homogenous using the k-means clustering algorithm.","PeriodicalId":6478,"journal":{"name":"2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELCIS.2015.7397261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Breast cancer is one of the most dangerous cancers in the world especially in the Arab countries and Egypt. Due to the large spreading of the disease, automatic recognition systems can help physicians to classify the tumors as benign or malignant. However, performing a lot of pathological analysis consumes time and money. In this paper, we propose an algorithm for decreasing the number of features required to detect the tumor. Two classifiers are chosen to test the classification accuracy; linear and quadratic. The experimental results show that, there are strong correlations between the features in the data set. When using the sequential feature selection algorithm, results show that, discarding more than 50% of the features has no significant loss on classification accuracy when using the quadratic discriminate classifier. Additionally, only four PCA components can be used with the same accuracy as using nine components when being classified by the linear discriminate classifier. Additionally, the outliers in the data set have no notable effect on the classification accuracy. The data set is proved to be homogenous using the k-means clustering algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用序列特征选择技术的高效乳腺癌检测
乳腺癌是世界上最危险的癌症之一,尤其是在阿拉伯国家和埃及。由于这种疾病的广泛传播,自动识别系统可以帮助医生将肿瘤分类为良性或恶性。然而,进行大量的病理分析需要耗费时间和金钱。在本文中,我们提出了一种算法来减少检测肿瘤所需的特征数量。选择两种分类器来测试分类精度;线性的和二次的。实验结果表明,数据集中的特征之间存在很强的相关性。当使用顺序特征选择算法时,结果表明,使用二次判别分类器时,丢弃50%以上的特征对分类精度没有明显损失。此外,当被线性判别分类器分类时,仅使用四个PCA分量就可以获得与使用九个分量相同的精度。此外,数据集中的异常值对分类精度没有显著影响。利用k均值聚类算法证明了数据集是齐次的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the use of probabilistic model-checking for the verification of prognostics applications Prospective, knowledge based clinical risk analysis: The OPT-model Partial deduction in predicate calculus as a tool for artificial intelligence problem complexity decreasing XML summarization: A survey Finding the pin in the haystack: A Bot Traceback service for public clouds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1