Classification of Breast Cancer Subtypes using Microarray RNA Expression Data

Muhammad Shazwan Suhiman, Sayang Mohd Deni, Ahmad Zia Ul-Saufie Mohamad Japeri, Aszila Asmat, Lirong Wang
{"title":"Classification of Breast Cancer Subtypes using Microarray RNA Expression Data","authors":"Muhammad Shazwan Suhiman, Sayang Mohd Deni, Ahmad Zia Ul-Saufie Mohamad Japeri, Aszila Asmat, Lirong Wang","doi":"10.37934/araset.46.1.7585","DOIUrl":null,"url":null,"abstract":"Breast cancer is a heterogeneous disease that involves molecular alteration, cellular alterations, and clinical outcome for which the classification of Breast cancer remains a challenge to diagnose. Current practice uses immunohistochemistry markers and clinical variables to classify Breast cancer, but this approach has limitations due to the inclusion of other tumour subtypes and healthy individuals. Machine learning approaches based on mRNA expression data offer new possibilities for researchers to investigate the potential of molecular biomarkers as one of the diagnostic characteristics. The purpose of this study is to evaluate features (genes) rank through feature selection method for Breast cancer diagnostic test. Three feature selection methods of IG, relief and mRMR were applied and subsets of top 100, 50, 25, 10, 5 and 3 were created. Each subset was tested with SVM, LR and RF classifiers and its performance was assessed using confusion matrix. The result of this study found that the feature selection of IG, reliefF and mRMR was able to achieve highest accuracy with SVM, LR and RF classifier. mRMR with RF classifier achieved highest accuracy with the least number of top rank genes with 25 genes. Hybrid feature selection approached (mRMR + SVM) improved accuracy of top 3 highest rank genes using SVM, LR and RF classifier. Future work should aim to use other feature selection methods and classifiers to explore the classification accuracy with the least features subset in multiclass cancer dataset.","PeriodicalId":506443,"journal":{"name":"Journal of Advanced Research in Applied Sciences and Engineering Technology","volume":"12 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Research in Applied Sciences and Engineering Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37934/araset.46.1.7585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Breast cancer is a heterogeneous disease that involves molecular alteration, cellular alterations, and clinical outcome for which the classification of Breast cancer remains a challenge to diagnose. Current practice uses immunohistochemistry markers and clinical variables to classify Breast cancer, but this approach has limitations due to the inclusion of other tumour subtypes and healthy individuals. Machine learning approaches based on mRNA expression data offer new possibilities for researchers to investigate the potential of molecular biomarkers as one of the diagnostic characteristics. The purpose of this study is to evaluate features (genes) rank through feature selection method for Breast cancer diagnostic test. Three feature selection methods of IG, relief and mRMR were applied and subsets of top 100, 50, 25, 10, 5 and 3 were created. Each subset was tested with SVM, LR and RF classifiers and its performance was assessed using confusion matrix. The result of this study found that the feature selection of IG, reliefF and mRMR was able to achieve highest accuracy with SVM, LR and RF classifier. mRMR with RF classifier achieved highest accuracy with the least number of top rank genes with 25 genes. Hybrid feature selection approached (mRMR + SVM) improved accuracy of top 3 highest rank genes using SVM, LR and RF classifier. Future work should aim to use other feature selection methods and classifiers to explore the classification accuracy with the least features subset in multiclass cancer dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用芯片 RNA 表达数据对乳腺癌亚型进行分类
乳腺癌是一种异质性疾病,涉及分子改变、细胞改变和临床结果,因此乳腺癌的分类仍然是诊断中的一项挑战。目前的做法是利用免疫组化标记物和临床变量对乳腺癌进行分类,但这种方法由于包含了其他肿瘤亚型和健康个体而存在局限性。基于 mRNA 表达数据的机器学习方法为研究人员研究分子生物标志物作为诊断特征之一的潜力提供了新的可能性。本研究的目的是通过特征选择方法评估乳腺癌诊断测试的特征(基因)等级。研究应用了 IG、浮雕和 mRMR 三种特征选择方法,并创建了前 100、50、25、10、5 和 3 个子集。每个子集都用 SVM、LR 和 RF 分类器进行了测试,并用混淆矩阵评估了其性能。研究结果发现,在 SVM、LR 和 RF 分类器中,IG、f reliefF 和 mRMR 的特征选择能够达到最高的准确率。混合特征选择方法(mRMR + SVM)提高了使用 SVM、LR 和 RF 分类器的前 3 个最高等级基因的准确率。未来的工作应着眼于使用其他特征选择方法和分类器,以探索在多类癌症数据集中使用最少特征子集的分类准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.30
自引率
0.00%
发文量
0
期刊最新文献
Optimising Layout of a Left-Turn Bypass Intersection under Mixed Traffic Flow using Simulation: A Case Study in Pulau Pinang, Malaysia Design and Fabrication of Compact MIMO Array Antenna with Tapered Feed Line for 5G Applications Analysing Flipped Classroom Themes Trends in Computer Science Education (2007–2023) Using CiteSpace The Comparison of Fuzzy Regression Approaches with and without Clustering Method in Predicting Manufacturing Income Unveiling Effective CSCL Constructs for STEM Education in Malaysia and Indonesia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1