Intersection Three Feature Selection and Machine Learning Approaches for Cancer Classification

Mahmood Khalsan, Mu Mu, E. Al-Shamery, Lee Machado, Michael Opoku Agyeman, S. Ajit
{"title":"Intersection Three Feature Selection and Machine Learning Approaches for Cancer Classification","authors":"Mahmood Khalsan, Mu Mu, E. Al-Shamery, Lee Machado, Michael Opoku Agyeman, S. Ajit","doi":"10.1109/ICSSE58758.2023.10227163","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) methods have a plaid an important role in classification and prediction in most fields. However, analyzing gene expression is remain complex in cancer classification because of the high dimensionality of the provided dataset in gene expression. Consequentially, intersection-based three feature selection methods (ITFS) was developed to select optimal features (genes) that would be used as identifiers for classification and reduce the dimensionality of the available data in gene expression. ITFS has employed three feature selection methods (Mutual Information (MI), F-ClassIf, and Minimum Redundancy Maximum Relevance (mRMR)). Therefore, employing intersection concept that leads to select only the genes that have been selected by the three feature selection techniques. These selected genes would be used as identifiers for the training classifier model. Our study applied the proposed ITFS to six gene expression datasets downloaded from (Microarray and RNAseq tools) for validating the effectiveness of ITFS on classifier methods. The highest average accuracy improvement in the six datasets was when Multilayer Perceptron (MLP) and ITFS employed together compared to employing MLP individually. The proposed ITFS-MLP model has produced classification accuracy between (92% to 100%) for the six datasets and the average accuracy is 96%.","PeriodicalId":280745,"journal":{"name":"2023 International Conference on System Science and Engineering (ICSSE)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE58758.2023.10227163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) methods have a plaid an important role in classification and prediction in most fields. However, analyzing gene expression is remain complex in cancer classification because of the high dimensionality of the provided dataset in gene expression. Consequentially, intersection-based three feature selection methods (ITFS) was developed to select optimal features (genes) that would be used as identifiers for classification and reduce the dimensionality of the available data in gene expression. ITFS has employed three feature selection methods (Mutual Information (MI), F-ClassIf, and Minimum Redundancy Maximum Relevance (mRMR)). Therefore, employing intersection concept that leads to select only the genes that have been selected by the three feature selection techniques. These selected genes would be used as identifiers for the training classifier model. Our study applied the proposed ITFS to six gene expression datasets downloaded from (Microarray and RNAseq tools) for validating the effectiveness of ITFS on classifier methods. The highest average accuracy improvement in the six datasets was when Multilayer Perceptron (MLP) and ITFS employed together compared to employing MLP individually. The proposed ITFS-MLP model has produced classification accuracy between (92% to 100%) for the six datasets and the average accuracy is 96%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
交叉三特征选择和机器学习方法用于癌症分类
机器学习方法在许多领域的分类和预测中发挥着重要的作用。然而,由于基因表达数据集的高维性,分析基因表达在癌症分类中仍然很复杂。因此,开发了基于交集的三特征选择方法(ITFS)来选择最优特征(基因),这些特征(基因)将用作分类标识符,并降低基因表达中可用数据的维数。ITFS采用了互信息(MI)、F-ClassIf和最小冗余最大相关性(mRMR)三种特征选择方法。因此,采用交叉概念导致只选择被三种特征选择技术选择的基因。这些被选择的基因将被用作训练分类器模型的标识符。我们的研究将提出的ITFS应用于从Microarray和RNAseq工具下载的六个基因表达数据集,以验证ITFS对分类器方法的有效性。当多层感知器(MLP)和ITFS一起使用时,与单独使用MLP相比,六个数据集的平均精度提高最高。提出的ITFS-MLP模型对6个数据集的分类准确率在(92% ~ 100%)之间,平均准确率为96%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Deep Q-Network (DQN) Approach for Automatic Vehicles Applied in the Intelligent Transportation System (ITS) Improvement in Proportional Energy Sharing and DC Bus Voltage Restoring for DC Microgrid in the Islanded Operation Mode A New Buck-Boost Converter Structure With Improved Efficiency Performance of Energy Harvesting Aided Multi-hop Mobile Relay Networks With and Without Using Cooperative Communication A New Novel of Prescribed Optimal Control and Its Application for Smart Damping System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1