基于RNAseq芯片的癌症诊断机器学习模型

Hanaa Torkey, Mostafa Atlam, N. El-Fishawy, Hanaa Salem
{"title":"基于RNAseq芯片的癌症诊断机器学习模型","authors":"Hanaa Torkey, Mostafa Atlam, N. El-Fishawy, Hanaa Salem","doi":"10.21608/mjeer.2020.20533.1000","DOIUrl":null,"url":null,"abstract":"Microarray technology is one of the most important recent breakthroughs in experimental molecular biology. This novel technology for thousands of genes concurrently allows the supervising of expression levels in cells and has been increasingly used in cancer research to understand more of the molecular variations among tumors so that a more reliable classification becomes attainable. Machine learning techniques are loosely used to create substantial and precise classification models. In this paper, a function called Feature Reduction Classification Optimization (FeRCO) is proposed. FeRCO function uses machine learning techniques applied upon RNAseq microarray data for predicting whether the patient is diseased or not. The main purpose of FeRCO function is to define the minimum number of features using the most fitting reduction technique along with classification technique that give the highest classification accuracy. These techniques include Support Vector Machine (SVM) both linear and kernel, Decision Trees (DT), Random Forest (RF), K-Nearest Neighbours (KNN) and Naïve Bayes (NB). Principle Component Analysis (PCA) both linear and kernel, Linear Discriminant Analysis (LDA) and Factor Analysis (FA) along with different machine learning techniques were used to find a lower-dimensional subspace with better discriminatory features for better classification. The major outcomes of this research can be considered as a roadmap for interesting researchers in this field to be able to choose the most suitable machine learning algorithm whatever classification or reduction. The results show that FA and LPCA are the best reduction techniques to be used with the three datasets providing an accuracy up to 100% with TCGA and simulation datasets and accuracy up to 97.86% with WDBC datasets. LSVM is the best classification technique to be used with Linear PCA (LPCA), FA and LDA. RF is the best classification technique to be used with Kernel PCA (KPCA). Keywords— Cancer Classification, Diagnosis, Gene Expression, Gene Reduction, Machine learning.","PeriodicalId":218019,"journal":{"name":"Menoufia Journal of Electronic Engineering Research","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Machine Learning Model for Cancer Diagnosis based on RNAseq Microarray\",\"authors\":\"Hanaa Torkey, Mostafa Atlam, N. El-Fishawy, Hanaa Salem\",\"doi\":\"10.21608/mjeer.2020.20533.1000\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microarray technology is one of the most important recent breakthroughs in experimental molecular biology. This novel technology for thousands of genes concurrently allows the supervising of expression levels in cells and has been increasingly used in cancer research to understand more of the molecular variations among tumors so that a more reliable classification becomes attainable. Machine learning techniques are loosely used to create substantial and precise classification models. In this paper, a function called Feature Reduction Classification Optimization (FeRCO) is proposed. FeRCO function uses machine learning techniques applied upon RNAseq microarray data for predicting whether the patient is diseased or not. The main purpose of FeRCO function is to define the minimum number of features using the most fitting reduction technique along with classification technique that give the highest classification accuracy. These techniques include Support Vector Machine (SVM) both linear and kernel, Decision Trees (DT), Random Forest (RF), K-Nearest Neighbours (KNN) and Naïve Bayes (NB). Principle Component Analysis (PCA) both linear and kernel, Linear Discriminant Analysis (LDA) and Factor Analysis (FA) along with different machine learning techniques were used to find a lower-dimensional subspace with better discriminatory features for better classification. The major outcomes of this research can be considered as a roadmap for interesting researchers in this field to be able to choose the most suitable machine learning algorithm whatever classification or reduction. The results show that FA and LPCA are the best reduction techniques to be used with the three datasets providing an accuracy up to 100% with TCGA and simulation datasets and accuracy up to 97.86% with WDBC datasets. LSVM is the best classification technique to be used with Linear PCA (LPCA), FA and LDA. RF is the best classification technique to be used with Kernel PCA (KPCA). Keywords— Cancer Classification, Diagnosis, Gene Expression, Gene Reduction, Machine learning.\",\"PeriodicalId\":218019,\"journal\":{\"name\":\"Menoufia Journal of Electronic Engineering Research\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Menoufia Journal of Electronic Engineering Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21608/mjeer.2020.20533.1000\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Menoufia Journal of Electronic Engineering Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/mjeer.2020.20533.1000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

微阵列技术是近年来实验分子生物学领域最重要的突破之一。这项新技术可以同时监测数千个基因在细胞中的表达水平,并越来越多地用于癌症研究,以了解更多的肿瘤分子变异,从而实现更可靠的分类。机器学习技术被松散地用于创建大量和精确的分类模型。本文提出了一种特征约简分类优化(FeRCO)函数。FeRCO函数使用应用于RNAseq微阵列数据的机器学习技术来预测患者是否患病。FeRCO函数的主要目的是使用最拟合的约简技术和分类技术来定义最小数量的特征,从而获得最高的分类精度。这些技术包括线性和核支持向量机(SVM)、决策树(DT)、随机森林(RF)、k近邻(KNN)和Naïve贝叶斯(NB)。采用线性和核主成分分析(PCA)、线性判别分析(LDA)和因子分析(FA)以及不同的机器学习技术,寻找具有更好判别特征的低维子空间,以进行更好的分类。本研究的主要成果可以被视为该领域有趣的研究人员能够选择最适合的机器学习算法的路线图,无论是分类还是约简。结果表明,FA和LPCA是三种数据集的最佳约简技术,对TCGA和模拟数据集的准确率可达100%,对WDBC数据集的准确率可达97.86%。LSVM是与线性主成分分析(LPCA)、FA和LDA结合使用的最佳分类技术。RF是核主成分分析(KPCA)的最佳分类技术。关键词:癌症分类,诊断,基因表达,基因还原,机器学习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine Learning Model for Cancer Diagnosis based on RNAseq Microarray
Microarray technology is one of the most important recent breakthroughs in experimental molecular biology. This novel technology for thousands of genes concurrently allows the supervising of expression levels in cells and has been increasingly used in cancer research to understand more of the molecular variations among tumors so that a more reliable classification becomes attainable. Machine learning techniques are loosely used to create substantial and precise classification models. In this paper, a function called Feature Reduction Classification Optimization (FeRCO) is proposed. FeRCO function uses machine learning techniques applied upon RNAseq microarray data for predicting whether the patient is diseased or not. The main purpose of FeRCO function is to define the minimum number of features using the most fitting reduction technique along with classification technique that give the highest classification accuracy. These techniques include Support Vector Machine (SVM) both linear and kernel, Decision Trees (DT), Random Forest (RF), K-Nearest Neighbours (KNN) and Naïve Bayes (NB). Principle Component Analysis (PCA) both linear and kernel, Linear Discriminant Analysis (LDA) and Factor Analysis (FA) along with different machine learning techniques were used to find a lower-dimensional subspace with better discriminatory features for better classification. The major outcomes of this research can be considered as a roadmap for interesting researchers in this field to be able to choose the most suitable machine learning algorithm whatever classification or reduction. The results show that FA and LPCA are the best reduction techniques to be used with the three datasets providing an accuracy up to 100% with TCGA and simulation datasets and accuracy up to 97.86% with WDBC datasets. LSVM is the best classification technique to be used with Linear PCA (LPCA), FA and LDA. RF is the best classification technique to be used with Kernel PCA (KPCA). Keywords— Cancer Classification, Diagnosis, Gene Expression, Gene Reduction, Machine learning.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification of Brain Neuroimaging for Alzheimer's Disease Employing Principal Component Analysis DICOM Medical Image Security with DNA- Non-Uniform Cellular Automata and JSMP Map Based Encryption Technique Photonic Crystal Fiber Sensors, Literature Review, Challenges, and Some Novel Trends Cascading ensemble machine learning algorithms for maize yield level prediction Vibration Control of Horizontally Supported Jeffcott-Rotor System Utilizing PIRC-controller
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1