Characterization of Cancer Types by Applying Machine Learning Methods on Blood RNA-Sequencing Data

2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) Pub Date : 2019-10-01 DOI:10.1109/ISMSIT.2019.8932905

Cem Bugra Alkan, Z. Işik

{"title":"Characterization of Cancer Types by Applying Machine Learning Methods on Blood RNA-Sequencing Data","authors":"Cem Bugra Alkan, Z. Işik","doi":"10.1109/ISMSIT.2019.8932905","DOIUrl":null,"url":null,"abstract":"RNA-sequencing data is used to measure mRNA levels of genes based on tissue or blood samples. The critical changes in transcriptome can be observed more accurately by using RNA-sequencing data that eventually leads to understanding different behavior of the disease. In this study, different feature selection methods and machine learning algorithms are compared for the accurate classification of cancer types by using RNA-sequencing data from blood samples. In the analysis, seven cancer types were compared with each other and healthy samples. Correlation coefficient and information gain analysis are applied as feature selection methods. The selected genes are provided as the input of Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) methods. All machine learning methods were evaluated by applying 10-fold cross-validation. In the experiments, machine learning models achieved higher than 85% accuracy in the discrimination of hepatobiliary, lung, and pancreatic cancer types. When machine learning models are evaluated in terms of accuracy, RF and SVM were more successful than NB in many cases.","PeriodicalId":169791,"journal":{"name":"2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMSIT.2019.8932905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

RNA-sequencing data is used to measure mRNA levels of genes based on tissue or blood samples. The critical changes in transcriptome can be observed more accurately by using RNA-sequencing data that eventually leads to understanding different behavior of the disease. In this study, different feature selection methods and machine learning algorithms are compared for the accurate classification of cancer types by using RNA-sequencing data from blood samples. In the analysis, seven cancer types were compared with each other and healthy samples. Correlation coefficient and information gain analysis are applied as feature selection methods. The selected genes are provided as the input of Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) methods. All machine learning methods were evaluated by applying 10-fold cross-validation. In the experiments, machine learning models achieved higher than 85% accuracy in the discrimination of hepatobiliary, lung, and pancreatic cancer types. When machine learning models are evaluated in terms of accuracy, RF and SVM were more successful than NB in many cases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在血液rna测序数据上应用机器学习方法表征癌症类型

rna测序数据用于测量基于组织或血液样本的基因mRNA水平。通过使用rna测序数据，可以更准确地观察转录组的关键变化，从而最终了解疾病的不同行为。在本研究中，通过使用来自血液样本的rna测序数据，比较了不同的特征选择方法和机器学习算法对癌症类型的准确分类。在分析中，将七种癌症类型与健康样本进行了比较。采用相关系数分析和信息增益分析作为特征选择方法。选择的基因作为支持向量机(SVM)、Naïve贝叶斯(NB)和随机森林(RF)方法的输入。通过应用10倍交叉验证对所有机器学习方法进行评估。在实验中，机器学习模型在区分肝胆癌、肺癌和胰腺癌类型方面的准确率超过85%。当评估机器学习模型的准确性时，RF和SVM在许多情况下比NB更成功。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT)

自引率

0.00%

发文量

期刊最新文献

Machine Learning Applications in Disease Surveillance Open-Source Web-Based Software for Performing Permutation Tests Graph-Based Representation of Customer Reviews for Online Stores Aynı Şartlar Altında Farklı Üretici Çekişmeli Ağların Karşılaştırılması Keratinocyte Carcinoma Detection via Convolutional Neural Networks