{"title":"Characterization of Cancer Types by Applying Machine Learning Methods on Blood RNA-Sequencing Data","authors":"Cem Bugra Alkan, Z. Işik","doi":"10.1109/ISMSIT.2019.8932905","DOIUrl":null,"url":null,"abstract":"RNA-sequencing data is used to measure mRNA levels of genes based on tissue or blood samples. The critical changes in transcriptome can be observed more accurately by using RNA-sequencing data that eventually leads to understanding different behavior of the disease. In this study, different feature selection methods and machine learning algorithms are compared for the accurate classification of cancer types by using RNA-sequencing data from blood samples. In the analysis, seven cancer types were compared with each other and healthy samples. Correlation coefficient and information gain analysis are applied as feature selection methods. The selected genes are provided as the input of Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) methods. All machine learning methods were evaluated by applying 10-fold cross-validation. In the experiments, machine learning models achieved higher than 85% accuracy in the discrimination of hepatobiliary, lung, and pancreatic cancer types. When machine learning models are evaluated in terms of accuracy, RF and SVM were more successful than NB in many cases.","PeriodicalId":169791,"journal":{"name":"2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMSIT.2019.8932905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
RNA-sequencing data is used to measure mRNA levels of genes based on tissue or blood samples. The critical changes in transcriptome can be observed more accurately by using RNA-sequencing data that eventually leads to understanding different behavior of the disease. In this study, different feature selection methods and machine learning algorithms are compared for the accurate classification of cancer types by using RNA-sequencing data from blood samples. In the analysis, seven cancer types were compared with each other and healthy samples. Correlation coefficient and information gain analysis are applied as feature selection methods. The selected genes are provided as the input of Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) methods. All machine learning methods were evaluated by applying 10-fold cross-validation. In the experiments, machine learning models achieved higher than 85% accuracy in the discrimination of hepatobiliary, lung, and pancreatic cancer types. When machine learning models are evaluated in terms of accuracy, RF and SVM were more successful than NB in many cases.