Saad Awadh Alanazi , Nasser Alshammari , Maddalah Alruwaili , Kashaf Junaid , Muhammad Rizwan Abid , Fahad Ahmad
{"title":"Integrative analysis of RNA expression data unveils distinct cancer types through machine learning techniques","authors":"Saad Awadh Alanazi , Nasser Alshammari , Maddalah Alruwaili , Kashaf Junaid , Muhammad Rizwan Abid , Fahad Ahmad","doi":"10.1016/j.sjbs.2023.103918","DOIUrl":null,"url":null,"abstract":"<div><p>Cancer is a highly complex and heterogeneous disease. Traditional methods of cancer classification based on histopathology have limitations in guiding personalized prognosis and therapy. Gene expression profiling provides a powerful approach to unraveling molecular intricacies and better-stratifying cancer subtypes. In this study, we performed an integrative analysis of RNA sequencing data from five cancer types - BRCA, KIRC, COAD, LUAD, and PRAD. A machine learning workflow consisting of dataset identification, normalization, feature selection, dimensionality reduction, clustering, and classification was implemented. The k-means algorithm was applied to categorize samples into distinct clusters based solely on gene expression patterns. Five unique clusters emerged from the unsupervised machine learning based analysis, significantly correlating with the known cancer types. BRCA aligned predominantly with one cluster, while COAD spanned three clusters. KIRC was represented within two main clusters. LUAD is associated strongly with a single cluster and PRAD with another cluster. This demonstrates the ability of machine learning approaches to unravel complex signatures within transcriptomic profiles that can delineate cancer subtypes. The proposed study highlights the potential of integrative analytics to derive meaningful biological insights from high-dimensional omics datasets. Molecular subtyping through machine learning clustering enhances our understanding of the intrinsic heterogeneities and pathways dysregulated in different cancers. Overall, this study exemplifies a powerful computational framework to classify gene expressions of patients having different types of cancers and guide personalized therapeutic decisions. Finally, Wide Neural Network demonstrates a significantly higher accuracy, achieving 99.834% on the validation set and an even more impressive 99.995% on the test set.</p></div>","PeriodicalId":21540,"journal":{"name":"Saudi Journal of Biological Sciences","volume":"31 3","pages":"Article 103918"},"PeriodicalIF":4.4000,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319562X23003637/pdfft?md5=df5c9c66999a3ff353bc28d572e93c5c&pid=1-s2.0-S1319562X23003637-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Saudi Journal of Biological Sciences","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319562X23003637","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Cancer is a highly complex and heterogeneous disease. Traditional methods of cancer classification based on histopathology have limitations in guiding personalized prognosis and therapy. Gene expression profiling provides a powerful approach to unraveling molecular intricacies and better-stratifying cancer subtypes. In this study, we performed an integrative analysis of RNA sequencing data from five cancer types - BRCA, KIRC, COAD, LUAD, and PRAD. A machine learning workflow consisting of dataset identification, normalization, feature selection, dimensionality reduction, clustering, and classification was implemented. The k-means algorithm was applied to categorize samples into distinct clusters based solely on gene expression patterns. Five unique clusters emerged from the unsupervised machine learning based analysis, significantly correlating with the known cancer types. BRCA aligned predominantly with one cluster, while COAD spanned three clusters. KIRC was represented within two main clusters. LUAD is associated strongly with a single cluster and PRAD with another cluster. This demonstrates the ability of machine learning approaches to unravel complex signatures within transcriptomic profiles that can delineate cancer subtypes. The proposed study highlights the potential of integrative analytics to derive meaningful biological insights from high-dimensional omics datasets. Molecular subtyping through machine learning clustering enhances our understanding of the intrinsic heterogeneities and pathways dysregulated in different cancers. Overall, this study exemplifies a powerful computational framework to classify gene expressions of patients having different types of cancers and guide personalized therapeutic decisions. Finally, Wide Neural Network demonstrates a significantly higher accuracy, achieving 99.834% on the validation set and an even more impressive 99.995% on the test set.
期刊介绍:
Saudi Journal of Biological Sciences is an English language, peer-reviewed scholarly publication in the area of biological sciences. Saudi Journal of Biological Sciences publishes original papers, reviews and short communications on, but not limited to:
• Biology, Ecology and Ecosystems, Environmental and Biodiversity
• Conservation
• Microbiology
• Physiology
• Genetics and Epidemiology
Saudi Journal of Biological Sciences is the official publication of the Saudi Society for Biological Sciences and is published by King Saud University in collaboration with Elsevier and is edited by an international group of eminent researchers.