Ali J. Abidalkareem, Ali K. Ibrahim, Moaed A. Abd, Oneeb Rehman, Hanqi Zhuang
{"title":"利用机器学习识别乳腺癌不同阶段的基因表达","authors":"Ali J. Abidalkareem, Ali K. Ibrahim, Moaed A. Abd, Oneeb Rehman, Hanqi Zhuang","doi":"10.3390/cancers16101864","DOIUrl":null,"url":null,"abstract":"Determining the tumor origin in humans is vital in clinical applications of molecular diagnostics. Metastatic cancer is usually a very aggressive disease with limited diagnostic procedures, despite the fact that many protocols have been evaluated for their effectiveness in prognostication. Research has shown that dysregulation in miRNAs (a class of non-coding, regulatory RNAs) is remarkably involved in oncogenic conditions. This research paper aims to develop a machine learning model that processes an array of miRNAs in 1097 metastatic tissue samples from patients who suffered from various stages of breast cancer. The suggested machine learning model is fed with miRNA quantitative read count data taken from The Cancer Genome Atlas Data Repository. Two main feature-selection techniques have been used, mainly Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance, to identify the most discriminant and relevant miRNAs for their up-regulated and down-regulated states. These miRNAs are then validated as biological identifiers for each of the four cancer stages in breast tumors. Both machine learning algorithms yield performance scores that are significantly higher than the traditional fold-change approach, particularly in earlier stages of cancer, with Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance achieving accuracy scores of up to 0.983 and 0.931, respectively, compared to 0.920 for the FC method. This study underscores the potential of advanced feature-selection methods in enhancing the accuracy of cancer stage identification, paving the way for improved diagnostic and therapeutic strategies in oncology.","PeriodicalId":504676,"journal":{"name":"Cancers","volume":"22 15","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of Gene Expression in Different Stages of Breast Cancer with Machine Learning\",\"authors\":\"Ali J. Abidalkareem, Ali K. Ibrahim, Moaed A. Abd, Oneeb Rehman, Hanqi Zhuang\",\"doi\":\"10.3390/cancers16101864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Determining the tumor origin in humans is vital in clinical applications of molecular diagnostics. Metastatic cancer is usually a very aggressive disease with limited diagnostic procedures, despite the fact that many protocols have been evaluated for their effectiveness in prognostication. Research has shown that dysregulation in miRNAs (a class of non-coding, regulatory RNAs) is remarkably involved in oncogenic conditions. This research paper aims to develop a machine learning model that processes an array of miRNAs in 1097 metastatic tissue samples from patients who suffered from various stages of breast cancer. The suggested machine learning model is fed with miRNA quantitative read count data taken from The Cancer Genome Atlas Data Repository. Two main feature-selection techniques have been used, mainly Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance, to identify the most discriminant and relevant miRNAs for their up-regulated and down-regulated states. These miRNAs are then validated as biological identifiers for each of the four cancer stages in breast tumors. Both machine learning algorithms yield performance scores that are significantly higher than the traditional fold-change approach, particularly in earlier stages of cancer, with Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance achieving accuracy scores of up to 0.983 and 0.931, respectively, compared to 0.920 for the FC method. This study underscores the potential of advanced feature-selection methods in enhancing the accuracy of cancer stage identification, paving the way for improved diagnostic and therapeutic strategies in oncology.\",\"PeriodicalId\":504676,\"journal\":{\"name\":\"Cancers\",\"volume\":\"22 15\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/cancers16101864\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/cancers16101864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
确定人类肿瘤的来源对于分子诊断的临床应用至关重要。转移性癌症通常是一种侵袭性很强的疾病,其诊断程序非常有限,尽管许多方案已被评估为对预后有效。研究表明,miRNAs(一类非编码、调节性 RNAs)的失调与致癌条件密切相关。本研究论文旨在开发一种机器学习模型,用于处理 1097 份转移组织样本中的 miRNAs 阵列,这些样本来自不同阶段的乳腺癌患者。所建议的机器学习模型采用的 miRNA 定量读数数据取自癌症基因组图谱数据存储库(The Cancer Genome Atlas Data Repository)。该模型使用了两种主要的特征选择技术,主要是邻近成分分析和最小冗余度最大相关性,以识别最具区分性和与上调和下调状态最相关的 miRNA。然后对这些 miRNA 进行验证,以作为乳腺肿瘤四个癌症分期的生物学标识符。这两种机器学习算法的性能得分都明显高于传统的折叠变化方法,特别是在癌症的早期阶段,邻近成分分析和最小冗余最大相关性的准确度得分分别高达 0.983 和 0.931,而 FC 方法的准确度得分仅为 0.920。这项研究强调了先进特征选择方法在提高癌症分期识别准确性方面的潜力,为改进肿瘤学诊断和治疗策略铺平了道路。
Identification of Gene Expression in Different Stages of Breast Cancer with Machine Learning
Determining the tumor origin in humans is vital in clinical applications of molecular diagnostics. Metastatic cancer is usually a very aggressive disease with limited diagnostic procedures, despite the fact that many protocols have been evaluated for their effectiveness in prognostication. Research has shown that dysregulation in miRNAs (a class of non-coding, regulatory RNAs) is remarkably involved in oncogenic conditions. This research paper aims to develop a machine learning model that processes an array of miRNAs in 1097 metastatic tissue samples from patients who suffered from various stages of breast cancer. The suggested machine learning model is fed with miRNA quantitative read count data taken from The Cancer Genome Atlas Data Repository. Two main feature-selection techniques have been used, mainly Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance, to identify the most discriminant and relevant miRNAs for their up-regulated and down-regulated states. These miRNAs are then validated as biological identifiers for each of the four cancer stages in breast tumors. Both machine learning algorithms yield performance scores that are significantly higher than the traditional fold-change approach, particularly in earlier stages of cancer, with Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance achieving accuracy scores of up to 0.983 and 0.931, respectively, compared to 0.920 for the FC method. This study underscores the potential of advanced feature-selection methods in enhancing the accuracy of cancer stage identification, paving the way for improved diagnostic and therapeutic strategies in oncology.