Taejin Ahn, Kidong Kim, Hyojin Kim, Sarah Kim, Sangick Park, Kyoungbun Lee
{"title":"A transcriptome-Based Deep Neural Network Classifier for Identifying the Site of Origin in Mucinous Cancer.","authors":"Taejin Ahn, Kidong Kim, Hyojin Kim, Sarah Kim, Sangick Park, Kyoungbun Lee","doi":"10.1177/11769351221135141","DOIUrl":null,"url":null,"abstract":"Purpose: There is a lack of tools for identifying the site of origin in mucinous cancer. This study aimed to evaluate the performance of a transcriptome-based classifier for identifying the site of origin in mucinous cancer. Materials And Methods: Transcriptomic data of 1878 non-mucinous and 82 mucinous cancer specimens, with 7 sites of origin, namely, the uterine cervix (CESC), colon (COAD), pancreas (PAAD), stomach (STAD), uterine endometrium (UCEC), uterine carcinosarcoma (UCS), and ovary (OV), obtained from The Cancer Genome Atlas, were used as the training and validation sets, respectively. Transcriptomic data of 14 mucinous cancer specimens from a tissue archive were used as the test set. For identifying the site of origin, a set of 100 differentially expressed genes for each site of origin was selected. After removing multiple iterations of the same gene, 427 genes were chosen, and their RNA expression profiles, at each site of origin, were used to train the deep neural network classifier. The performance of the classifier was estimated using the training, validation, and test sets. Results: The accuracy of the model in the training set was 0.998, while that in the validation set was 0.939 (77/82). In the test set which is newly sequenced from a tissue archive, the model showed an accuracy of 0.857 (12/14). t-SNE analysis revealed that samples in the test set were part of the clusters obtained for the training set. Conclusion: Although limited by small sample size, we showed that a transcriptome-based classifier could correctly identify the site of origin of mucinous cancer.","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9669684/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351221135141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 1
Abstract
Purpose: There is a lack of tools for identifying the site of origin in mucinous cancer. This study aimed to evaluate the performance of a transcriptome-based classifier for identifying the site of origin in mucinous cancer. Materials And Methods: Transcriptomic data of 1878 non-mucinous and 82 mucinous cancer specimens, with 7 sites of origin, namely, the uterine cervix (CESC), colon (COAD), pancreas (PAAD), stomach (STAD), uterine endometrium (UCEC), uterine carcinosarcoma (UCS), and ovary (OV), obtained from The Cancer Genome Atlas, were used as the training and validation sets, respectively. Transcriptomic data of 14 mucinous cancer specimens from a tissue archive were used as the test set. For identifying the site of origin, a set of 100 differentially expressed genes for each site of origin was selected. After removing multiple iterations of the same gene, 427 genes were chosen, and their RNA expression profiles, at each site of origin, were used to train the deep neural network classifier. The performance of the classifier was estimated using the training, validation, and test sets. Results: The accuracy of the model in the training set was 0.998, while that in the validation set was 0.939 (77/82). In the test set which is newly sequenced from a tissue archive, the model showed an accuracy of 0.857 (12/14). t-SNE analysis revealed that samples in the test set were part of the clusters obtained for the training set. Conclusion: Although limited by small sample size, we showed that a transcriptome-based classifier could correctly identify the site of origin of mucinous cancer.
目的:目前缺乏确定黏液癌起源部位的工具。本研究旨在评估基于转录组的分类器在鉴别黏液癌起源部位方面的性能。材料与方法:将来自the cancer Genome Atlas的1878例非黏液性癌和82例黏液性癌的转录组学数据分别作为训练集和验证集,这些样本分别来自宫颈(CESC)、结肠(COAD)、胰腺(PAAD)、胃(STAD)、子宫内膜(UCEC)、子宫癌肉瘤(UCS)和卵巢(OV)等7个起源部位。采用组织档案中14例黏液癌标本的转录组学数据作为测试集。为了确定起源位点,每个起源位点选择了一组100个差异表达基因。在去除同一基因的多次迭代后,选择了427个基因,并使用它们在每个起源位点的RNA表达谱来训练深度神经网络分类器。使用训练集、验证集和测试集来估计分类器的性能。结果:模型在训练集中的准确率为0.998,在验证集中的准确率为0.939(77/82)。在组织档案新测序的测试集中,该模型的准确率为0.857(12/14)。t-SNE分析显示,测试集中的样本是为训练集获得的聚类的一部分。结论:虽然样本量有限,但我们发现基于转录组的分类器可以正确识别黏液癌的起源部位。
期刊介绍:
The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.