A transcriptome-Based Deep Neural Network Classifier for Identifying the Site of Origin in Mucinous Cancer.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Cancer Informatics Pub Date : 2022-11-15 eCollection Date: 2022-01-01 DOI:10.1177/11769351221135141
Taejin Ahn, Kidong Kim, Hyojin Kim, Sarah Kim, Sangick Park, Kyoungbun Lee
{"title":"A transcriptome-Based Deep Neural Network Classifier for Identifying the Site of Origin in Mucinous Cancer.","authors":"Taejin Ahn, Kidong Kim, Hyojin Kim, Sarah Kim, Sangick Park, Kyoungbun Lee","doi":"10.1177/11769351221135141","DOIUrl":null,"url":null,"abstract":"Purpose: There is a lack of tools for identifying the site of origin in mucinous cancer. This study aimed to evaluate the performance of a transcriptome-based classifier for identifying the site of origin in mucinous cancer. Materials And Methods: Transcriptomic data of 1878 non-mucinous and 82 mucinous cancer specimens, with 7 sites of origin, namely, the uterine cervix (CESC), colon (COAD), pancreas (PAAD), stomach (STAD), uterine endometrium (UCEC), uterine carcinosarcoma (UCS), and ovary (OV), obtained from The Cancer Genome Atlas, were used as the training and validation sets, respectively. Transcriptomic data of 14 mucinous cancer specimens from a tissue archive were used as the test set. For identifying the site of origin, a set of 100 differentially expressed genes for each site of origin was selected. After removing multiple iterations of the same gene, 427 genes were chosen, and their RNA expression profiles, at each site of origin, were used to train the deep neural network classifier. The performance of the classifier was estimated using the training, validation, and test sets. Results: The accuracy of the model in the training set was 0.998, while that in the validation set was 0.939 (77/82). In the test set which is newly sequenced from a tissue archive, the model showed an accuracy of 0.857 (12/14). t-SNE analysis revealed that samples in the test set were part of the clusters obtained for the training set. Conclusion: Although limited by small sample size, we showed that a transcriptome-based classifier could correctly identify the site of origin of mucinous cancer.","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9669684/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351221135141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

Purpose: There is a lack of tools for identifying the site of origin in mucinous cancer. This study aimed to evaluate the performance of a transcriptome-based classifier for identifying the site of origin in mucinous cancer. Materials And Methods: Transcriptomic data of 1878 non-mucinous and 82 mucinous cancer specimens, with 7 sites of origin, namely, the uterine cervix (CESC), colon (COAD), pancreas (PAAD), stomach (STAD), uterine endometrium (UCEC), uterine carcinosarcoma (UCS), and ovary (OV), obtained from The Cancer Genome Atlas, were used as the training and validation sets, respectively. Transcriptomic data of 14 mucinous cancer specimens from a tissue archive were used as the test set. For identifying the site of origin, a set of 100 differentially expressed genes for each site of origin was selected. After removing multiple iterations of the same gene, 427 genes were chosen, and their RNA expression profiles, at each site of origin, were used to train the deep neural network classifier. The performance of the classifier was estimated using the training, validation, and test sets. Results: The accuracy of the model in the training set was 0.998, while that in the validation set was 0.939 (77/82). In the test set which is newly sequenced from a tissue archive, the model showed an accuracy of 0.857 (12/14). t-SNE analysis revealed that samples in the test set were part of the clusters obtained for the training set. Conclusion: Although limited by small sample size, we showed that a transcriptome-based classifier could correctly identify the site of origin of mucinous cancer.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于转录组的深层神经网络分类器识别黏液癌起源部位。
目的:目前缺乏确定黏液癌起源部位的工具。本研究旨在评估基于转录组的分类器在鉴别黏液癌起源部位方面的性能。材料与方法:将来自the cancer Genome Atlas的1878例非黏液性癌和82例黏液性癌的转录组学数据分别作为训练集和验证集,这些样本分别来自宫颈(CESC)、结肠(COAD)、胰腺(PAAD)、胃(STAD)、子宫内膜(UCEC)、子宫癌肉瘤(UCS)和卵巢(OV)等7个起源部位。采用组织档案中14例黏液癌标本的转录组学数据作为测试集。为了确定起源位点,每个起源位点选择了一组100个差异表达基因。在去除同一基因的多次迭代后,选择了427个基因,并使用它们在每个起源位点的RNA表达谱来训练深度神经网络分类器。使用训练集、验证集和测试集来估计分类器的性能。结果:模型在训练集中的准确率为0.998,在验证集中的准确率为0.939(77/82)。在组织档案新测序的测试集中,该模型的准确率为0.857(12/14)。t-SNE分析显示,测试集中的样本是为训练集获得的聚类的一部分。结论:虽然样本量有限,但我们发现基于转录组的分类器可以正确识别黏液癌的起源部位。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Cancer Informatics
Cancer Informatics Medicine-Oncology
CiteScore
3.00
自引率
5.00%
发文量
30
审稿时长
8 weeks
期刊介绍: The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.
期刊最新文献
Understanding the Biological Basis of Polygenic Risk Scores and Disparities in Prostate Cancer: A Comprehensive Genomic Analysis. Machine Learning for Dynamic Prognostication of Patients With Hepatocellular Carcinoma Using Time-Series Data: Survival Path Versus Dynamic-DeepHit HCC Model. Advancements and Challenges in the Image-Based Diagnosis of Lung and Colon Cancer: A Comprehensive Review. Prediction of Treatment Recommendations Via Ensemble Machine Learning Algorithms for Non-Small Cell Lung Cancer Patients in Personalized Medicine. Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1