Sparse-Input Neural Networks to Differentiate 32 Primary Cancer Types on the Basis of Somatic Point Mutations

Onco Pub Date : 2022-03-31 DOI:10.3390/onco2020005
Nikolaos Dikaios
{"title":"Sparse-Input Neural Networks to Differentiate 32 Primary Cancer Types on the Basis of Somatic Point Mutations","authors":"Nikolaos Dikaios","doi":"10.3390/onco2020005","DOIUrl":null,"url":null,"abstract":"Background and Objective: This paper aimed to differentiate primary cancer types from primary tumor samples on the basis of somatic point mutations (SPMs). Primary cancer site identification is necessary to perform site-specific and potentially targeted treatment. Current methods such as histopathology and lab tests cannot accurately determine cancer origin, which results in empirical patient treatment and poor survival rates. The availability of large deoxyribonucleic acid sequencing datasets has allowed scientists to examine the ability of somatic mutations to classify primary cancer sites. These datasets are highly sparse since most genes will not be mutated, have a low signal-to-noise ratio, and are often imbalanced since rare cancers have fewer samples. Methods: To overcome these limitations a sparse-input neural network (SPINN) is suggested that projects the input data in a lower-dimensional space, where the more informative genes are used for learning. To train and evaluate SPINN, an extensive dataset for SPM was collected from the cancer genome atlas containing 7624 samples spanning 32 cancer types. Different sampling strategies were performed to balance the dataset. SPINN was further validated on an independent ICGC dataset that contained 226 samples spanning four cancer types. Results and Conclusions: SPINN consistently outperformed classification algorithms such as extreme gradient boosting, deep neural networks, and support vector machines, achieving an accuracy up to 73% on independent testing data. Certain primary cancer types/subtypes (e.g., lung, brain, colon, esophagus, skin, and thyroid) were classified with an F-score > 0.80.","PeriodicalId":74339,"journal":{"name":"Onco","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Onco","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/onco2020005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background and Objective: This paper aimed to differentiate primary cancer types from primary tumor samples on the basis of somatic point mutations (SPMs). Primary cancer site identification is necessary to perform site-specific and potentially targeted treatment. Current methods such as histopathology and lab tests cannot accurately determine cancer origin, which results in empirical patient treatment and poor survival rates. The availability of large deoxyribonucleic acid sequencing datasets has allowed scientists to examine the ability of somatic mutations to classify primary cancer sites. These datasets are highly sparse since most genes will not be mutated, have a low signal-to-noise ratio, and are often imbalanced since rare cancers have fewer samples. Methods: To overcome these limitations a sparse-input neural network (SPINN) is suggested that projects the input data in a lower-dimensional space, where the more informative genes are used for learning. To train and evaluate SPINN, an extensive dataset for SPM was collected from the cancer genome atlas containing 7624 samples spanning 32 cancer types. Different sampling strategies were performed to balance the dataset. SPINN was further validated on an independent ICGC dataset that contained 226 samples spanning four cancer types. Results and Conclusions: SPINN consistently outperformed classification algorithms such as extreme gradient boosting, deep neural networks, and support vector machines, achieving an accuracy up to 73% on independent testing data. Certain primary cancer types/subtypes (e.g., lung, brain, colon, esophagus, skin, and thyroid) were classified with an F-score > 0.80.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于体细胞点突变的稀疏输入神经网络鉴别32种原发性癌症类型
背景与目的:本文旨在根据体细胞点突变(SPMs)从原发性肿瘤样本中区分原发性癌症类型。原发性癌症位点识别对于进行位点特异性和潜在靶向治疗是必要的。目前的组织病理学和实验室检测等方法无法准确确定癌症的起源,这导致了经验患者治疗和低存活率。大型脱氧核糖核酸测序数据集的可用性使科学家能够检查体细胞突变对原发性癌症位点进行分类的能力。这些数据集高度稀疏,因为大多数基因不会突变,信噪比低,而且由于罕见癌症的样本较少,数据集往往不平衡。方法:为了克服这些限制,提出了一种稀疏输入神经网络(SPINN),将输入数据投影到较低维空间中,在那里使用信息量较大的基因进行学习。为了训练和评估SPINN,从癌症基因组图谱中收集了SPM的广泛数据集,该图谱包含跨越32种癌症类型的7624个样本。采用不同的采样策略来平衡数据集。SPINN在一个独立的ICGC数据集上得到了进一步验证,该数据集包含跨越四种癌症类型的226个样本。结果和结论:SPINN始终优于极限梯度提升、深度神经网络和支持向量机等分类算法,在独立测试数据上实现了高达73%的准确率。某些原发性癌症类型/亚型(如肺、脑、结肠、食道、皮肤和甲状腺)的F评分>0.80。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Transformative Technology Linking Patient’s mRNA Expression Profile to Anticancer Drug Efficacy Revisiting the Role of PD-L1 Overexpression in Prognosis and Clinicopathological Features in Patients with Oral Squamous Cell Carcinoma The World of Immunotherapy Needs More Than PD-1/PD-L1—Two of the New Kids on the Block: LAG-3 and TIGIT The Prognostic Role of Prognostic Nutritional Index and Controlling Nutritional Status in Predicting Survival in Older Adults with Oncological Disease: A Systematic Review How Reliable Are Predictions of CD8+ T Cell Epitope Recognition? Lessons for Cancer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1