Hierarchical learning of gastric cancer molecular subtypes by integrating multi‐modal DNA‐level omics data and clinical stratification

Binyu Yang, Siying Liu, Jiemin Xie, Xi Tang, Pan Guan, Yifan Zhu, Xuemei Liu, Yunhui Xiong, Zuli Yang, Weiyao Li, Yonghua Wang, Wen Chen, Qingjiao Li, Li C. Xia
{"title":"Hierarchical learning of gastric cancer molecular subtypes by integrating multi‐modal DNA‐level omics data and clinical stratification","authors":"Binyu Yang, Siying Liu, Jiemin Xie, Xi Tang, Pan Guan, Yifan Zhu, Xuemei Liu, Yunhui Xiong, Zuli Yang, Weiyao Li, Yonghua Wang, Wen Chen, Qingjiao Li, Li C. Xia","doi":"10.1002/qub2.45","DOIUrl":null,"url":null,"abstract":"Molecular subtyping of gastric cancer (GC) aims to comprehend its genetic landscape. However, the efficacy of current subtyping methods is hampered by their mixed use of molecular features, a lack of strategy optimization, and the limited availability of public GC datasets. There is a pressing need for a precise and easily adoptable subtyping approach for early DNA‐based screening and treatment. Based on TCGA subtypes, we developed a novel DNA‐based hierarchical classifier for gastric cancer molecular subtyping (HCG), which employs gene mutations, copy number aberrations, and methylation patterns as predictors. By incorporating the closely related esophageal adenocarcinomas dataset, we expanded the TCGA GC dataset for the training and testing of HCG (n = 453). The optimization of HCG was achieved through three hierarchical strategies using Lasso‐Logistic regression, evaluated by their overall the area under receiver operating characteristic curve (auROC), accuracy, F1 score, the area under precision‐recall curve (auPRC) and their capability for clinical stratification using multivariate survival analysis. Subtype‐specific DNA alteration biomarkers were discerned through difference tests based on HCG defined subtypes. Our HCG classifier demonstrated superior performance in terms of overall auROC (0.95), accuracy (0.88), F1 score (0.87) and auPRC (0.86), significantly improving the clinical stratification of patients (overall p‐value = 0.032). Difference tests identified 25 subtype‐specific DNA alterations, including a high mutation rate in the SYNE1, ITGB4, and COL22A1 genes for the MSI subtype, and hypermethylation of ALS2CL, KIAA0406, and RPRD1B genes for the EBV subtype. HCG is an accurate and robust classifier for DNA‐based GC molecular subtyping with highly predictive clinical stratification performance. The training and test datasets, along with the analysis programs of HCG, are accessible on the GitHub website (github.com/LabxSCUT).","PeriodicalId":508846,"journal":{"name":"Quantitative Biology","volume":"77 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/qub2.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Molecular subtyping of gastric cancer (GC) aims to comprehend its genetic landscape. However, the efficacy of current subtyping methods is hampered by their mixed use of molecular features, a lack of strategy optimization, and the limited availability of public GC datasets. There is a pressing need for a precise and easily adoptable subtyping approach for early DNA‐based screening and treatment. Based on TCGA subtypes, we developed a novel DNA‐based hierarchical classifier for gastric cancer molecular subtyping (HCG), which employs gene mutations, copy number aberrations, and methylation patterns as predictors. By incorporating the closely related esophageal adenocarcinomas dataset, we expanded the TCGA GC dataset for the training and testing of HCG (n = 453). The optimization of HCG was achieved through three hierarchical strategies using Lasso‐Logistic regression, evaluated by their overall the area under receiver operating characteristic curve (auROC), accuracy, F1 score, the area under precision‐recall curve (auPRC) and their capability for clinical stratification using multivariate survival analysis. Subtype‐specific DNA alteration biomarkers were discerned through difference tests based on HCG defined subtypes. Our HCG classifier demonstrated superior performance in terms of overall auROC (0.95), accuracy (0.88), F1 score (0.87) and auPRC (0.86), significantly improving the clinical stratification of patients (overall p‐value = 0.032). Difference tests identified 25 subtype‐specific DNA alterations, including a high mutation rate in the SYNE1, ITGB4, and COL22A1 genes for the MSI subtype, and hypermethylation of ALS2CL, KIAA0406, and RPRD1B genes for the EBV subtype. HCG is an accurate and robust classifier for DNA‐based GC molecular subtyping with highly predictive clinical stratification performance. The training and test datasets, along with the analysis programs of HCG, are accessible on the GitHub website (github.com/LabxSCUT).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过整合多模态 DNA 级全息数据和临床分层,对胃癌分子亚型进行分层学习
胃癌(GC)的分子亚型分析旨在了解其基因状况。然而,目前的亚型鉴定方法因其对分子特征的混合使用、缺乏策略优化以及公共胃癌数据集的可用性有限而影响了其效果。目前迫切需要一种精确且易于采用的亚型鉴定方法,用于基于 DNA 的早期筛查和治疗。在 TCGA 亚型的基础上,我们开发了一种新的基于 DNA 的胃癌分子亚型分层分类器(HCG),它采用基因突变、拷贝数畸变和甲基化模式作为预测因子。通过纳入密切相关的食管腺癌数据集,我们扩展了用于训练和测试 HCG 的 TCGA 胃癌数据集(n = 453)。通过使用Lasso-Logistic回归的三种分层策略实现了HCG的优化,并通过接收者操作特征曲线下面积(auROC)、准确率、F1评分、精确度-召回曲线下面积(auPRC)以及使用多变量生存分析进行临床分层的能力对其进行了评估。亚型特异性DNA改变生物标记物是根据HCG定义的亚型通过差异检验确定的。我们的HCG分类器在总体auROC(0.95)、准确率(0.88)、F1得分(0.87)和auPRC(0.86)方面表现优异,显著改善了患者的临床分层(总体p值=0.032)。差异检验确定了 25 种亚型特异性 DNA 改变,包括 MSI 亚型中 SYNE1、ITGB4 和 COL22A1 基因的高突变率,以及 EBV 亚型中 ALS2CL、KIAA0406 和 RPRD1B 基因的高甲基化。HCG是一种基于DNA的GC分子亚型准确而稳健的分类器,具有高度的临床分层预测性能。HCG的训练和测试数据集以及分析程序可在GitHub网站(github.com/LabxSCUT)上访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Deterministic modelling of asymptomatic spread and disease stage progression in vaccine preventable infectious diseases Perspectives on benchmarking foundation models for network biology In silico designing and optimization of anti‐epidermal growth factor receptor scaffolds by complementary‐determining regions‐grafting technique Mathematical modeling of evolution of cell networks in epithelial tissues A  substructure‐aware graph neural network incorporating relation features for drug–drug interaction prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1