基于图神经网络的乳腺癌蛋白群体分层映射方法。

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS BMC Bioinformatics Pub Date : 2025-01-21 DOI:10.1186/s12859-024-06015-x
Xiao Zhang, Qian Liu
{"title":"基于图神经网络的乳腺癌蛋白群体分层映射方法。","authors":"Xiao Zhang, Qian Liu","doi":"10.1186/s12859-024-06015-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.</p><p><strong>Results: </strong>Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.</p><p><strong>Conclusion: </strong>The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"23"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11749236/pdf/","citationCount":"0","resultStr":"{\"title\":\"A graph neural network approach for hierarchical mapping of breast cancer protein communities.\",\"authors\":\"Xiao Zhang, Qian Liu\",\"doi\":\"10.1186/s12859-024-06015-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.</p><p><strong>Results: </strong>Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.</p><p><strong>Conclusion: </strong>The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"23\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11749236/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-024-06015-x\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-06015-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景:全面绘制乳腺癌蛋白群落的层次结构并从中识别潜在的生物标志物是乳腺癌研究的一种很有前途的途径。现有的方法是主观的,没有考虑到蛋白质序列的信息。深度学习可以自动从蛋白质序列和蛋白质-蛋白质相互作用中学习特征,用于分层聚类。结果:利用大量公开可用的蛋白质组学数据,我们使用一种新的层次图神经网络,在基因本体术语的监督和预训练的深度上下文语言模型的帮助下,为乳腺癌蛋白质社区创建了一个层次树。然后,应用群-套索算法来识别同时承受突变负担和生存负担的蛋白质群落,当特定药物分子靶向时发生显着改变,并显示出癌症依赖性扰动。由此产生的蛋白质群落层次图显示了基因水平的突变和生存信息如何在不同尺度上汇聚到蛋白质群落中。通过将BRCA2趋同为乳腺癌热点,建立了模型的内部有效性。与乳腺癌细胞依赖性的进一步重叠表明,SUPT6H和RAD21以及它们各自的蛋白质系统HOST:37和HOST:861是潜在的生物标志物。利用HOST:37和HOST:861基因集的基因水平扰动数据,选择3种fda批准的具有较高治疗价值的药物作为潜在治疗药物进行进一步评价。这些药物包括巯基嘌呤、吡格列酮和秋水仙碱。结论:本文提出的图神经网络方法在层次结构中分析乳腺癌蛋白群落,为乳腺癌的预后和治疗提供了新的视角。通过靶向整个基因集,我们能够在不同水平上评估基因(或基因集)的预后和治疗价值,从基因水平到系统水平生物学。癌症特异性基因依赖性为精确定位癌症相关系统提供了额外的背景,药物诱导的改变可以突出潜在的治疗靶点。这些已确定的蛋白质群落,与其他具有强突变和生存负担的蛋白质群落一起,可能被用作乳腺癌的临床生物标志物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A graph neural network approach for hierarchical mapping of breast cancer protein communities.

Background: Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.

Results: Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.

Conclusion: The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
期刊最新文献
BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse. CellMAP: an open-source software tool to batch-process cell topography and stiffness maps collected with an atomic force microscope. Accurate assembly of full-length consensus for viral quasispecies. Flexible analysis of spatial transcriptomics data (FAST): a deconvolution approach. Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1