首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
miCGR: interpretable deep neural network for predicting both site-level and gene-level functional targets of microRNA. miCGR:用于预测 microRNA 位点级和基因级功能靶点的可解释深度神经网络。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae616
Xiaolong Wu, Lehan Zhang, Xiaochu Tong, Yitian Wang, Zimei Zhang, Xiangtai Kong, Shengkun Ni, Xiaomin Luo, Mingyue Zheng, Yun Tang, Xutong Li

MicroRNAs (miRNAs) are critical regulators in various biological processes to cleave or repress translation of messenger RNAs (mRNAs). Accurately predicting miRNA targets is essential for developing miRNA-based therapies for diseases such as cancer and cardiovascular disease. Traditional miRNA target prediction methods often struggle due to incomplete knowledge of miRNA-target interactions and lack interpretability. To address these limitations, we propose miCGR, an end-to-end deep learning framework for predicting functional miRNA targets. MiCGR employs 2D convolutional neural networks alongside an enhanced Chaos Game Representation (CGR) of both miRNA sequences and their candidate target site (CTS) on mRNA. This advanced CGR transforms genetic sequences into informative 2D graphical representations based on sequence composition and subsequence frequencies, and explicitly incorporates important prior knowledge of seed regions and subsequence positions. Unlike one-dimensional methods based solely on sequence characters, this approach identifies functional motifs within sequences, even if they are distant in the original sequences. Our model outperforms existing methods in predicting functional targets at both the site and gene levels. To enhance interpretability, we incorporate Shapley value analysis for each subsequence within both miRNA sequences and their target sites, allowing miCGR to achieve improved accuracy, particularly with more lenient CTS selection criteria. Finally, two case studies demonstrate the practical applicability of miCGR, highlighting its potential to provide insights for optimizing artificial miRNA analogs that surpass endogenous counterparts.

微RNA(miRNA)是各种生物过程中的关键调控因子,可裂解或抑制信使RNA(mRNA)的翻译。准确预测 miRNA 靶点对于开发基于 miRNA 的疗法治疗癌症和心血管疾病等疾病至关重要。传统的 miRNA 靶点预测方法往往由于对 miRNA 与靶点相互作用的了解不全面而难以实现,而且缺乏可解释性。为了解决这些局限性,我们提出了 miCGR,这是一种用于预测功能性 miRNA 靶点的端到端深度学习框架。MiCGR 采用了二维卷积神经网络,以及 miRNA 序列及其 mRNA 上候选靶点(CTS)的增强型混沌博弈表示(CGR)。这种先进的混沌博弈表示法根据序列组成和子序列频率将基因序列转换为信息丰富的二维图形表示法,并明确纳入了种子区域和子序列位置的重要先验知识。与仅基于序列特征的一维方法不同,这种方法能识别序列中的功能主题,即使它们在原始序列中距离很远。在预测位点和基因水平的功能目标方面,我们的模型优于现有方法。为了提高可解释性,我们对 miRNA 序列及其靶位点中的每个子序列都进行了 Shapley 值分析,从而提高了 miCGR 的准确性,尤其是在采用更宽松的 CTS 选择标准时。最后,两个案例研究证明了 miCGR 的实际应用性,突出了它为优化人工 miRNA 类似物提供洞察力的潜力,这些人工 miRNA 类似物超越了内源性类似物。
{"title":"miCGR: interpretable deep neural network for predicting both site-level and gene-level functional targets of microRNA.","authors":"Xiaolong Wu, Lehan Zhang, Xiaochu Tong, Yitian Wang, Zimei Zhang, Xiangtai Kong, Shengkun Ni, Xiaomin Luo, Mingyue Zheng, Yun Tang, Xutong Li","doi":"10.1093/bib/bbae616","DOIUrl":"10.1093/bib/bbae616","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are critical regulators in various biological processes to cleave or repress translation of messenger RNAs (mRNAs). Accurately predicting miRNA targets is essential for developing miRNA-based therapies for diseases such as cancer and cardiovascular disease. Traditional miRNA target prediction methods often struggle due to incomplete knowledge of miRNA-target interactions and lack interpretability. To address these limitations, we propose miCGR, an end-to-end deep learning framework for predicting functional miRNA targets. MiCGR employs 2D convolutional neural networks alongside an enhanced Chaos Game Representation (CGR) of both miRNA sequences and their candidate target site (CTS) on mRNA. This advanced CGR transforms genetic sequences into informative 2D graphical representations based on sequence composition and subsequence frequencies, and explicitly incorporates important prior knowledge of seed regions and subsequence positions. Unlike one-dimensional methods based solely on sequence characters, this approach identifies functional motifs within sequences, even if they are distant in the original sequences. Our model outperforms existing methods in predicting functional targets at both the site and gene levels. To enhance interpretability, we incorporate Shapley value analysis for each subsequence within both miRNA sequences and their target sites, allowing miCGR to achieve improved accuracy, particularly with more lenient CTS selection criteria. Finally, two case studies demonstrate the practical applicability of miCGR, highlighting its potential to provide insights for optimizing artificial miRNA analogs that surpass endogenous counterparts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11596087/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142726304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformer-based deep learning survival prediction model and an explainable XGBoost anti-PD-1/PD-L1 outcome prediction model based on the cGAS-STING-centered pathways in hepatocellular carcinoma. 基于transformer的深度学习生存预测模型和基于cgas - sting中心通路的可解释的XGBoost抗pd -1/PD-L1预后预测模型
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae686
Ren Wang, Qiumei Liu, Wenhua You, Huiyu Wang, Yun Chen

Recent studies suggest cGAS-STING pathway may play a crucial role in the genesis and development of hepatocellular carcinoma (HCC), closely associated with classical pathways and tumor immunity. We aimed to develop models predicting survival and anti-PD-1/PD-L1 outcomes centered on the cGAS-STING pathway in HCC. We identified classical pathways highly correlated with cGAS-STING pathway and constructed transformer survival model preserving raw structure of pathways. We also developed explainable XGBoost model for predicting anti-PD-1/PD-L1 outcomes using SHAP algorithm. We trained and validated transformer survival model on pan-cancer cohort and tested it on three independent HCC cohorts. Using 0.5 as threshold across cohorts, we divided each HCC cohort into two groups and calculated P values with log-rank test. TCGA-LIHC: C-index = 0.750, P = 1.52e-11; ICGC-LIRI-JP: C-index = 0.741, P = .00138; GSE144269: C-index = 0.647, P = .0233. We trained and validated [area under the receiver operating characteristic curve (AUC) = 0.777] XGBoost model on immunotherapy datasets and tested it on GSE78220 (AUC = 0.789); we also tested XGBoost model on HCC anti-PD-L1 cohort (AUC = 0.719). Our deep learning model and XGBoost model demonstrate potential in predicting survival risks and anti-PD-1/PD-L1 outcomes in HCC. We deployed these two prediction models to the GitHub repository and provided detailed instructions for their usage: deep learning survival model, https://github.com/mlwalker123/CSP_survival_model; XGBoost immunotherapy model, https://github.com/mlwalker123/CSP_immunotherapy_model.

最近的研究表明,cGAS-STING通路可能在肝细胞癌(HCC)的发生和发展中起着至关重要的作用,与经典通路和肿瘤免疫密切相关。我们的目的是建立以 cGAS-STING 通路为中心的 HCC 生存和抗 PD-1/PD-L1 结果预测模型。我们确定了与 cGAS-STING 通路高度相关的经典通路,并构建了保留通路原始结构的转化生存模型。我们还利用 SHAP 算法开发了可解释的 XGBoost 模型,用于预测抗 PD-1/PD-L1 的结果。我们在泛癌症队列中训练并验证了变压器生存模型,并在三个独立的 HCC 队列中进行了测试。以 0.5 作为各队列的阈值,我们将每个 HCC 队列分为两组,并通过对数秩检验计算 P 值。TCGA-LIHC: C-index = 0.750, P = 1.52e-11;ICGC-LIRI-JP: C-index = 0.741, P = .00138;GSE144269:C-指数 = 0.647,P = .0233。我们在免疫疗法数据集上训练并验证了 XGBoost 模型[接收者操作特征曲线下面积(AUC)= 0.777],并在 GSE78220 上进行了测试(AUC = 0.789);我们还在 HCC 抗 PD-L1 队列上测试了 XGBoost 模型(AUC = 0.719)。我们的深度学习模型和 XGBoost 模型在预测 HCC 的生存风险和抗 PD-1/PD-L1 结局方面展现出了潜力。我们将这两个预测模型部署到了 GitHub 存储库中,并提供了详细的使用说明:深度学习生存模型,https://github.com/mlwalker123/CSP_survival_model;XGBoost 免疫疗法模型,https://github.com/mlwalker123/CSP_immunotherapy_model。
{"title":"A transformer-based deep learning survival prediction model and an explainable XGBoost anti-PD-1/PD-L1 outcome prediction model based on the cGAS-STING-centered pathways in hepatocellular carcinoma.","authors":"Ren Wang, Qiumei Liu, Wenhua You, Huiyu Wang, Yun Chen","doi":"10.1093/bib/bbae686","DOIUrl":"10.1093/bib/bbae686","url":null,"abstract":"<p><p>Recent studies suggest cGAS-STING pathway may play a crucial role in the genesis and development of hepatocellular carcinoma (HCC), closely associated with classical pathways and tumor immunity. We aimed to develop models predicting survival and anti-PD-1/PD-L1 outcomes centered on the cGAS-STING pathway in HCC. We identified classical pathways highly correlated with cGAS-STING pathway and constructed transformer survival model preserving raw structure of pathways. We also developed explainable XGBoost model for predicting anti-PD-1/PD-L1 outcomes using SHAP algorithm. We trained and validated transformer survival model on pan-cancer cohort and tested it on three independent HCC cohorts. Using 0.5 as threshold across cohorts, we divided each HCC cohort into two groups and calculated P values with log-rank test. TCGA-LIHC: C-index = 0.750, P = 1.52e-11; ICGC-LIRI-JP: C-index = 0.741, P = .00138; GSE144269: C-index = 0.647, P = .0233. We trained and validated [area under the receiver operating characteristic curve (AUC) = 0.777] XGBoost model on immunotherapy datasets and tested it on GSE78220 (AUC = 0.789); we also tested XGBoost model on HCC anti-PD-L1 cohort (AUC = 0.719). Our deep learning model and XGBoost model demonstrate potential in predicting survival risks and anti-PD-1/PD-L1 outcomes in HCC. We deployed these two prediction models to the GitHub repository and provided detailed instructions for their usage: deep learning survival model, https://github.com/mlwalker123/CSP_survival_model; XGBoost immunotherapy model, https://github.com/mlwalker123/CSP_immunotherapy_model.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ET-PROTACs: modeling ternary complex interactions using cross-modal learning and ternary attention for accurate PROTAC-induced degradation prediction. ET-PROTACs:使用跨模态学习和三元注意建模三元复杂相互作用,以准确预测protac诱导的退化。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae654
Lijun Cai, Guanyu Yue, Yifan Chen, Li Wang, Xiaojun Yao, Quan Zou, Xiangzheng Fu, Dongsheng Cao

Motivation: Accurately predicting the degradation capabilities of proteolysis-targeting chimeras (PROTACs) for given target proteins and E3 ligases is important for PROTAC design. The distinctive ternary structure of PROTACs presents a challenge to traditional drug-target interaction prediction methods, necessitating more innovative approaches. While current state-of-the-art (SOTA) methods using graph neural networks (GNNs) can discern the molecular structure of PROTACs and proteins, thus enabling the efficient prediction of PROTACs' degradation capabilities, they rely heavily on limited crystal structure data of the POI-PROTAC-E3 ternary complex. This reliance underutilizes rich PROTAC experimental data and neglects intricate interaction relationships within ternary complexes.

Results: In this study, we propose a model based on cross-modal strategy and ternary attention technology, ET-PROTACs, to predict the targeted degradation capabilities of PROTACs. Our model capitalizes on the strengths of cross-modal methods by using equivariant GNN graph neural networks to process the graph structure and spatial coordinates of PROTAC molecules concurrently while utilizing sequence-based methods to learn the protein sequence information. This integration of cross-modal information is cohesively harnessed and channeled into a ternary attention mechanism, specially tailored for the unique structure of PROTACs, enabling the congruent modeling of both PROTAC and protein modalities. Experimental results demonstrate that the ET-PROTACs model outperforms existing SOTA methods. Moreover, visualizing attention scores illuminates crucial residues and atoms pivotal in specific POI-PROTAC-E3 interactions, thus offering invaluable insights and guidance for future pharmaceutical research.

Availability and implementation: The codes of our model are available at https://github.com/GuanyuYue/ET-PROTACs.

动机:准确预测蛋白水解靶向嵌合体(PROTACs)对给定靶蛋白和E3连接酶的降解能力对PROTAC设计非常重要。PROTACs独特的三元结构对传统的药物-靶点相互作用预测方法提出了挑战,需要更多的创新方法。虽然目前使用图神经网络(gnn)的最先进(SOTA)方法可以识别PROTACs和蛋白质的分子结构,从而能够有效预测PROTACs的降解能力,但它们严重依赖于POI-PROTAC-E3三元配合物的有限晶体结构数据。这种依赖充分利用了丰富的PROTAC实验数据,忽略了三元配合物中复杂的相互作用关系。结果:在本研究中,我们提出了一个基于跨模态策略和三元注意技术的模型ET-PROTACs来预测PROTACs的目标降解能力。我们的模型利用跨模态方法的优势,利用等变GNN图神经网络同时处理PROTAC分子的图结构和空间坐标,同时利用基于序列的方法学习蛋白质序列信息。这种跨模态信息的整合被紧密地利用并引导到三元注意机制中,该机制专门为PROTAC的独特结构量身定制,使PROTAC和蛋白质模态的一致建模成为可能。实验结果表明,ET-PROTACs模型优于现有的SOTA方法。此外,可视化注意力分数阐明了特定POI-PROTAC-E3相互作用的关键残基和原子,从而为未来的药物研究提供了宝贵的见解和指导。可用性和实现:我们模型的代码可在https://github.com/GuanyuYue/ET-PROTACs上获得。
{"title":"ET-PROTACs: modeling ternary complex interactions using cross-modal learning and ternary attention for accurate PROTAC-induced degradation prediction.","authors":"Lijun Cai, Guanyu Yue, Yifan Chen, Li Wang, Xiaojun Yao, Quan Zou, Xiangzheng Fu, Dongsheng Cao","doi":"10.1093/bib/bbae654","DOIUrl":"10.1093/bib/bbae654","url":null,"abstract":"<p><strong>Motivation: </strong>Accurately predicting the degradation capabilities of proteolysis-targeting chimeras (PROTACs) for given target proteins and E3 ligases is important for PROTAC design. The distinctive ternary structure of PROTACs presents a challenge to traditional drug-target interaction prediction methods, necessitating more innovative approaches. While current state-of-the-art (SOTA) methods using graph neural networks (GNNs) can discern the molecular structure of PROTACs and proteins, thus enabling the efficient prediction of PROTACs' degradation capabilities, they rely heavily on limited crystal structure data of the POI-PROTAC-E3 ternary complex. This reliance underutilizes rich PROTAC experimental data and neglects intricate interaction relationships within ternary complexes.</p><p><strong>Results: </strong>In this study, we propose a model based on cross-modal strategy and ternary attention technology, ET-PROTACs, to predict the targeted degradation capabilities of PROTACs. Our model capitalizes on the strengths of cross-modal methods by using equivariant GNN graph neural networks to process the graph structure and spatial coordinates of PROTAC molecules concurrently while utilizing sequence-based methods to learn the protein sequence information. This integration of cross-modal information is cohesively harnessed and channeled into a ternary attention mechanism, specially tailored for the unique structure of PROTACs, enabling the congruent modeling of both PROTAC and protein modalities. Experimental results demonstrate that the ET-PROTACs model outperforms existing SOTA methods. Moreover, visualizing attention scores illuminates crucial residues and atoms pivotal in specific POI-PROTAC-E3 interactions, thus offering invaluable insights and guidance for future pharmaceutical research.</p><p><strong>Availability and implementation: </strong>The codes of our model are available at https://github.com/GuanyuYue/ET-PROTACs.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11713031/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142944791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal deep learning approaches for precision oncology: a comprehensive review. 精确肿瘤学的多模态深度学习方法:综合综述。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae699
Huan Yang, Minglei Yang, Jiani Chen, Guocong Yao, Quan Zou, Linpei Jia

The burgeoning accumulation of large-scale biomedical data in oncology, alongside significant strides in deep learning (DL) technologies, has established multimodal DL (MDL) as a cornerstone of precision oncology. This review provides an overview of MDL applications in this field, based on an extensive literature survey. In total, 651 articles published before September 2024 are included. We first outline publicly available multimodal datasets that support cancer research. Then, we discuss key DL training methods, data representation techniques, and fusion strategies for integrating multimodal data. The review also examines MDL applications in tumor segmentation, detection, diagnosis, prognosis, treatment selection, and therapy response monitoring. Finally, we critically assess the limitations of current approaches and propose directions for future research. By synthesizing current progress and identifying challenges, this review aims to guide future efforts in leveraging MDL to advance precision oncology.

肿瘤领域大规模生物医学数据的迅速积累,以及深度学习(DL)技术的重大进步,使多模态深度学习(MDL)成为精准肿瘤学的基石。本文在大量文献综述的基础上,对MDL在该领域的应用进行了综述。共收录了2024年9月之前发表的651篇文章。我们首先概述支持癌症研究的公开可用的多模态数据集。然后,我们讨论了关键的深度学习训练方法、数据表示技术以及集成多模态数据的融合策略。本综述还探讨了MDL在肿瘤分割、检测、诊断、预后、治疗选择和治疗反应监测等方面的应用。最后,我们批判性地评估了当前方法的局限性,并提出了未来研究的方向。通过综合目前的进展和识别挑战,本综述旨在指导利用MDL推进精准肿瘤学的未来努力。
{"title":"Multimodal deep learning approaches for precision oncology: a comprehensive review.","authors":"Huan Yang, Minglei Yang, Jiani Chen, Guocong Yao, Quan Zou, Linpei Jia","doi":"10.1093/bib/bbae699","DOIUrl":"https://doi.org/10.1093/bib/bbae699","url":null,"abstract":"<p><p>The burgeoning accumulation of large-scale biomedical data in oncology, alongside significant strides in deep learning (DL) technologies, has established multimodal DL (MDL) as a cornerstone of precision oncology. This review provides an overview of MDL applications in this field, based on an extensive literature survey. In total, 651 articles published before September 2024 are included. We first outline publicly available multimodal datasets that support cancer research. Then, we discuss key DL training methods, data representation techniques, and fusion strategies for integrating multimodal data. The review also examines MDL applications in tumor segmentation, detection, diagnosis, prognosis, treatment selection, and therapy response monitoring. Finally, we critically assess the limitations of current approaches and propose directions for future research. By synthesizing current progress and identifying challenges, this review aims to guide future efforts in leveraging MDL to advance precision oncology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RiceSNP-ABST: a deep learning approach to identify abiotic stress-associated single nucleotide polymorphisms in rice. rice - np - abst:一种深度学习方法,用于识别水稻非生物胁迫相关的单核苷酸多态性。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae702
Quan Lu, Jiajun Xu, Renyi Zhang, Hangcheng Liu, Meng Wang, Xiaoshuang Liu, Zhenyu Yue, Yujia Gao

Given the adverse effects faced by rice due to abiotic stresses, the precise and rapid identification of single nucleotide polymorphisms (SNPs) associated with abiotic stress traits (ABST-SNPs) in rice is crucial for developing resistant rice varieties. The scarcity of high-quality data related to abiotic stress in rice has hindered the development of computational models and constrained research efforts aimed at rice improvement and breeding. Genome-wide association studies provide a better statistical power to consider ABST-SNPs in rice. Meanwhile, deep learning methods have shown their capability in predicting disease- or phenotype-associated loci, but have primarily focused on human species. Therefore, developing predictive models for identifying ABST-SNPs in rice is both urgent and valuable. In this paper, a model called RiceSNP-ABST is proposed for predicting ABST-SNPs in rice. Firstly, six training datasets were generated using a novel strategy for negative sample construction. Secondly, four feature encoding methods were proposed based on DNA sequence fragments, followed by feature selection. Finally, convolutional neural networks with residual connections were used to determine whether the sequences contained rice ABST-SNPs. RiceSNP-ABST outperformed traditional machine learning and state-of-the-art methods on the benchmark dataset and demonstrated consistent generalization on an independent dataset and cross-species datasets. Notably, multi-granularity causal structure learning was employed to elucidate the relationships among DNA structural features, aiming to identify key genetic variants more effectively. The web-based tool for the RiceSNP-ABST can be accessed at http://rice-snp-abst.aielab.cc.

鉴于水稻在非生物胁迫下所面临的不利影响,准确、快速地鉴定与水稻非生物胁迫性状相关的单核苷酸多态性(ABST-SNPs)对于培育抗性水稻品种至关重要。与水稻非生物胁迫相关的高质量数据的缺乏阻碍了计算模型的发展,并限制了旨在水稻改良和育种的研究工作。全基因组关联研究为考虑水稻abst - snp提供了更好的统计能力。与此同时,深度学习方法已经显示出它们在预测疾病或表型相关基因位点方面的能力,但主要集中在人类物种上。因此,建立水稻ABST-SNPs的预测模型是迫切而有价值的。本文提出了水稻abst - snp预测模型rice - np - abst。首先,使用一种新的负样本构建策略生成6个训练数据集。其次,提出了四种基于DNA序列片段的特征编码方法,并进行了特征选择;最后,使用残差连接的卷积神经网络来确定序列是否含有水稻abst - snp。rice - np - abst在基准数据集上优于传统机器学习和最先进的方法,并在独立数据集和跨物种数据集上表现出一致的泛化。值得注意的是,采用多粒度因果结构学习来阐明DNA结构特征之间的关系,旨在更有效地识别关键遗传变异。rice- np-abst的网络工具可在http://rice-snp-abst.aielab.cc上访问。
{"title":"RiceSNP-ABST: a deep learning approach to identify abiotic stress-associated single nucleotide polymorphisms in rice.","authors":"Quan Lu, Jiajun Xu, Renyi Zhang, Hangcheng Liu, Meng Wang, Xiaoshuang Liu, Zhenyu Yue, Yujia Gao","doi":"10.1093/bib/bbae702","DOIUrl":"https://doi.org/10.1093/bib/bbae702","url":null,"abstract":"<p><p>Given the adverse effects faced by rice due to abiotic stresses, the precise and rapid identification of single nucleotide polymorphisms (SNPs) associated with abiotic stress traits (ABST-SNPs) in rice is crucial for developing resistant rice varieties. The scarcity of high-quality data related to abiotic stress in rice has hindered the development of computational models and constrained research efforts aimed at rice improvement and breeding. Genome-wide association studies provide a better statistical power to consider ABST-SNPs in rice. Meanwhile, deep learning methods have shown their capability in predicting disease- or phenotype-associated loci, but have primarily focused on human species. Therefore, developing predictive models for identifying ABST-SNPs in rice is both urgent and valuable. In this paper, a model called RiceSNP-ABST is proposed for predicting ABST-SNPs in rice. Firstly, six training datasets were generated using a novel strategy for negative sample construction. Secondly, four feature encoding methods were proposed based on DNA sequence fragments, followed by feature selection. Finally, convolutional neural networks with residual connections were used to determine whether the sequences contained rice ABST-SNPs. RiceSNP-ABST outperformed traditional machine learning and state-of-the-art methods on the benchmark dataset and demonstrated consistent generalization on an independent dataset and cross-species datasets. Notably, multi-granularity causal structure learning was employed to elucidate the relationships among DNA structural features, aiming to identify key genetic variants more effectively. The web-based tool for the RiceSNP-ABST can be accessed at http://rice-snp-abst.aielab.cc.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal multiobjective optimization with structural network control principles to optimize personalized drug targets for drug discovery of individual patients. 利用结构网络控制原理进行多模态多目标优化,优化个体化药物靶点,实现个体患者药物发现。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf007
Jing Liang, Zhuo Hu, Ying Bi, Han Cheng, Wei-Feng Guo

Structural network control principles provided novel and efficient clues for the optimization of personalized drug targets (PDTs) related to state transitions of individual patients. However, most existing methods focus on one subnetwork or module as drug targets through the identification of the minimal set of driver nodes and ignore the state transition capabilities of other modules with different configurations of drug targets [i.e. multimodal drug targets (MDTs)] embedding the knowledge of previous drug targets (i.e. multiobjective optimization). Therefore, a novel multimodal multiobjective evolutionary optimization framework (called MMONCP) is proposed to optimize PDTs with network control principles. The key points of MMONCP are that a constrained multimodal multiobjective optimization problem is formed with discrete constraints on the decision space and multimodality characteristics, and a novel evolutionary algorithm denoted as CMMOEA-GLS-WSCD is designed by combining a global and local search strategy and a weighting-based special crowding distance strategy to balance the diversity of both objective and decision space. The experimental results on three cancer genomics data from The Cancer Genome Atlas indicate that MMONCP achieves a higher performance including algorithm convergence and diversity, the fraction of identified MDTs, and the area under the curve score than advanced algorithms. Additionally, MMONCP can detect the early state from the difference between the target activity and toxicity of MDTs and provide early treatment options for cancer treatment in precision medicine.

结构网络控制原理为与个体患者状态转变相关的个体化药物靶点(PDTs)优化提供了新颖有效的线索。然而,现有的大多数方法都是通过识别最小驱动节点集来关注一个子网络或模块作为药物靶标,而忽略了其他具有不同药物靶标配置的模块[即多模态药物靶标(multimodal drug targets, MDTs)]的状态转移能力,这些模块嵌入了先前药物靶标的知识(即多目标优化)。为此,提出了一种基于网络控制原理的多模态多目标进化优化框架(MMONCP)。该算法的关键是在决策空间和多模态特征上形成一个具有离散约束的约束多模态多目标优化问题,并结合全局和局部搜索策略和基于权重的特殊拥挤距离策略设计了一种新的进化算法CMMOEA-GLS-WSCD,以平衡目标和决策空间的多样性。对来自The cancer Genome Atlas的三个癌症基因组数据的实验结果表明,MMONCP在算法的收敛性和多样性、被识别的mdt的比例、曲线下面积得分等方面都比先进的算法具有更高的性能。此外,MMONCP可以从MDTs的靶点活性和毒性差异中发现早期状态,为精准医学的癌症治疗提供早期治疗选择。
{"title":"Multimodal multiobjective optimization with structural network control principles to optimize personalized drug targets for drug discovery of individual patients.","authors":"Jing Liang, Zhuo Hu, Ying Bi, Han Cheng, Wei-Feng Guo","doi":"10.1093/bib/bbaf007","DOIUrl":"10.1093/bib/bbaf007","url":null,"abstract":"<p><p>Structural network control principles provided novel and efficient clues for the optimization of personalized drug targets (PDTs) related to state transitions of individual patients. However, most existing methods focus on one subnetwork or module as drug targets through the identification of the minimal set of driver nodes and ignore the state transition capabilities of other modules with different configurations of drug targets [i.e. multimodal drug targets (MDTs)] embedding the knowledge of previous drug targets (i.e. multiobjective optimization). Therefore, a novel multimodal multiobjective evolutionary optimization framework (called MMONCP) is proposed to optimize PDTs with network control principles. The key points of MMONCP are that a constrained multimodal multiobjective optimization problem is formed with discrete constraints on the decision space and multimodality characteristics, and a novel evolutionary algorithm denoted as CMMOEA-GLS-WSCD is designed by combining a global and local search strategy and a weighting-based special crowding distance strategy to balance the diversity of both objective and decision space. The experimental results on three cancer genomics data from The Cancer Genome Atlas indicate that MMONCP achieves a higher performance including algorithm convergence and diversity, the fraction of identified MDTs, and the area under the curve score than advanced algorithms. Additionally, MMONCP can detect the early state from the difference between the target activity and toxicity of MDTs and provide early treatment options for cancer treatment in precision medicine.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CapHLA: a comprehensive tool to predict peptide presentation and binding to HLA class I and class II. CapHLA:预测肽呈现和结合HLA I类和HLA II类的综合工具。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae595
Yunjian Chang, Ligang Wu

Human leukocyte antigen class I (HLA-I) and class II (HLA-II) proteins play an essential role in epitope binding and presentation to initiate an immune response. Accurate prediction of peptide-HLA (pHLA) binding and presentation is critical for developing effective immunotherapies. However, current tools can predict antigens exclusively for pHLA-I or pHLA-II, but not both; have constraints on peptide length; and commonly show unsatisfactory predictive accuracy. Here, we developed a convolution and attention-based model, CapHLA, trained with eluted ligand and binding affinity mass spectrometry data, to predict peptide presentation probability (PB) and binding affinities (BA) for HLA-I and HLA-II. In comparison with 11 other methods, CapHLA consistently showed improved performance in predicting pHLA BA and PB, particularly in HLA-II and non-classical peptide length datasets. Using CapHLA PB and BA predictions in combination with antigen expression level (EP) from transcriptomic data, we developed a neoantigen quality model for predicting immunotherapy response. In analyses of clinical response among 276 cancer patients given immunotherapy and overall survival in 7228 cancer patients, our neoantigen quality model outperformed other genetics-based models in predicting response to checkpoint inhibitors and patient prognosis. This study provides a versatile neoantigen screening tool, illustrating the prognostic value of neoantigen quality.

人类白细胞抗原I类(HLA-I)和II类(HLA-II)蛋白在表位结合和呈递中发挥重要作用,从而启动免疫应答。准确预测肽- hla (pHLA)结合和呈递对于开发有效的免疫疗法至关重要。然而,目前的工具只能预测phla - 1或pHLA-II的抗原,但不能同时预测两者;对肽长度有限制;并且通常表现出令人不满意的预测准确性。在这里,我们开发了一个基于卷积和注意力的模型CapHLA,使用洗脱配体和结合亲和质谱数据进行训练,以预测HLA-I和HLA-II的肽呈现概率(PB)和结合亲和度(BA)。与其他11种方法相比,CapHLA在预测pHLA BA和PB方面始终表现出更好的性能,特别是在HLA-II和非经典肽长度数据集中。利用CapHLA PB和BA预测结合转录组学数据的抗原表达水平(EP),我们建立了一个预测免疫治疗反应的新抗原质量模型。在对276名接受免疫治疗的癌症患者的临床反应和7228名癌症患者的总生存期的分析中,我们的新抗原质量模型在预测对检查点抑制剂的反应和患者预后方面优于其他基于遗传学的模型。这项研究提供了一种多功能的新抗原筛选工具,说明了新抗原质量的预后价值。
{"title":"CapHLA: a comprehensive tool to predict peptide presentation and binding to HLA class I and class II.","authors":"Yunjian Chang, Ligang Wu","doi":"10.1093/bib/bbae595","DOIUrl":"10.1093/bib/bbae595","url":null,"abstract":"<p><p>Human leukocyte antigen class I (HLA-I) and class II (HLA-II) proteins play an essential role in epitope binding and presentation to initiate an immune response. Accurate prediction of peptide-HLA (pHLA) binding and presentation is critical for developing effective immunotherapies. However, current tools can predict antigens exclusively for pHLA-I or pHLA-II, but not both; have constraints on peptide length; and commonly show unsatisfactory predictive accuracy. Here, we developed a convolution and attention-based model, CapHLA, trained with eluted ligand and binding affinity mass spectrometry data, to predict peptide presentation probability (PB) and binding affinities (BA) for HLA-I and HLA-II. In comparison with 11 other methods, CapHLA consistently showed improved performance in predicting pHLA BA and PB, particularly in HLA-II and non-classical peptide length datasets. Using CapHLA PB and BA predictions in combination with antigen expression level (EP) from transcriptomic data, we developed a neoantigen quality model for predicting immunotherapy response. In analyses of clinical response among 276 cancer patients given immunotherapy and overall survival in 7228 cancer patients, our neoantigen quality model outperformed other genetics-based models in predicting response to checkpoint inhibitors and patient prognosis. This study provides a versatile neoantigen screening tool, illustrating the prognostic value of neoantigen quality.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11650860/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142833834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STRsensor: a computationally efficient method for STR allele-typing from massively parallel sequencing data. STR传感器:一种从大量并行测序数据中进行STR等位基因分型的高效计算方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae637
Xiaolong Zhang, Xianchao Ji, Lingxiang Wang, Lianjiang Chi, Chengtao Li, Shaoqing Wen, Hua Chen

Short tandem repeats (STRs) represent one of the most polymorphic variations in the human genome, finding extensive applications in forensics, population genetics and medical genetics. In contrast to the traditional capillary electrophoresis (CE) method, genotyping STRs using massive parallel sequencing technology offers enhanced sensitivity and accuracy. However, current methods are mainly designed for target sequencing with higher coverage for a specific STR locus, thereby constraining the utility of STRs in low- and medium-coverage whole genome sequencing (WGS) data. Here, we introduce STRsensor, a method designed to type STR alleles in low-coverage WGS data and target sequencing data, achieving a significant high detection ratio and accuracy. STRsensor employs two methods for STR allele-typing: the Kmers-based method and the CIGAR-based method. Furthermore, by incorporating a model for PCR stutters, STRsensor greatly enhances the accuracy of STR allele typing. With simulation data, we demonstrate that STRsensor achieves a detection ratio of 100$%$ and an accuracy of 99.37$%$ for a 30$times $ WGS data, outperforming the existing methods, such as STRait Razor, STRinNGS, and HipSTR. When applied to real target sequencing data from 687 individuals, STRsensor achieves a detection ratio of 99.64$%$ and an accuracy of 99.99$%$. Moreover, STRsensor is a computationally efficient method that runs 79 times faster than HipSTR and 10 000 times faster than STRinNGS. STRsensor is freely available on GitHub: https://github.com/ChenHuaLab/STRsensor.

短串联重复序列(STRs)是人类基因组中最具多态性的变异之一,在法医学、群体遗传学和医学遗传学中有着广泛的应用。与传统的毛细管电泳(CE)方法相比,使用大规模平行测序技术进行STRs基因分型具有更高的灵敏度和准确性。然而,目前的方法主要是针对特定STR位点覆盖率较高的目标测序而设计的,从而限制了STR在低覆盖率和中等覆盖率全基因组测序(WGS)数据中的应用。在这里,我们引入了STRsensor,一种在低覆盖率WGS数据和目标测序数据中对STR等位基因进行分型的方法,取得了显著的高检出率和准确率。STRsensor采用两种方法进行STR等位基因分型:基于kmers的方法和基于cigar的方法。此外,通过结合PCR口吃模型,STRsensor大大提高了STR等位基因分型的准确性。通过仿真数据,我们证明了STRsensor对于30$times $ WGS数据的检测率为100$%$,准确率为99.37$%$,优于现有的方法,如STRait Razor, stringgs和HipSTR。应用于687个个体的真实目标测序数据时,STRsensor的检测率为99.64$%$,准确率为99.99$%$。此外,STRsensor是一种计算效率高的方法,比HipSTR快79倍,比strings快10000倍。STRsensor在GitHub上免费提供:https://github.com/ChenHuaLab/STRsensor。
{"title":"STRsensor: a computationally efficient method for STR allele-typing from massively parallel sequencing data.","authors":"Xiaolong Zhang, Xianchao Ji, Lingxiang Wang, Lianjiang Chi, Chengtao Li, Shaoqing Wen, Hua Chen","doi":"10.1093/bib/bbae637","DOIUrl":"10.1093/bib/bbae637","url":null,"abstract":"<p><p>Short tandem repeats (STRs) represent one of the most polymorphic variations in the human genome, finding extensive applications in forensics, population genetics and medical genetics. In contrast to the traditional capillary electrophoresis (CE) method, genotyping STRs using massive parallel sequencing technology offers enhanced sensitivity and accuracy. However, current methods are mainly designed for target sequencing with higher coverage for a specific STR locus, thereby constraining the utility of STRs in low- and medium-coverage whole genome sequencing (WGS) data. Here, we introduce STRsensor, a method designed to type STR alleles in low-coverage WGS data and target sequencing data, achieving a significant high detection ratio and accuracy. STRsensor employs two methods for STR allele-typing: the Kmers-based method and the CIGAR-based method. Furthermore, by incorporating a model for PCR stutters, STRsensor greatly enhances the accuracy of STR allele typing. With simulation data, we demonstrate that STRsensor achieves a detection ratio of 100$%$ and an accuracy of 99.37$%$ for a 30$times $ WGS data, outperforming the existing methods, such as STRait Razor, STRinNGS, and HipSTR. When applied to real target sequencing data from 687 individuals, STRsensor achieves a detection ratio of 99.64$%$ and an accuracy of 99.99$%$. Moreover, STRsensor is a computationally efficient method that runs 79 times faster than HipSTR and 10 000 times faster than STRinNGS. STRsensor is freely available on GitHub: https://github.com/ChenHuaLab/STRsensor.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11635639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142812002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Techniques for learning and transferring knowledge for microbiome-based classification and prediction: review and assessment. 基于微生物组的分类和预测的知识学习和转移技术:回顾和评估。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf015
Jin Han, Haohong Zhang, Kang Ning

The volume of microbiome data is growing at an exponential rate, and the current methodologies for big data mining are encountering substantial obstacles. Effectively managing and extracting valuable insights from these vast microbiome datasets has emerged as a significant challenge in the field of contemporary microbiome research. This comprehensive review delves into the utilization of foundation models and transfer learning techniques within the context of microbiome-based classification and prediction tasks, advocating for a transition away from traditional task-specific or scenario-specific models towards more adaptable, continuous learning models. The article underscores the practicality and benefits of initially constructing a robust foundation model, which can then be fine-tuned using transfer learning to tackle specific context tasks. In real-world scenarios, the application of transfer learning empowers models to leverage disease-related data from one geographical area and enhance diagnostic precision in different regions. This transition from relying on "good models" to embracing "adaptive models" resonates with the philosophy of "teaching a man to fish" thereby paving the way for advancements in personalized medicine and accurate diagnosis. Empirical research suggests that the integration of foundation models with transfer learning methodologies substantially boosts the performance of models when dealing with large-scale and diverse microbiome datasets, effectively mitigating the challenges posed by data heterogeneity.

微生物组数据量正以指数级速度增长,目前的大数据挖掘方法遇到了很大的障碍。从这些庞大的微生物组数据集中有效地管理和提取有价值的见解已经成为当代微生物组研究领域的一个重大挑战。这篇综合综述深入研究了基于微生物组的分类和预测任务背景下基础模型和迁移学习技术的使用,倡导从传统的特定任务或特定场景模型向更具适应性的持续学习模型过渡。本文强调了最初构建一个健壮的基础模型的实用性和好处,然后可以使用迁移学习对其进行微调,以处理特定的上下文任务。在现实场景中,迁移学习的应用使模型能够利用来自一个地理区域的疾病相关数据,并提高不同区域的诊断精度。这种从依赖“好模式”到接受“适应性模式”的转变与“授人以渔”的理念相呼应,从而为个性化医疗和准确诊断的进步铺平了道路。实证研究表明,基础模型与迁移学习方法的整合大大提高了模型在处理大规模和多样化微生物组数据集时的性能,有效缓解了数据异质性带来的挑战。
{"title":"Techniques for learning and transferring knowledge for microbiome-based classification and prediction: review and assessment.","authors":"Jin Han, Haohong Zhang, Kang Ning","doi":"10.1093/bib/bbaf015","DOIUrl":"10.1093/bib/bbaf015","url":null,"abstract":"<p><p>The volume of microbiome data is growing at an exponential rate, and the current methodologies for big data mining are encountering substantial obstacles. Effectively managing and extracting valuable insights from these vast microbiome datasets has emerged as a significant challenge in the field of contemporary microbiome research. This comprehensive review delves into the utilization of foundation models and transfer learning techniques within the context of microbiome-based classification and prediction tasks, advocating for a transition away from traditional task-specific or scenario-specific models towards more adaptable, continuous learning models. The article underscores the practicality and benefits of initially constructing a robust foundation model, which can then be fine-tuned using transfer learning to tackle specific context tasks. In real-world scenarios, the application of transfer learning empowers models to leverage disease-related data from one geographical area and enhance diagnostic precision in different regions. This transition from relying on \"good models\" to embracing \"adaptive models\" resonates with the philosophy of \"teaching a man to fish\" thereby paving the way for advancements in personalized medicine and accurate diagnosis. Empirical research suggests that the integration of foundation models with transfer learning methodologies substantially boosts the performance of models when dealing with large-scale and diverse microbiome datasets, effectively mitigating the challenges posed by data heterogeneity.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precise identification of somatic and germline variants in the absence of matched normal samples. 在没有匹配正常样本的情况下精确识别体细胞和种系变异。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae677
Hui Li, Lu Meng, Hongke Wang, Liang Cui, Heyu Sheng, Peiyan Zhao, Shuo Hong, Xinhua Du, Shi Yan, Yun Xing, Shicheng Feng, Yan Zhang, Huan Fang, Jing Bai, Yan Liu, Shaowei Lan, Tao Liu, Yanfang Guan, Xuefeng Xia, Xin Yi, Ying Cheng

Somatic variants play a crucial role in the occurrence and progression of cancer. However, in the absence of matched normal controls, distinguishing between germline and somatic variants becomes challenging in tumor samples. The existing tumor-only genomic analysis methods either suffer from limited performance or insufficient interpretability due to an excess of features. Therefore, there is an urgent need for an alternative approach that can address these issues and have practical implications. Here, we presented OncoTOP, a computational method for genomic analysis without matched normal samples, which can accurately distinguish somatic mutations from germline variants. Reference sample analysis revealed a 0% false positive rate and 99.7% reproducibility for variant calling. Assessing 2864 tumor samples across 18 cancer types yielded a 99.8% overall positive percent agreement and a 99.9% positive predictive value. OncoTOP can also accurately detect clinically actionable variants and subclonal mutations associated with drug resistance. For the prediction of mutation origins, the positive percent agreement stood at 97.4% for predicting somatic mutations and 95.7% for germline mutations. High consistency of tumor mutational burden (TMB) was observed between the results generated by OncoTOP and tumor-normal paired analysis. In a cohort of 97 lung cancer patients treated with immunotherapy, TMB-high patients had prolonged PFS (P = .02), proving the reliability of our approach in estimating TMB to predict therapy response. Furthermore, microsatellite instability status showed a strong concordance (97%) with polymerase chain reaction results, and leukocyte antigens class I subtypes and homozygosity achieved an impressive concordance rate of 99.3% and 99.9% respectively, compared to its tumor-normal paired analysis. Thus, OncoTOP exhibited high reliability in variant calling, mutation origin prediction, and biomarker estimation. Its application will promise substantial advantages for clinical genomic testing.

体细胞变异在癌症的发生和发展中起着至关重要的作用。然而,在没有匹配的正常对照的情况下,在肿瘤样本中区分种系和体细胞变异变得具有挑战性。现有的肿瘤基因组分析方法由于特征过多,要么性能有限,要么可解释性不足。因此,迫切需要一种能够解决这些问题并具有实际意义的替代方法。在这里,我们提出了OncoTOP,一种不匹配正常样本的基因组分析计算方法,可以准确区分体细胞突变和种系变异。参考样本分析显示,变异召唤的假阳性率为0%,重现性为99.7%。对18种癌症类型的2864个肿瘤样本进行评估,得出99.8%的总体阳性一致性和99.9%的阳性预测值。OncoTOP还可以准确地检测出临床可操作的变异和与耐药性相关的亚克隆突变。对于突变起源的预测,预测体细胞突变的正确率为97.4%,预测种系突变的正确率为95.7%。肿瘤突变负荷(tumor mutational burden, TMB)在OncoTOP和肿瘤-正常配对分析结果之间具有很高的一致性。在一组97例接受免疫治疗的肺癌患者中,TMB高患者的PFS延长(P = 0.02),证明了我们估计TMB预测治疗反应的方法的可靠性。此外,微卫星不稳定状态与聚合酶链反应结果显示出很强的一致性(97%),白细胞抗原I类亚型和纯合性与肿瘤-正常配对分析相比,分别达到99.3%和99.9%的惊人一致性。因此,OncoTOP在变异召唤、突变起源预测和生物标志物估计方面表现出很高的可靠性。它的应用将为临床基因组检测带来巨大的优势。
{"title":"Precise identification of somatic and germline variants in the absence of matched normal samples.","authors":"Hui Li, Lu Meng, Hongke Wang, Liang Cui, Heyu Sheng, Peiyan Zhao, Shuo Hong, Xinhua Du, Shi Yan, Yun Xing, Shicheng Feng, Yan Zhang, Huan Fang, Jing Bai, Yan Liu, Shaowei Lan, Tao Liu, Yanfang Guan, Xuefeng Xia, Xin Yi, Ying Cheng","doi":"10.1093/bib/bbae677","DOIUrl":"10.1093/bib/bbae677","url":null,"abstract":"<p><p>Somatic variants play a crucial role in the occurrence and progression of cancer. However, in the absence of matched normal controls, distinguishing between germline and somatic variants becomes challenging in tumor samples. The existing tumor-only genomic analysis methods either suffer from limited performance or insufficient interpretability due to an excess of features. Therefore, there is an urgent need for an alternative approach that can address these issues and have practical implications. Here, we presented OncoTOP, a computational method for genomic analysis without matched normal samples, which can accurately distinguish somatic mutations from germline variants. Reference sample analysis revealed a 0% false positive rate and 99.7% reproducibility for variant calling. Assessing 2864 tumor samples across 18 cancer types yielded a 99.8% overall positive percent agreement and a 99.9% positive predictive value. OncoTOP can also accurately detect clinically actionable variants and subclonal mutations associated with drug resistance. For the prediction of mutation origins, the positive percent agreement stood at 97.4% for predicting somatic mutations and 95.7% for germline mutations. High consistency of tumor mutational burden (TMB) was observed between the results generated by OncoTOP and tumor-normal paired analysis. In a cohort of 97 lung cancer patients treated with immunotherapy, TMB-high patients had prolonged PFS (P = .02), proving the reliability of our approach in estimating TMB to predict therapy response. Furthermore, microsatellite instability status showed a strong concordance (97%) with polymerase chain reaction results, and leukocyte antigens class I subtypes and homozygosity achieved an impressive concordance rate of 99.3% and 99.9% respectively, compared to its tumor-normal paired analysis. Thus, OncoTOP exhibited high reliability in variant calling, mutation origin prediction, and biomarker estimation. Its application will promise substantial advantages for clinical genomic testing.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1