Journal of Bioinformatics and Computational Biology最新文献

Mendelian randomization and AlphaFold3 analysis suggest putative causal plasma proteins in graves' disease. 孟德尔随机化和AlphaFold3分析提示格雷夫斯病的推定的因果血浆蛋白。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-12-01 DOI: 10.1142/S0219720025500234

Bin Deng, Zhanlin Liao, Liangzhi Huang, Ting Chen, Qiao Chen, Shuai Zhong, Zugui Huang

Graves' disease (GD) is a common autoimmune disorder. However, the circulating proteins that causally drive its pathogenesis remain largely unconfirmed by genetic evidence. Identifying such proteins is critical for developing novel therapeutics. Methods: We performed a two-sample Mendelian randomization (MR) analysis using genetic instruments for 4907 plasma proteins and summary statistics from a GD genome-wide association study (GWAS) of European ancestry to identify causal proteins. We subsequently conducted protein-protein interaction (PPI) network analysis and used AlphaFold3 to model the structural impact of a key variant, rs41271951, in the top candidate protein, cathepsin S (CTSS). Results: MR analysis identified 23 plasma proteins with putative causal effects on GD risk. Among these, CD5L showed the strongest evidence for colocalization (posterior [Formula: see text]), suggesting a shared causal variant. Network analysis revealed that these proteins converge on a novel complement-ECM-coagulation axis in GD pathogenesis. CTSS emerged as a central hub in this network. AlphaFold3 modeling suggested that the CTSS variant rs41271951 (p.Val7Ala), located within the signal peptide, induces subtle structural perturbations. The primary and most plausible consequence is a reduction in CTSS secretion and circulating levels, as supported by the pQTL data. Conclusion: This multi-omics analysis proposes a novel complement-ECM-coagulation axis in GD. By structurally and functionally linking reduced CTSS abundance and secretion to genetic variation, we identify CTSS as a potential candidate for therapeutic repurposing in GD.

Graves病（GD）是一种常见的自身免疫性疾病。然而，导致其发病机制的循环蛋白在很大程度上仍未得到遗传证据的证实。识别这些蛋白质对于开发新的治疗方法至关重要。方法：我们使用遗传仪器对4907种血浆蛋白进行了两样本孟德尔随机化（MR）分析，并汇总了来自欧洲血统GD全基因组关联研究（GWAS）的统计数据，以确定致病蛋白。随后，我们进行了蛋白-蛋白相互作用（PPI）网络分析，并使用AlphaFold3来模拟关键变体rs41271951对顶级候选蛋白组织蛋白酶S （CTSS）的结构影响。结果：磁共振分析确定了23种血浆蛋白，推测其与GD风险有因果关系。其中，CD5L显示了最有力的共定位证据（后验[公式：见文本]），表明存在共同的因果变异。网络分析显示，这些蛋白在GD发病机制中聚集在一个新的补体- ecm -凝固轴上。CTSS成为这一网络的中心枢纽。AlphaFold3模型表明，位于信号肽内的CTSS变体rs41271951 （p.Val7Ala）引起了微妙的结构扰动。pQTL数据支持的主要和最合理的结果是CTSS分泌和循环水平的降低。结论：该多组学分析提出了一种新的补体- ecm -凝血轴。通过在结构和功能上将CTSS丰度和分泌减少与遗传变异联系起来，我们确定CTSS是GD治疗重新利用的潜在候选者。

{"title":"Mendelian randomization and AlphaFold3 analysis suggest putative causal plasma proteins in graves' disease.","authors":"Bin Deng, Zhanlin Liao, Liangzhi Huang, Ting Chen, Qiao Chen, Shuai Zhong, Zugui Huang","doi":"10.1142/S0219720025500234","DOIUrl":"https://doi.org/10.1142/S0219720025500234","url":null,"abstract":"Graves' disease (GD) is a common autoimmune disorder. However, the circulating proteins that causally drive its pathogenesis remain largely unconfirmed by genetic evidence. Identifying such proteins is critical for developing novel therapeutics. Methods: We performed a two-sample Mendelian randomization (MR) analysis using genetic instruments for 4907 plasma proteins and summary statistics from a GD genome-wide association study (GWAS) of European ancestry to identify causal proteins. We subsequently conducted protein-protein interaction (PPI) network analysis and used AlphaFold3 to model the structural impact of a key variant, rs41271951, in the top candidate protein, cathepsin S (CTSS). Results: MR analysis identified 23 plasma proteins with putative causal effects on GD risk. Among these, CD5L showed the strongest evidence for colocalization (posterior [Formula: see text]), suggesting a shared causal variant. Network analysis revealed that these proteins converge on a novel complement-ECM-coagulation axis in GD pathogenesis. CTSS emerged as a central hub in this network. AlphaFold3 modeling suggested that the CTSS variant rs41271951 (p.Val7Ala), located within the signal peptide, induces subtle structural perturbations. The primary and most plausible consequence is a reduction in CTSS secretion and circulating levels, as supported by the pQTL data. Conclusion: This multi-omics analysis proposes a novel complement-ECM-coagulation axis in GD. By structurally and functionally linking reduced CTSS abundance and secretion to genetic variation, we identify CTSS as a potential candidate for therapeutic repurposing in GD.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 6","pages":"2550023"},"PeriodicalIF":0.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145769565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Classification of Drug-Drug interaction based on a complete graph convolutional neural network and explainable artificial intelligence. 基于全图卷积神经网络和可解释人工智能的药物-药物相互作用多分类。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-12-01 DOI: 10.1142/S0219720025500210

Samar Monem, Ashraf Darwish, Aboul Ella Hassanien, Heba M Afify

Multi-drug therapy has become more common in recent years, especially among older people who have many illnesses. However, patients are put at risk when unanticipated drug-drug interactions (DDIs) result in negative reactions or serious toxicity. Predicting possible DDI by computational model improves the drug design process and minimizes unexpected drug interactions and research expenses. In this paper, the proposed model is constructed by a complete graph convolutional neural (GCN) network on publicly DDI data from the DrugBank, including 65 classes for DDI prediction. In this data, the number of samples is 37,264 for two drugs with three optimal features, including the chemical, target, and enzyme. This multi-classification model consists of three phases, including drug preprocessing, three layers of GCN, and a fully connected network. The findings confirmed the performance of this proposed model, achieving an accuracy of 95.12%, which is the best result compared with previous works on the same data. Although the data was imbalanced, this paper primary contribution was to enhance both the computational time and the classification evaluation metrics rather than other state-of-the-art models. Explainable artificial intelligence (XAI) is applied SHapley Additive exPlainations (SHAP) to the proposed model to avoid misclassification and produce easily comprehensible results. This proposed model will help to explore the possible drug hazards and support intelligent pharmaceutical management.

近年来，多种药物治疗变得越来越普遍，尤其是在患有多种疾病的老年人中。然而，当意想不到的药物-药物相互作用（ddi）导致不良反应或严重毒性时，患者处于危险之中。通过计算模型预测可能的DDI改善了药物设计过程，并最大限度地减少了意外的药物相互作用和研究费用。本文采用全图卷积神经网络（GCN）在DrugBank公开的DDI数据上构建模型，包括65个用于DDI预测的类。在这个数据中，两种药物的样本数量为37264个，具有三个最优特征，包括化学、靶标和酶。该多分类模型包括药物预处理、三层GCN和全连接网络三个阶段。研究结果证实了该模型的性能，准确率达到95.12%，是以往相同数据下的最佳结果。虽然数据不平衡，但本文的主要贡献是提高了计算时间和分类评价指标，而不是其他最先进的模型。可解释人工智能（XAI）将SHapley加性解释（SHAP）应用于所提出的模型，以避免错误分类并产生易于理解的结果。该模型将有助于探索可能存在的药物危害，支持智能药品管理。

{"title":"Multi-Classification of Drug-Drug interaction based on a complete graph convolutional neural network and explainable artificial intelligence.","authors":"Samar Monem, Ashraf Darwish, Aboul Ella Hassanien, Heba M Afify","doi":"10.1142/S0219720025500210","DOIUrl":"https://doi.org/10.1142/S0219720025500210","url":null,"abstract":"Multi-drug therapy has become more common in recent years, especially among older people who have many illnesses. However, patients are put at risk when unanticipated drug-drug interactions (DDIs) result in negative reactions or serious toxicity. Predicting possible DDI by computational model improves the drug design process and minimizes unexpected drug interactions and research expenses. In this paper, the proposed model is constructed by a complete graph convolutional neural (GCN) network on publicly DDI data from the DrugBank, including 65 classes for DDI prediction. In this data, the number of samples is 37,264 for two drugs with three optimal features, including the chemical, target, and enzyme. This multi-classification model consists of three phases, including drug preprocessing, three layers of GCN, and a fully connected network. The findings confirmed the performance of this proposed model, achieving an accuracy of 95.12%, which is the best result compared with previous works on the same data. Although the data was imbalanced, this paper primary contribution was to enhance both the computational time and the classification evaluation metrics rather than other state-of-the-art models. Explainable artificial intelligence (XAI) is applied SHapley Additive exPlainations (SHAP) to the proposed model to avoid misclassification and produce easily comprehensible results. This proposed model will help to explore the possible drug hazards and support intelligent pharmaceutical management.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 6","pages":"2550021"},"PeriodicalIF":0.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145769592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PLMABFW: A deep learning framework for predicting Antibody-Antigen interactions using protein language model. PLMABFW：一个使用蛋白质语言模型预测抗体-抗原相互作用的深度学习框架。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-12-01 DOI: 10.1142/S0219720025500209

Yongbing Chen, Qianyi Jia, Xinyue Jia, Zhiguo Fu, Pingping Sun, Bo Li, Zilin Ren

The emergence of SARS-CoV-2 has highlighted the need for computational methods to identify neutralizing antibodies. Existing sequence-based tools for predicting antigen-antibody interactions struggle to effectively identify antibodies capable of neutralizing different variants due to high sequence similarity among SARS-CoV-2 strains and the similarity in the framework regions (FWRs) of antibodies. To address this challenge, particularly the issue of high sequence similarity among homologous antigens that impedes accurate prediction of antigen-antibody interactions, we developed a deep learning framework named PLMABFW. It differentiates homologous antigens using encoding techniques and network architecture design. It employs pre-trained protein language models ESM-2 for antigens and AntiBERTy for antibodies to encode sequences and capture additional features. The framework also incorporates both antigen features and their transposed versions to enhance antigen information capture. To validate the performance of PLMABFW, we collected a SARS-CoV-2 neutralization dataset. PLMABFW outperformed existing neutralizing antibody prediction tools (AbAgIntPre, DeepAAI) and docking tools (HDOCK, LSTM-PHV) in predicting neutralizing antibodies for homologous antigens. Furthermore, it effectively learned the interactions between the antibody's CDR-H3 region and antigens via a partial masking strategy. The model code is available on GitHub for customization and adaptation to diverse research needs.

SARS-CoV-2的出现突出表明，需要使用计算方法来识别中和抗体。现有基于序列的预测抗原-抗体相互作用的工具难以有效识别能够中和不同变体的抗体，这是由于SARS-CoV-2菌株之间的高度序列相似性和抗体框架区域（FWRs）的相似性。为了解决这一挑战，特别是同源抗原之间的高序列相似性阻碍了抗原-抗体相互作用的准确预测，我们开发了一个名为PLMABFW的深度学习框架。利用编码技术和网络结构设计来区分同源抗原。它采用预先训练的蛋白质语言模型ESM-2作为抗原，AntiBERTy作为抗体来编码序列并捕获额外的特征。该框架还包含抗原特征及其转置版本，以增强抗原信息捕获。为了验证PLMABFW的性能，我们收集了一个SARS-CoV-2中和数据集。在预测同源抗原的中和抗体方面，PLMABFW优于现有的中和抗体预测工具（AbAgIntPre, DeepAAI）和对接工具（HDOCK, LSTM-PHV）。此外，它通过部分掩蔽策略有效地学习了抗体CDR-H3区域与抗原之间的相互作用。模型代码可以在GitHub上获得，用于定制和适应不同的研究需求。

{"title":"PLMABFW: A deep learning framework for predicting Antibody-Antigen interactions using protein language model.","authors":"Yongbing Chen, Qianyi Jia, Xinyue Jia, Zhiguo Fu, Pingping Sun, Bo Li, Zilin Ren","doi":"10.1142/S0219720025500209","DOIUrl":"https://doi.org/10.1142/S0219720025500209","url":null,"abstract":"The emergence of SARS-CoV-2 has highlighted the need for computational methods to identify neutralizing antibodies. Existing sequence-based tools for predicting antigen-antibody interactions struggle to effectively identify antibodies capable of neutralizing different variants due to high sequence similarity among SARS-CoV-2 strains and the similarity in the framework regions (FWRs) of antibodies. To address this challenge, particularly the issue of high sequence similarity among homologous antigens that impedes accurate prediction of antigen-antibody interactions, we developed a deep learning framework named PLMABFW. It differentiates homologous antigens using encoding techniques and network architecture design. It employs pre-trained protein language models ESM-2 for antigens and AntiBERTy for antibodies to encode sequences and capture additional features. The framework also incorporates both antigen features and their transposed versions to enhance antigen information capture. To validate the performance of PLMABFW, we collected a SARS-CoV-2 neutralization dataset. PLMABFW outperformed existing neutralizing antibody prediction tools (AbAgIntPre, DeepAAI) and docking tools (HDOCK, LSTM-PHV) in predicting neutralizing antibodies for homologous antigens. Furthermore, it effectively learned the interactions between the antibody's CDR-H3 region and antigens via a partial masking strategy. The model code is available on GitHub for customization and adaptation to diverse research needs.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 6","pages":"2550020"},"PeriodicalIF":0.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145769584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Early lifespan prediction in Caenorhabditis elegans via contrastive learning and channel attention. 通过对比学习和通道注意预测秀丽隐杆线虫的早期寿命。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-12-01 Epub Date: 2025-12-06 DOI: 10.1142/S0219720025500167

Miaomiao Jin, Weiyang Chen, Yi Pan

Early lifespan prediction in Caenorhabditis elegans faces the challenges of indistinct discriminative signals, subtle and localized key features, difficulty in data annotation, and poor generalization. We propose Contrastive Learning-guided Channel Attention Modulation (CLCAM), in which supervised contrastive learning clusters individuals with the same lifespan and separates different classes. The resulting embedding drives channel-wise gains that are additively coupled to the backbone, thereby amplifying subtle morphological cues. At inference, the contrastive branch is removed, keeping FLOPs essentially unchanged with a modest runtime cost on our hardware. On a public dataset, CLCAM achieves an AUC-ROC of 0.84, showing a consistent improvement over the EfficientNet-B3 baseline (0.82) and a substantial gain over the prior WormNet model (0.61). Grad-CAM indicates attention focused on the pharynx and body-wall musculature, supporting the biological plausibility of the model's decisions. CLCAM offers a clear, low-overhead paradigm for early lifespan phenotyping. CLCAM code is available at https://github.com/JMM502/CLCAM/tree/master/clcam.

秀丽隐杆线虫（Caenorhabditis elegans）的早期寿命预测面临着判别信号不清晰、关键特征微妙且局部化、数据标注困难、泛化能力差等挑战。我们提出了对比学习引导的通道注意调制（CLCAM），其中监督对比学习聚类具有相同寿命的个体，并将不同的班级分开。所产生的嵌入驱动加性耦合到主干的信道增益，从而放大细微的形态线索。在推理中，对比分支被删除，使flop基本上保持不变，并且在硬件上的运行时成本适中。在一个公共数据集上，CLCAM实现了0.84的AUC-ROC，显示了对EfficientNet-B3基线（0.82）的一致改进，并且比先前的WormNet模型（0.61）有了实质性的提高。Grad-CAM表明注意力集中在咽和体壁肌肉组织上，支持模型决策的生物学合理性。CLCAM为早期寿命表型提供了一个清晰、低开销的范式。CLCAM代码可从https://github.com/JMM502/CLCAM/tree/master/clcam获得。

{"title":"Early lifespan prediction in Caenorhabditis elegans via contrastive learning and channel attention.","authors":"Miaomiao Jin, Weiyang Chen, Yi Pan","doi":"10.1142/S0219720025500167","DOIUrl":"10.1142/S0219720025500167","url":null,"abstract":"Early lifespan prediction in Caenorhabditis elegans faces the challenges of indistinct discriminative signals, subtle and localized key features, difficulty in data annotation, and poor generalization. We propose Contrastive Learning-guided Channel Attention Modulation (CLCAM), in which supervised contrastive learning clusters individuals with the same lifespan and separates different classes. The resulting embedding drives channel-wise gains that are additively coupled to the backbone, thereby amplifying subtle morphological cues. At inference, the contrastive branch is removed, keeping FLOPs essentially unchanged with a modest runtime cost on our hardware. On a public dataset, CLCAM achieves an AUC-ROC of 0.84, showing a consistent improvement over the EfficientNet-B3 baseline (0.82) and a substantial gain over the prior WormNet model (0.61). Grad-CAM indicates attention focused on the pharynx and body-wall musculature, supporting the biological plausibility of the model's decisions. CLCAM offers a clear, low-overhead paradigm for early lifespan phenotyping. CLCAM code is available at https://github.com/JMM502/CLCAM/tree/master/clcam.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2550016"},"PeriodicalIF":0.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Study of the mechanism of step-by-step interaction of viral proteins during replication and transcription. 病毒蛋白在复制和转录过程中逐步相互作用的机制研究。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-12-01 Epub Date: 2025-11-26 DOI: 10.1142/S0219720025500180

Tatiana V Koshlan, Kirill G Kulikov

This study investigates the thermodynamic behavior of molecular self-assembly along biochemical pathways leading to the formation of higher-order complexes. We specifically examine how thermodynamic parameters evolve - such as the dissociation constant [Formula: see text], the entropic contribution [Formula: see text], and the stability parameter of the interaction matrix [Formula: see text] - as molecular complexity increases from monomers to dimers, trimers, and tetramers. A central hypothesis is that stepwise thermodynamic modeling allows prediction of assembly pathways, identification of dead ends points, and the co-directional changes of thermodynamic variables during complex formation which reflect a preference for main biocomplex formation direction. We also introduce a practical rule to classify dead-end intermediates: a pathway step is considered a dead-end if the minimum [Formula: see text] occurs at a non-final intermediate or if [Formula: see text] falls below zero, indicating an entropic barrier. This criterion provides a reproducible way to flag non-viable assembly routes. We apply this analysis to several biologically relevant molecular systems, including the complex of LGP2 bound to an 8-base pair double-stranded RNA molecule, the dimer of VP35 protein interacting with double-stranded RNA and hexamer formations.

本研究探讨了分子自组装沿着生化途径导致高阶复合物形成的热力学行为。我们特别研究了热力学参数是如何随着分子复杂性从单体增加到二聚体、三聚体和四聚体而演变的——比如解离常数[公式：见文]、熵贡献[公式：见文]和相互作用矩阵的稳定性参数[公式：见文]。一个中心假设是，逐步热力学建模可以预测组装路径，识别死角，以及复杂形成过程中热力学变量的同向变化，这反映了对主要生物复合物形成方向的偏好。我们还引入了一个实用的规则来对死端中间体进行分类：如果最小值[公式：见文]出现在非最终中间体，或者[公式：见文]低于零，表明存在熵势垒，则路径步骤被认为是死端。该准则提供了一种可重复的方法来标记不可行的装配路线。我们将此分析应用于几个生物学相关的分子系统，包括LGP2与8碱基对双链RNA分子结合的复合物，VP35蛋白与双链RNA相互作用的二聚体和六聚体的形成。

{"title":"Study of the mechanism of step-by-step interaction of viral proteins during replication and transcription.","authors":"Tatiana V Koshlan, Kirill G Kulikov","doi":"10.1142/S0219720025500180","DOIUrl":"https://doi.org/10.1142/S0219720025500180","url":null,"abstract":"This study investigates the thermodynamic behavior of molecular self-assembly along biochemical pathways leading to the formation of higher-order complexes. We specifically examine how thermodynamic parameters evolve - such as the dissociation constant [Formula: see text], the entropic contribution [Formula: see text], and the stability parameter of the interaction matrix [Formula: see text] - as molecular complexity increases from monomers to dimers, trimers, and tetramers. A central hypothesis is that stepwise thermodynamic modeling allows prediction of assembly pathways, identification of dead ends points, and the co-directional changes of thermodynamic variables during complex formation which reflect a preference for main biocomplex formation direction. We also introduce a practical rule to classify dead-end intermediates: a pathway step is considered a dead-end if the minimum [Formula: see text] occurs at a non-final intermediate or if [Formula: see text] falls below zero, indicating an entropic barrier. This criterion provides a reproducible way to flag non-viable assembly routes. We apply this analysis to several biologically relevant molecular systems, including the complex of LGP2 bound to an 8-base pair double-stranded RNA molecule, the dimer of VP35 protein interacting with double-stranded RNA and hexamer formations.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 6","pages":"2550018"},"PeriodicalIF":0.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145769534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting ncRNA-Protein interactions with a graph attention model exploiting personalized subgraphs. 利用个性化子图的图注意模型预测ncrna -蛋白质相互作用。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-12-01 Epub Date: 2025-12-06 DOI: 10.1142/S0219720025500192

Fatemeh Khoushehgir, Zahra Noshad

Predicting interactions between ncRNAs and proteins is crucial for advancing our understanding of gene regulation, disease mechanisms, targeted drug design, and biomarker discovery, thereby driving innovation in research and therapeutic development. Numerous computational methods, particularly those employing machine learning and deep learning, have been proposed to address this challenge. Recent studies show that graph neural networks (GNNs) enhance ncRNA-protein interaction prediction accuracy by capturing intricate relationships and structural details in molecular data. However, current GNN approaches frequently rely on fixed-hop subgraphs for structural analysis, limiting their capacity to capture diverse interaction patterns fully. This fixed-hop approach may omit crucial nodes and edges outside the predefined neighborhood, potentially reducing prediction accuracy. To overcome this constraint, we introduce a novel method for ncRNA-protein interaction prediction by extracting the most informative subgraphs around each interaction using the personalized subgraph selection framework. These subgraphs are then utilized in a graph attention network (GAT) to learn node representations. K-mer frequencies are used to capture sequence-level features, while node2vec embeddings capture structural information, providing the GNN with a robust set of features. Experimental results on relevant datasets indicate a significant improvement in predicting ncRNA-protein interactions, with the algorithm maintaining an acceptable level of computational complexity even on large datasets. By integrating both sequence and structural insights through personalized subgraphs, this approach delivers a more accurate and scalable solution for predicting ncRNA-protein interactions.

预测ncRNAs与蛋白质之间的相互作用对于促进我们对基因调控、疾病机制、靶向药物设计和生物标志物发现的理解至关重要，从而推动研究和治疗开发的创新。已经提出了许多计算方法，特别是那些采用机器学习和深度学习的方法来解决这一挑战。最近的研究表明，图神经网络（GNNs）通过捕获分子数据中的复杂关系和结构细节来提高ncrna -蛋白质相互作用预测的准确性。然而，目前的GNN方法经常依赖于固定跳子图进行结构分析，限制了它们充分捕获各种相互作用模式的能力。这种固定跳方法可能会忽略预定义邻域之外的关键节点和边缘，从而潜在地降低预测精度。为了克服这一限制，我们引入了一种新的ncrna -蛋白质相互作用预测方法，即使用个性化子图选择框架提取每个相互作用周围信息最多的子图。然后在图注意网络（GAT）中利用这些子图来学习节点表示。K-mer频率用于捕获序列级特征，而node2vec嵌入捕获结构信息，为GNN提供一组鲁棒的特征。在相关数据集上的实验结果表明，该算法在预测ncrna -蛋白质相互作用方面有显著提高，即使在大型数据集上，该算法的计算复杂度也保持在可接受的水平。通过个性化子图整合序列和结构洞察，该方法为预测ncrna -蛋白质相互作用提供了更准确和可扩展的解决方案。

{"title":"Predicting ncRNA-Protein interactions with a graph attention model exploiting personalized subgraphs.","authors":"Fatemeh Khoushehgir, Zahra Noshad","doi":"10.1142/S0219720025500192","DOIUrl":"10.1142/S0219720025500192","url":null,"abstract":"Predicting interactions between ncRNAs and proteins is crucial for advancing our understanding of gene regulation, disease mechanisms, targeted drug design, and biomarker discovery, thereby driving innovation in research and therapeutic development. Numerous computational methods, particularly those employing machine learning and deep learning, have been proposed to address this challenge. Recent studies show that graph neural networks (GNNs) enhance ncRNA-protein interaction prediction accuracy by capturing intricate relationships and structural details in molecular data. However, current GNN approaches frequently rely on fixed-hop subgraphs for structural analysis, limiting their capacity to capture diverse interaction patterns fully. This fixed-hop approach may omit crucial nodes and edges outside the predefined neighborhood, potentially reducing prediction accuracy. To overcome this constraint, we introduce a novel method for ncRNA-protein interaction prediction by extracting the most informative subgraphs around each interaction using the personalized subgraph selection framework. These subgraphs are then utilized in a graph attention network (GAT) to learn node representations. K-mer frequencies are used to capture sequence-level features, while node2vec embeddings capture structural information, providing the GNN with a robust set of features. Experimental results on relevant datasets indicate a significant improvement in predicting ncRNA-protein interactions, with the algorithm maintaining an acceptable level of computational complexity even on large datasets. By integrating both sequence and structural insights through personalized subgraphs, this approach delivers a more accurate and scalable solution for predicting ncRNA-protein interactions.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2550019"},"PeriodicalIF":0.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The novel design of a multi-epitope vaccine candidate against the dengue virus using advanced immunoinformatics and structural analysis. 利用先进的免疫信息学和结构分析，设计了一种针对登革热病毒的多表位候选疫苗。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-10-07 DOI: 10.1142/S0219720025500143

Mohsen Karami Fath

Background: Dengue virus (DENV) remains a major public health challenge with limited vaccine options, and current licensed vaccines exhibit restricted efficacy and safety concerns in certain populations. Advanced immunoinformatics approaches offer opportunities for designing multi-epitope vaccines targeting conserved and immunogenic regions of viral proteins. Objective: To design and computationally evaluate a novel multi-epitope vaccine targeting the Envelope (E) and Non-Structural protein 1 (NSP1) of DENV-1 and DENV-2 using integrated immunoinformatics and structural bioinformatics. Methods: CTL, HTL, and B-cell epitopes were predicted from the E and NSP1 proteins and screened for antigenicity, non-allergenicity, and non-toxicity. High-affinity epitopes were linked with appropriate spacers and adjuvants (human [Formula: see text]-defensin-3 or 50S ribosomal protein L7/L12) to construct two vaccine candidates. Molecular docking with TLR2/TLR4, molecular dynamics (MD) simulations, MM/GBSA binding free energy analysis, population coverage assessment, codon optimization, and immune simulations were conducted. Control docking using scrambled peptides was included to evaluate binding specificity. Results: Both vaccine constructs were predicted to be stable, soluble, non-allergenic, and non-toxic. Vaccine 2 showed higher antigenicity (VaxiJen: 0.6127) and stronger TLR2 binding ([Formula: see text]: -110.37[Formula: see text]kcal/mol), whereas vaccine 1 demonstrated better solubility and TLR4 interaction stability. Control docking with scrambled peptides produced less favorable binding energies, supporting specificity. MD simulations confirmed structural stability, and immune simulations predicted robust humoral and cellular responses with high IFN-[Formula: see text] production. Population coverage exceeded 98% in most regions. Conclusion: The designed multi-epitope vaccines demonstrate promising immunogenic potential in silico. Experimental validation is required to confirm safety, efficacy, and protective capability against multiple DENV serotypes.

背景：登革热病毒（DENV）仍然是一个重大的公共卫生挑战，疫苗选择有限，目前许可的疫苗在某些人群中表现出有限的有效性和安全性问题。先进的免疫信息学方法为设计针对病毒蛋白保守区和免疫原区的多表位疫苗提供了机会。目的：利用综合免疫信息学和结构生物信息学设计并计算评价一种针对DENV-1和DENV-2包膜蛋白(E)和非结构蛋白1 （NSP1）的新型多表位疫苗。方法：从E和NSP1蛋白中预测CTL、HTL和b细胞表位，并进行抗原性、非过敏性和无毒性筛选。高亲和力表位与适当的间隔物和佐剂（人[配方：见文]-防御素-3或50S核糖体蛋白L7/L12）连接，构建两种候选疫苗。与TLR2/TLR4分子对接、分子动力学（MD）模拟、MM/GBSA结合自由能分析、种群覆盖率评估、密码子优化和免疫模拟。利用混乱肽进行对照对接以评估结合特异性。结果：两种疫苗结构都是稳定的、可溶的、无过敏性和无毒的。疫苗2表现出更高的抗原性（VaxiJen: 0.6127）和更强的TLR2结合（[公式：见文]:-110.37 kcal/mol），而疫苗1表现出更好的溶解性和TLR4相互作用的稳定性。与混乱肽的控制对接产生不太有利的结合能，支持特异性。MD模拟证实了结构稳定性，免疫模拟预测了高IFN-产生的强大的体液和细胞反应。大多数地区人口覆盖率超过98%。结论：所设计的多表位疫苗在硅片上具有良好的免疫原性。需要进行实验验证，以确认安全性、有效性和针对多种DENV血清型的保护能力。

{"title":"The novel design of a multi-epitope vaccine candidate against the dengue virus using advanced immunoinformatics and structural analysis.","authors":"Mohsen Karami Fath","doi":"10.1142/S0219720025500143","DOIUrl":"https://doi.org/10.1142/S0219720025500143","url":null,"abstract":"Background: Dengue virus (DENV) remains a major public health challenge with limited vaccine options, and current licensed vaccines exhibit restricted efficacy and safety concerns in certain populations. Advanced immunoinformatics approaches offer opportunities for designing multi-epitope vaccines targeting conserved and immunogenic regions of viral proteins. Objective: To design and computationally evaluate a novel multi-epitope vaccine targeting the Envelope (E) and Non-Structural protein 1 (NSP1) of DENV-1 and DENV-2 using integrated immunoinformatics and structural bioinformatics. Methods: CTL, HTL, and B-cell epitopes were predicted from the E and NSP1 proteins and screened for antigenicity, non-allergenicity, and non-toxicity. High-affinity epitopes were linked with appropriate spacers and adjuvants (human [Formula: see text]-defensin-3 or 50S ribosomal protein L7/L12) to construct two vaccine candidates. Molecular docking with TLR2/TLR4, molecular dynamics (MD) simulations, MM/GBSA binding free energy analysis, population coverage assessment, codon optimization, and immune simulations were conducted. Control docking using scrambled peptides was included to evaluate binding specificity. Results: Both vaccine constructs were predicted to be stable, soluble, non-allergenic, and non-toxic. Vaccine 2 showed higher antigenicity (VaxiJen: 0.6127) and stronger TLR2 binding ([Formula: see text]: -110.37[Formula: see text]kcal/mol), whereas vaccine 1 demonstrated better solubility and TLR4 interaction stability. Control docking with scrambled peptides produced less favorable binding energies, supporting specificity. MD simulations confirmed structural stability, and immune simulations predicted robust humoral and cellular responses with high IFN-[Formula: see text] production. Population coverage exceeded 98% in most regions. Conclusion: The designed multi-epitope vaccines demonstrate promising immunogenic potential in silico. Experimental validation is required to confirm safety, efficacy, and protective capability against multiple DENV serotypes.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 5","pages":"2550014"},"PeriodicalIF":0.7,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cancer classification and functional pathway discovery using TCGA transcriptomic profiles: A matched case-control framework. 使用TCGA转录组谱的癌症分类和功能通路发现：一个匹配的病例-对照框架。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-10-07 DOI: 10.1142/S0219720025500155

Jie-Huei Wang, Tzung-Ying Guo, Yen-Yi Pai, Po-Lin Hou, Himani Kumari, Michael W Y Chan

Leveraging high-dimensional transcriptomic data from The Cancer Genome Atlas (TCGA) for cancer classification holds critical significance for advancing precision oncology. Matched Case-control Design (MCCD), by pairing similar cases with controls, can enhance statistical power and reduce confounding bias. However, high-dimensional data present challenges such as overfitting, instability, and difficulty in interpretation, collectively referred to as the "curse of dimensionality." Feature selection can help mitigate these problems by identifying representative variables and reducing redundancy. This study's innovation lies in integrating a set of existing techniques into a unified analytical workflow tailored specifically for MCCD, validated through both simulated and real TCGA datasets. We compared the performance of paired versus unpaired feature selection approaches under simulated 1:1 MCCD scenarios, and developed a modular, pluggable pipeline. This includes mean-centering, gene filtering, and a Corrected Feature Matrix (CFM) transformation step that explicitly preserves the matched structure. This transformation is then combined with machine learning classifiers to predict cancer status. We also incorporated Incremental Feature Selection (IFS) to refine gene subsets and employed gene set enrichment analysis to enhance biological interpretability. While the individual components we used, such as paired testing, CFM, IFS, and model-based gene set analysis, are not novel in themselves, we demonstrate an integrated workflow optimized for MCCD tasks. This workflow outperforms uncorrected approaches in terms of classification accuracy, feature stability, and interpretability. Our results indicate that this method can enhance cancer classification accuracy, facilitate biomarker discovery, and aid in building interpretable diagnostic models, providing a practical and scalable tool for precision medicine.

利用来自癌症基因组图谱（TCGA）的高维转录组学数据进行癌症分类对于推进精准肿瘤学具有重要意义。匹配病例-对照设计（matching Case-control Design， MCCD）通过将相似病例与对照配对，可以提高统计效能，减少混杂偏倚。然而，高维数据带来了诸如过拟合、不稳定和解释困难等挑战，统称为“维度诅咒”。特征选择可以通过识别代表性变量和减少冗余来帮助缓解这些问题。本研究的创新之处在于将一套现有技术集成到专门为MCCD定制的统一分析工作流程中，并通过模拟和真实的TCGA数据集进行验证。我们比较了配对和非配对特征选择方法在模拟1:1 MCCD场景下的性能，并开发了一个模块化的可插拔管道。这包括均值中心、基因过滤和校正特征矩阵（CFM）转换步骤，该步骤明确地保留了匹配的结构。然后将这种转换与机器学习分类器相结合，以预测癌症状态。我们还采用了增量特征选择（IFS）来完善基因子集，并采用基因集富集分析来提高生物学可解释性。虽然我们使用的单个组件，如配对测试、CFM、IFS和基于模型的基因集分析，本身并不新颖，但我们展示了针对MCCD任务优化的集成工作流。该工作流在分类准确性、特征稳定性和可解释性方面优于未校正的方法。研究结果表明，该方法可以提高癌症分类的准确性，促进生物标志物的发现，并有助于建立可解释的诊断模型，为精准医疗提供实用和可扩展的工具。

{"title":"Cancer classification and functional pathway discovery using TCGA transcriptomic profiles: A matched case-control framework.","authors":"Jie-Huei Wang, Tzung-Ying Guo, Yen-Yi Pai, Po-Lin Hou, Himani Kumari, Michael W Y Chan","doi":"10.1142/S0219720025500155","DOIUrl":"https://doi.org/10.1142/S0219720025500155","url":null,"abstract":"Leveraging high-dimensional transcriptomic data from The Cancer Genome Atlas (TCGA) for cancer classification holds critical significance for advancing precision oncology. Matched Case-control Design (MCCD), by pairing similar cases with controls, can enhance statistical power and reduce confounding bias. However, high-dimensional data present challenges such as overfitting, instability, and difficulty in interpretation, collectively referred to as the \"curse of dimensionality.\" Feature selection can help mitigate these problems by identifying representative variables and reducing redundancy. This study's innovation lies in integrating a set of existing techniques into a unified analytical workflow tailored specifically for MCCD, validated through both simulated and real TCGA datasets. We compared the performance of paired versus unpaired feature selection approaches under simulated 1:1 MCCD scenarios, and developed a modular, pluggable pipeline. This includes mean-centering, gene filtering, and a Corrected Feature Matrix (CFM) transformation step that explicitly preserves the matched structure. This transformation is then combined with machine learning classifiers to predict cancer status. We also incorporated Incremental Feature Selection (IFS) to refine gene subsets and employed gene set enrichment analysis to enhance biological interpretability. While the individual components we used, such as paired testing, CFM, IFS, and model-based gene set analysis, are not novel in themselves, we demonstrate an integrated workflow optimized for MCCD tasks. This workflow outperforms uncorrected approaches in terms of classification accuracy, feature stability, and interpretability. Our results indicate that this method can enhance cancer classification accuracy, facilitate biomarker discovery, and aid in building interpretable diagnostic models, providing a practical and scalable tool for precision medicine.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 5","pages":"2550015"},"PeriodicalIF":0.7,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modelling and optimizing combination therapeutic strategies for KRAS- and EGFR-mutant lung cancer. KRAS-和egfr -突变肺癌联合治疗策略的建模和优化。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-10-01 DOI: 10.1142/S0219720025500179

Lanqi Wu, Ruocheng Yu, Minghui Yao, Md Matiur Rahaman, Zhaoyuan Fang

Non-small cell lung carcinoma (NSCLC) is well-known for its high incidence (about 80% of lung cancer) and genetic heterogeneity. Personalized driver mutations such as EGFR and KRAS have established targeted therapies with kinase inhibitors, whereas immune checkpoint inhibitors (ICIs) have revolutionized immunotherapy. However, challenges such as frequent drug resistance and low response rates highlight the need for novel therapeutic strategies. Boolean network modeling is a powerful mathematical tool to simulate complex biological processes and optimize potential treatment strategies. This study developed a Boolean network model for NSCLC patients with different mutational backgrounds and evaluated the therapeutic effects by incorporating key kinase mutation inhibitors and immunological interventions. Simulations in both the Boolean network model and another quantitative model consistently suggested that the optimal therapeutic strategy involves a combination of KRAS inhibitor and ICI for KRAS-mutant patients, which is also in line with mouse model studies and the KRYSTAL-7 phase-2 clinical trial data. It would be reasonable to expect further validations from the recently announced KRYSTAL-7 phase-3 clinical trial comparing the combined therapy over pembrolizumab monotherapy in the future. Our approach highlights the value of computational modeling to evaluate and refine therapeutic strategies for precision oncology.

非小细胞肺癌（NSCLC）以其高发病率（约占肺癌的80%）和遗传异质性而闻名。个性化的驱动突变，如EGFR和KRAS，已经建立了激酶抑制剂的靶向治疗，而免疫检查点抑制剂（ICIs）已经彻底改变了免疫治疗。然而，诸如频繁的耐药和低反应率等挑战突出了对新型治疗策略的需求。布尔网络建模是一种强大的数学工具，可以模拟复杂的生物过程并优化潜在的治疗策略。本研究建立了具有不同突变背景的NSCLC患者的布尔网络模型，并通过结合关键激酶突变抑制剂和免疫干预来评估治疗效果。布尔网络模型和另一个定量模型的模拟一致表明，KRAS突变患者的最佳治疗策略是KRAS抑制剂和ICI联合使用，这也与小鼠模型研究和KRYSTAL-7二期临床试验数据一致。最近宣布的KRYSTAL-7 iii期临床试验将在未来对联合治疗与派姆单抗单药进行比较，我们有理由期待进一步的验证。我们的方法突出了计算建模在评估和完善精确肿瘤学治疗策略方面的价值。

{"title":"Modelling and optimizing combination therapeutic strategies for KRAS- and EGFR-mutant lung cancer.","authors":"Lanqi Wu, Ruocheng Yu, Minghui Yao, Md Matiur Rahaman, Zhaoyuan Fang","doi":"10.1142/S0219720025500179","DOIUrl":"https://doi.org/10.1142/S0219720025500179","url":null,"abstract":"Non-small cell lung carcinoma (NSCLC) is well-known for its high incidence (about 80% of lung cancer) and genetic heterogeneity. Personalized driver mutations such as EGFR and KRAS have established targeted therapies with kinase inhibitors, whereas immune checkpoint inhibitors (ICIs) have revolutionized immunotherapy. However, challenges such as frequent drug resistance and low response rates highlight the need for novel therapeutic strategies. Boolean network modeling is a powerful mathematical tool to simulate complex biological processes and optimize potential treatment strategies. This study developed a Boolean network model for NSCLC patients with different mutational backgrounds and evaluated the therapeutic effects by incorporating key kinase mutation inhibitors and immunological interventions. Simulations in both the Boolean network model and another quantitative model consistently suggested that the optimal therapeutic strategy involves a combination of KRAS inhibitor and ICI for KRAS-mutant patients, which is also in line with mouse model studies and the KRYSTAL-7 phase-2 clinical trial data. It would be reasonable to expect further validations from the recently announced KRYSTAL-7 phase-3 clinical trial comparing the combined therapy over pembrolizumab monotherapy in the future. Our approach highlights the value of computational modeling to evaluate and refine therapeutic strategies for precision oncology.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 5","pages":"2550017"},"PeriodicalIF":0.7,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial: Guidelines for Credible Machine Learning in Computational Biology. 社论：计算生物学中可信机器学习指南。

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Bioinformatics and Computational Biology

Pub Date : 2025-10-01 DOI: 10.1142/S0219720025010012

Limsoon Wong

引用次数: 0