首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
Unraveling the molecular basis of snake venom nerve growth factor: human TrkA recognition through molecular dynamics simulation and comparison with human nerve growth factor. 揭示蛇毒神经生长因子的分子基础:通过分子动力学模拟人类TrkA识别,并与人类神经生长因子进行比较。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-24 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1674791
Shrudhi Devi, Gurunathan Jayaraman

Introduction: Neurodegenerative diseases pose significant challenges owing to the limited number of effective therapies. Nerve growth factor (NGF) plays a crucial role in neuronal survival and differentiation through tropomyosin receptor kinase A (TrkA). Although snake venom NGF (sNGF) has been studied for its ability to activate TrkA, the binding modes and associated dynamics remain unclear compared to those of human NGF (hNGF). Herein, we explored the possibilities of NGFs from Daboia russelii and Naja naja as potential therapeutic alternatives to hNGF by comparing the structural similarities and conserved binding residues.

Methods: The active sites were identified through a literature review, molecular docking was performed using HADDOCK, and molecular dynamics simulation was performed to analyse the stabilities of the complexes; then, PRODIGY and molecular mechanics Poisson-Boltzmann surface area were used to determine the binding affinities.

Results: The different sNGFs exhibited stronger binding affinities and stabilities than hNGF, while principal component analysis and the free energy landscape indicated constrained conformational flexibilities suggestive of an adaptive mechanism in sNGF for effective receptor engagement. A network coevolutionary analysis was performed, which showed the pattern in which the amino acids were coevolved and conserved throughout the simulations.

Discussion: These findings indicate that NGFs from D. russelii and N. naja are promising therapeutic candidates for treating neurodegenerative disorders and warrant further in vivo validation.

导言:神经退行性疾病由于有效的治疗方法数量有限而构成重大挑战。神经生长因子(NGF)通过原肌球蛋白受体激酶a (TrkA)在神经元存活和分化中起着至关重要的作用。虽然已经研究了蛇毒NGF (sNGF)激活TrkA的能力,但与人NGF (hNGF)相比,其结合模式和相关动力学尚不清楚。在此,我们通过比较结构相似性和保守的结合残基,探讨了来自达伯亚russelii和Naja Naja的ngf作为hNGF潜在治疗替代品的可能性。方法:通过文献查阅确定活性位点,利用HADDOCK进行分子对接,并进行分子动力学模拟分析配合物的稳定性;然后利用PRODIGY和分子力学泊松-玻尔兹曼表面积来确定结合亲和力。结果:不同的sNGF表现出比hNGF更强的结合亲和力和稳定性,而主成分分析和自由能图表明sNGF具有约束的构象灵活性,这表明sNGF具有有效结合受体的自适应机制。进行了网络共同进化分析,显示了氨基酸在整个模拟过程中共同进化和保守的模式。讨论:这些发现表明,来自russelii和nnaja的ngf是治疗神经退行性疾病的有希望的治疗候选者,值得进一步的体内验证。
{"title":"Unraveling the molecular basis of snake venom nerve growth factor: human TrkA recognition through molecular dynamics simulation and comparison with human nerve growth factor.","authors":"Shrudhi Devi, Gurunathan Jayaraman","doi":"10.3389/fbinf.2025.1674791","DOIUrl":"10.3389/fbinf.2025.1674791","url":null,"abstract":"<p><strong>Introduction: </strong>Neurodegenerative diseases pose significant challenges owing to the limited number of effective therapies. Nerve growth factor (NGF) plays a crucial role in neuronal survival and differentiation through tropomyosin receptor kinase A (TrkA). Although snake venom NGF (sNGF) has been studied for its ability to activate TrkA, the binding modes and associated dynamics remain unclear compared to those of human NGF (hNGF). Herein, we explored the possibilities of NGFs from <i>Daboia russelii</i> and <i>Naja naja</i> as potential therapeutic alternatives to hNGF by comparing the structural similarities and conserved binding residues.</p><p><strong>Methods: </strong>The active sites were identified through a literature review, molecular docking was performed using HADDOCK, and molecular dynamics simulation was performed to analyse the stabilities of the complexes; then, PRODIGY and molecular mechanics Poisson-Boltzmann surface area were used to determine the binding affinities.</p><p><strong>Results: </strong>The different sNGFs exhibited stronger binding affinities and stabilities than hNGF, while principal component analysis and the free energy landscape indicated constrained conformational flexibilities suggestive of an adaptive mechanism in sNGF for effective receptor engagement. A network coevolutionary analysis was performed, which showed the pattern in which the amino acids were coevolved and conserved throughout the simulations.</p><p><strong>Discussion: </strong>These findings indicate that NGFs from <i>D. russelii</i> and <i>N. naja</i> are promising therapeutic candidates for treating neurodegenerative disorders and warrant further <i>in vivo</i> validation.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1674791"},"PeriodicalIF":3.9,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12592128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug repositioning pipeline integrating community analysis in drug-drug similarity networks and automated ATC community labeling to foster molecular docking analysis. 药物再定位管道整合了药物相似网络中的社区分析和自动ATC社区标记,促进分子对接分析。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-23 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1666716
Daiana Colibăşanu, Vlad Groza, Maria Antonietta Occhiuzzi, Fedora Grande, Mihai Udrescu, Lucreția Udrescu

Introduction: Drug repositioning-finding new therapeutic uses for existing drugs-can dramatically reduce development time and cost, but requires efficient computational frameworks to generate and validate repositioning hypotheses. Network-based methods can uncover drug communities with shared pharmacological properties, while molecular docking offers mechanistic insights by predicting drug-target binding.

Methods: We introduce an end-to-end, fully automated pipeline that (1) constructs a tripartite drug-gene-disease network from DrugBank and DisGeNET, (2) projects it into a drug-drug similarity network for community detection, (3) labels communities via Anatomical Therapeutic Chemical (ATC) codes to generate repositioning hints and identify relevant targets, (4) validates hints through automated literature searches, and (5) prioritizes candidates via targeted molecular docking.

Results: After filtering for connectivity and size, 12 robust communities emerged from the initial 34 clusters. The pipeline correctly matched 53.4% of drugs to their ATC level 1 community label via database entries; literature validation confirmed an additional 20.2%, yielding 73.6% overall accuracy. The remaining 26.4% of drugs were flagged as repositioning candidates. To illustrate the advantages of our pipeline, molecular docking studies of chloramphenicol demonstrated stable binding and interaction profiles similar to those of known inhibitors, reinforcing its potential as an anticancer agent.

Conclusion: Our integrated pipeline effectively integrates network-based community analysis and automated ATC labeling with literature and docking analysis, narrowing the search space for in silico and experimental follow-up. The chloramphenicol example illustrates its utility for uncovering non-obvious repositioning opportunities. Future work will extend similarity definitions (e.g., to higher-order network motifs) and incorporate wet-lab validation of top candidates.

药物重新定位-为现有药物寻找新的治疗用途-可以显着减少开发时间和成本,但需要有效的计算框架来生成和验证重新定位假设。基于网络的方法可以发现具有共同药理特性的药物群落,而分子对接通过预测药物靶标结合提供了机制见解。方法:我们引入了一个端到端的全自动管道,该管道(1)从DrugBank和DisGeNET构建一个药物-基因-疾病的三要素网络,(2)将其投影到药物-药物相似网络中用于社区检测,(3)通过解剖治疗化学(ATC)代码标记社区以生成重新定位提示并识别相关靶点,(4)通过自动文献检索验证提示,(5)通过靶向分子对接确定候选对象的优先级。结果:在对连通性和规模进行筛选后,从最初的34个集群中产生了12个强大的社区。该管道通过数据库条目将53.4%的药物与ATC 1级社区标签正确匹配;文献验证证实了额外的20.2%,总体准确率为73.6%。其余26.4%的药物被标记为重新定位候选药物。为了说明我们的产品线的优势,氯霉素的分子对接研究显示出与已知抑制剂相似的稳定结合和相互作用特征,加强了其作为抗癌剂的潜力。结论:我们的集成管道有效地将基于网络的社区分析和自动ATC标记与文献和对接分析相结合,缩小了计算机和实验随访的搜索空间。氯霉素的例子说明了它在发现非明显的重新定位机会方面的效用。未来的工作将扩展相似性定义(例如,到高阶网络基序),并纳入顶级候选的湿实验室验证。
{"title":"Drug repositioning pipeline integrating community analysis in drug-drug similarity networks and automated ATC community labeling to foster molecular docking analysis.","authors":"Daiana Colibăşanu, Vlad Groza, Maria Antonietta Occhiuzzi, Fedora Grande, Mihai Udrescu, Lucreția Udrescu","doi":"10.3389/fbinf.2025.1666716","DOIUrl":"10.3389/fbinf.2025.1666716","url":null,"abstract":"<p><strong>Introduction: </strong>Drug repositioning-finding new therapeutic uses for existing drugs-can dramatically reduce development time and cost, but requires efficient computational frameworks to generate and validate repositioning hypotheses. Network-based methods can uncover drug communities with shared pharmacological properties, while molecular docking offers mechanistic insights by predicting drug-target binding.</p><p><strong>Methods: </strong>We introduce an end-to-end, fully automated pipeline that (1) constructs a tripartite drug-gene-disease network from DrugBank and DisGeNET, (2) projects it into a drug-drug similarity network for community detection, (3) labels communities <i>via</i> Anatomical Therapeutic Chemical (ATC) codes to generate repositioning hints and identify relevant targets, (4) validates hints through automated literature searches, and (5) prioritizes candidates <i>via</i> targeted molecular docking.</p><p><strong>Results: </strong>After filtering for connectivity and size, 12 robust communities emerged from the initial 34 clusters. The pipeline correctly matched 53.4% of drugs to their ATC level 1 community label <i>via</i> database entries; literature validation confirmed an additional 20.2%, yielding 73.6% overall accuracy. The remaining 26.4% of drugs were flagged as repositioning candidates. To illustrate the advantages of our pipeline, molecular docking studies of chloramphenicol demonstrated stable binding and interaction profiles similar to those of known inhibitors, reinforcing its potential as an anticancer agent.</p><p><strong>Conclusion: </strong>Our integrated pipeline effectively integrates network-based community analysis and automated ATC labeling with literature and docking analysis, narrowing the search space for <i>in silico</i> and experimental follow-up. The chloramphenicol example illustrates its utility for uncovering non-obvious repositioning opportunities. Future work will extend similarity definitions (e.g., to higher-order network motifs) and incorporate wet-lab validation of top candidates.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1666716"},"PeriodicalIF":3.9,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12589059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks. 从一级结构序列中推断出的全局密集残基转移图可以通过有向图卷积神经网络进行蛋白质相互作用预测。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-22 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1651623
Islam Akef Ebeid, Haoteng Tang, Pengfei Gu

Introduction: Accurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing the development of drugs. While existing in-silico methods leverage direct sequence embeddings from Protein Language Models (PLMs) or apply Graph Neural Networks (GNNs) to 3D protein structures, the main focus of this study is to investigate less computationally intensive alternatives. This work introduces a novel framework for the downstream task of PPI prediction via link prediction.

Methods: We introduce a two-stage graph representation learning framework, ProtGram-DirectGCN. First, we developed ProtGram, a novel approach that models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of a directed graph of paired residues. Second, we propose a custom directed graph convolutional neural network, DirectGCN, which features a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations, combined via a learnable gating mechanism. DirectGCN is applied to the ProtGram graphs to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings for the prediction task.

Results: The efficacy of the DirectGCN model was first established on standard node classification benchmarks, where its performance is comparable to that of established methods on general datasets, while demonstrating specialization for complex, directed, and dense heterophilic graph structures. When applied to PPI prediction, the full ProtGram-DirectGCN framework achieves robust predictive power despite being trained on limited data.

Discussion: Our results suggest that a globally inferred, directed graph-based representation of sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs for the task of PPI prediction. Future work will involve testing ProtGram-DirectGCN on a wider range of bioinformatics tasks.

蛋白-蛋白相互作用(PPIs)的准确预测对于理解细胞功能和推进药物开发至关重要。虽然现有的计算机方法利用蛋白质语言模型(PLMs)的直接序列嵌入或将图神经网络(gnn)应用于3D蛋白质结构,但本研究的主要重点是研究计算强度较低的替代方法。这项工作引入了一种新的框架,用于通过链路预测进行PPI预测的下游任务。方法:我们引入了一个两阶段图表示学习框架program - directgcn。首先,我们开发了ProtGram,这是一种新颖的方法,将蛋白质的初级结构建模为全局推断的n图层次结构。在这些图中,从一个大的序列语料库中聚集的残数转移概率定义了成对残数有向图的边权。其次,我们提出了一个自定义的有向图卷积神经网络DirectGCN,它具有独特的卷积层,通过单独的路径特定(传入,传出,无向)和共享转换处理信息,并通过可学习的门控机制组合在一起。将DirectGCN应用于program图来学习残差级嵌入,然后通过注意机制将残差级嵌入集合起来,生成用于预测任务的蛋白级嵌入。结果:DirectGCN模型的有效性首先是在标准节点分类基准上建立的,其性能与在一般数据集上建立的方法相当,同时显示出对复杂、有向和密集的异亲图结构的专门化。当应用于PPI预测时,尽管在有限的数据上进行了训练,但完整的program - directgcn框架仍具有强大的预测能力。讨论:我们的研究结果表明,序列转换的全局推断、有向图表示为资源密集型PLMs的PPI预测任务提供了一个强大的、计算上独特的替代方案。未来的工作将包括在更广泛的生物信息学任务中测试program - directgcn。
{"title":"Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks.","authors":"Islam Akef Ebeid, Haoteng Tang, Pengfei Gu","doi":"10.3389/fbinf.2025.1651623","DOIUrl":"10.3389/fbinf.2025.1651623","url":null,"abstract":"<p><strong>Introduction: </strong>Accurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing the development of drugs. While existing <i>in-silico</i> methods leverage direct sequence embeddings from Protein Language Models (PLMs) or apply Graph Neural Networks (GNNs) to 3D protein structures, the main focus of this study is to investigate less computationally intensive alternatives. This work introduces a novel framework for the downstream task of PPI prediction via link prediction.</p><p><strong>Methods: </strong>We introduce a two-stage graph representation learning framework, <i>ProtGram-DirectGCN</i>. First, we developed <i>ProtGram</i>, a novel approach that models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of a directed graph of paired residues. Second, we propose a custom directed graph convolutional neural network, <i>DirectGCN</i>, which features a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations, combined via a learnable gating mechanism. <i>DirectGCN</i> is applied to the <i>ProtGram</i> graphs to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings for the prediction task.</p><p><strong>Results: </strong>The efficacy of the <i>DirectGCN</i> model was first established on standard node classification benchmarks, where its performance is comparable to that of established methods on general datasets, while demonstrating specialization for complex, directed, and dense heterophilic graph structures. When applied to PPI prediction, the full <i>ProtGram-DirectGCN</i> framework achieves robust predictive power despite being trained on limited data.</p><p><strong>Discussion: </strong>Our results suggest that a globally inferred, directed graph-based representation of sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs for the task of PPI prediction. Future work will involve testing <i>ProtGram-DirectGCN</i> on a wider range of bioinformatics tasks.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1651623"},"PeriodicalIF":3.9,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12585958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing drug-target interaction prediction with graph representation learning and knowledge-based regularization. 用图表示学习和基于知识的正则化增强药物-靶标相互作用预测。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-21 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1649337
Qihuan Yao, Zhen Chen, Ye Cao, Huijing Hu

Introduction: Accurately predicting drug-target interactions (DTIs) is crucial for accelerating drug discovery and repurposing. Despite recent advances in deep learning-based methods, challenges remain in effectively capturing the complex relationships between drugs and targets while incorporating prior biological knowledge.

Methods: We introduce a novel framework that combines graph neural networks with knowledge integration for DTI prediction. Our approach learns representations from molecular structures and protein sequences through a customized graph-based message passing scheme. We integrate domain knowledge from biomedical ontologies and databases using a knowledge-based regularization strategy to infuse biological context into the learned representations.

Results: We evaluated our model on multiple benchmark datasets, achieving an average AUC of 0.98 and an average AUPR of 0.89, surpassing existing state-of-the-art methods by a considerable margin. Visualization of learned attention weights identified salient molecular substructures and protein motifs driving the predicted interactions, demonstrating model interpretability.

Discussion: We validated the practical utility by predicting novel DTIs for FDA-approved drugs and experimentally confirming a high proportion of predictions. Our framework offers a powerful and interpretable solution for DTI prediction with the potential to substantially accelerate the identification of new drug candidates and therapeutic targets.

准确预测药物-靶标相互作用(DTIs)是加速药物发现和再利用的关键。尽管基于深度学习的方法最近取得了进展,但在有效捕获药物和靶标之间的复杂关系同时结合先前的生物学知识方面仍然存在挑战。方法:提出了一种将图神经网络与知识集成相结合的DTI预测框架。我们的方法通过定制的基于图的消息传递方案从分子结构和蛋白质序列中学习表征。我们使用基于知识的正则化策略将生物医学本体和数据库中的领域知识集成到学习表征中。结果:我们在多个基准数据集上评估了我们的模型,平均AUC为0.98,平均AUPR为0.89,大大超过了现有的最先进的方法。学习到的注意权重的可视化识别了驱动预测相互作用的显著分子亚结构和蛋白质基序,证明了模型的可解释性。讨论:我们通过预测fda批准的药物的新型dti来验证实际效用,并通过实验证实了高比例的预测。我们的框架为DTI预测提供了一个强大且可解释的解决方案,有可能大大加速新的候选药物和治疗靶点的确定。
{"title":"Enhancing drug-target interaction prediction with graph representation learning and knowledge-based regularization.","authors":"Qihuan Yao, Zhen Chen, Ye Cao, Huijing Hu","doi":"10.3389/fbinf.2025.1649337","DOIUrl":"10.3389/fbinf.2025.1649337","url":null,"abstract":"<p><strong>Introduction: </strong>Accurately predicting drug-target interactions (DTIs) is crucial for accelerating drug discovery and repurposing. Despite recent advances in deep learning-based methods, challenges remain in effectively capturing the complex relationships between drugs and targets while incorporating prior biological knowledge.</p><p><strong>Methods: </strong>We introduce a novel framework that combines graph neural networks with knowledge integration for DTI prediction. Our approach learns representations from molecular structures and protein sequences through a customized graph-based message passing scheme. We integrate domain knowledge from biomedical ontologies and databases using a knowledge-based regularization strategy to infuse biological context into the learned representations.</p><p><strong>Results: </strong>We evaluated our model on multiple benchmark datasets, achieving an average AUC of 0.98 and an average AUPR of 0.89, surpassing existing state-of-the-art methods by a considerable margin. Visualization of learned attention weights identified salient molecular substructures and protein motifs driving the predicted interactions, demonstrating model interpretability.</p><p><strong>Discussion: </strong>We validated the practical utility by predicting novel DTIs for FDA-approved drugs and experimentally confirming a high proportion of predictions. Our framework offers a powerful and interpretable solution for DTI prediction with the potential to substantially accelerate the identification of new drug candidates and therapeutic targets.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1649337"},"PeriodicalIF":3.9,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fish isoallergens and variants: database compilation, in silico allergenicity prediction challenges, and epitope-based threshold optimization. 鱼类等过敏原和变异体:数据库编译,在硅过敏原预测挑战,和基于表位的阈值优化。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-20 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1669237
Vachiranee Limviphuvadh, Thimo Ruethers, Minh N Nguyen, Dean R Jerry, Benjamin P C Smith, Yulan Wang, Yansong Miao, Anand Kumar Andiappan, Andreas L Lopata, Sebastian Maurer-Stroh

Introduction: Fish is a major food allergy trigger with a complex variety of allergenic protein isoforms and vast species diversity exhibiting variable allergenicity. This is the first study to systematically compile fish isoallergen and variant entries associated with ingestion-related allergic reactions.

Methods: Entries were compiled from four major allergen databases: World Health Organization and International Union of Immunological Societies (WHO/IUIS), AllergenOnline, Comprehensive Protein Allergen Resource (COMPARE), and Allergome, including evidence from in vitro IgE-binding assays and complete amino acid sequences. Challenges in predicting the allergenicity of fish isoallergens and variants were evaluated, and the sensitivity of five widely used in silico tools (AllerCatPro 2.0, AlgPred 2.0, pLM4Alg, AllergenFP v.1.0, and AllerTop v.2.0) was assessed. Epitope mapping and phylogenetic analyses were performed for the major fish allergen parvalbumin, incorporating experimentally validated B-cell epitope data from the Immune Epitope Database (IEDB) and evolutionary relationships.

Results: A comprehensive dataset of 79 unique fish isoallergen and variant entries from 34 fish species was identified, with 25 entries common across all four databases. AllerCatPro 2.0 achieved the highest sensitivity (97.5%). A phylogenetic tree was constructed, integrating epitope data to optimize protein family-specific thresholds for differentiating allergenic from less/non-allergenic parvalbumins. A threshold of ≥4 IEDB-mapped epitopes allowing up to two mismatches captured 52 out of 54 parvalbumin sequences (96%) in the dataset, effectively distinguishing between parvalbumin classes.

Discussion: This study enhances understanding of fish allergy by systematically compiling fish isoallergens and variants and integrating B-cell epitope data. The optimized thresholds improve the performance of allergenicity prediction tools and can be applied to other protein families in future studies.

鱼类是一种主要的食物过敏触发器,具有多种复杂的致敏蛋白异构体和巨大的物种多样性,表现出不同的致敏性。这是第一个系统地汇编与摄入相关过敏反应相关的鱼类等过敏原和变异条目的研究。方法:从世界卫生组织和国际免疫学会联合会(WHO/IUIS)、AllergenOnline、综合蛋白过敏原资源(COMPARE)和Allergome四个主要过敏原数据库中收集条目,包括体外ige结合试验和完整氨基酸序列的证据。评估了预测鱼类等过敏原和变异体致敏性的挑战,并评估了五种广泛使用的硅工具(AllerCatPro 2.0、AlgPred 2.0、pLM4Alg、AllergenFP v.1.0和AllerTop v.2.0)的敏感性。结合免疫表位数据库(IEDB)中实验验证的b细胞表位数据和进化关系,对主要的鱼类过敏原小白蛋白进行了表位定位和系统发育分析。结果:鉴定了来自34种鱼类的79个独特的鱼类等过敏原和变体条目的综合数据集,其中25个条目在所有四个数据库中都是通用的。AllerCatPro 2.0的灵敏度最高(97.5%)。构建了一个系统发育树,整合表位数据来优化蛋白家族特异性阈值,以区分过敏性和非过敏性小白蛋白。≥4个iedb映射表位的阈值允许最多两个错配,捕获了数据集中54个细小蛋白序列中的52个(96%),有效区分了细小蛋白类别。讨论:本研究通过系统地汇编鱼类等过敏原和变异体以及整合b细胞表位数据,增强了对鱼类过敏的理解。优化后的阈值提高了过敏原预测工具的性能,可以在未来的研究中应用于其他蛋白质家族。
{"title":"Fish isoallergens and variants: database compilation, <i>in silico</i> allergenicity prediction challenges, and epitope-based threshold optimization.","authors":"Vachiranee Limviphuvadh, Thimo Ruethers, Minh N Nguyen, Dean R Jerry, Benjamin P C Smith, Yulan Wang, Yansong Miao, Anand Kumar Andiappan, Andreas L Lopata, Sebastian Maurer-Stroh","doi":"10.3389/fbinf.2025.1669237","DOIUrl":"10.3389/fbinf.2025.1669237","url":null,"abstract":"<p><strong>Introduction: </strong>Fish is a major food allergy trigger with a complex variety of allergenic protein isoforms and vast species diversity exhibiting variable allergenicity. This is the first study to systematically compile fish isoallergen and variant entries associated with ingestion-related allergic reactions.</p><p><strong>Methods: </strong>Entries were compiled from four major allergen databases: World Health Organization and International Union of Immunological Societies (WHO/IUIS), AllergenOnline, Comprehensive Protein Allergen Resource (COMPARE), and Allergome, including evidence from <i>in vitro</i> IgE-binding assays and complete amino acid sequences. Challenges in predicting the allergenicity of fish isoallergens and variants were evaluated, and the sensitivity of five widely used <i>in silico</i> tools (AllerCatPro 2.0, AlgPred 2.0, pLM4Alg, AllergenFP v.1.0, and AllerTop v.2.0) was assessed. Epitope mapping and phylogenetic analyses were performed for the major fish allergen parvalbumin, incorporating experimentally validated B-cell epitope data from the Immune Epitope Database (IEDB) and evolutionary relationships.</p><p><strong>Results: </strong>A comprehensive dataset of 79 unique fish isoallergen and variant entries from 34 fish species was identified, with 25 entries common across all four databases. AllerCatPro 2.0 achieved the highest sensitivity (97.5%). A phylogenetic tree was constructed, integrating epitope data to optimize protein family-specific thresholds for differentiating allergenic from less/non-allergenic parvalbumins. A threshold of ≥4 IEDB-mapped epitopes allowing up to two mismatches captured 52 out of 54 parvalbumin sequences (96%) in the dataset, effectively distinguishing between parvalbumin classes.</p><p><strong>Discussion: </strong>This study enhances understanding of fish allergy by systematically compiling fish isoallergens and variants and integrating B-cell epitope data. The optimized thresholds improve the performance of allergenicity prediction tools and can be applied to other protein families in future studies.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1669237"},"PeriodicalIF":3.9,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoMPHI: a novel composite machine learning approach utilizing multiple feature representation to predict hosts of bacteriophages. CoMPHI:一种新的复合机器学习方法,利用多特征表示来预测噬菌体的宿主。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-16 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1622931
Shreyashi Bodaka, Narasaiah Kolliputi

Phage therapy has reemerged as a compelling alternative to antibiotics in treating bacterial infections, especially for superbugs that have developed antibiotic resistance. The challenge in the broader application of phage therapy is identifying host targets for the vast array of uncharacterized phages obtained through next-generation sequencing. We introduce a Composite Model for Phage Host Interaction (CoMPHI) that integrates alignment-based approaches with machine learning. The model generates multiple feature encodings from nucleotide and protein sequences of both phages and hosts. It incorporates alignment scores between phage-phage, phage-host, and host-host pairs, creating a composite prediction framework. During 5-fold cross-validation, CoMPHI achieved Area Under the ROC Curve (AUC-ROC) values of 94-96.7% and accuracies of 92.3-95.1% across taxonomic levels from species to phylum. Comparative analysis showed a 6-8% performance improvement when alignment scores were included. Ablation studies demonstrated that combining nucleotide and protein encodings, along with phage-host, host-host, and phage-phage alignment scores, significantly enhanced prediction accuracy. CoMPHI provides a robust and comprehensive framework for predicting phage-host interactions. By combining sequence features and alignment information, the model advances computational tools that can accelerate the application of phage therapy in modern medicine.

噬菌体疗法已经重新成为治疗细菌感染的一种令人信服的替代抗生素,特别是对于已经产生抗生素耐药性的超级细菌。噬菌体治疗更广泛应用的挑战是为通过下一代测序获得的大量未表征噬菌体确定宿主靶标。我们介绍了噬菌体宿主相互作用的复合模型(CoMPHI),该模型集成了基于对齐的方法和机器学习。该模型从噬菌体和宿主的核苷酸和蛋白质序列中生成多个特征编码。它结合了噬菌体、噬菌体-宿主和宿主-宿主对之间的比对得分,创建了一个复合预测框架。在5次交叉验证中,CoMPHI的ROC曲线下面积(AUC-ROC)值为94 ~ 96.7%,准确度为92.3 ~ 95.1%。对比分析显示,当包括对齐分数时,性能提高了6-8%。消融研究表明,结合核苷酸和蛋白质编码,以及噬菌体-宿主、宿主-宿主和噬菌体-噬菌体比对评分,可显著提高预测准确性。CoMPHI为预测噬菌体-宿主相互作用提供了一个强大而全面的框架。通过结合序列特征和比对信息,该模型推进了计算工具,可以加速噬菌体治疗在现代医学中的应用。
{"title":"CoMPHI: a novel composite machine learning approach utilizing multiple feature representation to predict hosts of bacteriophages.","authors":"Shreyashi Bodaka, Narasaiah Kolliputi","doi":"10.3389/fbinf.2025.1622931","DOIUrl":"10.3389/fbinf.2025.1622931","url":null,"abstract":"<p><p>Phage therapy has reemerged as a compelling alternative to antibiotics in treating bacterial infections, especially for superbugs that have developed antibiotic resistance. The challenge in the broader application of phage therapy is identifying host targets for the vast array of uncharacterized phages obtained through next-generation sequencing. We introduce a Composite Model for Phage Host Interaction (CoMPHI) that integrates alignment-based approaches with machine learning. The model generates multiple feature encodings from nucleotide and protein sequences of both phages and hosts. It incorporates alignment scores between phage-phage, phage-host, and host-host pairs, creating a composite prediction framework. During 5-fold cross-validation, CoMPHI achieved Area Under the ROC Curve (AUC-ROC) values of 94-96.7% and accuracies of 92.3-95.1% across taxonomic levels from species to phylum. Comparative analysis showed a 6-8% performance improvement when alignment scores were included. Ablation studies demonstrated that combining nucleotide and protein encodings, along with phage-host, host-host, and phage-phage alignment scores, significantly enhanced prediction accuracy. CoMPHI provides a robust and comprehensive framework for predicting phage-host interactions. By combining sequence features and alignment information, the model advances computational tools that can accelerate the application of phage therapy in modern medicine.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1622931"},"PeriodicalIF":3.9,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12571911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal knowledge expansion widget powered by plant protein phosphorylation database and ChatGPT. 由植物蛋白磷酸化数据库和ChatGPT驱动的多模式知识扩展小部件。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-15 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1687687
Chunhui Xu, Yang Yu, Govardhan Khadakkar, Jiacheng Xie, Dong Xu, Qiuming Yao

Biological databases are essential for providing curated knowledge, but their rigid data structures and restrictive query formats often limit flexible and exploratory user interactions. In the field of plant phosphorylation, manually curated and reviewed data represent only a small portion of the available knowledge, and users often seek information that goes beyond what is provided in structured databases. While large language models (LLMs) like ChatGPT-4o possess extensive contextual knowledge, integrating this capability into bioinformatics tools remains an open challenge. Here, we present a multimodal question-answering widget that integrates ChatGPT-4o with our Plant Protein Phosphorylation Database (P3DB). This system supports natural language queries and dynamic prompt formulation, enabling users to explore phosphorylation events, kinase-substrate relationships, and protein-protein interactions through a global entry. In another application, the widget leverages ChatGPT's image interpretation functionality to extract regulatory pathways and phosphorylation markers from complex scientific figures. To build this widget effectively, we have explored multiple prompt strategies, including one-step, two-step, few-shot, and image-cropping techniques, demonstrating their impact on output accuracy and consistency. In addition, recent multimodal LLMs such as ChatGPT-5 and Gemini 1.5 have demonstrated comparable capabilities and adaptability when applied to our test cases and the developed widgets. Together, our application widget and results highlight the development of the ChatGPT-P3DB integration as a system that enhances user accessibility, enables visual extraction, and extends the current utility of biological knowledgebases through a flexible and adaptive framework. Our "ChatGPT-P3DB" is open-source and can be accessed on GitHub (https://github.com/yao-laboratory/p3db-chat). The frontend interface, "P3DB askAI" web module, can be accessed freely through https://www.p3db.org/ask-ai.

生物数据库对于提供有条理的知识是必不可少的,但是它们严格的数据结构和限制性的查询格式往往限制了灵活和探索性的用户交互。在植物磷酸化领域,人工整理和审查的数据只代表了可用知识的一小部分,用户经常寻求超出结构化数据库所提供的信息。虽然像chatgpt - 40这样的大型语言模型(llm)拥有广泛的上下文知识,但将这种能力集成到生物信息学工具中仍然是一个开放的挑战。在这里,我们提出了一个多模式问答小部件,它将chatgpt - 40与我们的植物蛋白磷酸化数据库(P3DB)集成在一起。该系统支持自然语言查询和动态提示公式,使用户能够通过全局入口探索磷酸化事件,激酶-底物关系和蛋白质-蛋白质相互作用。在另一个应用程序中,该小部件利用ChatGPT的图像解释功能从复杂的科学数据中提取调控途径和磷酸化标记。为了有效地构建这个小部件,我们探索了多种提示策略,包括一步、两步、少拍和图像裁剪技术,展示了它们对输出准确性和一致性的影响。此外,最近的多模式法学硕士,如ChatGPT-5和Gemini 1.5,在应用于我们的测试用例和开发的小部件时,已经展示了相当的能力和适应性。总之,我们的应用程序小部件和结果突出了ChatGPT-P3DB集成作为一个系统的开发,该系统增强了用户可访问性,支持可视化提取,并通过灵活和自适应的框架扩展了生物知识库的当前效用。我们的“ChatGPT-P3DB”是开源的,可以在GitHub (https://github.com/yao-laboratory/p3db-chat)上访问。前端界面“P3DB askAI”web模块可通过https://www.p3db.org/ask-ai免费访问。
{"title":"Multimodal knowledge expansion widget powered by plant protein phosphorylation database and ChatGPT.","authors":"Chunhui Xu, Yang Yu, Govardhan Khadakkar, Jiacheng Xie, Dong Xu, Qiuming Yao","doi":"10.3389/fbinf.2025.1687687","DOIUrl":"10.3389/fbinf.2025.1687687","url":null,"abstract":"<p><p>Biological databases are essential for providing curated knowledge, but their rigid data structures and restrictive query formats often limit flexible and exploratory user interactions. In the field of plant phosphorylation, manually curated and reviewed data represent only a small portion of the available knowledge, and users often seek information that goes beyond what is provided in structured databases. While large language models (LLMs) like ChatGPT-4o possess extensive contextual knowledge, integrating this capability into bioinformatics tools remains an open challenge. Here, we present a multimodal question-answering widget that integrates ChatGPT-4o with our Plant Protein Phosphorylation Database (P3DB). This system supports natural language queries and dynamic prompt formulation, enabling users to explore phosphorylation events, kinase-substrate relationships, and protein-protein interactions through a global entry. In another application, the widget leverages ChatGPT's image interpretation functionality to extract regulatory pathways and phosphorylation markers from complex scientific figures. To build this widget effectively, we have explored multiple prompt strategies, including one-step, two-step, few-shot, and image-cropping techniques, demonstrating their impact on output accuracy and consistency. In addition, recent multimodal LLMs such as ChatGPT-5 and Gemini 1.5 have demonstrated comparable capabilities and adaptability when applied to our test cases and the developed widgets. Together, our application widget and results highlight the development of the ChatGPT-P3DB integration as a system that enhances user accessibility, enables visual extraction, and extends the current utility of biological knowledgebases through a flexible and adaptive framework. Our \"ChatGPT-P3DB\" is open-source and can be accessed on GitHub (https://github.com/yao-laboratory/p3db-chat). The frontend interface, \"P3DB askAI\" web module, can be accessed freely through https://www.p3db.org/ask-ai.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1687687"},"PeriodicalIF":3.9,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12568720/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of breast region segmentation in thermal images using U-Net deep neural network variants. 基于U-Net深度神经网络的热图像乳房区域分割分析。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-10 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1609004
Rafhanah Shazwani Rosli, Mohamed Hadi Habaebi, Md Rafiqul Islam, Mohammed Abdulla Salim Al Hussaini

Introduction: Breast cancer detection using thermal imaging relies on accurate segmentation of the breast region from adjacent body areas. Reliable segmentation is essential to improve the effectiveness of computer-aided diagnosis systems.

Methods: This study evaluated three segmentation models-U-Net, U-Net with Spatial Attention, and U-Net++-using five optimization algorithms (ADAM, NADAM, RMSPROP, SGDM, and ADADELTA). Performance was assessed through k-fold cross-validation with metrics including Intersection over Union (IoU), Dice coefficient, precision, recall, sensitivity, specificity, pixel accuracy, ROC-AUC, PR-AUC, and Grad-CAM heatmaps for qualitative analysis.

Results: The ADAM optimizer consistently outperformed the others, yielding superior accuracy and reduced loss. Among the models, the baseline U-Net, despite being less complex, demonstrated the most effective performance, with precision of 0.9721, recall of 0.9559, specificity of 0.9801, ROC-AUC of 0.9680, and PR-AUC of 0.9472. U-Net also achieved higher robustness in breast region overlap and noise handling compared to its more complex variants. The findings indicate that greater architectural complexity does not necessarily lead to improved outcomes.

Discussion: This research highlights that the original U-Net, when trained with the ADAM optimizer, remains highly effective for breast region segmentation in thermal images. The insights contribute to guiding the selection of suitable deep learning models and optimizers for medical image analysis, with the potential to enhance the efficiency and accuracy of breast cancer diagnosis using thermal imaging.

介绍:使用热成像检测乳腺癌依赖于乳房区域与邻近身体区域的准确分割。可靠的分割是提高计算机辅助诊断系统有效性的关键。方法:采用5种优化算法(ADAM、NADAM、RMSPROP、SGDM和ADADELTA)对U-Net、U-Net带空间注意和U-Net++ 3种分割模型进行了评价。通过k-fold交叉验证评估性能,指标包括交叉交叉(IoU)、Dice系数、精度、召回率、灵敏度、特异性、像素精度、ROC-AUC、PR-AUC和Grad-CAM热图进行定性分析。结果:ADAM优化器始终优于其他优化器,产生优越的准确性和减少损失。其中,基线U-Net模型虽然复杂度较低,但效果最好,其精密度为0.9721,召回率为0.9559,特异性为0.9801,ROC-AUC为0.9680,PR-AUC为0.9472。与更复杂的变体相比,U-Net在乳房区域重叠和噪声处理方面也取得了更高的鲁棒性。研究结果表明,更大的架构复杂性并不一定会带来更好的结果。讨论:本研究强调了原始的U-Net在经过ADAM优化器的训练后,对于热图像中的乳房区域分割仍然非常有效。这些见解有助于指导为医学图像分析选择合适的深度学习模型和优化器,有可能提高使用热成像诊断乳腺癌的效率和准确性。
{"title":"Analysis of breast region segmentation in thermal images using U-Net deep neural network variants.","authors":"Rafhanah Shazwani Rosli, Mohamed Hadi Habaebi, Md Rafiqul Islam, Mohammed Abdulla Salim Al Hussaini","doi":"10.3389/fbinf.2025.1609004","DOIUrl":"10.3389/fbinf.2025.1609004","url":null,"abstract":"<p><strong>Introduction: </strong>Breast cancer detection using thermal imaging relies on accurate segmentation of the breast region from adjacent body areas. Reliable segmentation is essential to improve the effectiveness of computer-aided diagnosis systems.</p><p><strong>Methods: </strong>This study evaluated three segmentation models-U-Net, U-Net with Spatial Attention, and U-Net++-using five optimization algorithms (ADAM, NADAM, RMSPROP, SGDM, and ADADELTA). Performance was assessed through k-fold cross-validation with metrics including Intersection over Union (IoU), Dice coefficient, precision, recall, sensitivity, specificity, pixel accuracy, ROC-AUC, PR-AUC, and Grad-CAM heatmaps for qualitative analysis.</p><p><strong>Results: </strong>The ADAM optimizer consistently outperformed the others, yielding superior accuracy and reduced loss. Among the models, the baseline U-Net, despite being less complex, demonstrated the most effective performance, with precision of 0.9721, recall of 0.9559, specificity of 0.9801, ROC-AUC of 0.9680, and PR-AUC of 0.9472. U-Net also achieved higher robustness in breast region overlap and noise handling compared to its more complex variants. The findings indicate that greater architectural complexity does not necessarily lead to improved outcomes.</p><p><strong>Discussion: </strong>This research highlights that the original U-Net, when trained with the ADAM optimizer, remains highly effective for breast region segmentation in thermal images. The insights contribute to guiding the selection of suitable deep learning models and optimizers for medical image analysis, with the potential to enhance the efficiency and accuracy of breast cancer diagnosis using thermal imaging.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1609004"},"PeriodicalIF":3.9,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12550958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the impact of interferon genes on the immune microenvironment of triple-negative breast cancer: identification of therapeutic targets. 揭示干扰素基因对三阴性乳腺癌免疫微环境的影响:治疗靶点的确定
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-08 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1629526
Ying Liu, Jiayi Cai, Aamir Fahira, Kai Zhuang, Jiaojiao Wang, Zhi Zhang, Lin Yan, Yong Liu, Defang Ouyang, Zunnan Huang

Objective: Triple-negative breast cancer (TNBC), a classic subtype of breast cancer, is challenging to treat due to the lack of drug-targeting receptors. This study aims to explore interferon-related prognostic molecular biomarkers in TNBC and their potential competing endogenous RNA (ceRNA) regulatory network in TNBC.

Methods: RNA expression profiles and interferon genes were downloaded from the Cancer Genome Atlas (TCGA) database and the Gene Set Enrichment Analysis (GSEA) website, respectively. Univariate and multivariate Cox regression analyses were performed to identify prognostic genes and construct a risk model. Single-sample GSEA (ssGSEA) and the CellMiner database were used to explore the relationships between prognostic genes and both tumor immune microenvironment and drug sensitivity, respectively. The lncRNA-miRNA-mRNA network associated with prognosis was constructed using the ENCORI database. Finally, the potential interferon-associated lncRNA/miRNA/mRNA regulatory axis was identified through correlation analysis. The abnormal expressions of prognostic genes were validated in three TNBC tumor cell lines compared to normal mammary epithelial cells by using quantitative real-time polymerase chain reaction (qRT-PCR).

Results: The TNBC prognostic signature comprising four interferon genes (STXBP1, LAMP3, CD276, and POLR2F) was identified, with their expression significantly correlated with the infiltration abundance of multiple immune cells and the drug sensitivity of 30 diverse drugs (ARQ-680, Fluphenazine, and Chelerythrine, etc.). Furthermore, an interferon-related genes prognostic ceRNA network was further constructed, consisting of 248 lncRNAs, 66 miRNAs, and 4 mRNAs. As a result, 5 interferon-related ceRNA regulatory axes (AC124067.4/hsa-miR-455-3p/STXBP1, RBPMS-AS1/hsa-miR-455-3p/STXBP1, DNMBP-AS1/hsa-miR-455-3p/STXBP1, FAM198B-AS1/hsa-miR-455-3p/STXBP1, LIFR-AS1/hsa-miR-455-3p/STXBP1) associated with TNBC progression were identified. QRT-PCR results showed that all four prognostic mRNAs were upregulated in TNBC cells.

Conclusion: This study established a prognostic signature and a ceRNA network associated with interferon in TNBC, and identified five key regulatory axes. In the prognostic signature and the ceRNA axes, STXBP1, RBPMS-AS1, and FAM198B-AS1 were first reported as potential biomarkers of TNBC. These findings have the potential to provide new insights into the mechanisms driving TNBC tumorigenesis and development.

目的:三阴性乳腺癌(TNBC)是一种典型的乳腺癌亚型,由于缺乏药物靶向受体,治疗具有挑战性。本研究旨在探索TNBC中与干扰素相关的预后分子生物标志物及其在TNBC中潜在的竞争性内源性RNA (ceRNA)调控网络。方法:分别从Cancer Genome Atlas (TCGA)数据库和Gene Set Enrichment Analysis (GSEA)网站下载RNA表达谱和干扰素基因。进行单因素和多因素Cox回归分析以确定预后基因并构建风险模型。单样本GSEA (ssGSEA)和CellMiner数据库分别用于探讨预后基因与肿瘤免疫微环境和药物敏感性之间的关系。利用ENCORI数据库构建与预后相关的lncRNA-miRNA-mRNA网络。最后,通过相关分析确定干扰素相关的潜在lncRNA/miRNA/mRNA调控轴。采用实时荧光定量聚合酶链反应(qRT-PCR)验证了三种TNBC肿瘤细胞系中与正常乳腺上皮细胞相比预后基因的异常表达。结果:鉴定出4个干扰素基因(STXBP1、LAMP3、CD276、POLR2F)的TNBC预后特征,其表达与多种免疫细胞浸润丰度及30种不同药物(ARQ-680、氟非那嗪、Chelerythrine等)的药物敏感性显著相关。进一步构建干扰素相关基因预后的ceRNA网络,包括248个lncrna、66个mirna和4个mrna。结果,鉴定出5个与TNBC进展相关的干扰素相关ceRNA调控轴(AC124067.4/hsa-miR-455-3p/STXBP1, RBPMS-AS1/hsa-miR-455-3p/STXBP1, DNMBP-AS1/hsa-miR-455-3p/STXBP1, FAM198B-AS1/hsa-miR-455-3p/STXBP1, LIFR-AS1/hsa-miR-455-3p/STXBP1)。QRT-PCR结果显示,所有四种预后mrna在TNBC细胞中均上调。结论:本研究在TNBC中建立了与干扰素相关的预后特征和ceRNA网络,并确定了5个关键调控轴。在预后特征和ceRNA轴中,STXBP1、RBPMS-AS1和FAM198B-AS1首次被报道为TNBC的潜在生物标志物。这些发现有可能为TNBC肿瘤发生和发展的机制提供新的见解。
{"title":"Unveiling the impact of interferon genes on the immune microenvironment of triple-negative breast cancer: identification of therapeutic targets.","authors":"Ying Liu, Jiayi Cai, Aamir Fahira, Kai Zhuang, Jiaojiao Wang, Zhi Zhang, Lin Yan, Yong Liu, Defang Ouyang, Zunnan Huang","doi":"10.3389/fbinf.2025.1629526","DOIUrl":"10.3389/fbinf.2025.1629526","url":null,"abstract":"<p><strong>Objective: </strong>Triple-negative breast cancer (TNBC), a classic subtype of breast cancer, is challenging to treat due to the lack of drug-targeting receptors. This study aims to explore interferon-related prognostic molecular biomarkers in TNBC and their potential competing endogenous RNA (ceRNA) regulatory network in TNBC.</p><p><strong>Methods: </strong>RNA expression profiles and interferon genes were downloaded from the Cancer Genome Atlas (TCGA) database and the Gene Set Enrichment Analysis (GSEA) website, respectively. Univariate and multivariate Cox regression analyses were performed to identify prognostic genes and construct a risk model. Single-sample GSEA (ssGSEA) and the CellMiner database were used to explore the relationships between prognostic genes and both tumor immune microenvironment and drug sensitivity, respectively. The lncRNA-miRNA-mRNA network associated with prognosis was constructed using the ENCORI database. Finally, the potential interferon-associated lncRNA/miRNA/mRNA regulatory axis was identified through correlation analysis. The abnormal expressions of prognostic genes were validated in three TNBC tumor cell lines compared to normal mammary epithelial cells by using quantitative real-time polymerase chain reaction (qRT-PCR).</p><p><strong>Results: </strong>The TNBC prognostic signature comprising four interferon genes (STXBP1, LAMP3, CD276, and POLR2F) was identified, with their expression significantly correlated with the infiltration abundance of multiple immune cells and the drug sensitivity of 30 diverse drugs (ARQ-680, Fluphenazine, and Chelerythrine, etc.). Furthermore, an interferon-related genes prognostic ceRNA network was further constructed, consisting of 248 lncRNAs, 66 miRNAs, and 4 mRNAs. As a result, 5 interferon-related ceRNA regulatory axes (AC124067.4/hsa-miR-455-3p/STXBP1, RBPMS-AS1/hsa-miR-455-3p/STXBP1, DNMBP-AS1/hsa-miR-455-3p/STXBP1, FAM198B-AS1/hsa-miR-455-3p/STXBP1, LIFR-AS1/hsa-miR-455-3p/STXBP1) associated with TNBC progression were identified. QRT-PCR results showed that all four prognostic mRNAs were upregulated in TNBC cells.</p><p><strong>Conclusion: </strong>This study established a prognostic signature and a ceRNA network associated with interferon in TNBC, and identified five key regulatory axes. In the prognostic signature and the ceRNA axes, STXBP1, RBPMS-AS1, and FAM198B-AS1 were first reported as potential biomarkers of TNBC. These findings have the potential to provide new insights into the mechanisms driving TNBC tumorigenesis and development.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1629526"},"PeriodicalIF":3.9,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12542738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing clustering of CDR3 sequences using natural language processing, Word2Vec, and KMeans. 利用自然语言处理、Word2Vec和KMeans优化CDR3序列聚类。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-02 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1623488
Sanskriti Baranwal, Ricardo Avila Sanchez, Clement-Andi Edet, Erick Chastain, Inimary Toby

T-cell receptor (TCR) sequencing has emerged as a powerful tool for understanding adaptive immune responses, yet challenges persist in deciphering the immense diversity of Complementarity-Determining Region 3 (CDR3) sequences. This study presents a novel natural language processing (NLP)-based pipeline to cluster CDR3 sequences from TCR β-chain repertoires using Word2Vec embeddings, principal component analysis (PCA), and KMeans clustering. Focusing on Acute Respiratory Distress Syndrome (ARDS), a life-threatening inflammatory lung condition, we trained Word2Vec models on healthy controls and applied unsupervised clustering across ARDS, non-ARDS, and control datasets. Dimensionality-reduced embeddings revealed clear distinctions in repertoire structure: control samples exhibited tight, low-diversity clusters; ARDS patients showed high dispersion and numerous diffuse clusters indicative of repertoire disruption; and non-ARDS samples displayed intermediate organization. These differences suggest that immune activation states are embedded in the structural topology of the CDR3 space. Our framework successfully captured these latent patterns, offering a scalable approach to biomarker discovery. This study not only reinforces the utility of NLP in immunological analysis but also paves the way for data-driven immune monitoring in critical care and personalized diagnostics.

t细胞受体(TCR)测序已成为理解适应性免疫反应的有力工具,但在解读互补性决定区3 (CDR3)序列的巨大多样性方面仍然存在挑战。本研究提出了一种新的基于自然语言处理(NLP)的管道,利用Word2Vec嵌入、主成分分析(PCA)和KMeans聚类,从TCR β链库中对CDR3序列进行聚类。针对急性呼吸窘迫综合征(Acute Respiratory Distress Syndrome, ARDS)这一危及生命的炎症性肺部疾病,我们在健康对照上训练了Word2Vec模型,并在ARDS、非ARDS和对照数据集上应用无监督聚类。降维嵌入揭示了库结构的明显差异:对照样本表现出紧密、低多样性的聚类;急性呼吸窘迫综合征患者弥散度高,弥散性聚集多,提示储备系统破坏;非ards样品显示中间组织。这些差异表明免疫激活状态嵌入在CDR3空间的结构拓扑中。我们的框架成功捕获了这些潜在的模式,为生物标志物的发现提供了一种可扩展的方法。这项研究不仅加强了NLP在免疫学分析中的应用,而且为重症监护和个性化诊断中的数据驱动免疫监测铺平了道路。
{"title":"Optimizing clustering of CDR3 sequences using natural language processing, Word2Vec, and KMeans.","authors":"Sanskriti Baranwal, Ricardo Avila Sanchez, Clement-Andi Edet, Erick Chastain, Inimary Toby","doi":"10.3389/fbinf.2025.1623488","DOIUrl":"10.3389/fbinf.2025.1623488","url":null,"abstract":"<p><p>T-cell receptor (TCR) sequencing has emerged as a powerful tool for understanding adaptive immune responses, yet challenges persist in deciphering the immense diversity of Complementarity-Determining Region 3 (CDR3) sequences. This study presents a novel natural language processing (NLP)-based pipeline to cluster CDR3 sequences from TCR β-chain repertoires using Word2Vec embeddings, principal component analysis (PCA), and KMeans clustering. Focusing on Acute Respiratory Distress Syndrome (ARDS), a life-threatening inflammatory lung condition, we trained Word2Vec models on healthy controls and applied unsupervised clustering across ARDS, non-ARDS, and control datasets. Dimensionality-reduced embeddings revealed clear distinctions in repertoire structure: control samples exhibited tight, low-diversity clusters; ARDS patients showed high dispersion and numerous diffuse clusters indicative of repertoire disruption; and non-ARDS samples displayed intermediate organization. These differences suggest that immune activation states are embedded in the structural topology of the CDR3 space. Our framework successfully captured these latent patterns, offering a scalable approach to biomarker discovery. This study not only reinforces the utility of NLP in immunological analysis but also paves the way for data-driven immune monitoring in critical care and personalized diagnostics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1623488"},"PeriodicalIF":3.9,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12528129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1