Pub Date : 2025-10-31eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1705252
Yaan J Jang
{"title":"Editorial: Computational protein function prediction based on sequence and/or structural data.","authors":"Yaan J Jang","doi":"10.3389/fbinf.2025.1705252","DOIUrl":"10.3389/fbinf.2025.1705252","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1705252"},"PeriodicalIF":3.9,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145544048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-31eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1645520
Helya Goharbavang, Artem T Ashitkov, Athira Pillai, Joshua D Wythe, Guoning Chen, David Mayerich
Recent advances in three-dimensional microscopy enable imaging of whole-organ microvascular networks in small animals. Since microvasculature plays a crucial role in tissue development and function, its structure may provide diagnostic biomarkers and insight into disease progression. However, the microscopy community currently lacks benchmarks for scalable algorithms to measure these potential biomarkers. While many algorithms exist for segmenting vessel-like structures and extracting their surface features and connectivity, they have not been thoroughly evaluated on modern gigavoxel-scale images. In this paper, we propose a comprehensive yet compact survey of available algorithms. We focus on essential features for microvascular analysis, including extracting vessel surfaces and the network's associated connectivity. We select a series of algorithms based on popularity and availability and provide a thorough quantitative analysis of their performance on datasets acquired using light sheet fluorescence microscopy (LSFM), knife-edge scanning microscopy (KESM), and X-ray microtomography (µ-CT).
{"title":"Segmentation and modeling of large-scale microvascular networks: a survey.","authors":"Helya Goharbavang, Artem T Ashitkov, Athira Pillai, Joshua D Wythe, Guoning Chen, David Mayerich","doi":"10.3389/fbinf.2025.1645520","DOIUrl":"10.3389/fbinf.2025.1645520","url":null,"abstract":"<p><p>Recent advances in three-dimensional microscopy enable imaging of whole-organ microvascular networks in small animals. Since microvasculature plays a crucial role in tissue development and function, its structure may provide diagnostic biomarkers and insight into disease progression. However, the microscopy community currently lacks benchmarks for scalable algorithms to measure these potential biomarkers. While many algorithms exist for segmenting vessel-like structures and extracting their surface features and connectivity, they have not been thoroughly evaluated on modern gigavoxel-scale images. In this paper, we propose a comprehensive yet compact survey of available algorithms. We focus on essential features for microvascular analysis, including extracting vessel surfaces and the network's associated connectivity. We select a series of algorithms based on popularity and availability and provide a thorough quantitative analysis of their performance on datasets acquired using light sheet fluorescence microscopy (LSFM), knife-edge scanning microscopy (KESM), and X-ray microtomography (µ-CT).</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1645520"},"PeriodicalIF":3.9,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12616183/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145544065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1693343
Genevieve Laprade, Quinn Lee, Kristin L Gallik, Michael Nelson, Natalie Woo, Celina Terán Ramírez, Alexis Ricardo Becerril Cuevas, Kevin W Eliceiri, Corinne Esquibel
The fields of bioimaging and image analysis are rapidly expanding as new technologies transform biological questions into novel insights. While professionals of varying expertise are essential to achieving these advancements, early-career scientists-a prominent user group within the imaging community-are often assumed to have the prerequisite knowledge and ability to use these tools. This demographic, consisting of students, post-docs, and bioimage analysis trainees, is critical for the field to continue to evolve and flourish. However, obstacles such as geographic location, language barriers, insufficient funding or training, and instrument availability hinder access to resources and introduce significant knowledge gaps, especially for scientists in early-career stages. Democratized resources for bioimaging and analysis such as forums, community organizations, and publicly available datasets have been helpful in overcoming barriers to access for early-career scientists. Here, we discuss the current tools and resources available for early-career researchers, highlight their limitations from the learners' perspective, and propose strategies to better support this group. As bioimage analysis extends broadly into many scientific disciplines, we implore all members of this community, regardless of experience level, to empower next-generation scientists.
{"title":"The importance of democratized resources in early-career training for bioimage analysts and bioimaging scientists.","authors":"Genevieve Laprade, Quinn Lee, Kristin L Gallik, Michael Nelson, Natalie Woo, Celina Terán Ramírez, Alexis Ricardo Becerril Cuevas, Kevin W Eliceiri, Corinne Esquibel","doi":"10.3389/fbinf.2025.1693343","DOIUrl":"10.3389/fbinf.2025.1693343","url":null,"abstract":"<p><p>The fields of bioimaging and image analysis are rapidly expanding as new technologies transform biological questions into novel insights. While professionals of varying expertise are essential to achieving these advancements, early-career scientists-a prominent user group within the imaging community-are often assumed to have the prerequisite knowledge and ability to use these tools. This demographic, consisting of students, post-docs, and bioimage analysis trainees, is critical for the field to continue to evolve and flourish. However, obstacles such as geographic location, language barriers, insufficient funding or training, and instrument availability hinder access to resources and introduce significant knowledge gaps, especially for scientists in early-career stages. Democratized resources for bioimaging and analysis such as forums, community organizations, and publicly available datasets have been helpful in overcoming barriers to access for early-career scientists. Here, we discuss the current tools and resources available for early-career researchers, highlight their limitations from the learners' perspective, and propose strategies to better support this group. As bioimage analysis extends broadly into many scientific disciplines, we implore all members of this community, regardless of experience level, to empower next-generation scientists.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1693343"},"PeriodicalIF":3.9,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12611831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145544038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-29eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1674179
Priscila Galvão Doria, Gisele Vieira Rocha, Vanessa Dybal Bertoni, Roberto de Souza Batista Dos Santos, Mariana Araújo-Pereira, Clarissa Gurgel
Introduction: Colon cancer is a common disease, treated with few chemotherapeutic agents with similar treatment sequencing despite its heterogeneity. A significant proportion of patients are diagnosed with metastasis, and resistance to antineoplastic drugs is associated with disease progression and therapeutic failure. It is known that the tumor microenvironment plays an essential role in cancer progression, contributing to processes that may be associated with therapeutic resistance mechanisms in colon cancer. In this study, we aim to identify a gene expression signature and its relationship with immune cell infiltration in colon cancer, contributing to the identification of potential resistance biomarkers.
Methods: An in silico study was conducted using RNA-seq data from The Cancer Genome Atlas Program (TCGA) samples, subdivided into two groups (treatment-resistant and non-resistant), taking into account the molecular subgroups (CMS1, CMS2, CMS3, and CMS4). The following algorithms were used: i. Limma was applied to identify differentially expressed genes; ii. WGCNA was applied to construct co-expression networks; iii. CIBERSORT was applied to estimate the proportion of infiltrating immune cells; and iv. TIMER was applied to explore the relationship between core genes and immune cell content.
Results: Twenty differentially expressed genes (DEGs) were found, with 18 related to the group considered resistant to oncologic treatment and presenting poorer overall survival. T CD4 memory resting cells and M0 and M2 macrophages were found in more significant proportions in the analyzed samples and more infiltrated in the tumor microenvironment, the higher the expression of some of these resistance DEGs. Additionally, these genes correlate with biological aspects of neuronal differentiation, axogenesis, and synaptic transmission.
Conclusion: The gene expression signature suggests the presence of differentially expressed synaptic membrane genes, which may be involved in neuronal pathways that influence the tumor microenvironment, potentially serving as future biomarkers. Furthermore, the presence of M0 and M2 macrophages and T CD4 memory resting cells suggests a potential interaction that may play a role in therapeutic resistance.
{"title":"Gene expression profile in colon cancer therapeutic resistance and its relationship with the tumor microenvironment.","authors":"Priscila Galvão Doria, Gisele Vieira Rocha, Vanessa Dybal Bertoni, Roberto de Souza Batista Dos Santos, Mariana Araújo-Pereira, Clarissa Gurgel","doi":"10.3389/fbinf.2025.1674179","DOIUrl":"10.3389/fbinf.2025.1674179","url":null,"abstract":"<p><strong>Introduction: </strong>Colon cancer is a common disease, treated with few chemotherapeutic agents with similar treatment sequencing despite its heterogeneity. A significant proportion of patients are diagnosed with metastasis, and resistance to antineoplastic drugs is associated with disease progression and therapeutic failure. It is known that the tumor microenvironment plays an essential role in cancer progression, contributing to processes that may be associated with therapeutic resistance mechanisms in colon cancer. In this study, we aim to identify a gene expression signature and its relationship with immune cell infiltration in colon cancer, contributing to the identification of potential resistance biomarkers.</p><p><strong>Methods: </strong>An <i>in silico</i> study was conducted using RNA-seq data from The Cancer Genome Atlas Program (TCGA) samples, subdivided into two groups (treatment-resistant and non-resistant), taking into account the molecular subgroups (CMS1, CMS2, CMS3, and CMS4). The following algorithms were used: i. <i>Limma</i> was applied to identify differentially expressed genes; ii. WGCNA was applied to construct co-expression networks; iii. CIBERSORT was applied to estimate the proportion of infiltrating immune cells; and iv. TIMER was applied to explore the relationship between core genes and immune cell content.</p><p><strong>Results: </strong>Twenty differentially expressed genes (DEGs) were found, with 18 related to the group considered resistant to oncologic treatment and presenting poorer overall survival. T CD4 memory resting cells and M0 and M2 macrophages were found in more significant proportions in the analyzed samples and more infiltrated in the tumor microenvironment, the higher the expression of some of these resistance DEGs. Additionally, these genes correlate with biological aspects of neuronal differentiation, axogenesis, and synaptic transmission.</p><p><strong>Conclusion: </strong>The gene expression signature suggests the presence of differentially expressed synaptic membrane genes, which may be involved in neuronal pathways that influence the tumor microenvironment, potentially serving as future biomarkers. Furthermore, the presence of M0 and M2 macrophages and T CD4 memory resting cells suggests a potential interaction that may play a role in therapeutic resistance.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1674179"},"PeriodicalIF":3.9,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1708311
Giulia Ghisleni, Christian Stolte, Megan Gozzard, Lea Von Soosten, Antonia Bruno
This perspective paper examines the profound cognitive and methodological parallels between scientific and artistic research, challenging the traditional distinction between the two domains. While science and art use different languages, both emerge from the human drive for creativity and understanding. We argue that scientific inquiry, often presented as strictly objective and methodical, inherently shares with art the need for imagination, flexibility, and interpretative thinking. Drawing on neuroscience, education, design theory, and the visual arts, we highlight how artistic practices, particularly in the visual arts, can enhance scientific learning, innovation, and public engagement. We advocate integrating art into scientific training and research to foster a more creative and inclusive epistemology. Through examples in microbiology, education, and data visualization, we show how the arts can support deeper understanding, cross-disciplinary collaboration, and more effective science communication. Ultimately, we call for a shift toward a more integrated approach that embraces the complementary strengths of both art and science in advancing knowledge and societal impact.
{"title":"Why science needs art.","authors":"Giulia Ghisleni, Christian Stolte, Megan Gozzard, Lea Von Soosten, Antonia Bruno","doi":"10.3389/fbinf.2025.1708311","DOIUrl":"10.3389/fbinf.2025.1708311","url":null,"abstract":"<p><p>This perspective paper examines the profound cognitive and methodological parallels between scientific and artistic research, challenging the traditional distinction between the two domains. While science and art use different languages, both emerge from the human drive for creativity and understanding. We argue that scientific inquiry, often presented as strictly objective and methodical, inherently shares with art the need for imagination, flexibility, and interpretative thinking. Drawing on neuroscience, education, design theory, and the visual arts, we highlight how artistic practices, particularly in the visual arts, can enhance scientific learning, innovation, and public engagement. We advocate integrating art into scientific training and research to foster a more creative and inclusive epistemology. Through examples in microbiology, education, and data visualization, we show how the arts can support deeper understanding, cross-disciplinary collaboration, and more effective science communication. Ultimately, we call for a shift toward a more integrated approach that embraces the complementary strengths of both art and science in advancing knowledge and societal impact.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1708311"},"PeriodicalIF":3.9,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12592062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1674791
Shrudhi Devi, Gurunathan Jayaraman
Introduction: Neurodegenerative diseases pose significant challenges owing to the limited number of effective therapies. Nerve growth factor (NGF) plays a crucial role in neuronal survival and differentiation through tropomyosin receptor kinase A (TrkA). Although snake venom NGF (sNGF) has been studied for its ability to activate TrkA, the binding modes and associated dynamics remain unclear compared to those of human NGF (hNGF). Herein, we explored the possibilities of NGFs from Daboia russelii and Naja naja as potential therapeutic alternatives to hNGF by comparing the structural similarities and conserved binding residues.
Methods: The active sites were identified through a literature review, molecular docking was performed using HADDOCK, and molecular dynamics simulation was performed to analyse the stabilities of the complexes; then, PRODIGY and molecular mechanics Poisson-Boltzmann surface area were used to determine the binding affinities.
Results: The different sNGFs exhibited stronger binding affinities and stabilities than hNGF, while principal component analysis and the free energy landscape indicated constrained conformational flexibilities suggestive of an adaptive mechanism in sNGF for effective receptor engagement. A network coevolutionary analysis was performed, which showed the pattern in which the amino acids were coevolved and conserved throughout the simulations.
Discussion: These findings indicate that NGFs from D. russelii and N. naja are promising therapeutic candidates for treating neurodegenerative disorders and warrant further in vivo validation.
{"title":"Unraveling the molecular basis of snake venom nerve growth factor: human TrkA recognition through molecular dynamics simulation and comparison with human nerve growth factor.","authors":"Shrudhi Devi, Gurunathan Jayaraman","doi":"10.3389/fbinf.2025.1674791","DOIUrl":"10.3389/fbinf.2025.1674791","url":null,"abstract":"<p><strong>Introduction: </strong>Neurodegenerative diseases pose significant challenges owing to the limited number of effective therapies. Nerve growth factor (NGF) plays a crucial role in neuronal survival and differentiation through tropomyosin receptor kinase A (TrkA). Although snake venom NGF (sNGF) has been studied for its ability to activate TrkA, the binding modes and associated dynamics remain unclear compared to those of human NGF (hNGF). Herein, we explored the possibilities of NGFs from <i>Daboia russelii</i> and <i>Naja naja</i> as potential therapeutic alternatives to hNGF by comparing the structural similarities and conserved binding residues.</p><p><strong>Methods: </strong>The active sites were identified through a literature review, molecular docking was performed using HADDOCK, and molecular dynamics simulation was performed to analyse the stabilities of the complexes; then, PRODIGY and molecular mechanics Poisson-Boltzmann surface area were used to determine the binding affinities.</p><p><strong>Results: </strong>The different sNGFs exhibited stronger binding affinities and stabilities than hNGF, while principal component analysis and the free energy landscape indicated constrained conformational flexibilities suggestive of an adaptive mechanism in sNGF for effective receptor engagement. A network coevolutionary analysis was performed, which showed the pattern in which the amino acids were coevolved and conserved throughout the simulations.</p><p><strong>Discussion: </strong>These findings indicate that NGFs from <i>D. russelii</i> and <i>N. naja</i> are promising therapeutic candidates for treating neurodegenerative disorders and warrant further <i>in vivo</i> validation.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1674791"},"PeriodicalIF":3.9,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12592128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1666716
Daiana Colibăşanu, Vlad Groza, Maria Antonietta Occhiuzzi, Fedora Grande, Mihai Udrescu, Lucreția Udrescu
Introduction: Drug repositioning-finding new therapeutic uses for existing drugs-can dramatically reduce development time and cost, but requires efficient computational frameworks to generate and validate repositioning hypotheses. Network-based methods can uncover drug communities with shared pharmacological properties, while molecular docking offers mechanistic insights by predicting drug-target binding.
Methods: We introduce an end-to-end, fully automated pipeline that (1) constructs a tripartite drug-gene-disease network from DrugBank and DisGeNET, (2) projects it into a drug-drug similarity network for community detection, (3) labels communities via Anatomical Therapeutic Chemical (ATC) codes to generate repositioning hints and identify relevant targets, (4) validates hints through automated literature searches, and (5) prioritizes candidates via targeted molecular docking.
Results: After filtering for connectivity and size, 12 robust communities emerged from the initial 34 clusters. The pipeline correctly matched 53.4% of drugs to their ATC level 1 community label via database entries; literature validation confirmed an additional 20.2%, yielding 73.6% overall accuracy. The remaining 26.4% of drugs were flagged as repositioning candidates. To illustrate the advantages of our pipeline, molecular docking studies of chloramphenicol demonstrated stable binding and interaction profiles similar to those of known inhibitors, reinforcing its potential as an anticancer agent.
Conclusion: Our integrated pipeline effectively integrates network-based community analysis and automated ATC labeling with literature and docking analysis, narrowing the search space for in silico and experimental follow-up. The chloramphenicol example illustrates its utility for uncovering non-obvious repositioning opportunities. Future work will extend similarity definitions (e.g., to higher-order network motifs) and incorporate wet-lab validation of top candidates.
{"title":"Drug repositioning pipeline integrating community analysis in drug-drug similarity networks and automated ATC community labeling to foster molecular docking analysis.","authors":"Daiana Colibăşanu, Vlad Groza, Maria Antonietta Occhiuzzi, Fedora Grande, Mihai Udrescu, Lucreția Udrescu","doi":"10.3389/fbinf.2025.1666716","DOIUrl":"10.3389/fbinf.2025.1666716","url":null,"abstract":"<p><strong>Introduction: </strong>Drug repositioning-finding new therapeutic uses for existing drugs-can dramatically reduce development time and cost, but requires efficient computational frameworks to generate and validate repositioning hypotheses. Network-based methods can uncover drug communities with shared pharmacological properties, while molecular docking offers mechanistic insights by predicting drug-target binding.</p><p><strong>Methods: </strong>We introduce an end-to-end, fully automated pipeline that (1) constructs a tripartite drug-gene-disease network from DrugBank and DisGeNET, (2) projects it into a drug-drug similarity network for community detection, (3) labels communities <i>via</i> Anatomical Therapeutic Chemical (ATC) codes to generate repositioning hints and identify relevant targets, (4) validates hints through automated literature searches, and (5) prioritizes candidates <i>via</i> targeted molecular docking.</p><p><strong>Results: </strong>After filtering for connectivity and size, 12 robust communities emerged from the initial 34 clusters. The pipeline correctly matched 53.4% of drugs to their ATC level 1 community label <i>via</i> database entries; literature validation confirmed an additional 20.2%, yielding 73.6% overall accuracy. The remaining 26.4% of drugs were flagged as repositioning candidates. To illustrate the advantages of our pipeline, molecular docking studies of chloramphenicol demonstrated stable binding and interaction profiles similar to those of known inhibitors, reinforcing its potential as an anticancer agent.</p><p><strong>Conclusion: </strong>Our integrated pipeline effectively integrates network-based community analysis and automated ATC labeling with literature and docking analysis, narrowing the search space for <i>in silico</i> and experimental follow-up. The chloramphenicol example illustrates its utility for uncovering non-obvious repositioning opportunities. Future work will extend similarity definitions (e.g., to higher-order network motifs) and incorporate wet-lab validation of top candidates.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1666716"},"PeriodicalIF":3.9,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12589059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-22eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1651623
Islam Akef Ebeid, Haoteng Tang, Pengfei Gu
Introduction: Accurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing the development of drugs. While existing in-silico methods leverage direct sequence embeddings from Protein Language Models (PLMs) or apply Graph Neural Networks (GNNs) to 3D protein structures, the main focus of this study is to investigate less computationally intensive alternatives. This work introduces a novel framework for the downstream task of PPI prediction via link prediction.
Methods: We introduce a two-stage graph representation learning framework, ProtGram-DirectGCN. First, we developed ProtGram, a novel approach that models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of a directed graph of paired residues. Second, we propose a custom directed graph convolutional neural network, DirectGCN, which features a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations, combined via a learnable gating mechanism. DirectGCN is applied to the ProtGram graphs to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings for the prediction task.
Results: The efficacy of the DirectGCN model was first established on standard node classification benchmarks, where its performance is comparable to that of established methods on general datasets, while demonstrating specialization for complex, directed, and dense heterophilic graph structures. When applied to PPI prediction, the full ProtGram-DirectGCN framework achieves robust predictive power despite being trained on limited data.
Discussion: Our results suggest that a globally inferred, directed graph-based representation of sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs for the task of PPI prediction. Future work will involve testing ProtGram-DirectGCN on a wider range of bioinformatics tasks.
{"title":"Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks.","authors":"Islam Akef Ebeid, Haoteng Tang, Pengfei Gu","doi":"10.3389/fbinf.2025.1651623","DOIUrl":"10.3389/fbinf.2025.1651623","url":null,"abstract":"<p><strong>Introduction: </strong>Accurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing the development of drugs. While existing <i>in-silico</i> methods leverage direct sequence embeddings from Protein Language Models (PLMs) or apply Graph Neural Networks (GNNs) to 3D protein structures, the main focus of this study is to investigate less computationally intensive alternatives. This work introduces a novel framework for the downstream task of PPI prediction via link prediction.</p><p><strong>Methods: </strong>We introduce a two-stage graph representation learning framework, <i>ProtGram-DirectGCN</i>. First, we developed <i>ProtGram</i>, a novel approach that models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of a directed graph of paired residues. Second, we propose a custom directed graph convolutional neural network, <i>DirectGCN</i>, which features a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations, combined via a learnable gating mechanism. <i>DirectGCN</i> is applied to the <i>ProtGram</i> graphs to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings for the prediction task.</p><p><strong>Results: </strong>The efficacy of the <i>DirectGCN</i> model was first established on standard node classification benchmarks, where its performance is comparable to that of established methods on general datasets, while demonstrating specialization for complex, directed, and dense heterophilic graph structures. When applied to PPI prediction, the full <i>ProtGram-DirectGCN</i> framework achieves robust predictive power despite being trained on limited data.</p><p><strong>Discussion: </strong>Our results suggest that a globally inferred, directed graph-based representation of sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs for the task of PPI prediction. Future work will involve testing <i>ProtGram-DirectGCN</i> on a wider range of bioinformatics tasks.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1651623"},"PeriodicalIF":3.9,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12585958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1649337
Qihuan Yao, Zhen Chen, Ye Cao, Huijing Hu
Introduction: Accurately predicting drug-target interactions (DTIs) is crucial for accelerating drug discovery and repurposing. Despite recent advances in deep learning-based methods, challenges remain in effectively capturing the complex relationships between drugs and targets while incorporating prior biological knowledge.
Methods: We introduce a novel framework that combines graph neural networks with knowledge integration for DTI prediction. Our approach learns representations from molecular structures and protein sequences through a customized graph-based message passing scheme. We integrate domain knowledge from biomedical ontologies and databases using a knowledge-based regularization strategy to infuse biological context into the learned representations.
Results: We evaluated our model on multiple benchmark datasets, achieving an average AUC of 0.98 and an average AUPR of 0.89, surpassing existing state-of-the-art methods by a considerable margin. Visualization of learned attention weights identified salient molecular substructures and protein motifs driving the predicted interactions, demonstrating model interpretability.
Discussion: We validated the practical utility by predicting novel DTIs for FDA-approved drugs and experimentally confirming a high proportion of predictions. Our framework offers a powerful and interpretable solution for DTI prediction with the potential to substantially accelerate the identification of new drug candidates and therapeutic targets.
{"title":"Enhancing drug-target interaction prediction with graph representation learning and knowledge-based regularization.","authors":"Qihuan Yao, Zhen Chen, Ye Cao, Huijing Hu","doi":"10.3389/fbinf.2025.1649337","DOIUrl":"10.3389/fbinf.2025.1649337","url":null,"abstract":"<p><strong>Introduction: </strong>Accurately predicting drug-target interactions (DTIs) is crucial for accelerating drug discovery and repurposing. Despite recent advances in deep learning-based methods, challenges remain in effectively capturing the complex relationships between drugs and targets while incorporating prior biological knowledge.</p><p><strong>Methods: </strong>We introduce a novel framework that combines graph neural networks with knowledge integration for DTI prediction. Our approach learns representations from molecular structures and protein sequences through a customized graph-based message passing scheme. We integrate domain knowledge from biomedical ontologies and databases using a knowledge-based regularization strategy to infuse biological context into the learned representations.</p><p><strong>Results: </strong>We evaluated our model on multiple benchmark datasets, achieving an average AUC of 0.98 and an average AUPR of 0.89, surpassing existing state-of-the-art methods by a considerable margin. Visualization of learned attention weights identified salient molecular substructures and protein motifs driving the predicted interactions, demonstrating model interpretability.</p><p><strong>Discussion: </strong>We validated the practical utility by predicting novel DTIs for FDA-approved drugs and experimentally confirming a high proportion of predictions. Our framework offers a powerful and interpretable solution for DTI prediction with the potential to substantially accelerate the identification of new drug candidates and therapeutic targets.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1649337"},"PeriodicalIF":3.9,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1669237
Vachiranee Limviphuvadh, Thimo Ruethers, Minh N Nguyen, Dean R Jerry, Benjamin P C Smith, Yulan Wang, Yansong Miao, Anand Kumar Andiappan, Andreas L Lopata, Sebastian Maurer-Stroh
Introduction: Fish is a major food allergy trigger with a complex variety of allergenic protein isoforms and vast species diversity exhibiting variable allergenicity. This is the first study to systematically compile fish isoallergen and variant entries associated with ingestion-related allergic reactions.
Methods: Entries were compiled from four major allergen databases: World Health Organization and International Union of Immunological Societies (WHO/IUIS), AllergenOnline, Comprehensive Protein Allergen Resource (COMPARE), and Allergome, including evidence from in vitro IgE-binding assays and complete amino acid sequences. Challenges in predicting the allergenicity of fish isoallergens and variants were evaluated, and the sensitivity of five widely used in silico tools (AllerCatPro 2.0, AlgPred 2.0, pLM4Alg, AllergenFP v.1.0, and AllerTop v.2.0) was assessed. Epitope mapping and phylogenetic analyses were performed for the major fish allergen parvalbumin, incorporating experimentally validated B-cell epitope data from the Immune Epitope Database (IEDB) and evolutionary relationships.
Results: A comprehensive dataset of 79 unique fish isoallergen and variant entries from 34 fish species was identified, with 25 entries common across all four databases. AllerCatPro 2.0 achieved the highest sensitivity (97.5%). A phylogenetic tree was constructed, integrating epitope data to optimize protein family-specific thresholds for differentiating allergenic from less/non-allergenic parvalbumins. A threshold of ≥4 IEDB-mapped epitopes allowing up to two mismatches captured 52 out of 54 parvalbumin sequences (96%) in the dataset, effectively distinguishing between parvalbumin classes.
Discussion: This study enhances understanding of fish allergy by systematically compiling fish isoallergens and variants and integrating B-cell epitope data. The optimized thresholds improve the performance of allergenicity prediction tools and can be applied to other protein families in future studies.
{"title":"Fish isoallergens and variants: database compilation, <i>in silico</i> allergenicity prediction challenges, and epitope-based threshold optimization.","authors":"Vachiranee Limviphuvadh, Thimo Ruethers, Minh N Nguyen, Dean R Jerry, Benjamin P C Smith, Yulan Wang, Yansong Miao, Anand Kumar Andiappan, Andreas L Lopata, Sebastian Maurer-Stroh","doi":"10.3389/fbinf.2025.1669237","DOIUrl":"10.3389/fbinf.2025.1669237","url":null,"abstract":"<p><strong>Introduction: </strong>Fish is a major food allergy trigger with a complex variety of allergenic protein isoforms and vast species diversity exhibiting variable allergenicity. This is the first study to systematically compile fish isoallergen and variant entries associated with ingestion-related allergic reactions.</p><p><strong>Methods: </strong>Entries were compiled from four major allergen databases: World Health Organization and International Union of Immunological Societies (WHO/IUIS), AllergenOnline, Comprehensive Protein Allergen Resource (COMPARE), and Allergome, including evidence from <i>in vitro</i> IgE-binding assays and complete amino acid sequences. Challenges in predicting the allergenicity of fish isoallergens and variants were evaluated, and the sensitivity of five widely used <i>in silico</i> tools (AllerCatPro 2.0, AlgPred 2.0, pLM4Alg, AllergenFP v.1.0, and AllerTop v.2.0) was assessed. Epitope mapping and phylogenetic analyses were performed for the major fish allergen parvalbumin, incorporating experimentally validated B-cell epitope data from the Immune Epitope Database (IEDB) and evolutionary relationships.</p><p><strong>Results: </strong>A comprehensive dataset of 79 unique fish isoallergen and variant entries from 34 fish species was identified, with 25 entries common across all four databases. AllerCatPro 2.0 achieved the highest sensitivity (97.5%). A phylogenetic tree was constructed, integrating epitope data to optimize protein family-specific thresholds for differentiating allergenic from less/non-allergenic parvalbumins. A threshold of ≥4 IEDB-mapped epitopes allowing up to two mismatches captured 52 out of 54 parvalbumin sequences (96%) in the dataset, effectively distinguishing between parvalbumin classes.</p><p><strong>Discussion: </strong>This study enhances understanding of fish allergy by systematically compiling fish isoallergens and variants and integrating B-cell epitope data. The optimized thresholds improve the performance of allergenicity prediction tools and can be applied to other protein families in future studies.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1669237"},"PeriodicalIF":3.9,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}