José A Sánchez-Villanueva, Lia N'Guyen, Mathilde Poplineau, Estelle Duprez, Élisabeth Remy, Denis Thieffry
Acute Promyelocytic Leukaemia (APL) arises from an aberrant chromosomal translocation involving the Retinoic Acid Receptor Alpha (RARA) gene, predominantly with the Promyelocytic Leukaemia (PML) or Promyelocytic Leukaemia Zinc Finger (PLZF) genes. The resulting oncoproteins block the haematopoietic differentiation program promoting aberrant proliferative promyelocytes. Retinoic Acid (RA) therapy is successful in most of the PML::RARA patients, while PLZF::RARA patients frequently become resistant and relapse. Recent studies pointed to various underlying molecular components, but their precise contributions remain to be deciphered. We developed a logical network model integrating signalling, transcriptional, and epigenetic regulatory mechanisms, which captures key features of the APL cell responses to RA depending on the genetic background. The explicit inclusion of the histone methyltransferase EZH2 allowed the assessment of its role in the resistance mechanism, distinguishing between its canonical and non-canonical activities. The model dynamics was thoroughly analysed using tools integrated in the public software suite maintained by the CoLoMoTo consortium (https://colomoto.github.io/). The model serves as a solid basis to assess the roles of novel regulatory mechanisms, as well as to explore novel therapeutical approaches in silico.
{"title":"Predictive modelling of acute Promyelocytic leukaemia resistance to retinoic acid therapy.","authors":"José A Sánchez-Villanueva, Lia N'Guyen, Mathilde Poplineau, Estelle Duprez, Élisabeth Remy, Denis Thieffry","doi":"10.1093/bib/bbaf002","DOIUrl":"10.1093/bib/bbaf002","url":null,"abstract":"<p><p>Acute Promyelocytic Leukaemia (APL) arises from an aberrant chromosomal translocation involving the Retinoic Acid Receptor Alpha (RARA) gene, predominantly with the Promyelocytic Leukaemia (PML) or Promyelocytic Leukaemia Zinc Finger (PLZF) genes. The resulting oncoproteins block the haematopoietic differentiation program promoting aberrant proliferative promyelocytes. Retinoic Acid (RA) therapy is successful in most of the PML::RARA patients, while PLZF::RARA patients frequently become resistant and relapse. Recent studies pointed to various underlying molecular components, but their precise contributions remain to be deciphered. We developed a logical network model integrating signalling, transcriptional, and epigenetic regulatory mechanisms, which captures key features of the APL cell responses to RA depending on the genetic background. The explicit inclusion of the histone methyltransferase EZH2 allowed the assessment of its role in the resistance mechanism, distinguishing between its canonical and non-canonical activities. The model dynamics was thoroughly analysed using tools integrated in the public software suite maintained by the CoLoMoTo consortium (https://colomoto.github.io/). The model serves as a solid basis to assess the roles of novel regulatory mechanisms, as well as to explore novel therapeutical approaches in silico.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729720/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Han, Shanshan Fu, Miaomiao Chen, Yujie Gou, Dan Liu, Chi Zhang, Xinhe Huang, Leming Xiao, Miaoying Zhao, Jiayi Zhang, Qiang Xiao, Di Peng, Yu Xue
Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.
{"title":"GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites.","authors":"Cheng Han, Shanshan Fu, Miaomiao Chen, Yujie Gou, Dan Liu, Chi Zhang, Xinhe Huang, Leming Xiao, Miaoying Zhao, Jiayi Zhang, Qiang Xiao, Di Peng, Yu Xue","doi":"10.1093/bib/bbae694","DOIUrl":"10.1093/bib/bbae694","url":null,"abstract":"<p><p>Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695897/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bacterial resistance has emerged as one of the greatest threats to human health, and phages have shown tremendous potential in addressing the issue of drug-resistant bacteria by lysing host. The identification of phage-host interactions (PHI) is crucial for addressing bacterial infections. Some existing computational methods for predicting PHI are suboptimal in terms of prediction efficiency due to the limited types of available information. Despite the emergence of some supporting information, the generalizability of models using this information is limited by the small scale of the databases. Additionally, most existing models overlook the sparsity of association data, which severely impacts their predictive performance as well. In this study, we propose a dual-view sparse network model (DSPHI) to predict PHI, which leverages logical probability theory and network sparsification. Specifically, we first constructed similarity networks using the sequences of phages and hosts respectively, and then sparsified these networks, enabling the model to focus more on key information during the learning process, thereby improving prediction efficiency. Next, we utilize logical probability theory to compute high-order logical information between phages (hosts), which is known as mutual information. Subsequently, we connect this information in node form to the sparse phage (host) similarity network, resulting in a phage (host) heterogeneous network that better integrates the two information views, thereby reducing the complexity of model computation and enhancing information aggregation capabilities. The hidden features of phages and hosts are explored through graph learning algorithms. Experimental results demonstrate that mutual information is effective information in predicting PHI, and the sparsification procedure of similarity networks significantly improves the model's predictive performance.
{"title":"A novel framework for phage-host prediction via logical probability theory and network sparsification.","authors":"Ankang Wei, Huanghan Zhan, Zhen Xiao, Weizhong Zhao, Xingpeng Jiang","doi":"10.1093/bib/bbae708","DOIUrl":"10.1093/bib/bbae708","url":null,"abstract":"<p><p>Bacterial resistance has emerged as one of the greatest threats to human health, and phages have shown tremendous potential in addressing the issue of drug-resistant bacteria by lysing host. The identification of phage-host interactions (PHI) is crucial for addressing bacterial infections. Some existing computational methods for predicting PHI are suboptimal in terms of prediction efficiency due to the limited types of available information. Despite the emergence of some supporting information, the generalizability of models using this information is limited by the small scale of the databases. Additionally, most existing models overlook the sparsity of association data, which severely impacts their predictive performance as well. In this study, we propose a dual-view sparse network model (DSPHI) to predict PHI, which leverages logical probability theory and network sparsification. Specifically, we first constructed similarity networks using the sequences of phages and hosts respectively, and then sparsified these networks, enabling the model to focus more on key information during the learning process, thereby improving prediction efficiency. Next, we utilize logical probability theory to compute high-order logical information between phages (hosts), which is known as mutual information. Subsequently, we connect this information in node form to the sparse phage (host) similarity network, resulting in a phage (host) heterogeneous network that better integrates the two information views, thereby reducing the complexity of model computation and enhancing information aggregation capabilities. The hidden features of phages and hosts are explored through graph learning algorithms. Experimental results demonstrate that mutual information is effective information in predicting PHI, and the sparsification procedure of similarity networks significantly improves the model's predictive performance.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11711101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142944458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominik Lux, Katrin Marcus-Alic, Martin Eisenacher, Julian Uszkoreit
Due to computational resource limitations, in mass spectrometry based proteomics only a limited set of peptide sequences is used for the matching against measured spectra. We present an approach to represent proteins by graphs and allow not only the canonical sequences but also known isoforms and annotated amino acid variations, e.g. originating from genomic mutations, and further common protein sequence features contained in Uniprot KB or other protein databases. Our C++ and Python implementation enables a groundbreaking comprehensive characterization of the peptide search space, encompassing for the first time all available annotations in a protein database (in combination more than $10^{200}$ possibilities). Additionally, it can be used to quickly extract the relevant subset of the search space for peptide to spectrum matching, e.g. filtering by the peptide mass. We demonstrate the advantages and innovative findings of our implementation compared to previous workflows by re-analysing publicly available datasets.
{"title":"ProtGraph: a tool for the quick and comprehensive exploration and exploitation of the peptide search space derived from protein sequence databases using graphs.","authors":"Dominik Lux, Katrin Marcus-Alic, Martin Eisenacher, Julian Uszkoreit","doi":"10.1093/bib/bbae671","DOIUrl":"https://doi.org/10.1093/bib/bbae671","url":null,"abstract":"<p><p>Due to computational resource limitations, in mass spectrometry based proteomics only a limited set of peptide sequences is used for the matching against measured spectra. We present an approach to represent proteins by graphs and allow not only the canonical sequences but also known isoforms and annotated amino acid variations, e.g. originating from genomic mutations, and further common protein sequence features contained in Uniprot KB or other protein databases. Our C++ and Python implementation enables a groundbreaking comprehensive characterization of the peptide search space, encompassing for the first time all available annotations in a protein database (in combination more than $10^{200}$ possibilities). Additionally, it can be used to quickly extract the relevant subset of the search space for peptide to spectrum matching, e.g. filtering by the peptide mass. We demonstrate the advantages and innovative findings of our implementation compared to previous workflows by re-analysing publicly available datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: BANMF-S: a blockwise accelerated non-negative matrix factorization framework with structural network constraints for single cell imputation.","authors":"","doi":"10.1093/bib/bbaf034","DOIUrl":"https://doi.org/10.1093/bib/bbaf034","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nandhini Rajagopal, Udit Choudhary, Kenny Tsang, Kyle P Martin, Murat Karadag, Hsin-Ting Chen, Na-Young Kwon, Joseph Mozdzierz, Alexander M Horspool, Li Li, Peter M Tessier, Michael S Marlow, Andrew E Nixon, Sandeep Kumar
Antibody generation requires the use of one or more time-consuming methods, namely animal immunization, and in vitro display technologies. However, the recent availability of large amounts of antibody sequence and structural data in the public domain along with the advent of generative deep learning algorithms raises the possibility of computationally generating novel antibody sequences with desirable developability attributes. Here, we describe a deep learning model for computationally generating libraries of highly human antibody variable regions whose intrinsic physicochemical properties resemble those of the variable regions of the marketed antibody-based biotherapeutics (medicine-likeness). We generated 100000 variable region sequences of antigen-agnostic human antibodies belonging to the IGHV3-IGKV1 germline pair using a training dataset of 31416 human antibodies that satisfied our computational developability criteria. The in-silico generated antibodies recapitulate intrinsic sequence, structural, and physicochemical properties of the training antibodies, and compare favorably with the experimentally measured biophysical attributes of 100 variable regions of marketed and clinical stage antibody-based biotherapeutics. A sample of 51 highly diverse in-silico generated antibodies with >90th percentile medicine-likeness and > 90% humanness was evaluated by two independent experimental laboratories. Our data show the in-silico generated sequences exhibit high expression, monomer content, and thermal stability along with low hydrophobicity, self-association, and non-specific binding when produced as full-length monoclonal antibodies. The ability to computationally generate developable human antibody libraries is a first step towards enabling in-silico discovery of antibody-based biotherapeutics. These findings are expected to accelerate in-silico discovery of antibody-based biotherapeutics and expand the druggable antigen space to include targets refractory to conventional antibody discovery methods requiring in vitro antigen production.
{"title":"Deep learning-based design and experimental validation of a medicine-like human antibody library.","authors":"Nandhini Rajagopal, Udit Choudhary, Kenny Tsang, Kyle P Martin, Murat Karadag, Hsin-Ting Chen, Na-Young Kwon, Joseph Mozdzierz, Alexander M Horspool, Li Li, Peter M Tessier, Michael S Marlow, Andrew E Nixon, Sandeep Kumar","doi":"10.1093/bib/bbaf023","DOIUrl":"10.1093/bib/bbaf023","url":null,"abstract":"<p><p>Antibody generation requires the use of one or more time-consuming methods, namely animal immunization, and in vitro display technologies. However, the recent availability of large amounts of antibody sequence and structural data in the public domain along with the advent of generative deep learning algorithms raises the possibility of computationally generating novel antibody sequences with desirable developability attributes. Here, we describe a deep learning model for computationally generating libraries of highly human antibody variable regions whose intrinsic physicochemical properties resemble those of the variable regions of the marketed antibody-based biotherapeutics (medicine-likeness). We generated 100000 variable region sequences of antigen-agnostic human antibodies belonging to the IGHV3-IGKV1 germline pair using a training dataset of 31416 human antibodies that satisfied our computational developability criteria. The in-silico generated antibodies recapitulate intrinsic sequence, structural, and physicochemical properties of the training antibodies, and compare favorably with the experimentally measured biophysical attributes of 100 variable regions of marketed and clinical stage antibody-based biotherapeutics. A sample of 51 highly diverse in-silico generated antibodies with >90th percentile medicine-likeness and > 90% humanness was evaluated by two independent experimental laboratories. Our data show the in-silico generated sequences exhibit high expression, monomer content, and thermal stability along with low hydrophobicity, self-association, and non-specific binding when produced as full-length monoclonal antibodies. The ability to computationally generate developable human antibody libraries is a first step towards enabling in-silico discovery of antibody-based biotherapeutics. These findings are expected to accelerate in-silico discovery of antibody-based biotherapeutics and expand the druggable antigen space to include targets refractory to conventional antibody discovery methods requiring in vitro antigen production.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kibeom Kim, Juseong Kim, Minwook Kim, Hyewon Lee, Giltae Song
Identifying therapeutic genes is crucial for developing treatments targeting genetic causes of diseases, but experimental trials are costly and time-consuming. Although many deep learning approaches aim to identify biomarker genes, predicting therapeutic target genes remains challenging due to the limited number of known targets. To address this, we propose HIT (Hypergraph Interaction Transformer), a deep hypergraph representation learning model that identifies a gene's therapeutic potential, biomarker status, or lack of association with diseases. HIT uses hypergraph structures of genes, ontologies, diseases, and phenotypes, employing attention-based learning to capture complex relationships. Experiments demonstrate HIT's state-of-the-art performance, explainability, and ability to identify novel therapeutic targets.
{"title":"Therapeutic gene target prediction using novel deep hypergraph representation learning.","authors":"Kibeom Kim, Juseong Kim, Minwook Kim, Hyewon Lee, Giltae Song","doi":"10.1093/bib/bbaf019","DOIUrl":"10.1093/bib/bbaf019","url":null,"abstract":"<p><p>Identifying therapeutic genes is crucial for developing treatments targeting genetic causes of diseases, but experimental trials are costly and time-consuming. Although many deep learning approaches aim to identify biomarker genes, predicting therapeutic target genes remains challenging due to the limited number of known targets. To address this, we propose HIT (Hypergraph Interaction Transformer), a deep hypergraph representation learning model that identifies a gene's therapeutic potential, biomarker status, or lack of association with diseases. HIT uses hypergraph structures of genes, ontologies, diseases, and phenotypes, employing attention-based learning to capture complex relationships. Experiments demonstrate HIT's state-of-the-art performance, explainability, and ability to identify novel therapeutic targets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antibiotic resistance poses a significant threat to global health, making the development of alternative strategies to combat bacterial pathogens increasingly urgent. One such promising approach is the strategic use of bacteriophages (or phages) to specifically target and eradicate antibiotic-resistant bacteria. Phages, being among the most prevalent life forms on Earth, play a critical role in maintaining ecological balance by regulating bacterial communities and driving genetic diversity. Accurate prediction of phage hosts is essential for successfully applying phage therapy. However, existing prediction models may not fully encapsulate the complex dynamics of phage-host interactions in diverse microbial environments, indicating a need for improved accuracy through more sophisticated modeling techniques. In response to this challenge, this study introduces a novel phage-host prediction model, PHPGAT, which leverages a multimodal heterogeneous knowledge graph with the advanced GATv2 (Graph Attention Network v2) framework. The model first constructs a multimodal heterogeneous knowledge graph by integrating phage-phage, host-host, and phage-host interactions to capture the intricate connections between biological entities. GATv2 is then employed to extract deep node features and learn dynamic interdependencies, generating context-aware embeddings. Finally, an inner product decoder is designed to compute the likelihood of interaction between a phage and host pair based on the embedding vectors produced by GATv2. Evaluation results using two datasets demonstrate that PHPGAT achieves precise phage host predictions and outperforms other models. PHPGAT is available at https://github.com/ZhaoZMer/PHPGAT.
{"title":"PHPGAT: predicting phage hosts based on multimodal heterogeneous knowledge graph with graph attention network.","authors":"Fu Liu, Zhimiao Zhao, Yun Liu","doi":"10.1093/bib/bbaf017","DOIUrl":"10.1093/bib/bbaf017","url":null,"abstract":"<p><p>Antibiotic resistance poses a significant threat to global health, making the development of alternative strategies to combat bacterial pathogens increasingly urgent. One such promising approach is the strategic use of bacteriophages (or phages) to specifically target and eradicate antibiotic-resistant bacteria. Phages, being among the most prevalent life forms on Earth, play a critical role in maintaining ecological balance by regulating bacterial communities and driving genetic diversity. Accurate prediction of phage hosts is essential for successfully applying phage therapy. However, existing prediction models may not fully encapsulate the complex dynamics of phage-host interactions in diverse microbial environments, indicating a need for improved accuracy through more sophisticated modeling techniques. In response to this challenge, this study introduces a novel phage-host prediction model, PHPGAT, which leverages a multimodal heterogeneous knowledge graph with the advanced GATv2 (Graph Attention Network v2) framework. The model first constructs a multimodal heterogeneous knowledge graph by integrating phage-phage, host-host, and phage-host interactions to capture the intricate connections between biological entities. GATv2 is then employed to extract deep node features and learn dynamic interdependencies, generating context-aware embeddings. Finally, an inner product decoder is designed to compute the likelihood of interaction between a phage and host pair based on the embedding vectors produced by GATv2. Evaluation results using two datasets demonstrate that PHPGAT achieves precise phage host predictions and outperforms other models. PHPGAT is available at https://github.com/ZhaoZMer/PHPGAT.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11745545/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antibodies play a key role in medical diagnostics and therapeutics. Accurately predicting antibody-antigen binding is essential for developing effective treatments. Traditional protein-protein interaction prediction methods often fall short because they do not account for the unique structural and dynamic properties of antibodies and antigens. In this study, we present AntiBinder, a novel predictive model specifically designed to address these challenges. AntiBinder integrates the unique structural and sequence characteristics of antibodies and antigens into its framework and employs a bidirectional cross-attention mechanism to automatically learn the intrinsic mechanisms of antigen-antibody binding, eliminating the need for manual feature engineering. Our comprehensive experiments, which include predicting interactions between known antigens and new antibodies, predicting the binding of previously unseen antigens, and predicting cross-species antigen-antibody interactions, demonstrate that AntiBinder outperforms existing state-of-the-art methods. Notably, AntiBinder excels in predicting interactions with unseen antigens and maintains a reasonable level of predictive capability in challenging cross-species prediction tasks. AntiBinder's ability to model complex antigen-antibody interactions highlights its potential applications in biomedical research and therapeutic development, including the design of vaccines and antibody therapies for rapidly emerging infectious diseases.
{"title":"AntiBinder: utilizing bidirectional attention and hybrid encoding for precise antibody-antigen interaction prediction.","authors":"Kaiwen Zhang, Yuhao Tao, Fei Wang","doi":"10.1093/bib/bbaf008","DOIUrl":"10.1093/bib/bbaf008","url":null,"abstract":"<p><p>Antibodies play a key role in medical diagnostics and therapeutics. Accurately predicting antibody-antigen binding is essential for developing effective treatments. Traditional protein-protein interaction prediction methods often fall short because they do not account for the unique structural and dynamic properties of antibodies and antigens. In this study, we present AntiBinder, a novel predictive model specifically designed to address these challenges. AntiBinder integrates the unique structural and sequence characteristics of antibodies and antigens into its framework and employs a bidirectional cross-attention mechanism to automatically learn the intrinsic mechanisms of antigen-antibody binding, eliminating the need for manual feature engineering. Our comprehensive experiments, which include predicting interactions between known antigens and new antibodies, predicting the binding of previously unseen antigens, and predicting cross-species antigen-antibody interactions, demonstrate that AntiBinder outperforms existing state-of-the-art methods. Notably, AntiBinder excels in predicting interactions with unseen antigens and maintains a reasonable level of predictive capability in challenging cross-species prediction tasks. AntiBinder's ability to model complex antigen-antibody interactions highlights its potential applications in biomedical research and therapeutic development, including the design of vaccines and antibody therapies for rapidly emerging infectious diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11744619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imaging-based spatial transcriptomics (iST), such as MERFISH, CosMx SMI, and Xenium, quantify gene expression level across cells in space, but more importantly, they directly reveal the subcellular distribution of RNA transcripts at the single-molecule resolution. The subcellular localization of RNA molecules plays a crucial role in the compartmentalization-dependent regulation of genes within individual cells. Understanding the intracellular spatial distribution of RNA for a particular cell type thus not only improves the characterization of cell identity but also is of paramount importance in elucidating unique subcellular regulatory mechanisms specific to the cell type. However, current cell type annotation approaches of iST primarily utilize gene expression information while neglecting the spatial distribution of RNAs within cells. In this work, we introduce a semi-supervised graph contrastive learning method called Focus, the first method, to the best of our knowledge, that explicitly models RNA's subcellular distribution and community to improve cell type annotation. Focus demonstrates significant improvements over state-of-the-art algorithms across a range of spatial transcriptomics platforms, achieving improvements up to 27.8% in terms of accuracy and 51.9% in terms of F1-score for cell type annotation. Furthermore, Focus enjoys the advantages of intricate cell type-specific subcellular spatial gene patterns and providing interpretable subcellular gene analysis, such as defining the gene importance score. Importantly, with the importance score, Focus identifies genes harboring strong relevance to cell type-specific pathways, indicating its potential in uncovering novel regulatory programs across numerous biological systems.
{"title":"Graph contrastive learning of subcellular-resolution spatial transcriptomics improves cell type annotation and reveals critical molecular pathways.","authors":"Qiaolin Lu, Jiayuan Ding, Lingxiao Li, Yi Chang","doi":"10.1093/bib/bbaf020","DOIUrl":"10.1093/bib/bbaf020","url":null,"abstract":"<p><p>Imaging-based spatial transcriptomics (iST), such as MERFISH, CosMx SMI, and Xenium, quantify gene expression level across cells in space, but more importantly, they directly reveal the subcellular distribution of RNA transcripts at the single-molecule resolution. The subcellular localization of RNA molecules plays a crucial role in the compartmentalization-dependent regulation of genes within individual cells. Understanding the intracellular spatial distribution of RNA for a particular cell type thus not only improves the characterization of cell identity but also is of paramount importance in elucidating unique subcellular regulatory mechanisms specific to the cell type. However, current cell type annotation approaches of iST primarily utilize gene expression information while neglecting the spatial distribution of RNAs within cells. In this work, we introduce a semi-supervised graph contrastive learning method called Focus, the first method, to the best of our knowledge, that explicitly models RNA's subcellular distribution and community to improve cell type annotation. Focus demonstrates significant improvements over state-of-the-art algorithms across a range of spatial transcriptomics platforms, achieving improvements up to 27.8% in terms of accuracy and 51.9% in terms of F1-score for cell type annotation. Furthermore, Focus enjoys the advantages of intricate cell type-specific subcellular spatial gene patterns and providing interpretable subcellular gene analysis, such as defining the gene importance score. Importantly, with the importance score, Focus identifies genes harboring strong relevance to cell type-specific pathways, indicating its potential in uncovering novel regulatory programs across numerous biological systems.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143063829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}