Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168494
Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein–protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at https://e-prsa.biocomp.unibo.it/main/ where users can submit single-sequence and batch jobs.
{"title":"E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence","authors":"","doi":"10.1016/j.jmb.2024.168494","DOIUrl":"10.1016/j.jmb.2024.168494","url":null,"abstract":"<div><p>Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein–protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at <span><span>https://e-prsa.biocomp.unibo.it/main/</span><svg><path></path></svg></span> where users can submit single-sequence and batch jobs.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624000664/pdfft?md5=5479e98c4394e85085ec9ab992a70ec7&pid=1-s2.0-S0022283624000664-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139830690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168554
Molecular modeling and simulation serve an important role in exploring biological functions of proteins at the molecular level, which is complementary to experiments. CHARMM-GUI (https://www.charmm-gui.org) is a web-based graphical user interface that generates complex molecular simulation systems and input files, and we have been continuously developing and expanding its functionalities to facilitate various complex molecular modeling and make molecular dynamics simulations more accessible to the scientific community. Currently, covalent drug discovery emerges as a popular and important field. Covalent drug forms a chemical bond with specific residues on the target protein, and it has advantages in potency for its prolonged inhibition effects. Even though there are higher demands in modeling PDB protein structures with various covalent ligand types, proper modeling of covalent ligands remains challenging. This work presents a new functionality in CHARMM-GUI PDB Reader & Manipulator that can handle a diversity of ligand-amino acid linkage types, which is validated by a careful benchmark study using over 1,000 covalent ligand structures in RCSB PDB. We hope that this new functionality can boost the modeling and simulation study of covalent ligands.
{"title":"CHARMM-GUI PDB Reader and Manipulator: Covalent Ligand Modeling and Simulation","authors":"","doi":"10.1016/j.jmb.2024.168554","DOIUrl":"10.1016/j.jmb.2024.168554","url":null,"abstract":"<div><p>Molecular modeling and simulation serve an important role in exploring biological functions of proteins at the molecular level, which is complementary to experiments. CHARMM-GUI (<span><span>https://www.charmm-gui.org</span><svg><path></path></svg></span>) is a web-based graphical user interface that generates complex molecular simulation systems and input files, and we have been continuously developing and expanding its functionalities to facilitate various complex molecular modeling and make molecular dynamics simulations more accessible to the scientific community. Currently, covalent drug discovery emerges as a popular and important field. Covalent drug forms a chemical bond with specific residues on the target protein, and it has advantages in potency for its prolonged inhibition effects. Even though there are higher demands in modeling PDB protein structures with various covalent ligand types, proper modeling of covalent ligands remains challenging. This work presents a new functionality in CHARMM-GUI <em>PDB Reader & Manipulator</em> that can handle a diversity of ligand-amino acid linkage types, which is validated by a careful benchmark study using over 1,000 covalent ligand structures in RCSB PDB. We hope that this new functionality can boost the modeling and simulation study of covalent ligands.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001499/pdfft?md5=c22bc6a24892229f4d80acb3c293965e&pid=1-s2.0-S0022283624001499-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140406025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168687
Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.
{"title":"mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations","authors":"","doi":"10.1016/j.jmb.2024.168687","DOIUrl":"10.1016/j.jmb.2024.168687","url":null,"abstract":"<div><p>Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at <span><span>https://balalab-skku.org/mACPpred2/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002894/pdfft?md5=ecdf80bb684910ec5433145962a8f247&pid=1-s2.0-S0022283624002894-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141511139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168567
A pervasive question in biological research studying gene regulation, chromatin structure, or genomics is where, and to what extent, does a signal of interest arise genome-wide? This question is addressed using a variety of methods relying on high-throughput sequencing data as their final output, including ChIP-seq for protein-DNA interactions,1 GapR-seq for measuring supercoiling,2 and HBD-seq or DRIP-seq for R-loop positioning.3, 4 Current computational methods to calculate genome-wide enrichment of the signal of interest usually do not properly handle the count-based nature of sequencing data, they often do not make use of the local correlation structure of sequencing data, and they do not apply any regularization of enrichment estimates. This can result in unrealistic estimates of the true underlying biological enrichment of interest, unrealistically low estimates of confidence in point estimates of enrichment (or no estimates of confidence at all), unrealistic gyrations in enrichment estimates at very close (<10 bp) genomic loci due to noise inherent in sequencing data, and in a multiple-hypothesis testing problem during interpretation of genome-wide enrichment estimates. We developed a tool called Enricherator to infer genome-wide enrichments from sequencing count data. Enricherator uses the variational Bayes algorithm to fit a generalized linear model to sequencing count data and to sample from the approximate posterior distribution of enrichment estimates (https://github.com/jwschroeder3/enricherator). Enrichments inferred by Enricherator more precisely identify known binding sites in cases where low coverage between binding sites leads to false-positive peak calls in these noisy regions of the genome; these benefits extend to published datasets.
{"title":"Enricherator: A Bayesian Method for Inferring Regularized Genome-wide Enrichments from Sequencing Count Data","authors":"","doi":"10.1016/j.jmb.2024.168567","DOIUrl":"10.1016/j.jmb.2024.168567","url":null,"abstract":"<div><p>A pervasive question in biological research studying gene regulation, chromatin structure, or genomics is where, and to what extent, does a signal of interest arise genome-wide? This question is addressed using a variety of methods relying on high-throughput sequencing data as their final output, including ChIP-seq for protein-DNA interactions,<span><span><sup>1</sup></span></span> GapR-seq for measuring supercoiling,<span><span><sup>2</sup></span></span> and HBD-seq or DRIP-seq for R-loop positioning.<span><span>3</span></span>, <span><span>4</span></span> Current computational methods to calculate genome-wide enrichment of the signal of interest usually do not properly handle the count-based nature of sequencing data, they often do not make use of the local correlation structure of sequencing data, and they do not apply any regularization of enrichment estimates. This can result in unrealistic estimates of the true underlying biological enrichment of interest, unrealistically low estimates of confidence in point estimates of enrichment (or no estimates of confidence at all), unrealistic gyrations in enrichment estimates at very close (<10 bp) genomic loci due to noise inherent in sequencing data, and in a multiple-hypothesis testing problem during interpretation of genome-wide enrichment estimates. We developed a tool called Enricherator to infer genome-wide enrichments from sequencing count data. Enricherator uses the variational Bayes algorithm to fit a generalized linear model to sequencing count data and to sample from the approximate posterior distribution of enrichment estimates (<span><span>https://github.com/jwschroeder3/enricherator</span><svg><path></path></svg></span>). Enrichments inferred by Enricherator more precisely identify known binding sites in cases where low coverage between binding sites leads to false-positive peak calls in these noisy regions of the genome; these benefits extend to published datasets.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001621/pdfft?md5=12eadc9303ecf2b7325490d62b957d44&pid=1-s2.0-S0022283624001621-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168655
Nucleosome dynamics plays important roles in many biological processes, such as DNA replication and gene expression. NucMap (https://ngdc.cncb.ac.cn/nucmap) is the first database of genome-wide nucleosome positioning maps across species. Here, we present an updated version, NucMap 2.0, by incorporating more species and MNase-seq samples. In addition, we integrate other related omics data for each MNase-seq sample to provide a comprehensive view of nucleosome positioning, such as gene expression, transcription factor binding sites, histone modifications and DNA methylation. In particular, NucMap 2.0 integrates and pre-analyzes RNA-seq data and ChIP-seq data of human-related samples, which facilitates the interpretation of nucleosome positioning in humans. All processed data are integrated into an in-built genome browser, and users can make comprehensive side-by-side analyses. In addition, more online analytical functions are developed, which allows researchers to identify differential nucleosome regions and explore potential gene regulatory regions. All resources are open access with a user-friendly web interface.
{"title":"NucMap 2.0: An Updated Database of Genome-wide Nucleosome Positioning Maps Across Species","authors":"","doi":"10.1016/j.jmb.2024.168655","DOIUrl":"10.1016/j.jmb.2024.168655","url":null,"abstract":"<div><p>Nucleosome dynamics plays important roles in many biological processes, such as DNA replication and gene expression. NucMap (<span><span>https://ngdc.cncb.ac.cn/nucmap</span><svg><path></path></svg></span>) is the first database of genome-wide nucleosome positioning maps across species. Here, we present an updated version, NucMap 2.0, by incorporating more species and MNase-seq samples. In addition, we integrate other related omics data for each MNase-seq sample to provide a comprehensive view of nucleosome positioning, such as gene expression, transcription factor binding sites, histone modifications and DNA methylation. In particular, NucMap 2.0 integrates and pre-analyzes RNA-seq data and ChIP-seq data of human-related samples, which facilitates the interpretation of nucleosome positioning in humans. All processed data are integrated into an in-built genome browser, and users can make comprehensive side-by-side analyses. In addition, more online analytical functions are developed, which allows researchers to identify differential nucleosome regions and explore potential gene regulatory regions. All resources are open access with a user-friendly web interface.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S002228362400250X/pdfft?md5=05c0ca9f6c37361600fa1c82182f3970&pid=1-s2.0-S002228362400250X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141327075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168686
The PPInterface dataset contains 815,082 interface structures, providing the most comprehensive structural information on protein–protein interfaces. This resource is extracted from over 215,000 three-dimensional protein structures stored in the Protein Data Bank (PDB). The dataset contains a wide range of protein complexes, providing a wealth of information for researchers investigating the structural properties of protein–protein interactions. The accompanying web server has a user-friendly interface that allows for efficient search and download functions. Researchers can access detailed information on protein interface structures, visualize them, and explore a variety of features, increasing the dataset’s utility and accessibility.
The dataset and web server can be found at https://3dpath.ku.edu.tr/PPInt/.
{"title":"PPInterface: A Comprehensive Dataset of 3D Protein-Protein Interface Structures","authors":"","doi":"10.1016/j.jmb.2024.168686","DOIUrl":"10.1016/j.jmb.2024.168686","url":null,"abstract":"<div><p>The PPInterface dataset contains 815,082 interface structures, providing the most comprehensive structural information on protein–protein interfaces. This resource is extracted from over 215,000 three-dimensional protein structures stored in the Protein Data Bank (PDB). The dataset contains a wide range of protein complexes, providing a wealth of information for researchers investigating the structural properties of protein–protein interactions. The accompanying web server has a user-friendly interface that allows for efficient search and download functions. Researchers can access detailed information on protein interface structures, visualize them, and explore a variety of features, increasing the dataset’s utility and accessibility.</p><p>The dataset and web server can be found at <span><span>https://3dpath.ku.edu.tr/PPInt/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002882/pdfft?md5=8c06cb4d0f228da90e95d1e5dc422504&pid=1-s2.0-S0022283624002882-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141465122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168551
CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%.
Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice).
CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.
{"title":"CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds","authors":"","doi":"10.1016/j.jmb.2024.168551","DOIUrl":"10.1016/j.jmb.2024.168551","url":null,"abstract":"<div><p>CATH (<span><span>https://www.cathdb.info</span><svg><path></path></svg></span>) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%.</p><p>Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice).</p><p>CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001463/pdfft?md5=7f042c9d519839cc743c6f8330403192&pid=1-s2.0-S0022283624001463-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140317526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168617
In recent years, advancements in deep learning techniques have significantly expanded the structural coverage of the human proteome. GalaxySagittarius-AF translates these achievements in structure prediction into target prediction for druglike compounds by incorporating predicted structures. This web server searches the database of human protein structures using both similarity- and structure-based approaches, suggesting potential targets for a given druglike compound. In comparison to its predecessor, GalaxySagittarius, GalaxySagittarius-AF utilizes an enlarged structure database, incorporating curated AlphaFold model structures alongside their binding sites and ligands, predicted using an updated version of GalaxySite. GalaxySagittarius-AF covers a large human protein space compared to many other available computational target screening methods. The structure-based prediction method enhances the use of expanded structural information, differentiating it from other target prediction servers that rely on ligand-based methods. Additionally, the web server has undergone enhancements, operating two to three times faster than its predecessor. The updated report page provides comprehensive information on the sequence and structure of the predicted protein targets. GalaxySagittarius-AF is accessible at https://galaxy.seoklab.org/sagittarius_af without the need for registration.
{"title":"GalaxySagittarius-AF: Predicting Targets for Drug-Like Compounds in the Extended Human 3D Proteome","authors":"","doi":"10.1016/j.jmb.2024.168617","DOIUrl":"10.1016/j.jmb.2024.168617","url":null,"abstract":"<div><p>In recent years, advancements in deep learning techniques have significantly expanded the structural coverage of the human proteome. GalaxySagittarius-AF translates these achievements in structure prediction into target prediction for druglike compounds by incorporating predicted structures. This web server searches the database of human protein structures using both similarity- and structure-based approaches, suggesting potential targets for a given druglike compound. In comparison to its predecessor, GalaxySagittarius, GalaxySagittarius-AF utilizes an enlarged structure database, incorporating curated AlphaFold model structures alongside their binding sites and ligands, predicted using an updated version of GalaxySite. GalaxySagittarius-AF covers a large human protein space compared to many other available computational target screening methods. The structure-based prediction method enhances the use of expanded structural information, differentiating it from other target prediction servers that rely on ligand-based methods. Additionally, the web server has undergone enhancements, operating two to three times faster than its predecessor. The updated report page provides comprehensive information on the sequence and structure of the predicted protein targets. GalaxySagittarius-AF is accessible at <span><span>https://galaxy.seoklab.org/sagittarius_af</span><svg><path></path></svg></span> without the need for registration.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002122/pdfft?md5=0e0c23dccda32199932ab93f923b91bb&pid=1-s2.0-S0022283624002122-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141051417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168613
Fungal pathogens pose significant threats to plant health by secreting effectors that manipulate plant-host defences. However, identifying effector proteins remains challenging, in part because they lack common sequence motifs. Here, we introduce Fungtion (Fungal effector prediction), a toolkit leveraging a hybrid framework to accurately predict and visualize fungal effectors. By combining global patterns learned from pretrained protein language models with refined information from known effectors, Fungtion achieves state-of-the-art prediction performance. Additionally, the interactive visualizations we have developed enable researchers to explore both sequence- and high-level relationships between the predicted and known effectors, facilitating effector function discovery, annotation, and hypothesis formulation regarding plant-pathogen interactions. We anticipate Fungtion to be a valuable resource for biologists seeking deeper insights into fungal effector functions and for computational biologists aiming to develop future methodologies for fungal effector prediction: https://step3.erc.monash.edu/Fungtion/.
{"title":"Fungtion: A Server for Predicting and Visualizing Fungal Effector Proteins","authors":"","doi":"10.1016/j.jmb.2024.168613","DOIUrl":"10.1016/j.jmb.2024.168613","url":null,"abstract":"<div><p>Fungal pathogens pose significant threats to plant health by secreting effectors that manipulate plant-host defences. However, identifying effector proteins remains challenging, in part because they lack common sequence motifs. Here, we introduce Fungtion (<u>Fung</u>al effector predic<u>tion</u>), a toolkit leveraging a hybrid framework to accurately predict and visualize fungal effectors. By combining global patterns learned from pretrained protein language models with refined information from known effectors, Fungtion achieves state-of-the-art prediction performance. Additionally, the interactive visualizations we have developed enable researchers to explore both sequence- and high-level relationships between the predicted and known effectors, facilitating effector function discovery, annotation, and hypothesis formulation regarding plant-pathogen interactions. We anticipate Fungtion to be a valuable resource for biologists seeking deeper insights into fungal effector functions and for computational biologists aiming to develop future methodologies for fungal effector prediction: <span><span>https://step3.erc.monash.edu/Fungtion/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002080/pdfft?md5=36d94fbec14088b549acb51c24012b05&pid=1-s2.0-S0022283624002080-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168742
Yi Zhang, Yiduo Xiong, Chenxi Yang, Yi Xiao
There is an increasing need for determining 3D structures of DNAs, e.g., for increasing the efficiency of DNA aptamer selection. Recently, we have proposed a computational method of 3D structure prediction of DNAs, called 3dDNA, which has been integrated into our original web server 3dRNA, now renamed 3dRNA/DNA (http://biophy.hust.edu.cn/new/3dRNA). Currently, 3dDNA can only output the predicted DNA 3D structures for users but cannot rank them as an energy function for assessing DNA 3D structures is still lacking. Here, we first provide a brief introduction to 3dDNA and then introduce a new energy function, 3dDNAscore, for the assessment of DNA 3D structures. 3dDNAscore is an all-atom knowledge-based potential by integrating 86 atomic types from nucleic acids. Benchmarks demonstrate that 3dDNAscore can effectively identify near-native structures from the decoys generated by 3dDNA, thus enhancing the completeness of 3dDNA.
现在越来越需要确定 DNA 的三维结构,例如提高 DNA 合体选择的效率。最近,我们提出了一种名为 3dDNA 的 DNA 三维结构预测计算方法,并将其集成到了我们最初的网络服务器 3dRNA,现在更名为 3dRNA/DNA(http://biophy.hust.edu.cn/new/3dRNA)。目前,3dDNA 只能为用户输出预测的 DNA 3D 结构,但不能对其进行排序,因为还缺乏评估 DNA 3D 结构的能量函数。在此,我们首先简要介绍 3dDNA,然后介绍一种用于评估 DNA 3D 结构的新能量函数 3dDNAscore。3dDNAscore 是一种基于全原子知识的势能,它整合了核酸中的 86 种原子类型。基准测试表明,3dDNAscore 能从 3dDNA 生成的诱饵中有效识别近原生结构,从而提高 3dDNA 的完整性。
{"title":"3dRNA/DNA: 3D Structure Prediction from RNA to DNA","authors":"Yi Zhang, Yiduo Xiong, Chenxi Yang, Yi Xiao","doi":"10.1016/j.jmb.2024.168742","DOIUrl":"10.1016/j.jmb.2024.168742","url":null,"abstract":"<div><p>There is an increasing need for determining 3D structures of DNAs, e.g., for increasing the efficiency of DNA aptamer selection. Recently, we have proposed a computational method of 3D structure prediction of DNAs, called 3dDNA, which has been integrated into our original web server 3dRNA, now renamed 3dRNA/DNA (<span><span>http://biophy.hust.edu.cn/new/3dRNA</span><svg><path></path></svg></span>). Currently, 3dDNA can only output the predicted DNA 3D structures for users but cannot rank them as an energy function for assessing DNA 3D structures is still lacking. Here, we first provide a brief introduction to 3dDNA and then introduce a new energy function, 3dDNAscore, for the assessment of DNA 3D structures. 3dDNAscore is an all-atom knowledge-based potential by integrating 86 atomic types from nucleic acids. Benchmarks demonstrate that 3dDNAscore can effectively identify near-native structures from the decoys generated by 3dDNA, thus enhancing the completeness of 3dDNA.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}