Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168571
Extracellular vesicles and particles (EVPs) play a crucial role in mediating cell-to-cell communication by transporting various molecular cargos, with small non-coding RNAs (ncRNAs) holding particular significance. A thorough investigation into the abundance and sorting mechanisms of ncRNA within EVPs is imperative for advancing their clinical applications. We have developed EVPsort, which not only provides an extensive overview of ncRNA profiling in 3,162 samples across various biofluids, cell lines, and disease contexts but also seamlessly integrates 19 external databases and tools. This integration encompasses information on associations between ncRNAs and RNA-binding proteins (RBPs), motifs, targets, pathways, diseases, and drugs. With its rich resources and powerful analysis tools, EVPsort extends its profiling capabilities to investigate ncRNA sorting, identify relevant RBPs and motifs, and assess functional implications. EVPsort stands as a pioneering database dedicated to comprehensively addressing both the abundance and sorting of ncRNA within EVPs. It is freely accessible at https://bioinfo.vanderbilt.edu/evpsort/.
{"title":"EVPsort: An Atlas of Small ncRNA Profiling and Sorting in Extracellular Vesicles and Particles","authors":"","doi":"10.1016/j.jmb.2024.168571","DOIUrl":"10.1016/j.jmb.2024.168571","url":null,"abstract":"<div><p>Extracellular vesicles and particles (EVPs) play a crucial role in mediating cell-to-cell communication by transporting various molecular cargos, with small non-coding RNAs (ncRNAs) holding particular significance. A thorough investigation into the abundance and sorting mechanisms of ncRNA within EVPs is imperative for advancing their clinical applications. We have developed EVPsort, which not only provides an extensive overview of ncRNA profiling in 3,162 samples across various biofluids, cell lines, and disease contexts but also seamlessly integrates 19 external databases and tools. This integration encompasses information on associations between ncRNAs and RNA-binding proteins (RBPs), motifs, targets, pathways, diseases, and drugs. With its rich resources and powerful analysis tools, EVPsort extends its profiling capabilities to investigate ncRNA sorting, identify relevant RBPs and motifs, and assess functional implications. EVPsort stands as a pioneering database dedicated to comprehensively addressing both the abundance and sorting of ncRNA within EVPs. It is freely accessible at <span><span>https://bioinfo.vanderbilt.edu/evpsort/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168571"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001669/pdfft?md5=d1e4fae061b08442f9953654a1bc6eaa&pid=1-s2.0-S0022283624001669-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168704
Jean-Luc Pons , Victor Reys , François Grand , Violaine Moreau , Jerôme Gracy , Thomas E. Exner , Gilles Labesse
Knowledge of protein–ligand complexes is essential for efficient drug design. Virtual docking can bring important information on putative complexes but it is still far from being simultaneously fast and accurate. Receptors are flexible and adapt to the incoming small molecules while docking is highly sensitive to small conformational deviations. Conformation ensemble is providing a mean to simulate protein flexibility. However, modeling multiple protein structures for many targets is seldom connected to ligand screening in an efficient and straightforward manner.
@TOME-3 is an updated version of our former pipeline @TOME-2, in which protein structure modeling is now directly interfaced with flexible ligand docking. Sequence-sequence profile comparisons identify suitable PDB templates for structure modeling and ligands from these templates are used to deduce binding sites to be screened. In addition, bound ligand can be used as pharmacophoric restraint during the virtual docking. The latter is performed by PLANTS while the docking poses are analysed through multiple chemoinformatics functions. This unique combination of tools allows rapid and efficient ligand docking on multiple receptor conformations in parallel. @TOME-3 is freely available on the web at https://atome.cbs.cnrs.fr.
{"title":"@TOME 3.0: Interfacing Protein Structure Modeling and Ligand Docking","authors":"Jean-Luc Pons , Victor Reys , François Grand , Violaine Moreau , Jerôme Gracy , Thomas E. Exner , Gilles Labesse","doi":"10.1016/j.jmb.2024.168704","DOIUrl":"10.1016/j.jmb.2024.168704","url":null,"abstract":"<div><p>Knowledge of protein–ligand complexes is essential for efficient drug design. Virtual docking can bring important information on putative complexes but it is still far from being simultaneously fast and accurate. Receptors are flexible and adapt to the incoming small molecules while docking is highly sensitive to small conformational deviations. Conformation ensemble is providing a mean to simulate protein flexibility. However, modeling multiple protein structures for many targets is seldom connected to ligand screening in an efficient and straightforward manner.</p><p>@TOME-3 is an updated version of our former pipeline @TOME-2, in which protein structure modeling is now directly interfaced with flexible ligand docking. Sequence-sequence profile comparisons identify suitable PDB templates for structure modeling and ligands from these templates are used to deduce binding sites to be screened. In addition, bound ligand can be used as pharmacophoric restraint during the virtual docking. The latter is performed by PLANTS while the docking poses are analysed through multiple chemoinformatics functions. This unique combination of tools allows rapid and efficient ligand docking on multiple receptor conformations in parallel. @TOME-3 is freely available on the web at <span><span>https://atome.cbs.cnrs.fr</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168704"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624003139/pdfft?md5=88c6a60894400d42c3d2f8977cdcdff1&pid=1-s2.0-S0022283624003139-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141713094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168545
A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein–ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (www.apoholo.cz/db), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data.
{"title":"AHoJ-DB: A PDB-wide Assignment of apo & holo Relationships Based on Individual Protein–Ligand Interactions","authors":"","doi":"10.1016/j.jmb.2024.168545","DOIUrl":"10.1016/j.jmb.2024.168545","url":null,"abstract":"<div><p>A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein–ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (<span><span>www.apoholo.cz/db</span><svg><path></path></svg></span>), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data.</p><p>Availability: <span><span>www.apoholo.cz/db</span><svg><path></path></svg></span></p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168545"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001402/pdfft?md5=f8fea6cdc88d1aafc3fe11d1d30c7887&pid=1-s2.0-S0022283624001402-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140178790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168520
The red flour beetle Tribolium castaneum has emerged as a powerful model in insect functional genomics. However, a major limitation in the field is the lack of a detailed spatio-temporal view of the genetic signatures underpinning the function of distinct tissues and life stages. Here, we present an ontogenetic and tissue-specific web-based resource for Tribolium transcriptomics: BeetleAtlas (https://www.beetleatlas.org). This web application provides access to a database populated with quantitative expression data for nine adult and seven larval tissues, as well as for four embryonic stages of Tribolium. BeetleAtlas allows one to search for individual Tribolium genes to obtain values of both total gene expression and enrichment in different tissues, together with data for individual isoforms. To facilitate cross-species studies, one can also use Drosophila melanogaster gene identifiers to search for related Tribolium genes. For retrieved genes there are options to identify and display the tissue expression of related Tribolium genes or homologous Drosophila genes. Five additional search modes are available to find genes conforming to any of the following criteria: exhibiting high expression in a particular tissue; showing significant differences in expression between larva and adult; having a peak of expression at a specific stage of embryonic development; belonging to a particular functional category; and displaying a pattern of tissue expression similar to that of a query gene. We illustrate how the different feaures of BeetleAtlas can be used to illuminate our understanding of the genetic mechanisms underpinning the biology of what is the largest animal group on earth.
{"title":"BeetleAtlas: An Ontogenetic and Tissue-specific Transcriptomic Atlas of the Red Flour Beetle Tribolium castaneum","authors":"","doi":"10.1016/j.jmb.2024.168520","DOIUrl":"10.1016/j.jmb.2024.168520","url":null,"abstract":"<div><p>The red flour beetle <em>Tribolium castaneum</em> has emerged as a powerful model in insect functional genomics. However, a major limitation in the field is the lack of a detailed spatio-temporal view of the genetic signatures underpinning the function of distinct tissues and life stages. Here, we present an ontogenetic and tissue-specific web-based resource for <em>Tribolium</em> transcriptomics: BeetleAtlas (<span><span>https://www.beetleatlas.org</span><svg><path></path></svg></span>). This web application provides access to a database populated with quantitative expression data for nine adult and seven larval tissues, as well as for four embryonic stages of <em>Tribolium</em>. BeetleAtlas allows one to search for individual <em>Tribolium</em> genes to obtain values of both total gene expression and enrichment in different tissues, together with data for individual isoforms. To facilitate cross-species studies, one can also use <em>Drosophila melanogaster</em> gene identifiers to search for related <em>Tribolium</em> genes. For retrieved genes there are options to identify and display the tissue expression of related <em>Tribolium</em> genes or homologous <em>Drosophila</em> genes. Five additional search modes are available to find genes conforming to any of the following criteria: exhibiting high expression in a particular tissue; showing significant differences in expression between larva and adult; having a peak of expression at a specific stage of embryonic development; belonging to a particular functional category; and displaying a pattern of tissue expression similar to that of a query gene. We illustrate how the different feaures of BeetleAtlas can be used to illuminate our understanding of the genetic mechanisms underpinning the biology of what is the largest animal group on earth.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168520"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001074/pdfft?md5=d43ca113651c6eb642cd449e3aad0011&pid=1-s2.0-S0022283624001074-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168694
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 h on the above 400 genomes, or speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold’s prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at:
https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
{"title":"LinearAlifold: Linear-time consensus structure prediction for RNA alignments","authors":"","doi":"10.1016/j.jmb.2024.168694","DOIUrl":"10.1016/j.jmb.2024.168694","url":null,"abstract":"<div><p>Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (<span><math><mrow><mo>∼</mo></mrow></math></span>30,000<em>nt</em>). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 h on the above 400 genomes, or <span><math><mrow><mo>∼</mo><mn>36</mn><mo>×</mo></mrow></math></span> speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold’s prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at:</p><p><span><span>https://github.com/LinearFold/LinearAlifold</span><svg><path></path></svg></span> (code) and <span><span>http://linearfold.org/linear-alifold</span><svg><path></path></svg></span> (server).</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168694"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002961/pdfft?md5=00f1e9455bb03b8e6b3ad40c8ff311e7&pid=1-s2.0-S0022283624002961-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141544352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168705
Shan Wang , Chaohui Bao , Siyue Yang , Chenxu Gao , Chang Lu , Lulu Jiang , Liye Chen , Zheng Wang , Hai Fang
We introduce XGR-model (or XGRm), a web server made accessible at http://www.xgrm.pro, with the aim of meeting the increasing demand for effectively interpreting summary-level genomic data in model organisms. Currently, it hosts two enrichment analysers and two subnetwork analysers to support enrichment and subnetwork analyses for user-input mouse genomic data, whether gene-centric or genomic region-centric. The enrichment analysers identify ontology term enrichments for input genes (GElyser) or for genes linked from input genomic regions (RElyser). The subnetwork analysers rely on our previously established network algorithm to identify gene subnetworks from input gene-centric summary data (GSlyser) or from input region-centric summary data (RSlyser), leveraging network information about either functional interactions or pathway-derived interactions. Collectively, XGRm offers an all-in-one solution for gaining systems biology insights into summary-level genomic data in mice, underpinned by our commitment to regular updates as well as natural extensions to other model organisms.
{"title":"XGRm: A Web Server for Interpreting Mouse Summary-level Genomic Data","authors":"Shan Wang , Chaohui Bao , Siyue Yang , Chenxu Gao , Chang Lu , Lulu Jiang , Liye Chen , Zheng Wang , Hai Fang","doi":"10.1016/j.jmb.2024.168705","DOIUrl":"10.1016/j.jmb.2024.168705","url":null,"abstract":"<div><p>We introduce XGR-model (or XGRm), a web server made accessible at http://www.xgrm.pro, with the aim of meeting the increasing demand for effectively interpreting summary-level genomic data in model organisms. Currently, it hosts two enrichment analysers and two subnetwork analysers to support enrichment and subnetwork analyses for user-input mouse genomic data, whether gene-centric or genomic region-centric. The enrichment analysers identify ontology term enrichments for input genes (<em>GElyser</em>) or for genes linked from input genomic regions (<em>RElyser</em>). The subnetwork analysers rely on our previously established network algorithm to identify gene subnetworks from input gene-centric summary data (<em>GSlyser</em>) or from input region-centric summary data (<em>RSlyser</em>), leveraging network information about either functional interactions or pathway-derived interactions. Collectively, XGRm offers an all-in-one solution for gaining systems biology insights into summary-level genomic data in mice, underpinned by our commitment to regular updates as well as natural extensions to other model organisms.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168705"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624003140/pdfft?md5=f07be86c74dc5fc6f9f89d97fefb37cf&pid=1-s2.0-S0022283624003140-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141690520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168494
Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein–protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at https://e-prsa.biocomp.unibo.it/main/ where users can submit single-sequence and batch jobs.
{"title":"E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence","authors":"","doi":"10.1016/j.jmb.2024.168494","DOIUrl":"10.1016/j.jmb.2024.168494","url":null,"abstract":"<div><p>Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein–protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at <span><span>https://e-prsa.biocomp.unibo.it/main/</span><svg><path></path></svg></span> where users can submit single-sequence and batch jobs.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168494"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624000664/pdfft?md5=5479e98c4394e85085ec9ab992a70ec7&pid=1-s2.0-S0022283624000664-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139830690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168549
Nearest neighbor thermodynamic parameters are widely used for RNA and DNA secondary structure prediction and to model thermodynamic ensembles of secondary structures. The Nearest Neighbor Database (NNDB) is a freely available web resource (https://rna.urmc.rochester.edu/NNDB) that provides the functional forms, parameter values, and example calculations. The NNDB provides the 1999 and 2004 set of RNA folding nearest neighbor parameters. We expanded the database to include a set of DNA parameters and a set of RNA parameters that includes m6A in addition to the canonical RNA nucleobases. The site was redesigned using the Quarto open-source publishing system. A downloadable PDF version of the complete resource and downloadable sets of nearest neighbor parameters are available.
近邻热力学参数被广泛用于 RNA 和 DNA 二级结构预测以及二级结构热力学集合建模。最近邻数据库(NNDB)是一个可免费获取的网络资源(https://rna.urmc.rochester.edu/NNDB),提供函数形式、参数值和计算示例。NNDB 提供了 1999 年和 2004 年的 RNA 折叠近邻参数集。我们对数据库进行了扩展,增加了一组 DNA 参数和一组 RNA 参数,其中除了典型的 RNA 核碱基外,还包括 m6A。我们使用 Quarto 开源出版系统重新设计了网站。可下载 PDF 版本的完整资源和可下载的近邻参数集。
{"title":"NNDB: An Expanded Database of Nearest Neighbor Parameters for Predicting Stability of Nucleic Acid Secondary Structures","authors":"","doi":"10.1016/j.jmb.2024.168549","DOIUrl":"10.1016/j.jmb.2024.168549","url":null,"abstract":"<div><p>Nearest neighbor thermodynamic parameters are widely used for RNA and DNA secondary structure prediction and to model thermodynamic ensembles of secondary structures. The Nearest Neighbor Database (NNDB) is a freely available web resource (<span><span>https://rna.urmc.rochester.edu/NNDB</span><svg><path></path></svg></span>) that provides the functional forms, parameter values, and example calculations. The NNDB provides the 1999 and 2004 set of RNA folding nearest neighbor parameters. We expanded the database to include a set of DNA parameters and a set of RNA parameters that includes m<sup>6</sup>A in addition to the canonical RNA nucleobases. The site was redesigned using the Quarto open-source publishing system. A downloadable PDF version of the complete resource and downloadable sets of nearest neighbor parameters are available.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168549"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S002228362400144X/pdfft?md5=0e570dd624af08cd501a916c5bb24b55&pid=1-s2.0-S002228362400144X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140206113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168552
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods.
In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
{"title":"RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction","authors":"","doi":"10.1016/j.jmb.2024.168552","DOIUrl":"10.1016/j.jmb.2024.168552","url":null,"abstract":"<div><p>With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods.</p><p>In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168552"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001475/pdfft?md5=5530a074f00756a90477518772fa34fc&pid=1-s2.0-S0022283624001475-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140326204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jmb.2024.168554
Molecular modeling and simulation serve an important role in exploring biological functions of proteins at the molecular level, which is complementary to experiments. CHARMM-GUI (https://www.charmm-gui.org) is a web-based graphical user interface that generates complex molecular simulation systems and input files, and we have been continuously developing and expanding its functionalities to facilitate various complex molecular modeling and make molecular dynamics simulations more accessible to the scientific community. Currently, covalent drug discovery emerges as a popular and important field. Covalent drug forms a chemical bond with specific residues on the target protein, and it has advantages in potency for its prolonged inhibition effects. Even though there are higher demands in modeling PDB protein structures with various covalent ligand types, proper modeling of covalent ligands remains challenging. This work presents a new functionality in CHARMM-GUI PDB Reader & Manipulator that can handle a diversity of ligand-amino acid linkage types, which is validated by a careful benchmark study using over 1,000 covalent ligand structures in RCSB PDB. We hope that this new functionality can boost the modeling and simulation study of covalent ligands.
{"title":"CHARMM-GUI PDB Reader and Manipulator: Covalent Ligand Modeling and Simulation","authors":"","doi":"10.1016/j.jmb.2024.168554","DOIUrl":"10.1016/j.jmb.2024.168554","url":null,"abstract":"<div><p>Molecular modeling and simulation serve an important role in exploring biological functions of proteins at the molecular level, which is complementary to experiments. CHARMM-GUI (<span><span>https://www.charmm-gui.org</span><svg><path></path></svg></span>) is a web-based graphical user interface that generates complex molecular simulation systems and input files, and we have been continuously developing and expanding its functionalities to facilitate various complex molecular modeling and make molecular dynamics simulations more accessible to the scientific community. Currently, covalent drug discovery emerges as a popular and important field. Covalent drug forms a chemical bond with specific residues on the target protein, and it has advantages in potency for its prolonged inhibition effects. Even though there are higher demands in modeling PDB protein structures with various covalent ligand types, proper modeling of covalent ligands remains challenging. This work presents a new functionality in CHARMM-GUI <em>PDB Reader & Manipulator</em> that can handle a diversity of ligand-amino acid linkage types, which is validated by a careful benchmark study using over 1,000 covalent ligand structures in RCSB PDB. We hope that this new functionality can boost the modeling and simulation study of covalent ligands.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168554"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001499/pdfft?md5=c22bc6a24892229f4d80acb3c293965e&pid=1-s2.0-S0022283624001499-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140406025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}