Thyagarajan T Chandrasekaran, Michel Choudalakis, Alexander Bröhm, Sara Weirich, Alexandra G Kouroukli, Ole Ammerpohl, Philipp Rathert, Pavel Bashtrykov, Albert Jeltsch
SETDB1 (SET domain bifurcated histone lysine methyltransferase 1) is a major protein lysine methyltransferase trimethylating lysine 9 on histone H3 (H3K9) which is involved in heterochromatin formation and silencing of repeat elements (REs). It contains a unique Triple Tudor Domain (3TD), which specifically binds the dual modification of H3K14ac in the presence of H3K9me1/2/3. Here, we explored the role of the 3TD H3-tail interaction for the H3K9 methylation activity of SETDB1. We generated a binding reduced 3TD mutant and demonstrate in biochemical methylation assays on peptides and recombinant nucleosomes containing H3K14ac and H3K14ac analogs, respectively, that H3K14 acetylation is crucial for the 3TD mediated recruitment of SETDB1. We also observe this effect in cells where SETDB1 binding and activity is globally correlated with H3K14ac, and knockout of the H3K14 acetyltransferase HBO1 causes a drastic reduction in H3K9me3 levels at SETDB1 dependent sites. Regions with DNA hypomethylation after SETDB1 knockout also show an enrichment in SETDB1-dependent H3K9me3 and H3K14ac. Further analyses revealed that 3TD is particularly important at specific target regions like L1M REs, where H3K9me3 cannot be efficiently reconstituted by the 3TD mutant of SETDB1. In summary, our data demonstrate that the H3K9me3 and H3K14ac are not antagonistic marks but rather the presence of H3K14ac is required for SETDB1 recruitment via 3TD binding to H3K9me1/2/3-K14ac regions and establishment of H3K9me3.
{"title":"SETDB1 activity is globally directed by H3K14 acetylation via its Triple Tudor Domain.","authors":"Thyagarajan T Chandrasekaran, Michel Choudalakis, Alexander Bröhm, Sara Weirich, Alexandra G Kouroukli, Ole Ammerpohl, Philipp Rathert, Pavel Bashtrykov, Albert Jeltsch","doi":"10.1093/nar/gkae1053","DOIUrl":"https://doi.org/10.1093/nar/gkae1053","url":null,"abstract":"<p><p>SETDB1 (SET domain bifurcated histone lysine methyltransferase 1) is a major protein lysine methyltransferase trimethylating lysine 9 on histone H3 (H3K9) which is involved in heterochromatin formation and silencing of repeat elements (REs). It contains a unique Triple Tudor Domain (3TD), which specifically binds the dual modification of H3K14ac in the presence of H3K9me1/2/3. Here, we explored the role of the 3TD H3-tail interaction for the H3K9 methylation activity of SETDB1. We generated a binding reduced 3TD mutant and demonstrate in biochemical methylation assays on peptides and recombinant nucleosomes containing H3K14ac and H3K14ac analogs, respectively, that H3K14 acetylation is crucial for the 3TD mediated recruitment of SETDB1. We also observe this effect in cells where SETDB1 binding and activity is globally correlated with H3K14ac, and knockout of the H3K14 acetyltransferase HBO1 causes a drastic reduction in H3K9me3 levels at SETDB1 dependent sites. Regions with DNA hypomethylation after SETDB1 knockout also show an enrichment in SETDB1-dependent H3K9me3 and H3K14ac. Further analyses revealed that 3TD is particularly important at specific target regions like L1M REs, where H3K9me3 cannot be efficiently reconstituted by the 3TD mutant of SETDB1. In summary, our data demonstrate that the H3K9me3 and H3K14ac are not antagonistic marks but rather the presence of H3K14ac is required for SETDB1 recruitment via 3TD binding to H3K9me1/2/3-K14ac regions and establishment of H3K9me3.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel W Udwary, Drew T Doering, Bryce Foster, Tatyana Smirnova, Satria A Kautsar, Nigel J Mouncey
Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.
{"title":"The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters.","authors":"Daniel W Udwary, Drew T Doering, Bryce Foster, Tatyana Smirnova, Satria A Kautsar, Nigel J Mouncey","doi":"10.1093/nar/gkae1060","DOIUrl":"https://doi.org/10.1093/nar/gkae1060","url":null,"abstract":"<p><p>Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack A S Tierney, Michał I Świrski, Håkon Tjeldnes, Anmol M Kiran, Gionmattia Carancini, Stephen J Kiniry, Audrey M Michel, Joanna Kufel, Eivind Valen, Pavel V Baranov
Ribosome profiling (Ribo-Seq) has revolutionised our understanding of translation, but the increasing complexity and volume of Ribo-Seq data present challenges for its reuse. Here, we formally introduce RiboSeq.Org, an integrated suite of resources designed to facilitate Ribo-Seq data analysis and visualisation within a web browser. RiboSeq.Org comprises several interconnected tools: GWIPS-viz for genome-wide visualisation, Trips-Viz for transcriptome-centric analysis, RiboGalaxy for data processing and the newly developed RiboSeq data portal (RDP) for centralised dataset identification and access. The RDP currently hosts preprocessed datasets corresponding to 14840 sequence libraries (samples) from 969 studies across 96 species, in various file formats along with standardised metadata. RiboSeq.Org addresses key challenges in Ribo-Seq data reuse through standardised sample preprocessing, semi-automated metadata curation and programmatic information access via a REST API and command-line utilities. RiboSeq.Org enhances the accessibility and utility of public Ribo-Seq data, enabling researchers to gain new insights into translational regulation and protein synthesis across diverse organisms and conditions. By providing these integrated, user-friendly resources, RiboSeq.Org aims to lower the barrier to reproducible research in the field of translatomics and promote more efficient utilisation of the wealth of available Ribo-Seq data.
{"title":"RiboSeq.Org: an integrated suite of resources for ribosome profiling data analysis and visualization","authors":"Jack A S Tierney, Michał I Świrski, Håkon Tjeldnes, Anmol M Kiran, Gionmattia Carancini, Stephen J Kiniry, Audrey M Michel, Joanna Kufel, Eivind Valen, Pavel V Baranov","doi":"10.1093/nar/gkae1020","DOIUrl":"https://doi.org/10.1093/nar/gkae1020","url":null,"abstract":"Ribosome profiling (Ribo-Seq) has revolutionised our understanding of translation, but the increasing complexity and volume of Ribo-Seq data present challenges for its reuse. Here, we formally introduce RiboSeq.Org, an integrated suite of resources designed to facilitate Ribo-Seq data analysis and visualisation within a web browser. RiboSeq.Org comprises several interconnected tools: GWIPS-viz for genome-wide visualisation, Trips-Viz for transcriptome-centric analysis, RiboGalaxy for data processing and the newly developed RiboSeq data portal (RDP) for centralised dataset identification and access. The RDP currently hosts preprocessed datasets corresponding to 14840 sequence libraries (samples) from 969 studies across 96 species, in various file formats along with standardised metadata. RiboSeq.Org addresses key challenges in Ribo-Seq data reuse through standardised sample preprocessing, semi-automated metadata curation and programmatic information access via a REST API and command-line utilities. RiboSeq.Org enhances the accessibility and utility of public Ribo-Seq data, enabling researchers to gain new insights into translational regulation and protein synthesis across diverse organisms and conditions. By providing these integrated, user-friendly resources, RiboSeq.Org aims to lower the barrier to reproducible research in the field of translatomics and promote more efficient utilisation of the wealth of available Ribo-Seq data.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"20 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuehang Meng, Yujia Du, Chang Liu, Zhaoyu Zhai, Jianbo Pan
Gene therapy, which involves the delivery of genetic material into cells to correct an underlying genetic problem, has emerged as a promising approach for treating various conditions. To promote research in this rapidly evolving field, we developed the Gene Therapy Omnibus (GTO) (http://www.inbirg.com/gto/), a comprehensive resource containing detailed clinical trial data and molecular information related to gene therapy. The GTO includes 6333 clinical trial records and 3466 transcriptome profiles, with information on 614 altered genes and 22 types of gene therapy, including DNA therapies, RNA therapies and genetically-modified cell therapies. For each gene therapy product in a clinical trial, detailed information, such as altered gene name, structural components, indication, vector information, phase of the clinical trial, clinical outcomes and adverse effects, is provided when available. Additionally, 345 comparison datasets, including 29 single-cell RNA-sequencing datasets comprising information on both gene therapy and control samples, were established. Differential gene expression and downstream functional enrichment analyses were performed through standardized pipelines to elucidate the molecular alterations induced by gene therapy. The user-friendly interface of the GTO supports efficient data retrieval, visualization and analysis, making it an invaluable resource for researchers and clinicians performing clinical research on gene therapy and the underlying mechanisms.
{"title":"GTO: a comprehensive gene therapy omnibus","authors":"Xuehang Meng, Yujia Du, Chang Liu, Zhaoyu Zhai, Jianbo Pan","doi":"10.1093/nar/gkae1051","DOIUrl":"https://doi.org/10.1093/nar/gkae1051","url":null,"abstract":"Gene therapy, which involves the delivery of genetic material into cells to correct an underlying genetic problem, has emerged as a promising approach for treating various conditions. To promote research in this rapidly evolving field, we developed the Gene Therapy Omnibus (GTO) (http://www.inbirg.com/gto/), a comprehensive resource containing detailed clinical trial data and molecular information related to gene therapy. The GTO includes 6333 clinical trial records and 3466 transcriptome profiles, with information on 614 altered genes and 22 types of gene therapy, including DNA therapies, RNA therapies and genetically-modified cell therapies. For each gene therapy product in a clinical trial, detailed information, such as altered gene name, structural components, indication, vector information, phase of the clinical trial, clinical outcomes and adverse effects, is provided when available. Additionally, 345 comparison datasets, including 29 single-cell RNA-sequencing datasets comprising information on both gene therapy and control samples, were established. Differential gene expression and downstream functional enrichment analyses were performed through standardized pipelines to elucidate the molecular alterations induced by gene therapy. The user-friendly interface of the GTO supports efficient data retrieval, visualization and analysis, making it an invaluable resource for researchers and clinicians performing clinical research on gene therapy and the underlying mechanisms.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"17 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roman Tremmel, Yitian Zhou, Mahamadou D Camara, Sofiene Laarif, Erik Eliasson, Volker M Lauschke
Genetic polymorphisms in drug metabolizing enzymes, drug transporters as well as in genes encoding the human major histocompatibility complex contribute to inter-individual differences in drug efficacy and safety. The extent, pattern and complexity of such pharmacogenetic variation differ drastically across human populations. Here, we present PharmFreq, a global repository of pharmacogenetic frequency information that aggregates frequency data of 658 allelic variants from over 10 million individuals collected from >1200 studies across 144 countries. Most investigations were conducted in East Asian and European populations, accounting for 29.4 and 26.6% of all studies, respectively. We find that the number of studies per country and aggregated cohort size correlated significantly with population size (R = 0.55, P= 3*10-9) and country gross domestic product (R = 0.43, P= 2*10-6) with overall population coverage varying between 5% in Estonia to < 0.001% in many countries in Sub-Saharan Africa and Asia. All frequency data are openly accessible via a web-based interactive dashboard at pharmfreq.com that facilitates the exploration, visualization and analysis of country- and population-specific data and their inferred phenotypic consequences. PharmFreq thus presents a comprehensive, freely available resource for pharmacogenetic variant frequencies that can inform about ethnogeographic pharmacogenomic diversity and reveal important inequities that help to focus future research efforts into underrepresented populations.
{"title":"PharmFreq: a comprehensive atlas of ethnogeographic allelic variation in clinically important pharmacogenes.","authors":"Roman Tremmel, Yitian Zhou, Mahamadou D Camara, Sofiene Laarif, Erik Eliasson, Volker M Lauschke","doi":"10.1093/nar/gkae1016","DOIUrl":"https://doi.org/10.1093/nar/gkae1016","url":null,"abstract":"<p><p>Genetic polymorphisms in drug metabolizing enzymes, drug transporters as well as in genes encoding the human major histocompatibility complex contribute to inter-individual differences in drug efficacy and safety. The extent, pattern and complexity of such pharmacogenetic variation differ drastically across human populations. Here, we present PharmFreq, a global repository of pharmacogenetic frequency information that aggregates frequency data of 658 allelic variants from over 10 million individuals collected from >1200 studies across 144 countries. Most investigations were conducted in East Asian and European populations, accounting for 29.4 and 26.6% of all studies, respectively. We find that the number of studies per country and aggregated cohort size correlated significantly with population size (R = 0.55, P= 3*10-9) and country gross domestic product (R = 0.43, P= 2*10-6) with overall population coverage varying between 5% in Estonia to < 0.001% in many countries in Sub-Saharan Africa and Asia. All frequency data are openly accessible via a web-based interactive dashboard at pharmfreq.com that facilitates the exploration, visualization and analysis of country- and population-specific data and their inferred phenotypic consequences. PharmFreq thus presents a comprehensive, freely available resource for pharmacogenetic variant frequencies that can inform about ethnogeographic pharmacogenomic diversity and reveal important inequities that help to focus future research efforts into underrepresented populations.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Typhaine Paysan-Lafosse, Antonina Andreeva, Matthias Blum, Sara Rocio Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A Salazar, Maxwell L Bileschi, Felipe Llinares-López, Laetitia Meng-Papaxanthos, Lucy J Colwell, Nick V Grishin, R Dustin Schaeffer, Damiano Clementel, Silvio C E Tosatto, Erik Sonhammer, Valerie Wood, Alex Bateman
The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.
{"title":"The Pfam protein families database: embracing AI/ML","authors":"Typhaine Paysan-Lafosse, Antonina Andreeva, Matthias Blum, Sara Rocio Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A Salazar, Maxwell L Bileschi, Felipe Llinares-López, Laetitia Meng-Papaxanthos, Lucy J Colwell, Nick V Grishin, R Dustin Schaeffer, Damiano Clementel, Silvio C E Tosatto, Erik Sonhammer, Valerie Wood, Alex Bateman","doi":"10.1093/nar/gkae997","DOIUrl":"https://doi.org/10.1093/nar/gkae997","url":null,"abstract":"The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"42 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shusruto Rishik, Pascal Hirsch, Friederike Grandke, Tobias Fehlmann, Andreas Keller
MiRNAs represent a non-coding RNA class that regulate gene expression and pathways. While miRNAs are evolutionary conserved most data stems from Homo sapiens and Mus musculus. As miRNA expression is highly tissue specific, we developed miRNATissueAtlas to comprehensively explore this landscape in H. sapiens. We expanded the H. sapiens tissue repertoire and included M. musculus. In past years, the number of public miRNA expression datasets has grown substantially. Our previous releases of the miRNATissueAtlas represent a great framework for a uniformly pre-processed and label-harmonized resource containing information on these datasets. We incorporate the respective data in the newest release, miRNATissueAtlas 2025, which contains expressions from 9 classes of ncRNA from 799 billion reads across 61 593 samples for H. sapiens and M. musculus. The number of organs and tissues has increased from 28 and 54 to 74 and 373, respectively. This number includes physiological tissues, cell lines and extracellular vesicles. New tissue specificity index calculations build atop the knowledge of previous iterations. Calculations from cell lines enable comparison with physiological tissues, providing a valuable resource for translational research. Finally, between H. sapiens and M. musculus, 35 organs overlap, allowing cross-species comparisons. The updated miRNATissueAtlas 2025 is available at https://www.ccb.uni-saarland.de/tissueatlas2025.
{"title":"miRNATissueAtlas 2025: an update to the uniformly processed and annotated human and mouse non-coding RNA tissue atlas","authors":"Shusruto Rishik, Pascal Hirsch, Friederike Grandke, Tobias Fehlmann, Andreas Keller","doi":"10.1093/nar/gkae1036","DOIUrl":"https://doi.org/10.1093/nar/gkae1036","url":null,"abstract":"MiRNAs represent a non-coding RNA class that regulate gene expression and pathways. While miRNAs are evolutionary conserved most data stems from Homo sapiens and Mus musculus. As miRNA expression is highly tissue specific, we developed miRNATissueAtlas to comprehensively explore this landscape in H. sapiens. We expanded the H. sapiens tissue repertoire and included M. musculus. In past years, the number of public miRNA expression datasets has grown substantially. Our previous releases of the miRNATissueAtlas represent a great framework for a uniformly pre-processed and label-harmonized resource containing information on these datasets. We incorporate the respective data in the newest release, miRNATissueAtlas 2025, which contains expressions from 9 classes of ncRNA from 799 billion reads across 61 593 samples for H. sapiens and M. musculus. The number of organs and tissues has increased from 28 and 54 to 74 and 373, respectively. This number includes physiological tissues, cell lines and extracellular vesicles. New tissue specificity index calculations build atop the knowledge of previous iterations. Calculations from cell lines enable comparison with physiological tissues, providing a valuable resource for translational research. Finally, between H. sapiens and M. musculus, 35 organs overlap, allowing cross-species comparisons. The updated miRNATissueAtlas 2025 is available at https://www.ccb.uni-saarland.de/tissueatlas2025.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"3 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antimicrobial resistance is one of the most urgent global health threats, especially in the post-pandemic era. Antimicrobial peptides (AMPs) offer a promising alternative to traditional antibiotics, driving growing interest in recent years. dbAMP is a comprehensive database offering extensive annotations on AMPs, including sequence information, functional activity data, physicochemical properties and structural annotations. In this update, dbAMP has curated data from over 5200 publications, encompassing 33,065 AMPs and 2453 antimicrobial proteins from 3534 organisms. Additionally, dbAMP utilizes ESMFold to determine the three-dimensional structures of AMPs, providing over 30,000 structural annotations that facilitate structure-based functional insights for clinical drug development. Furthermore, dbAMP employs molecular docking techniques, providing over 100 docked complexes that contribute useful insights into the potential mechanisms of AMPs. The toxicity and stability of AMPs are critical factors in assessing their potential as clinical drugs. The updated dbAMP introduced an efficient tool for evaluating the hemolytic toxicity and half-life of AMPs, alongside an AMP optimization platform for designing AMPs with high antimicrobial activity, reduced toxicity and increased stability. The updated dbAMP is freely accessible at https://awi.cuhk.edu.cn/dbAMP/. Overall, dbAMP represents a comprehensive and essential resource for AMP analysis and design, poised to advance antimicrobial strategies in the post-pandemic era.
{"title":"dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era","authors":"Lantian Yao, Jiahui Guan, Peilin Xie, Chia-Ru Chung, Zhihao Zhao, Danhong Dong, Yilin Guo, Wenyang Zhang, Junyang Deng, Yuxuan Pang, Yulan Liu, Yunlu Peng, Jorng-Tzong Horng, Ying-Chih Chiang, Tzong-Yi Lee","doi":"10.1093/nar/gkae1019","DOIUrl":"https://doi.org/10.1093/nar/gkae1019","url":null,"abstract":"Antimicrobial resistance is one of the most urgent global health threats, especially in the post-pandemic era. Antimicrobial peptides (AMPs) offer a promising alternative to traditional antibiotics, driving growing interest in recent years. dbAMP is a comprehensive database offering extensive annotations on AMPs, including sequence information, functional activity data, physicochemical properties and structural annotations. In this update, dbAMP has curated data from over 5200 publications, encompassing 33,065 AMPs and 2453 antimicrobial proteins from 3534 organisms. Additionally, dbAMP utilizes ESMFold to determine the three-dimensional structures of AMPs, providing over 30,000 structural annotations that facilitate structure-based functional insights for clinical drug development. Furthermore, dbAMP employs molecular docking techniques, providing over 100 docked complexes that contribute useful insights into the potential mechanisms of AMPs. The toxicity and stability of AMPs are critical factors in assessing their potential as clinical drugs. The updated dbAMP introduced an efficient tool for evaluating the hemolytic toxicity and half-life of AMPs, alongside an AMP optimization platform for designing AMPs with high antimicrobial activity, reduced toxicity and increased stability. The updated dbAMP is freely accessible at https://awi.cuhk.edu.cn/dbAMP/. Overall, dbAMP represents a comprehensive and essential resource for AMP analysis and design, poised to advance antimicrobial strategies in the post-pandemic era.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"64 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phillip W Gingrich, Rezvan Chitsazi, Ansuman Biswas, Chunjie Jiang, Li Zhao, Joseph E Tym, Kevin M Brammer, Jun Li, Zhigang Shu, David S Maxwell, Jeffrey A Tacy, Ioan L Mica, Michael Darkoh, Patrizio di Micco, Kaitlyn P Russell, Paul Workman, Bissan Al-Lazikani
canSAR (https://cansar.ai) continues to serve as the largest publicly available platform for cancer-focused drug discovery and translational research. It integrates multidisciplinary data from disparate and otherwise siloed public data sources as well as data curated uniquely for canSAR. In addition, canSAR deploys a suite of curation and standardization tools together with AI algorithms to generate new knowledge from these integrated data to inform hypothesis generation. Here we report the latest updates to canSAR. As well as increasing available data, we provide enhancements to our algorithms to improve the offering to the user. Notably, our enhancements include a revised ligandability classifier leveraging Positive Unlabeled Learning that finds twice as many ligandable opportunities across the pocketome, and our revised chemical standardization pipeline and hierarchy better enables the aggregation of structurally related molecular records.
{"title":"canSAR 2024—an update to the public drug discovery knowledgebase","authors":"Phillip W Gingrich, Rezvan Chitsazi, Ansuman Biswas, Chunjie Jiang, Li Zhao, Joseph E Tym, Kevin M Brammer, Jun Li, Zhigang Shu, David S Maxwell, Jeffrey A Tacy, Ioan L Mica, Michael Darkoh, Patrizio di Micco, Kaitlyn P Russell, Paul Workman, Bissan Al-Lazikani","doi":"10.1093/nar/gkae1050","DOIUrl":"https://doi.org/10.1093/nar/gkae1050","url":null,"abstract":"canSAR (https://cansar.ai) continues to serve as the largest publicly available platform for cancer-focused drug discovery and translational research. It integrates multidisciplinary data from disparate and otherwise siloed public data sources as well as data curated uniquely for canSAR. In addition, canSAR deploys a suite of curation and standardization tools together with AI algorithms to generate new knowledge from these integrated data to inform hypothesis generation. Here we report the latest updates to canSAR. As well as increasing available data, we provide enhancements to our algorithms to improve the offering to the user. Notably, our enhancements include a revised ligandability classifier leveraging Positive Unlabeled Learning that finds twice as many ligandable opportunities across the pocketome, and our revised chemical standardization pipeline and hierarchy better enables the aggregation of structurally related molecular records.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"98 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transcriptome-wide association studies (TWAS) have been successful in identifying disease susceptibility genes by integrating cis-variants predicted gene expression with genome-wide association studies (GWAS) data. However, trans-variants for predicting gene expression remain largely unexplored. Here, we introduce transTF-TWAS, which incorporates transcription factor (TF)-linked trans-variants to enhance model building for TF downstream target genes. Using data from the Genotype-Tissue Expression project, we predict gene expression and alternative splicing and applied these prediction models to large GWAS datasets for breast, prostate, lung cancers and other diseases. We demonstrate that transTF-TWAS outperforms other existing TWAS approaches in both constructing gene expression prediction models and identifying disease-associated genes, as shown by simulations and real data analysis. Our transTF-TWAS approach significantly contributes to the discovery of disease risk genes. Findings from this study shed new light on several genetically driven key TF regulators and their associated TF–gene regulatory networks underlying disease susceptibility.
{"title":"Enhancing disease risk gene discovery by integrating transcription factor-linked trans-variants into transcriptome-wide association analyses","authors":"Jingni He, Deshan Perera, Wanqing Wen, Jie Ping, Qing Li, Linshuoshuo Lyu, Zhishan Chen, Xiang Shu, Jirong Long, Qiuyin Cai, Xiao-Ou Shu, Zhijun Yin, Wei Zheng, Quan Long, Xingyi Guo","doi":"10.1093/nar/gkae1035","DOIUrl":"https://doi.org/10.1093/nar/gkae1035","url":null,"abstract":"Transcriptome-wide association studies (TWAS) have been successful in identifying disease susceptibility genes by integrating cis-variants predicted gene expression with genome-wide association studies (GWAS) data. However, trans-variants for predicting gene expression remain largely unexplored. Here, we introduce transTF-TWAS, which incorporates transcription factor (TF)-linked trans-variants to enhance model building for TF downstream target genes. Using data from the Genotype-Tissue Expression project, we predict gene expression and alternative splicing and applied these prediction models to large GWAS datasets for breast, prostate, lung cancers and other diseases. We demonstrate that transTF-TWAS outperforms other existing TWAS approaches in both constructing gene expression prediction models and identifying disease-associated genes, as shown by simulations and real data analysis. Our transTF-TWAS approach significantly contributes to the discovery of disease risk genes. Findings from this study shed new light on several genetically driven key TF regulators and their associated TF–gene regulatory networks underlying disease susceptibility.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"24 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}