首页 > 最新文献

Nucleic Acids Research最新文献

英文 中文
SETDB1 activity is globally directed by H3K14 acetylation via its Triple Tudor Domain. SETDB1 的活性通过其三重都铎结构域(Triple Tudor Domain)受 H3K14 乙酰化的全面指导。
IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1053
Thyagarajan T Chandrasekaran, Michel Choudalakis, Alexander Bröhm, Sara Weirich, Alexandra G Kouroukli, Ole Ammerpohl, Philipp Rathert, Pavel Bashtrykov, Albert Jeltsch

SETDB1 (SET domain bifurcated histone lysine methyltransferase 1) is a major protein lysine methyltransferase trimethylating lysine 9 on histone H3 (H3K9) which is involved in heterochromatin formation and silencing of repeat elements (REs). It contains a unique Triple Tudor Domain (3TD), which specifically binds the dual modification of H3K14ac in the presence of H3K9me1/2/3. Here, we explored the role of the 3TD H3-tail interaction for the H3K9 methylation activity of SETDB1. We generated a binding reduced 3TD mutant and demonstrate in biochemical methylation assays on peptides and recombinant nucleosomes containing H3K14ac and H3K14ac analogs, respectively, that H3K14 acetylation is crucial for the 3TD mediated recruitment of SETDB1. We also observe this effect in cells where SETDB1 binding and activity is globally correlated with H3K14ac, and knockout of the H3K14 acetyltransferase HBO1 causes a drastic reduction in H3K9me3 levels at SETDB1 dependent sites. Regions with DNA hypomethylation after SETDB1 knockout also show an enrichment in SETDB1-dependent H3K9me3 and H3K14ac. Further analyses revealed that 3TD is particularly important at specific target regions like L1M REs, where H3K9me3 cannot be efficiently reconstituted by the 3TD mutant of SETDB1. In summary, our data demonstrate that the H3K9me3 and H3K14ac are not antagonistic marks but rather the presence of H3K14ac is required for SETDB1 recruitment via 3TD binding to H3K9me1/2/3-K14ac regions and establishment of H3K9me3.

SETDB1(SET domain bifurcated histone lysine methyltransferase 1)是一种主要的蛋白质赖氨酸甲基转移酶,可对组蛋白 H3(H3K9)上的赖氨酸 9 进行三甲基化,从而参与异染色质的形成和重复元件(RE)的沉默。它含有一个独特的三重都铎结构域(3TD),能在 H3K9me1/2/3 存在的情况下特异性地结合 H3K14ac 的双重修饰。在这里,我们探讨了 3TD H3-尾部相互作用对 SETDB1 的 H3K9 甲基化活性的作用。我们生成了一种结合力降低的 3TD 突变体,并在分别含有 H3K14ac 和 H3K14ac 类似物的肽和重组核小体的生化甲基化实验中证明,H3K14 乙酰化对 3TD 介导的 SETDB1 招募至关重要。我们还在细胞中观察到了这种效应,在细胞中,SETDB1 的结合和活性与 H3K14ac 全局相关,H3K14 乙酰转移酶 HBO1 的敲除会导致 SETDB1 依赖位点的 H3K9me3 水平急剧下降。SETDB1敲除后DNA低甲基化的区域也显示出SETDB1依赖性H3K9me3和H3K14ac的富集。进一步的分析表明,3TD 在 L1M REs 等特定靶区尤为重要,在这些靶区,SETDB1 的 3TD 突变体不能有效地重组 H3K9me3。总之,我们的数据表明,H3K9me3 和 H3K14ac 并不是拮抗标记,而是需要 H3K14ac 的存在才能通过 3TD 结合到 H3K9me1/2/3-K14ac 区域并建立 H3K9me3 来招募 SETDB1。
{"title":"SETDB1 activity is globally directed by H3K14 acetylation via its Triple Tudor Domain.","authors":"Thyagarajan T Chandrasekaran, Michel Choudalakis, Alexander Bröhm, Sara Weirich, Alexandra G Kouroukli, Ole Ammerpohl, Philipp Rathert, Pavel Bashtrykov, Albert Jeltsch","doi":"10.1093/nar/gkae1053","DOIUrl":"https://doi.org/10.1093/nar/gkae1053","url":null,"abstract":"<p><p>SETDB1 (SET domain bifurcated histone lysine methyltransferase 1) is a major protein lysine methyltransferase trimethylating lysine 9 on histone H3 (H3K9) which is involved in heterochromatin formation and silencing of repeat elements (REs). It contains a unique Triple Tudor Domain (3TD), which specifically binds the dual modification of H3K14ac in the presence of H3K9me1/2/3. Here, we explored the role of the 3TD H3-tail interaction for the H3K9 methylation activity of SETDB1. We generated a binding reduced 3TD mutant and demonstrate in biochemical methylation assays on peptides and recombinant nucleosomes containing H3K14ac and H3K14ac analogs, respectively, that H3K14 acetylation is crucial for the 3TD mediated recruitment of SETDB1. We also observe this effect in cells where SETDB1 binding and activity is globally correlated with H3K14ac, and knockout of the H3K14 acetyltransferase HBO1 causes a drastic reduction in H3K9me3 levels at SETDB1 dependent sites. Regions with DNA hypomethylation after SETDB1 knockout also show an enrichment in SETDB1-dependent H3K9me3 and H3K14ac. Further analyses revealed that 3TD is particularly important at specific target regions like L1M REs, where H3K9me3 cannot be efficiently reconstituted by the 3TD mutant of SETDB1. In summary, our data demonstrate that the H3K9me3 and H3K14ac are not antagonistic marks but rather the presence of H3K14ac is required for SETDB1 recruitment via 3TD binding to H3K9me1/2/3-K14ac regions and establishment of H3K9me3.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters. 次生代谢合作组织:次生代谢物生物合成基因簇数据库和网络讨论门户。
IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1060
Daniel W Udwary, Drew T Doering, Bryce Foster, Tatyana Smirnova, Satria A Kautsar, Nigel J Mouncey

Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.

次生代谢物是由生命的各个角落产生的小分子,通常具有专门的生物活性功能,与临床和环境相关。次生代谢物生物合成基因簇(BGC)通常可以通过各种序列相似性工具在 DNA 序列中识别出来,但要确定通路中基因的确切功能并预测其化学产物,通常只能通过仔细的人工比较分析来完成。为了促进这项工作,我们报告了二次代谢合作组织(Secondary Metabolism Collaboratory,SMC)的首次发布,该组织旨在提供一个全面的、与工具无关的 BGC 序列数据储存库,这些数据来自所有公开的和用户提交的细菌和古生物基因组和等位基因来源。在该网站上,用户可以搜索到从每个来源确定的假定 BGC 目录,以及从多种序列分析工具中获得的基因和域注释的可视化。SMC 的数据还可通过可公开访问的应用编程接口 (API) 端点获取,以方便编程访问。我们鼓励用户通过在 BGC 和源网页上发表评论来分享自己的发现(以及搜索他人的发现)。在撰写本文时,SMC 是最大的 BGC 信息库,拥有来自 130 万个源序列的 1,310 万个 BGC 区域,并且还在不断增加,其网址为 https://smc.jgi.doe.gov。
{"title":"The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters.","authors":"Daniel W Udwary, Drew T Doering, Bryce Foster, Tatyana Smirnova, Satria A Kautsar, Nigel J Mouncey","doi":"10.1093/nar/gkae1060","DOIUrl":"https://doi.org/10.1093/nar/gkae1060","url":null,"abstract":"<p><p>Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RiboSeq.Org: an integrated suite of resources for ribosome profiling data analysis and visualization RiboSeq.Org:用于核糖体图谱数据分析和可视化的集成资源套件
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1020
Jack A S Tierney, Michał I Świrski, Håkon Tjeldnes, Anmol M Kiran, Gionmattia Carancini, Stephen J Kiniry, Audrey M Michel, Joanna Kufel, Eivind Valen, Pavel V Baranov
Ribosome profiling (Ribo-Seq) has revolutionised our understanding of translation, but the increasing complexity and volume of Ribo-Seq data present challenges for its reuse. Here, we formally introduce RiboSeq.Org, an integrated suite of resources designed to facilitate Ribo-Seq data analysis and visualisation within a web browser. RiboSeq.Org comprises several interconnected tools: GWIPS-viz for genome-wide visualisation, Trips-Viz for transcriptome-centric analysis, RiboGalaxy for data processing and the newly developed RiboSeq data portal (RDP) for centralised dataset identification and access. The RDP currently hosts preprocessed datasets corresponding to 14840 sequence libraries (samples) from 969 studies across 96 species, in various file formats along with standardised metadata. RiboSeq.Org addresses key challenges in Ribo-Seq data reuse through standardised sample preprocessing, semi-automated metadata curation and programmatic information access via a REST API and command-line utilities. RiboSeq.Org enhances the accessibility and utility of public Ribo-Seq data, enabling researchers to gain new insights into translational regulation and protein synthesis across diverse organisms and conditions. By providing these integrated, user-friendly resources, RiboSeq.Org aims to lower the barrier to reproducible research in the field of translatomics and promote more efficient utilisation of the wealth of available Ribo-Seq data.
核糖体分析(Ribo-Seq)彻底改变了我们对翻译的理解,但 Ribo-Seq 数据的复杂性和数量不断增加,为其再利用带来了挑战。在此,我们正式介绍 RiboSeq.Org,这是一套集成资源,旨在促进网络浏览器中的 Ribo-Seq 数据分析和可视化。RiboSeq.Org 包括几个相互关联的工具:GWIPS-viz用于全基因组可视化,Trips-Viz用于以转录组为中心的分析,RiboGalaxy用于数据处理,新开发的RiboSeq数据门户网站(RDP)用于集中识别和访问数据集。目前,RDP 寄存着预处理数据集,这些数据集来自 96 个物种的 969 项研究的 14840 个序列文库(样本),具有各种文件格式和标准化元数据。RiboSeq.Org 通过标准化的样本预处理、半自动化的元数据整理以及通过 REST API 和命令行实用程序进行的程序化信息访问,解决了 Ribo-Seq 数据再利用的关键难题。RiboSeq.Org 提高了公共 Ribo-Seq 数据的可访问性和实用性,使研究人员能够对不同生物体和条件下的转化调控和蛋白质合成有新的认识。通过提供这些用户友好的集成资源,RiboSeq.Org 旨在降低转译领域可重复研究的门槛,促进更有效地利用大量可用的 Ribo-Seq 数据。
{"title":"RiboSeq.Org: an integrated suite of resources for ribosome profiling data analysis and visualization","authors":"Jack A S Tierney, Michał I Świrski, Håkon Tjeldnes, Anmol M Kiran, Gionmattia Carancini, Stephen J Kiniry, Audrey M Michel, Joanna Kufel, Eivind Valen, Pavel V Baranov","doi":"10.1093/nar/gkae1020","DOIUrl":"https://doi.org/10.1093/nar/gkae1020","url":null,"abstract":"Ribosome profiling (Ribo-Seq) has revolutionised our understanding of translation, but the increasing complexity and volume of Ribo-Seq data present challenges for its reuse. Here, we formally introduce RiboSeq.Org, an integrated suite of resources designed to facilitate Ribo-Seq data analysis and visualisation within a web browser. RiboSeq.Org comprises several interconnected tools: GWIPS-viz for genome-wide visualisation, Trips-Viz for transcriptome-centric analysis, RiboGalaxy for data processing and the newly developed RiboSeq data portal (RDP) for centralised dataset identification and access. The RDP currently hosts preprocessed datasets corresponding to 14840 sequence libraries (samples) from 969 studies across 96 species, in various file formats along with standardised metadata. RiboSeq.Org addresses key challenges in Ribo-Seq data reuse through standardised sample preprocessing, semi-automated metadata curation and programmatic information access via a REST API and command-line utilities. RiboSeq.Org enhances the accessibility and utility of public Ribo-Seq data, enabling researchers to gain new insights into translational regulation and protein synthesis across diverse organisms and conditions. By providing these integrated, user-friendly resources, RiboSeq.Org aims to lower the barrier to reproducible research in the field of translatomics and promote more efficient utilisation of the wealth of available Ribo-Seq data.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"20 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GTO: a comprehensive gene therapy omnibus GTO:全面的基因治疗总汇
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1051
Xuehang Meng, Yujia Du, Chang Liu, Zhaoyu Zhai, Jianbo Pan
Gene therapy, which involves the delivery of genetic material into cells to correct an underlying genetic problem, has emerged as a promising approach for treating various conditions. To promote research in this rapidly evolving field, we developed the Gene Therapy Omnibus (GTO) (http://www.inbirg.com/gto/), a comprehensive resource containing detailed clinical trial data and molecular information related to gene therapy. The GTO includes 6333 clinical trial records and 3466 transcriptome profiles, with information on 614 altered genes and 22 types of gene therapy, including DNA therapies, RNA therapies and genetically-modified cell therapies. For each gene therapy product in a clinical trial, detailed information, such as altered gene name, structural components, indication, vector information, phase of the clinical trial, clinical outcomes and adverse effects, is provided when available. Additionally, 345 comparison datasets, including 29 single-cell RNA-sequencing datasets comprising information on both gene therapy and control samples, were established. Differential gene expression and downstream functional enrichment analyses were performed through standardized pipelines to elucidate the molecular alterations induced by gene therapy. The user-friendly interface of the GTO supports efficient data retrieval, visualization and analysis, making it an invaluable resource for researchers and clinicians performing clinical research on gene therapy and the underlying mechanisms.
基因治疗是指将遗传物质注入细胞以纠正潜在的遗传问题,它已成为治疗各种疾病的一种很有前景的方法。为了促进这一快速发展领域的研究,我们开发了基因治疗总库(Gene Therapy Omnibus,GTO)(http://www.inbirg.com/gto/),这是一个包含与基因治疗相关的详细临床试验数据和分子信息的综合资源。GTO 包括 6333 条临床试验记录和 3466 份转录组图谱,涉及 614 个改变的基因和 22 种基因疗法,包括 DNA 疗法、RNA 疗法和基因修饰细胞疗法。对于临床试验中的每种基因治疗产品,只要有详细信息,如改变基因名称、结构成分、适应症、载体信息、临床试验阶段、临床结果和不良反应等,都会提供。此外,还建立了 345 个对比数据集,包括 29 个单细胞 RNA 序列数据集,其中包含基因治疗样本和对照样本的信息。通过标准化管道进行差异基因表达和下游功能富集分析,以阐明基因治疗诱导的分子改变。GTO 的用户友好界面支持高效的数据检索、可视化和分析,是研究人员和临床医生进行基因治疗及其内在机制临床研究的宝贵资源。
{"title":"GTO: a comprehensive gene therapy omnibus","authors":"Xuehang Meng, Yujia Du, Chang Liu, Zhaoyu Zhai, Jianbo Pan","doi":"10.1093/nar/gkae1051","DOIUrl":"https://doi.org/10.1093/nar/gkae1051","url":null,"abstract":"Gene therapy, which involves the delivery of genetic material into cells to correct an underlying genetic problem, has emerged as a promising approach for treating various conditions. To promote research in this rapidly evolving field, we developed the Gene Therapy Omnibus (GTO) (http://www.inbirg.com/gto/), a comprehensive resource containing detailed clinical trial data and molecular information related to gene therapy. The GTO includes 6333 clinical trial records and 3466 transcriptome profiles, with information on 614 altered genes and 22 types of gene therapy, including DNA therapies, RNA therapies and genetically-modified cell therapies. For each gene therapy product in a clinical trial, detailed information, such as altered gene name, structural components, indication, vector information, phase of the clinical trial, clinical outcomes and adverse effects, is provided when available. Additionally, 345 comparison datasets, including 29 single-cell RNA-sequencing datasets comprising information on both gene therapy and control samples, were established. Differential gene expression and downstream functional enrichment analyses were performed through standardized pipelines to elucidate the molecular alterations induced by gene therapy. The user-friendly interface of the GTO supports efficient data retrieval, visualization and analysis, making it an invaluable resource for researchers and clinicians performing clinical research on gene therapy and the underlying mechanisms.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"17 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PharmFreq: a comprehensive atlas of ethnogeographic allelic variation in clinically important pharmacogenes. PharmFreq:临床重要药物基因等位基因变异的人种地理综合图谱。
IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1016
Roman Tremmel, Yitian Zhou, Mahamadou D Camara, Sofiene Laarif, Erik Eliasson, Volker M Lauschke

Genetic polymorphisms in drug metabolizing enzymes, drug transporters as well as in genes encoding the human major histocompatibility complex contribute to inter-individual differences in drug efficacy and safety. The extent, pattern and complexity of such pharmacogenetic variation differ drastically across human populations. Here, we present PharmFreq, a global repository of pharmacogenetic frequency information that aggregates frequency data of 658 allelic variants from over 10 million individuals collected from >1200 studies across 144 countries. Most investigations were conducted in East Asian and European populations, accounting for 29.4 and 26.6% of all studies, respectively. We find that the number of studies per country and aggregated cohort size correlated significantly with population size (R = 0.55, P= 3*10-9) and country gross domestic product (R = 0.43, P= 2*10-6) with overall population coverage varying between 5% in Estonia to < 0.001% in many countries in Sub-Saharan Africa and Asia. All frequency data are openly accessible via a web-based interactive dashboard at pharmfreq.com that facilitates the exploration, visualization and analysis of country- and population-specific data and their inferred phenotypic consequences. PharmFreq thus presents a comprehensive, freely available resource for pharmacogenetic variant frequencies that can inform about ethnogeographic pharmacogenomic diversity and reveal important inequities that help to focus future research efforts into underrepresented populations.

药物代谢酶、药物转运体以及人类主要组织相容性复合体编码基因的基因多态性导致了药物疗效和安全性的个体差异。在不同的人群中,这种药物基因变异的程度、模式和复杂性都大不相同。在此,我们介绍了全球药物基因频率信息库 PharmFreq,该信息库汇总了来自 144 个国家超过 1200 项研究的超过 1000 万个个体的 658 个等位基因变异的频率数据。大多数研究是在东亚和欧洲人群中进行的,分别占所有研究的 29.4% 和 26.6%。我们发现,每个国家的研究数量和总体队列规模与人口规模(R = 0.55,P= 3*10-9)和国家国内生产总值(R = 0.43,P= 2*10-6)显著相关,总体人口覆盖率从爱沙尼亚的 5%到美国的 5%不等。
{"title":"PharmFreq: a comprehensive atlas of ethnogeographic allelic variation in clinically important pharmacogenes.","authors":"Roman Tremmel, Yitian Zhou, Mahamadou D Camara, Sofiene Laarif, Erik Eliasson, Volker M Lauschke","doi":"10.1093/nar/gkae1016","DOIUrl":"https://doi.org/10.1093/nar/gkae1016","url":null,"abstract":"<p><p>Genetic polymorphisms in drug metabolizing enzymes, drug transporters as well as in genes encoding the human major histocompatibility complex contribute to inter-individual differences in drug efficacy and safety. The extent, pattern and complexity of such pharmacogenetic variation differ drastically across human populations. Here, we present PharmFreq, a global repository of pharmacogenetic frequency information that aggregates frequency data of 658 allelic variants from over 10 million individuals collected from >1200 studies across 144 countries. Most investigations were conducted in East Asian and European populations, accounting for 29.4 and 26.6% of all studies, respectively. We find that the number of studies per country and aggregated cohort size correlated significantly with population size (R = 0.55, P= 3*10-9) and country gross domestic product (R = 0.43, P= 2*10-6) with overall population coverage varying between 5% in Estonia to < 0.001% in many countries in Sub-Saharan Africa and Asia. All frequency data are openly accessible via a web-based interactive dashboard at pharmfreq.com that facilitates the exploration, visualization and analysis of country- and population-specific data and their inferred phenotypic consequences. PharmFreq thus presents a comprehensive, freely available resource for pharmacogenetic variant frequencies that can inform about ethnogeographic pharmacogenomic diversity and reveal important inequities that help to focus future research efforts into underrepresented populations.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Pfam protein families database: embracing AI/ML Pfam 蛋白质家族数据库:拥抱人工智能/ML
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae997
Typhaine Paysan-Lafosse, Antonina Andreeva, Matthias Blum, Sara Rocio Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A Salazar, Maxwell L Bileschi, Felipe Llinares-López, Laetitia Meng-Papaxanthos, Lucy J Colwell, Nick V Grishin, R Dustin Schaeffer, Damiano Clementel, Silvio C E Tosatto, Erik Sonhammer, Valerie Wood, Alex Bateman
The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.
Pfam 蛋白质家族数据库是用于基因组注释和蛋白质结构与功能分析的蛋白质结构域和家族的综合集合 (https://www.ebi.ac.uk/interpro/)。本次更新介绍了 Pfam 自 2020 年以来的主要发展情况,包括 Pfam 网站的退役和与 InterPro 的整合、与 ECOD 结构分类的协调,以及对元基因组、微蛋白和含重复家族的扩展整理。我们重点介绍了如何利用 AlphaFold 结构预测来完善结构域边界和识别新结构域。我们还介绍了通过对 AlphaFold 模型进行大规模序列相似性分析而发现的新科属。我们还详细介绍了 Pfam-N 的开发情况,它利用深度学习扩大了科的覆盖范围,与标准 Pfam 相比,UniProtKB 的覆盖范围增加了 8.8%。我们讨论了更频繁地发布与 InterPro 集成的 Pfam 的计划,以及人工智能进一步协助整理工作的潜力。尽管最近取得了一些进展,但仍有许多蛋白质家族有待分类,Pfam 将继续努力实现蛋白质领域的全面覆盖。
{"title":"The Pfam protein families database: embracing AI/ML","authors":"Typhaine Paysan-Lafosse, Antonina Andreeva, Matthias Blum, Sara Rocio Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A Salazar, Maxwell L Bileschi, Felipe Llinares-López, Laetitia Meng-Papaxanthos, Lucy J Colwell, Nick V Grishin, R Dustin Schaeffer, Damiano Clementel, Silvio C E Tosatto, Erik Sonhammer, Valerie Wood, Alex Bateman","doi":"10.1093/nar/gkae997","DOIUrl":"https://doi.org/10.1093/nar/gkae997","url":null,"abstract":"The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"42 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
miRNATissueAtlas 2025: an update to the uniformly processed and annotated human and mouse non-coding RNA tissue atlas miRNATissueAtlas 2025:经统一处理和注释的人类和小鼠非编码 RNA 组织图谱的更新版
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1036
Shusruto Rishik, Pascal Hirsch, Friederike Grandke, Tobias Fehlmann, Andreas Keller
MiRNAs represent a non-coding RNA class that regulate gene expression and pathways. While miRNAs are evolutionary conserved most data stems from Homo sapiens and Mus musculus. As miRNA expression is highly tissue specific, we developed miRNATissueAtlas to comprehensively explore this landscape in H. sapiens. We expanded the H. sapiens tissue repertoire and included M. musculus. In past years, the number of public miRNA expression datasets has grown substantially. Our previous releases of the miRNATissueAtlas represent a great framework for a uniformly pre-processed and label-harmonized resource containing information on these datasets. We incorporate the respective data in the newest release, miRNATissueAtlas 2025, which contains expressions from 9 classes of ncRNA from 799 billion reads across 61 593 samples for H. sapiens and M. musculus. The number of organs and tissues has increased from 28 and 54 to 74 and 373, respectively. This number includes physiological tissues, cell lines and extracellular vesicles. New tissue specificity index calculations build atop the knowledge of previous iterations. Calculations from cell lines enable comparison with physiological tissues, providing a valuable resource for translational research. Finally, between H. sapiens and M. musculus, 35 organs overlap, allowing cross-species comparisons. The updated miRNATissueAtlas 2025 is available at https://www.ccb.uni-saarland.de/tissueatlas2025.
miRNA 是一类非编码 RNA,可调节基因表达和通路。虽然 miRNA 在进化上是保守的,但大多数数据都来自智人和麝。由于 miRNA 的表达具有高度的组织特异性,我们开发了 miRNATissueAtlas 来全面探索智人的这种表达状况。我们扩大了智人的组织范围,并纳入了麝鼠。过去几年中,公共 miRNA 表达数据集的数量大幅增加。我们之前发布的 miRNATissueAtlas 是一个很好的框架,它是经过统一预处理和标签协调的资源,包含了这些数据集的信息。我们在最新发布的 miRNATissueAtlas 2025 中纳入了相应的数据,它包含了来自 H. sapiens 和 M. musculus 61 593 个样本的 799 亿个读数中 9 类 ncRNA 的表达。器官和组织的数量分别从 28 个和 54 个增加到 74 个和 373 个。这一数字包括生理组织、细胞系和细胞外囊泡。新的组织特异性指数计算建立在之前迭代的知识基础之上。细胞系的计算结果可以与生理组织进行比较,为转化研究提供了宝贵的资源。最后,在智人和肌肉人之间,有 35 个器官重叠,可以进行跨物种比较。更新后的 miRNATissueAtlas 2025 可在 https://www.ccb.uni-saarland.de/tissueatlas2025 网站上查阅。
{"title":"miRNATissueAtlas 2025: an update to the uniformly processed and annotated human and mouse non-coding RNA tissue atlas","authors":"Shusruto Rishik, Pascal Hirsch, Friederike Grandke, Tobias Fehlmann, Andreas Keller","doi":"10.1093/nar/gkae1036","DOIUrl":"https://doi.org/10.1093/nar/gkae1036","url":null,"abstract":"MiRNAs represent a non-coding RNA class that regulate gene expression and pathways. While miRNAs are evolutionary conserved most data stems from Homo sapiens and Mus musculus. As miRNA expression is highly tissue specific, we developed miRNATissueAtlas to comprehensively explore this landscape in H. sapiens. We expanded the H. sapiens tissue repertoire and included M. musculus. In past years, the number of public miRNA expression datasets has grown substantially. Our previous releases of the miRNATissueAtlas represent a great framework for a uniformly pre-processed and label-harmonized resource containing information on these datasets. We incorporate the respective data in the newest release, miRNATissueAtlas 2025, which contains expressions from 9 classes of ncRNA from 799 billion reads across 61 593 samples for H. sapiens and M. musculus. The number of organs and tissues has increased from 28 and 54 to 74 and 373, respectively. This number includes physiological tissues, cell lines and extracellular vesicles. New tissue specificity index calculations build atop the knowledge of previous iterations. Calculations from cell lines enable comparison with physiological tissues, providing a valuable resource for translational research. Finally, between H. sapiens and M. musculus, 35 organs overlap, allowing cross-species comparisons. The updated miRNATissueAtlas 2025 is available at https://www.ccb.uni-saarland.de/tissueatlas2025.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"3 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era dbAMP 3.0:后流行病时代多肽抗菌活性和结构注释的最新资源
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-14 DOI: 10.1093/nar/gkae1019
Lantian Yao, Jiahui Guan, Peilin Xie, Chia-Ru Chung, Zhihao Zhao, Danhong Dong, Yilin Guo, Wenyang Zhang, Junyang Deng, Yuxuan Pang, Yulan Liu, Yunlu Peng, Jorng-Tzong Horng, Ying-Chih Chiang, Tzong-Yi Lee
Antimicrobial resistance is one of the most urgent global health threats, especially in the post-pandemic era. Antimicrobial peptides (AMPs) offer a promising alternative to traditional antibiotics, driving growing interest in recent years. dbAMP is a comprehensive database offering extensive annotations on AMPs, including sequence information, functional activity data, physicochemical properties and structural annotations. In this update, dbAMP has curated data from over 5200 publications, encompassing 33,065 AMPs and 2453 antimicrobial proteins from 3534 organisms. Additionally, dbAMP utilizes ESMFold to determine the three-dimensional structures of AMPs, providing over 30,000 structural annotations that facilitate structure-based functional insights for clinical drug development. Furthermore, dbAMP employs molecular docking techniques, providing over 100 docked complexes that contribute useful insights into the potential mechanisms of AMPs. The toxicity and stability of AMPs are critical factors in assessing their potential as clinical drugs. The updated dbAMP introduced an efficient tool for evaluating the hemolytic toxicity and half-life of AMPs, alongside an AMP optimization platform for designing AMPs with high antimicrobial activity, reduced toxicity and increased stability. The updated dbAMP is freely accessible at https://awi.cuhk.edu.cn/dbAMP/. Overall, dbAMP represents a comprehensive and essential resource for AMP analysis and design, poised to advance antimicrobial strategies in the post-pandemic era.
抗菌素耐药性是最紧迫的全球健康威胁之一,尤其是在大流行后的时代。dbAMP 是一个综合性数据库,提供有关抗菌肽的大量注释,包括序列信息、功能活性数据、理化性质和结构注释。在本次更新中,dbAMP 已从 5200 多篇论文中整理出数据,涵盖来自 3534 种生物的 33,065 种 AMP 和 2453 种抗菌蛋白。此外,dbAMP 利用 ESMFold 确定了 AMP 的三维结构,提供了 30,000 多条结构注释,为临床药物开发提供了基于结构的功能见解。此外,dbAMP 还采用了分子对接技术,提供了 100 多个对接复合物,有助于深入了解 AMPs 的潜在机制。AMPs 的毒性和稳定性是评估其作为临床药物潜力的关键因素。更新后的 dbAMP 提供了一个评估 AMP 溶血毒性和半衰期的有效工具,以及一个 AMP 优化平台,用于设计具有高抗菌活性、低毒性和高稳定性的 AMP。更新后的 dbAMP 可在 https://awi.cuhk.edu.cn/dbAMP/ 免费访问。总之,dbAMP 是分析和设计 AMP 的全面而重要的资源,有望在后流行病时代推动抗菌策略的发展。
{"title":"dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era","authors":"Lantian Yao, Jiahui Guan, Peilin Xie, Chia-Ru Chung, Zhihao Zhao, Danhong Dong, Yilin Guo, Wenyang Zhang, Junyang Deng, Yuxuan Pang, Yulan Liu, Yunlu Peng, Jorng-Tzong Horng, Ying-Chih Chiang, Tzong-Yi Lee","doi":"10.1093/nar/gkae1019","DOIUrl":"https://doi.org/10.1093/nar/gkae1019","url":null,"abstract":"Antimicrobial resistance is one of the most urgent global health threats, especially in the post-pandemic era. Antimicrobial peptides (AMPs) offer a promising alternative to traditional antibiotics, driving growing interest in recent years. dbAMP is a comprehensive database offering extensive annotations on AMPs, including sequence information, functional activity data, physicochemical properties and structural annotations. In this update, dbAMP has curated data from over 5200 publications, encompassing 33,065 AMPs and 2453 antimicrobial proteins from 3534 organisms. Additionally, dbAMP utilizes ESMFold to determine the three-dimensional structures of AMPs, providing over 30,000 structural annotations that facilitate structure-based functional insights for clinical drug development. Furthermore, dbAMP employs molecular docking techniques, providing over 100 docked complexes that contribute useful insights into the potential mechanisms of AMPs. The toxicity and stability of AMPs are critical factors in assessing their potential as clinical drugs. The updated dbAMP introduced an efficient tool for evaluating the hemolytic toxicity and half-life of AMPs, alongside an AMP optimization platform for designing AMPs with high antimicrobial activity, reduced toxicity and increased stability. The updated dbAMP is freely accessible at https://awi.cuhk.edu.cn/dbAMP/. Overall, dbAMP represents a comprehensive and essential resource for AMP analysis and design, poised to advance antimicrobial strategies in the post-pandemic era.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"64 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
canSAR 2024—an update to the public drug discovery knowledgebase canSAR 2024--更新公共药物发现知识库
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-13 DOI: 10.1093/nar/gkae1050
Phillip W Gingrich, Rezvan Chitsazi, Ansuman Biswas, Chunjie Jiang, Li Zhao, Joseph E Tym, Kevin M Brammer, Jun Li, Zhigang Shu, David S Maxwell, Jeffrey A Tacy, Ioan L Mica, Michael Darkoh, Patrizio di Micco, Kaitlyn P Russell, Paul Workman, Bissan Al-Lazikani
canSAR (https://cansar.ai) continues to serve as the largest publicly available platform for cancer-focused drug discovery and translational research. It integrates multidisciplinary data from disparate and otherwise siloed public data sources as well as data curated uniquely for canSAR. In addition, canSAR deploys a suite of curation and standardization tools together with AI algorithms to generate new knowledge from these integrated data to inform hypothesis generation. Here we report the latest updates to canSAR. As well as increasing available data, we provide enhancements to our algorithms to improve the offering to the user. Notably, our enhancements include a revised ligandability classifier leveraging Positive Unlabeled Learning that finds twice as many ligandable opportunities across the pocketome, and our revised chemical standardization pipeline and hierarchy better enables the aggregation of structurally related molecular records.
canSAR (https://cansar.ai) 仍然是癌症药物发现和转化研究领域最大的公开平台。它整合了来自不同公共数据来源的多学科数据,以及专门为 canSAR 策划的数据。此外,canSAR 还部署了一套整理和标准化工具以及人工智能算法,以便从这些集成数据中生成新知识,为假设的生成提供信息。我们在此报告 canSAR 的最新更新。在增加可用数据的同时,我们还对算法进行了改进,以便更好地为用户提供服务。值得注意的是,我们的增强功能包括利用正向无标记学习(Positive Unlabeled Learning)的配伍性分类器,该分类器在整个口袋组中找到的配伍机会是原来的两倍。
{"title":"canSAR 2024—an update to the public drug discovery knowledgebase","authors":"Phillip W Gingrich, Rezvan Chitsazi, Ansuman Biswas, Chunjie Jiang, Li Zhao, Joseph E Tym, Kevin M Brammer, Jun Li, Zhigang Shu, David S Maxwell, Jeffrey A Tacy, Ioan L Mica, Michael Darkoh, Patrizio di Micco, Kaitlyn P Russell, Paul Workman, Bissan Al-Lazikani","doi":"10.1093/nar/gkae1050","DOIUrl":"https://doi.org/10.1093/nar/gkae1050","url":null,"abstract":"canSAR (https://cansar.ai) continues to serve as the largest publicly available platform for cancer-focused drug discovery and translational research. It integrates multidisciplinary data from disparate and otherwise siloed public data sources as well as data curated uniquely for canSAR. In addition, canSAR deploys a suite of curation and standardization tools together with AI algorithms to generate new knowledge from these integrated data to inform hypothesis generation. Here we report the latest updates to canSAR. As well as increasing available data, we provide enhancements to our algorithms to improve the offering to the user. Notably, our enhancements include a revised ligandability classifier leveraging Positive Unlabeled Learning that finds twice as many ligandable opportunities across the pocketome, and our revised chemical standardization pipeline and hierarchy better enables the aggregation of structurally related molecular records.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"98 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing disease risk gene discovery by integrating transcription factor-linked trans-variants into transcriptome-wide association analyses 通过将转录因子关联反式变异纳入转录组关联分析,提高疾病风险基因的发现能力
IF 14.9 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-13 DOI: 10.1093/nar/gkae1035
Jingni He, Deshan Perera, Wanqing Wen, Jie Ping, Qing Li, Linshuoshuo Lyu, Zhishan Chen, Xiang Shu, Jirong Long, Qiuyin Cai, Xiao-Ou Shu, Zhijun Yin, Wei Zheng, Quan Long, Xingyi Guo
Transcriptome-wide association studies (TWAS) have been successful in identifying disease susceptibility genes by integrating cis-variants predicted gene expression with genome-wide association studies (GWAS) data. However, trans-variants for predicting gene expression remain largely unexplored. Here, we introduce transTF-TWAS, which incorporates transcription factor (TF)-linked trans-variants to enhance model building for TF downstream target genes. Using data from the Genotype-Tissue Expression project, we predict gene expression and alternative splicing and applied these prediction models to large GWAS datasets for breast, prostate, lung cancers and other diseases. We demonstrate that transTF-TWAS outperforms other existing TWAS approaches in both constructing gene expression prediction models and identifying disease-associated genes, as shown by simulations and real data analysis. Our transTF-TWAS approach significantly contributes to the discovery of disease risk genes. Findings from this study shed new light on several genetically driven key TF regulators and their associated TF–gene regulatory networks underlying disease susceptibility.
通过整合顺式变异预测基因表达和全基因组关联研究(GWAS)数据,全转录组关联研究(TWAS)已成功鉴定出疾病易感基因。然而,用于预测基因表达的反式变异在很大程度上仍未得到探索。在这里,我们介绍了 transTF-TWAS,它结合了转录因子(TF)关联的反式变异,以加强 TF 下游靶基因的模型构建。利用基因型-组织表达项目的数据,我们预测了基因表达和替代剪接,并将这些预测模型应用于乳腺癌、前列腺癌、肺癌和其他疾病的大型 GWAS 数据集。我们通过模拟和实际数据分析证明,transTF-TWAS 在构建基因表达预测模型和识别疾病相关基因方面都优于其他现有的 TWAS 方法。我们的 transTF-TWAS 方法极大地促进了疾病风险基因的发现。这项研究的发现为几种基因驱动的关键 TF 调节因子及其相关的 TF 基因调控网络提供了新的线索,这些基因是疾病易感性的基础。
{"title":"Enhancing disease risk gene discovery by integrating transcription factor-linked trans-variants into transcriptome-wide association analyses","authors":"Jingni He, Deshan Perera, Wanqing Wen, Jie Ping, Qing Li, Linshuoshuo Lyu, Zhishan Chen, Xiang Shu, Jirong Long, Qiuyin Cai, Xiao-Ou Shu, Zhijun Yin, Wei Zheng, Quan Long, Xingyi Guo","doi":"10.1093/nar/gkae1035","DOIUrl":"https://doi.org/10.1093/nar/gkae1035","url":null,"abstract":"Transcriptome-wide association studies (TWAS) have been successful in identifying disease susceptibility genes by integrating cis-variants predicted gene expression with genome-wide association studies (GWAS) data. However, trans-variants for predicting gene expression remain largely unexplored. Here, we introduce transTF-TWAS, which incorporates transcription factor (TF)-linked trans-variants to enhance model building for TF downstream target genes. Using data from the Genotype-Tissue Expression project, we predict gene expression and alternative splicing and applied these prediction models to large GWAS datasets for breast, prostate, lung cancers and other diseases. We demonstrate that transTF-TWAS outperforms other existing TWAS approaches in both constructing gene expression prediction models and identifying disease-associated genes, as shown by simulations and real data analysis. Our transTF-TWAS approach significantly contributes to the discovery of disease risk genes. Findings from this study shed new light on several genetically driven key TF regulators and their associated TF–gene regulatory networks underlying disease susceptibility.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"24 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Nucleic Acids Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1