Rachel Seongeun Kim, Eli Levy Karin, Milot Mirdita, Rayan Chikhi, Martin Steinegger
The AlphaFold Protein Structure Database (AFDB) is the largest repository of accurately predicted structures with taxonomic labels. Despite providing predictions for over 214 million UniProt entries, the AFDB does not cover viral sequences, severely limiting their study. To address this, we created the Big Fantastic Virus Database (BFVD), a repository of 351 242 protein structures predicted by applying ColabFold to the viral sequence representatives of the UniRef30 clusters. By utilizing homology searches across two petabases of assembled sequencing data, we improved 36% of these structure predictions beyond ColabFold's initial results. BFVD holds a unique repertoire of protein structures as over 62% of its entries show no or low structural similarity to existing repositories. We demonstrate how a substantial fraction of bacteriophage proteins, which remained unannotated based on their sequences, can be matched with similar structures from BFVD. In that, BFVD is on par with the AFDB, while holding nearly three orders of magnitude fewer structures. BFVD is an important virus-specific expansion to protein structure repositories, offering new opportunities to advance viral research. BFVD can be freely downloaded at bfvd.steineggerlab.workers.dev and queried using Foldseek and UniProt labels at bfvd.foldseek.com.
{"title":"BFVD-a large repository of predicted viral protein structures.","authors":"Rachel Seongeun Kim, Eli Levy Karin, Milot Mirdita, Rayan Chikhi, Martin Steinegger","doi":"10.1093/nar/gkae1119","DOIUrl":"https://doi.org/10.1093/nar/gkae1119","url":null,"abstract":"<p><p>The AlphaFold Protein Structure Database (AFDB) is the largest repository of accurately predicted structures with taxonomic labels. Despite providing predictions for over 214 million UniProt entries, the AFDB does not cover viral sequences, severely limiting their study. To address this, we created the Big Fantastic Virus Database (BFVD), a repository of 351 242 protein structures predicted by applying ColabFold to the viral sequence representatives of the UniRef30 clusters. By utilizing homology searches across two petabases of assembled sequencing data, we improved 36% of these structure predictions beyond ColabFold's initial results. BFVD holds a unique repertoire of protein structures as over 62% of its entries show no or low structural similarity to existing repositories. We demonstrate how a substantial fraction of bacteriophage proteins, which remained unannotated based on their sequences, can be matched with similar structures from BFVD. In that, BFVD is on par with the AFDB, while holding nearly three orders of magnitude fewer structures. BFVD is an important virus-specific expansion to protein structure repositories, offering new opportunities to advance viral research. BFVD can be freely downloaded at bfvd.steineggerlab.workers.dev and queried using Foldseek and UniProt labels at bfvd.foldseek.com.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142686536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karen L Kanke, Rachael E Rayner, Jack Bozik, Eli Abel, Aparna Venugopalan, Ma Suu, Reza Nouri, Jacob T Stack, Gongbo Guo, Tatyana A Vetter, Estelle Cormet-Boyaka, Mark E Hester, Sriram Vaidyanathan
Single-stranded DNA (ssDNA) templates along with Cas9 have been used for knocking-in exogenous sequences in the genome but suffer from low efficiency. Here, we show that ssDNA with chemical modifications in 12–19% of internal bases, which we denote as enhanced ssDNA (esDNA), improve knock-in (KI) by 2–3-fold compared to end-modified ssDNA in airway basal stem cells (ABCs), CD34 + hematopoietic cells (CD34 + cells), T-cells and endothelial cells. Over 50% of alleles showed KI in three clinically relevant loci (CFTR, HBB and CCR5) in ABCs using esDNA and up to 70% of alleles showed KI in the HBB locus in CD34 + cells in the presence of a DNA-PKcs inhibitor. This level of correction is therapeutically relevant and is comparable to adeno-associated virus-based templates. The esDNA templates did not improve KI in induced pluripotent stem cells (iPSCs). This may be due to the absence of the nuclease TREX1 in iPSCs. Indeed, knocking out TREX1 in other cells improved KI using unmodified ssDNA. esDNA can be used to modify 20–30 bp regions in primary cells for therapeutic applications and biological modeling. The use of this approach for gene length insertions will require new methods to produce long chemically modified ssDNA in scalable quantities.
{"title":"Single-stranded DNA with internal base modifications mediates highly efficient knock-in in primary cells using CRISPR-Cas9","authors":"Karen L Kanke, Rachael E Rayner, Jack Bozik, Eli Abel, Aparna Venugopalan, Ma Suu, Reza Nouri, Jacob T Stack, Gongbo Guo, Tatyana A Vetter, Estelle Cormet-Boyaka, Mark E Hester, Sriram Vaidyanathan","doi":"10.1093/nar/gkae1069","DOIUrl":"https://doi.org/10.1093/nar/gkae1069","url":null,"abstract":"Single-stranded DNA (ssDNA) templates along with Cas9 have been used for knocking-in exogenous sequences in the genome but suffer from low efficiency. Here, we show that ssDNA with chemical modifications in 12–19% of internal bases, which we denote as enhanced ssDNA (esDNA), improve knock-in (KI) by 2–3-fold compared to end-modified ssDNA in airway basal stem cells (ABCs), CD34 + hematopoietic cells (CD34 + cells), T-cells and endothelial cells. Over 50% of alleles showed KI in three clinically relevant loci (CFTR, HBB and CCR5) in ABCs using esDNA and up to 70% of alleles showed KI in the HBB locus in CD34 + cells in the presence of a DNA-PKcs inhibitor. This level of correction is therapeutically relevant and is comparable to adeno-associated virus-based templates. The esDNA templates did not improve KI in induced pluripotent stem cells (iPSCs). This may be due to the absence of the nuclease TREX1 in iPSCs. Indeed, knocking out TREX1 in other cells improved KI using unmodified ssDNA. esDNA can be used to modify 20–30 bp regions in primary cells for therapeutic applications and biological modeling. The use of this approach for gene length insertions will require new methods to produce long chemically modified ssDNA in scalable quantities.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"6 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo R Arantes, Xiaoyu Chen, Souvik Sinha, Aakash Saha, Amun C Patel, Matthew Sample, Łukasz Nierzwicki, Audrone Lapinaite, Giulia Palermo
CRISPR-based DNA adenine base editors (ABEs) hold remarkable promises to address human genetic diseases caused by point mutations. ABEs were developed by combining CRISPR-Cas9 with a transfer RNA (tRNA) adenosine deaminase enzyme and through directed evolution, conferring the ability to deaminate DNA. However, the molecular mechanisms driving the efficient DNA deamination in the evolved ABEs remain unresolved. Here, extensive molecular simulations and biochemical experiments reveal the biophysical basis behind the astonishing base editing efficiency of ABE8e, the most efficient ABE to date. We demonstrate that the ABE8e's DNA deaminase domain, TadA8e, forms remarkably stable dimers compared to its tRNA-deaminating progenitor and that the strength of TadA dimerization is crucial for DNA deamination. The TadA8e dimer forms robust interactions involving its R98 and R129 residues, the RuvC domain of Cas9 and the DNA. These locking interactions are exclusive to ABE8e, distinguishing it from its predecessor, ABE7.10, and are indispensable to boost DNA deamination. Additionally, we identify three critical residues that drive the evolution of ABE8e toward improved base editing by balancing the enzyme's activity and stability, reinforcing the TadA8e dimer and improving the ABE8e's functionality. These insights offer new directions to engineer superior ABEs, advancing the design of safer precision genome editing tools.
基于CRISPR的DNA腺嘌呤碱基编辑器(ABEs)在解决由点突变引起的人类遗传疾病方面前景广阔。ABEs是通过将CRISPR-Cas9与转运核糖核酸(tRNA)腺苷脱氨酶结合,并通过定向进化,赋予DNA脱氨能力而开发出来的。然而,驱动进化ABEs高效脱氨基的分子机制仍未解决。在这里,大量的分子模拟和生化实验揭示了 ABE8e(迄今为止最高效的 ABE)惊人的碱基编辑效率背后的生物物理基础。我们证明了 ABE8e 的 DNA 脱氨酶结构域 TadA8e 与其 tRNA 脱氨祖先相比形成了非常稳定的二聚体,而且 TadA 二聚体的强度对 DNA 脱氨至关重要。TadA8e 二聚体与其 R98 和 R129 残基、Cas9 的 RuvC 结构域以及 DNA 形成了强有力的相互作用。这些锁定相互作用是 ABE8e 独有的,使其有别于其前身 ABE7.10,是促进 DNA 去氨基不可或缺的因素。此外,我们还发现了三个关键残基,它们通过平衡酶的活性和稳定性、强化 TadA8e 二聚体和改善 ABE8e 的功能,推动 ABE8e 向着改进碱基编辑的方向进化。这些见解为设计更优越的 ABE 提供了新的方向,推动了更安全的精准基因组编辑工具的设计。
{"title":"Dimerization of the deaminase domain and locking interactions with Cas9 boost base editing efficiency in ABE8e.","authors":"Pablo R Arantes, Xiaoyu Chen, Souvik Sinha, Aakash Saha, Amun C Patel, Matthew Sample, Łukasz Nierzwicki, Audrone Lapinaite, Giulia Palermo","doi":"10.1093/nar/gkae1066","DOIUrl":"https://doi.org/10.1093/nar/gkae1066","url":null,"abstract":"<p><p>CRISPR-based DNA adenine base editors (ABEs) hold remarkable promises to address human genetic diseases caused by point mutations. ABEs were developed by combining CRISPR-Cas9 with a transfer RNA (tRNA) adenosine deaminase enzyme and through directed evolution, conferring the ability to deaminate DNA. However, the molecular mechanisms driving the efficient DNA deamination in the evolved ABEs remain unresolved. Here, extensive molecular simulations and biochemical experiments reveal the biophysical basis behind the astonishing base editing efficiency of ABE8e, the most efficient ABE to date. We demonstrate that the ABE8e's DNA deaminase domain, TadA8e, forms remarkably stable dimers compared to its tRNA-deaminating progenitor and that the strength of TadA dimerization is crucial for DNA deamination. The TadA8e dimer forms robust interactions involving its R98 and R129 residues, the RuvC domain of Cas9 and the DNA. These locking interactions are exclusive to ABE8e, distinguishing it from its predecessor, ABE7.10, and are indispensable to boost DNA deamination. Additionally, we identify three critical residues that drive the evolution of ABE8e toward improved base editing by balancing the enzyme's activity and stability, reinforcing the TadA8e dimer and improving the ABE8e's functionality. These insights offer new directions to engineer superior ABEs, advancing the design of safer precision genome editing tools.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Response to spatiotemporal variation in selection gradients resulted in signatures of polygenic adaptation in human genomes. We introduce RAISING, a two-stage deep learning framework that optimizes neural network architecture through hyperparameter tuning before performing feature selection and prediction tasks. We tested RAISING on published and newly designed simulations that incorporate the complex interplay between demographic history and selection gradients. RAISING outperformed Phylogenetic Generalized Least Squares (PGLS), ridge regression and DeepGenomeScan, with significantly higher true positive rates (TPR) in detecting genetic adaptation. It reduced computational time by 60-fold and increased TPR by up to 28% compared to DeepGenomeScan on published data. In more complex demographic simulations, RAISING showed lower false discoveries and significantly higher TPR, up to 17-fold, compared to other methods. RAISING demonstrated robustness with least sensitivity to demographic history, selection gradient and their interactions. We developed a sliding window method for genome-wide implementation of RAISING to overcome the computational challenges of high-dimensional genomic data. Applied to African, European, South Asian and East Asian populations, we identified multiple genomic regions undergoing polygenic selection. Notably, ∼70% of the regions identified in Africans are unique, with broad patterns distinguishing them from non-Africans, corroborating the Out of Africa dispersal model.
{"title":"Deep learning insights into distinct patterns of polygenic adaptation across human populations.","authors":"Devashish Tripathi, Chandrika Bhattacharyya, Analabha Basu","doi":"10.1093/nar/gkae1027","DOIUrl":"10.1093/nar/gkae1027","url":null,"abstract":"<p><p>Response to spatiotemporal variation in selection gradients resulted in signatures of polygenic adaptation in human genomes. We introduce RAISING, a two-stage deep learning framework that optimizes neural network architecture through hyperparameter tuning before performing feature selection and prediction tasks. We tested RAISING on published and newly designed simulations that incorporate the complex interplay between demographic history and selection gradients. RAISING outperformed Phylogenetic Generalized Least Squares (PGLS), ridge regression and DeepGenomeScan, with significantly higher true positive rates (TPR) in detecting genetic adaptation. It reduced computational time by 60-fold and increased TPR by up to 28% compared to DeepGenomeScan on published data. In more complex demographic simulations, RAISING showed lower false discoveries and significantly higher TPR, up to 17-fold, compared to other methods. RAISING demonstrated robustness with least sensitivity to demographic history, selection gradient and their interactions. We developed a sliding window method for genome-wide implementation of RAISING to overcome the computational challenges of high-dimensional genomic data. Applied to African, European, South Asian and East Asian populations, we identified multiple genomic regions undergoing polygenic selection. Notably, ∼70% of the regions identified in Africans are unique, with broad patterns distinguishing them from non-Africans, corroborating the Out of Africa dispersal model.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weili Miao, Douglas F Porter, Ya Li, Lindsey M Meservey, Yen-Yu Yang, Chengjie Ma, Ian D Ferguson, Vivian B Tien, Timothy M Jack, Luca Ducoli, Vanessa Lopez-Pajares, Shiying Tao, Paul B Savage, Yinsheng Wang, Paul A Khavari
Elevations in intracellular glucose concentrations are essential for epithelial cell differentiation by mechanisms that are not fully understood. Glucose has recently been found to directly bind several proteins to alter their functions to enhance differentiation. Among the newly identified glucose-binding proteins is NSUN2, an RNA-binding protein that we identified as indispensable for epidermal differentiation. Glucose was found to bind conserved sequences within NSUN2, enhancing its binding to S-adenosyl-L-methionine and boosting its enzymatic activity. Additionally, glucose enhanced NSUN2’s proximity to proteins involved in mRNA translation, with NSUN2 modulating global messenger RNA (mRNA) translation, particularly that of key pro-differentiation mRNAs containing m5C modifications, such as GRHL3. Glucose thus engages diverse molecular mechanisms beyond its energetic roles to facilitate cellular differentiation processes.
{"title":"Glucose binds and activates NSUN2 to promote translation and epidermal differentiation","authors":"Weili Miao, Douglas F Porter, Ya Li, Lindsey M Meservey, Yen-Yu Yang, Chengjie Ma, Ian D Ferguson, Vivian B Tien, Timothy M Jack, Luca Ducoli, Vanessa Lopez-Pajares, Shiying Tao, Paul B Savage, Yinsheng Wang, Paul A Khavari","doi":"10.1093/nar/gkae1097","DOIUrl":"https://doi.org/10.1093/nar/gkae1097","url":null,"abstract":"Elevations in intracellular glucose concentrations are essential for epithelial cell differentiation by mechanisms that are not fully understood. Glucose has recently been found to directly bind several proteins to alter their functions to enhance differentiation. Among the newly identified glucose-binding proteins is NSUN2, an RNA-binding protein that we identified as indispensable for epidermal differentiation. Glucose was found to bind conserved sequences within NSUN2, enhancing its binding to S-adenosyl-L-methionine and boosting its enzymatic activity. Additionally, glucose enhanced NSUN2’s proximity to proteins involved in mRNA translation, with NSUN2 modulating global messenger RNA (mRNA) translation, particularly that of key pro-differentiation mRNAs containing m5C modifications, such as GRHL3. Glucose thus engages diverse molecular mechanisms beyond its energetic roles to facilitate cellular differentiation processes.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"125 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abubakar Muhammad, Zsuzsa Sarkadi, Agnisrota Mazumder, Anissia Ait Saada, Thomas van Emden, Matias Capella, Gergely Fekete, Vishnu N Suma Sreechakram, Bassem Al-Sady, Sarah A E Lambert, Balázs Papp, Ramón Ramos Barrales, Sigurd Braun
Heterochromatin plays a critical role in regulating gene expression and maintaining genome integrity. While structural and enzymatic components have been linked to heterochromatin establishment, a comprehensive view of the underlying pathways at diverse heterochromatin domains remains elusive. Here, we developed a systematic approach to identify factors involved in heterochromatin silencing at pericentromeres, subtelomeres and the silent mating type locus in Schizosaccharomyces pombe. Using quantitative measures, iterative genetic screening and domain-specific heterochromatin reporters, we identified 369 mutants with different degrees of reduced or enhanced silencing. As expected, mutations in the core heterochromatin machinery globally decreased silencing. However, most other mutants exhibited distinct qualitative and quantitative profiles that indicate heterochromatin domain-specific functions, as seen for example for metabolic pathways affecting primarily subtelomere silencing. Moreover, similar phenotypic profiles revealed shared functions for subunits within complexes. We further discovered that the uncharacterized protein Dhm2 plays a crucial role in heterochromatin maintenance, affecting the inheritance of H3K9 methylation and the clonal propagation of the repressed state. Additionally, Dhm2 loss resulted in delayed S-phase progression and replication stress. Collectively, our systematic approach unveiled a landscape of domain-specific heterochromatin regulators controlling distinct states and identified Dhm2 as a previously unknown factor linked to heterochromatin inheritance and replication fidelity.
{"title":"A systematic quantitative approach comprehensively defines domain-specific functional pathways linked to Schizosaccharomyces pombe heterochromatin regulation.","authors":"Abubakar Muhammad, Zsuzsa Sarkadi, Agnisrota Mazumder, Anissia Ait Saada, Thomas van Emden, Matias Capella, Gergely Fekete, Vishnu N Suma Sreechakram, Bassem Al-Sady, Sarah A E Lambert, Balázs Papp, Ramón Ramos Barrales, Sigurd Braun","doi":"10.1093/nar/gkae1024","DOIUrl":"https://doi.org/10.1093/nar/gkae1024","url":null,"abstract":"<p><p>Heterochromatin plays a critical role in regulating gene expression and maintaining genome integrity. While structural and enzymatic components have been linked to heterochromatin establishment, a comprehensive view of the underlying pathways at diverse heterochromatin domains remains elusive. Here, we developed a systematic approach to identify factors involved in heterochromatin silencing at pericentromeres, subtelomeres and the silent mating type locus in Schizosaccharomyces pombe. Using quantitative measures, iterative genetic screening and domain-specific heterochromatin reporters, we identified 369 mutants with different degrees of reduced or enhanced silencing. As expected, mutations in the core heterochromatin machinery globally decreased silencing. However, most other mutants exhibited distinct qualitative and quantitative profiles that indicate heterochromatin domain-specific functions, as seen for example for metabolic pathways affecting primarily subtelomere silencing. Moreover, similar phenotypic profiles revealed shared functions for subunits within complexes. We further discovered that the uncharacterized protein Dhm2 plays a crucial role in heterochromatin maintenance, affecting the inheritance of H3K9 methylation and the clonal propagation of the repressed state. Additionally, Dhm2 loss resulted in delayed S-phase progression and replication stress. Collectively, our systematic approach unveiled a landscape of domain-specific heterochromatin regulators controlling distinct states and identified Dhm2 as a previously unknown factor linked to heterochromatin inheritance and replication fidelity.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trinh Trung Duong Nguyen, Ziaurrehman Tanoli, Saad Hassan, Umut Onur Özcan, Jimmy Caroli, Albert J Kooistra, David E Gloriam, Alexander S Hauser
Pharmacogenomics, the study of how an individual's genetic makeup influences their response to medications, is a rapidly evolving field with significant implications for personalized medicine. As researchers and healthcare professionals face challenges in exploring the intricate relationships between genetic profiles and therapeutic outcomes, the demand for effective and user-friendly tools to access and analyze genetic data related to drug responses continues to grow. To address these challenges, we have developed PGxDB, an interactive, web-based platform specifically designed for comprehensive pharmacogenomics research. PGxDB enables the analysis across a wide range of genetic and drug response data types - informing cell-based validations and translational treatment strategies. We developed a pipeline that uniquely combines the relationship between medications indexed with Anatomical Therapeutic Chemical (ATC) codes with molecular target profiles with their genetic variability and predicted variant effects. This enables scientists from diverse backgrounds - including molecular scientists and clinicians - to link genetic variability to curated drug response variability and investigate indication or treatment associations in a single resource. With PGxDB, we aim to catalyze innovations in pharmacogenomics research, empower drug discovery, support clinical decision-making, and pave the way for more effective treatment regimens. PGxDB is a freely accessible database available at https://pgx-db.org/
{"title":"PGxDB: an interactive web-platform for pharmacogenomics research","authors":"Trinh Trung Duong Nguyen, Ziaurrehman Tanoli, Saad Hassan, Umut Onur Özcan, Jimmy Caroli, Albert J Kooistra, David E Gloriam, Alexander S Hauser","doi":"10.1093/nar/gkae1127","DOIUrl":"https://doi.org/10.1093/nar/gkae1127","url":null,"abstract":"Pharmacogenomics, the study of how an individual's genetic makeup influences their response to medications, is a rapidly evolving field with significant implications for personalized medicine. As researchers and healthcare professionals face challenges in exploring the intricate relationships between genetic profiles and therapeutic outcomes, the demand for effective and user-friendly tools to access and analyze genetic data related to drug responses continues to grow. To address these challenges, we have developed PGxDB, an interactive, web-based platform specifically designed for comprehensive pharmacogenomics research. PGxDB enables the analysis across a wide range of genetic and drug response data types - informing cell-based validations and translational treatment strategies. We developed a pipeline that uniquely combines the relationship between medications indexed with Anatomical Therapeutic Chemical (ATC) codes with molecular target profiles with their genetic variability and predicted variant effects. This enables scientists from diverse backgrounds - including molecular scientists and clinicians - to link genetic variability to curated drug response variability and investigate indication or treatment associations in a single resource. With PGxDB, we aim to catalyze innovations in pharmacogenomics research, empower drug discovery, support clinical decision-making, and pave the way for more effective treatment regimens. PGxDB is a freely accessible database available at https://pgx-db.org/","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"12 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Proteome-Wide Association Study (PWAS) is a protein-based genetic association approach designed to complement traditional variant-based methods like GWAS. PWAS operates in two stages: first, machine learning models predict the impact of genetic variants on protein-coding genes, generating effect scores. These scores are then aggregated into a gene-damaging score for each individual. This score is then used in case-control statistical tests to significantly link to specific phenotypes. PWAS Hub (v1.2) is a user-friendly platform that facilitates the exploration of gene-disease associations using clinical and genetic data from the UK Biobank (UKB), encompassing 500k individuals. PWAS Hub reports on 819 diseases and phenotypes determined by PheCode and ICD-10 clinical codes, each with a minimum of 400 affected individuals. PWAS-derived gene associations were reported for 72% of the tested phenotypes. The PWAS Hub also analyzes gene associations separately for males and females, considering sex-specific genetic effects, inheritance patterns (dominant and recessive), and gene pleiotropy. We illustrated the utility of the PWAS Hub for primary (essential) hypertension (I10), type 2 diabetes mellitus (E11), and specified haematuria (R31) that showed sex-dependent genetic signals. The PWAS Hub, available at pwas.huji.ac.il, is a valuable resource for studying genetic contributions to common diseases and sex-specific effects.
{"title":"PWAS Hub: exploring gene-based associations of complex diseases with sex dependency","authors":"Roei Zucker, Guy Kelman, Michal Linial","doi":"10.1093/nar/gkae1125","DOIUrl":"https://doi.org/10.1093/nar/gkae1125","url":null,"abstract":"The Proteome-Wide Association Study (PWAS) is a protein-based genetic association approach designed to complement traditional variant-based methods like GWAS. PWAS operates in two stages: first, machine learning models predict the impact of genetic variants on protein-coding genes, generating effect scores. These scores are then aggregated into a gene-damaging score for each individual. This score is then used in case-control statistical tests to significantly link to specific phenotypes. PWAS Hub (v1.2) is a user-friendly platform that facilitates the exploration of gene-disease associations using clinical and genetic data from the UK Biobank (UKB), encompassing 500k individuals. PWAS Hub reports on 819 diseases and phenotypes determined by PheCode and ICD-10 clinical codes, each with a minimum of 400 affected individuals. PWAS-derived gene associations were reported for 72% of the tested phenotypes. The PWAS Hub also analyzes gene associations separately for males and females, considering sex-specific genetic effects, inheritance patterns (dominant and recessive), and gene pleiotropy. We illustrated the utility of the PWAS Hub for primary (essential) hypertension (I10), type 2 diabetes mellitus (E11), and specified haematuria (R31) that showed sex-dependent genetic signals. The PWAS Hub, available at pwas.huji.ac.il, is a valuable resource for studying genetic contributions to common diseases and sex-specific effects.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"27 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan
By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.
{"title":"Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources","authors":"Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan","doi":"10.1093/nar/gkae1080","DOIUrl":"https://doi.org/10.1093/nar/gkae1080","url":null,"abstract":"By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"250 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}