Pub Date : 2024-09-05Epub Date: 2024-08-21DOI: 10.1016/j.ajhg.2024.07.017
K Alaine Broadaway, Sarah M Brotman, Jonathan D Rosen, Kevin W Currin, Abdalla A Alkhawaja, Amy S Etheridge, Fred Wright, Paul Gallins, Dereje Jima, Yi-Hui Zhou, Michael I Love, Federico Innocenti, Karen L Mohlke
Understanding the molecular mechanisms of complex traits is essential for developing targeted interventions. We analyzed liver expression quantitative-trait locus (eQTL) meta-analysis data on 1,183 participants to identify conditionally distinct signals. We found 9,013 eQTL signals for 6,564 genes; 23% of eGenes had two signals, and 6% had three or more signals. We then integrated the eQTL results with data from 29 cardiometabolic genome-wide association study (GWAS) traits and identified 1,582 GWAS-eQTL colocalizations for 747 eGenes. Non-primary eQTL signals accounted for 17% of all colocalizations. Isolating signals by conditional analysis prior to coloc resulted in 37% more colocalizations than using marginal eQTL and GWAS data, highlighting the importance of signal isolation. Isolating signals also led to stronger evidence of colocalization: among 343 eQTL-GWAS signal pairs in multi-signal regions, analyses that isolated the signals of interest resulted in higher posterior probability of colocalization for 41% of tests. Leveraging allelic heterogeneity, we predicted causal effects of gene expression on liver traits for four genes. To predict functional variants and regulatory elements, we colocalized eQTL with liver chromatin accessibility QTL (caQTL) and found 391 colocalizations, including 73 with non-primary eQTL signals and 60 eQTL signals that colocalized with both a caQTL and a GWAS signal. Finally, we used publicly available massively parallel reporter assays in HepG2 to highlight 14 eQTL signals that include at least one expression-modulating variant. This multi-faceted approach to unraveling the genetic underpinnings of liver-related traits could lead to therapeutic development.
{"title":"Liver eQTL meta-analysis illuminates potential molecular mechanisms of cardiometabolic traits.","authors":"K Alaine Broadaway, Sarah M Brotman, Jonathan D Rosen, Kevin W Currin, Abdalla A Alkhawaja, Amy S Etheridge, Fred Wright, Paul Gallins, Dereje Jima, Yi-Hui Zhou, Michael I Love, Federico Innocenti, Karen L Mohlke","doi":"10.1016/j.ajhg.2024.07.017","DOIUrl":"10.1016/j.ajhg.2024.07.017","url":null,"abstract":"<p><p>Understanding the molecular mechanisms of complex traits is essential for developing targeted interventions. We analyzed liver expression quantitative-trait locus (eQTL) meta-analysis data on 1,183 participants to identify conditionally distinct signals. We found 9,013 eQTL signals for 6,564 genes; 23% of eGenes had two signals, and 6% had three or more signals. We then integrated the eQTL results with data from 29 cardiometabolic genome-wide association study (GWAS) traits and identified 1,582 GWAS-eQTL colocalizations for 747 eGenes. Non-primary eQTL signals accounted for 17% of all colocalizations. Isolating signals by conditional analysis prior to coloc resulted in 37% more colocalizations than using marginal eQTL and GWAS data, highlighting the importance of signal isolation. Isolating signals also led to stronger evidence of colocalization: among 343 eQTL-GWAS signal pairs in multi-signal regions, analyses that isolated the signals of interest resulted in higher posterior probability of colocalization for 41% of tests. Leveraging allelic heterogeneity, we predicted causal effects of gene expression on liver traits for four genes. To predict functional variants and regulatory elements, we colocalized eQTL with liver chromatin accessibility QTL (caQTL) and found 391 colocalizations, including 73 with non-primary eQTL signals and 60 eQTL signals that colocalized with both a caQTL and a GWAS signal. Finally, we used publicly available massively parallel reporter assays in HepG2 to highlight 14 eQTL signals that include at least one expression-modulating variant. This multi-faceted approach to unraveling the genetic underpinnings of liver-related traits could lead to therapeutic development.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"1899-1913"},"PeriodicalIF":8.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142034892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05Epub Date: 2024-08-13DOI: 10.1016/j.ajhg.2024.07.013
Michael T Parsons, Miguel de la Hoya, Marcy E Richardson, Emma Tudini, Michael Anderson, Windy Berkofsky-Fessler, Sandrine M Caputo, Raymond C Chan, Melissa S Cline, Bing-Jian Feng, Cristina Fortuno, Encarna Gomez-Garcia, Johanna Hadler, Susan Hiraki, Megan Holdren, Claude Houdayer, Kathleen Hruska, Paul James, Rachid Karam, Huei San Leong, Alexandra Martins, Arjen R Mensenkamp, Alvaro N Monteiro, Vaishnavi Nathan, Robert O'Connor, Inge Sokilde Pedersen, Tina Pesaran, Paolo Radice, Gunnar Schmidt, Melissa Southey, Sean Tavtigian, Bryony A Thompson, Amanda E Toland, Clare Turnbull, Maartje J Vogel, Jamie Weyandt, George A R Wiggins, Lauren Zec, Fergus J Couch, Logan C Walker, Maaike P G Vreeswijk, David E Goldgar, Amanda B Spurdle
The ENIGMA research consortium develops and applies methods to determine clinical significance of variants in hereditary breast and ovarian cancer genes. An ENIGMA BRCA1/2 classification sub-group, formed in 2015 as a ClinGen external expert panel, evolved into a ClinGen internal Variant Curation Expert Panel (VCEP) to align with Food and Drug Administration recognized processes for ClinVar contributions. The VCEP reviewed American College of Medical Genetics and Genomics/Association of Molecular Pathology (ACMG/AMP) classification criteria for relevance to interpreting BRCA1 and BRCA2 variants. Statistical methods were used to calibrate evidence strength for different data types. Pilot specifications were tested on 40 variants and documentation revised for clarity and ease of use. The original criterion descriptions for 13 evidence codes were considered non-applicable or overlapping with other criteria. Scenario of use was extended or re-purposed for eight codes. Extensive analysis and/or data review informed specification descriptions and weights for all codes. Specifications were applied to pilot variants with pre-existing ClinVar classification as follows: 13 uncertain significance or conflicting, 14 pathogenic and/or likely pathogenic, and 13 benign and/or likely benign. Review resolved classification for 11/13 uncertain significance or conflicting variants and retained or improved confidence in classification for the remaining variants. Alignment of pre-existing ENIGMA research classification processes with ACMG/AMP classification guidelines highlighted several gaps in the research processes and the baseline ACMG/AMP criteria. Calibration of evidence strength was key to justify utility and strength of different data types for gene-specific application. The gene-specific criteria demonstrated value for improving ACMG/AMP-aligned classification of BRCA1 and BRCA2 variants.
{"title":"Evidence-based recommendations for gene-specific ACMG/AMP variant classification from the ClinGen ENIGMA BRCA1 and BRCA2 Variant Curation Expert Panel.","authors":"Michael T Parsons, Miguel de la Hoya, Marcy E Richardson, Emma Tudini, Michael Anderson, Windy Berkofsky-Fessler, Sandrine M Caputo, Raymond C Chan, Melissa S Cline, Bing-Jian Feng, Cristina Fortuno, Encarna Gomez-Garcia, Johanna Hadler, Susan Hiraki, Megan Holdren, Claude Houdayer, Kathleen Hruska, Paul James, Rachid Karam, Huei San Leong, Alexandra Martins, Arjen R Mensenkamp, Alvaro N Monteiro, Vaishnavi Nathan, Robert O'Connor, Inge Sokilde Pedersen, Tina Pesaran, Paolo Radice, Gunnar Schmidt, Melissa Southey, Sean Tavtigian, Bryony A Thompson, Amanda E Toland, Clare Turnbull, Maartje J Vogel, Jamie Weyandt, George A R Wiggins, Lauren Zec, Fergus J Couch, Logan C Walker, Maaike P G Vreeswijk, David E Goldgar, Amanda B Spurdle","doi":"10.1016/j.ajhg.2024.07.013","DOIUrl":"10.1016/j.ajhg.2024.07.013","url":null,"abstract":"<p><p>The ENIGMA research consortium develops and applies methods to determine clinical significance of variants in hereditary breast and ovarian cancer genes. An ENIGMA BRCA1/2 classification sub-group, formed in 2015 as a ClinGen external expert panel, evolved into a ClinGen internal Variant Curation Expert Panel (VCEP) to align with Food and Drug Administration recognized processes for ClinVar contributions. The VCEP reviewed American College of Medical Genetics and Genomics/Association of Molecular Pathology (ACMG/AMP) classification criteria for relevance to interpreting BRCA1 and BRCA2 variants. Statistical methods were used to calibrate evidence strength for different data types. Pilot specifications were tested on 40 variants and documentation revised for clarity and ease of use. The original criterion descriptions for 13 evidence codes were considered non-applicable or overlapping with other criteria. Scenario of use was extended or re-purposed for eight codes. Extensive analysis and/or data review informed specification descriptions and weights for all codes. Specifications were applied to pilot variants with pre-existing ClinVar classification as follows: 13 uncertain significance or conflicting, 14 pathogenic and/or likely pathogenic, and 13 benign and/or likely benign. Review resolved classification for 11/13 uncertain significance or conflicting variants and retained or improved confidence in classification for the remaining variants. Alignment of pre-existing ENIGMA research classification processes with ACMG/AMP classification guidelines highlighted several gaps in the research processes and the baseline ACMG/AMP criteria. Calibration of evidence strength was key to justify utility and strength of different data types for gene-specific application. The gene-specific criteria demonstrated value for improving ACMG/AMP-aligned classification of BRCA1 and BRCA2 variants.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2044-2058"},"PeriodicalIF":8.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141981497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05Epub Date: 2024-08-21DOI: 10.1016/j.ajhg.2024.07.018
Malvika Tejura, Shawn Fayer, Abbye E McEwen, Jake Flynn, Lea M Starita, Douglas M Fowler
In silico variant effect predictions are available for nearly all missense variants but played a minimal role in clinical variant classification because they were deemed to provide only supporting evidence. Recently, the ClinGen Sequence Variant Interpretation (SVI) Working Group updated recommendations for variant effect prediction use. By analyzing control pathogenic and benign variants across all genes, they were able to compute evidence strength for predictor score intervals with some intervals generating moderate, strong, or even very strong evidence. However, this genome-wide approach could obscure heterogeneous predictor performance in different genes. We quantified the gene-by-gene performance of two top predictors, REVEL and BayesDel, by analyzing control variants in each predictor score interval in 3,668 disease-relevant genes. Approximately 10% of intervals had sufficient control variants for analysis, and ∼70% of these intervals exceeded the maximum number of incorrect predictions implied by the SVI recommendations. These trending discordant intervals arose owing to the divergence of the gene-specific distribution of predictions from the genome-wide distribution, suggesting that gene-specific calibration is needed in many cases. Approximately 22% of ClinVar missense variants of uncertain significance in genes we analyzed (REVEL = 100,629, BayesDel = 71,928) had predictions in trending discordant intervals. Thus, genome-wide calibrations could result in many variants receiving inappropriate evidence strength. To facilitate a review of the SVI's calibrations, we developed a web application enabling visualization of gene-specific predictions and trending concordant and discordant intervals.
{"title":"Calibration of variant effect predictors on genome-wide data masks heterogeneous performance across genes.","authors":"Malvika Tejura, Shawn Fayer, Abbye E McEwen, Jake Flynn, Lea M Starita, Douglas M Fowler","doi":"10.1016/j.ajhg.2024.07.018","DOIUrl":"10.1016/j.ajhg.2024.07.018","url":null,"abstract":"<p><p>In silico variant effect predictions are available for nearly all missense variants but played a minimal role in clinical variant classification because they were deemed to provide only supporting evidence. Recently, the ClinGen Sequence Variant Interpretation (SVI) Working Group updated recommendations for variant effect prediction use. By analyzing control pathogenic and benign variants across all genes, they were able to compute evidence strength for predictor score intervals with some intervals generating moderate, strong, or even very strong evidence. However, this genome-wide approach could obscure heterogeneous predictor performance in different genes. We quantified the gene-by-gene performance of two top predictors, REVEL and BayesDel, by analyzing control variants in each predictor score interval in 3,668 disease-relevant genes. Approximately 10% of intervals had sufficient control variants for analysis, and ∼70% of these intervals exceeded the maximum number of incorrect predictions implied by the SVI recommendations. These trending discordant intervals arose owing to the divergence of the gene-specific distribution of predictions from the genome-wide distribution, suggesting that gene-specific calibration is needed in many cases. Approximately 22% of ClinVar missense variants of uncertain significance in genes we analyzed (REVEL = 100,629, BayesDel = 71,928) had predictions in trending discordant intervals. Thus, genome-wide calibrations could result in many variants receiving inappropriate evidence strength. To facilitate a review of the SVI's calibrations, we developed a web application enabling visualization of gene-specific predictions and trending concordant and discordant intervals.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2031-2043"},"PeriodicalIF":8.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142034891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1016/j.ajhg.2024.07.019
Courtney Thaxton, Leslie G Biesecker, Marina DiStefano, Melissa Haendel, Ada Hamosh, Emma Owens, Sharon E Plon, Heidi L Rehm, Jonathan S Berg
A core task when establishing the strength of evidence for a gene's role in a monogenic disorder is determining the appropriate disease entity to curate. Establishing this concept determines which evidence can be applied and quantified toward the final gene-disease validity, variant pathogenicity, or actionability classification. Genes with implications in more than one phenotype can necessitate a process of lumping and splitting, disease reorganization, and updates to disease nomenclature. Reappraisal of the names that are used as labels for disease entities is therefore a necessary and perpetual process. The Clinical Genome Resource (ClinGen), in collaboration with representatives from Monarch Disease Ontology (Mondo) and Online Inheritance in Man (OMIM), formed the Disease Naming Advisory Committee (DNAC) to develop guidance for groups faced with the need to establish the "curated disease entity" for gene-phenotype validity and variant pathogenicity and to update disease names for clinical use when necessary. The objective of this group was to harmonize guidance for disease naming across these nosologic entities and among ClinGen curation groups in collaboration with other disease-related professional groups. Here, we present the initial guidance developed by the DNAC with representative examples provided by the ClinGen expert panels and working groups that warranted nomenclature updates. We also discuss the broader implications of these efforts and their benefits for harmonization of gene-disease validity curation. Overall, this work sheds light on current inconsistencies and/or discrepancies and is designed to engage the broader community on how ClinGen defines monogenic disorders using a consistent approach for disease naming.
{"title":"Implementation of a dyadic nomenclature for monogenic diseases.","authors":"Courtney Thaxton, Leslie G Biesecker, Marina DiStefano, Melissa Haendel, Ada Hamosh, Emma Owens, Sharon E Plon, Heidi L Rehm, Jonathan S Berg","doi":"10.1016/j.ajhg.2024.07.019","DOIUrl":"10.1016/j.ajhg.2024.07.019","url":null,"abstract":"<p><p>A core task when establishing the strength of evidence for a gene's role in a monogenic disorder is determining the appropriate disease entity to curate. Establishing this concept determines which evidence can be applied and quantified toward the final gene-disease validity, variant pathogenicity, or actionability classification. Genes with implications in more than one phenotype can necessitate a process of lumping and splitting, disease reorganization, and updates to disease nomenclature. Reappraisal of the names that are used as labels for disease entities is therefore a necessary and perpetual process. The Clinical Genome Resource (ClinGen), in collaboration with representatives from Monarch Disease Ontology (Mondo) and Online Inheritance in Man (OMIM), formed the Disease Naming Advisory Committee (DNAC) to develop guidance for groups faced with the need to establish the \"curated disease entity\" for gene-phenotype validity and variant pathogenicity and to update disease names for clinical use when necessary. The objective of this group was to harmonize guidance for disease naming across these nosologic entities and among ClinGen curation groups in collaboration with other disease-related professional groups. Here, we present the initial guidance developed by the DNAC with representative examples provided by the ClinGen expert panels and working groups that warranted nomenclature updates. We also discuss the broader implications of these efforts and their benefits for harmonization of gene-disease validity curation. Overall, this work sheds light on current inconsistencies and/or discrepancies and is designed to engage the broader community on how ClinGen defines monogenic disorders using a consistent approach for disease naming.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"111 9","pages":"1810-1818"},"PeriodicalIF":8.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142144959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05Epub Date: 2024-08-07DOI: 10.1016/j.ajhg.2024.07.006
Katherine A Wood, R Spencer Tong, Marialetizia Motta, Viviana Cordeddu, Eleanor R Scimone, Stephen J Bush, Dale W Maxwell, Eleni Giannoulatou, Viviana Caputo, Alice Traversa, Cecilia Mancini, Giovanni B Ferrero, Francesco Benedicenti, Paola Grammatico, Daniela Melis, Katharina Steindl, Nicola Brunetti-Pierri, Eva Trevisson, Andrew Om Wilkie, Angela E Lin, Valerie Cormier-Daire, Stephen Rf Twigg, Marco Tartaglia, Anne Goriely
While it is widely thought that de novo mutations (DNMs) occur randomly, we previously showed that some DNMs are enriched because they are positively selected in the testes of aging men. These "selfish" mutations cause disorders with a shared presentation of features, including exclusive paternal origin, significant increase of the father's age, and high apparent germline mutation rate. To date, all known selfish mutations cluster within the components of the RTK-RAS-MAPK signaling pathway, a critical modulator of testicular homeostasis. Here, we demonstrate the selfish nature of the SMAD4 DNMs causing Myhre syndrome (MYHRS). By analyzing 16 informative trios, we show that MYHRS-causing DNMs originated on the paternally derived allele in all cases. We document a statistically significant epidemiological paternal age effect of 6.3 years excess for fathers of MYHRS probands. We developed an ultra-sensitive assay to quantify spontaneous MYHRS-causing SMAD4 variants in sperm and show that pathogenic variants at codon 500 are found at elevated level in sperm of most men and exhibit a strong positive correlation with donor's age, indicative of a high apparent germline mutation rate. Finally, we performed in vitro assays to validate the peculiar functional behavior of the clonally selected DNMs and explored the basis of the pathophysiology of the different SMAD4 sperm-enriched variants. Taken together, these data provide compelling evidence that SMAD4, a gene operating outside the canonical RAS-MAPK signaling pathway, is associated with selfish spermatogonial selection and raises the possibility that other genes/pathways are under positive selection in the aging human testis.
{"title":"SMAD4 mutations causing Myhre syndrome are under positive selection in the male germline.","authors":"Katherine A Wood, R Spencer Tong, Marialetizia Motta, Viviana Cordeddu, Eleanor R Scimone, Stephen J Bush, Dale W Maxwell, Eleni Giannoulatou, Viviana Caputo, Alice Traversa, Cecilia Mancini, Giovanni B Ferrero, Francesco Benedicenti, Paola Grammatico, Daniela Melis, Katharina Steindl, Nicola Brunetti-Pierri, Eva Trevisson, Andrew Om Wilkie, Angela E Lin, Valerie Cormier-Daire, Stephen Rf Twigg, Marco Tartaglia, Anne Goriely","doi":"10.1016/j.ajhg.2024.07.006","DOIUrl":"10.1016/j.ajhg.2024.07.006","url":null,"abstract":"<p><p>While it is widely thought that de novo mutations (DNMs) occur randomly, we previously showed that some DNMs are enriched because they are positively selected in the testes of aging men. These \"selfish\" mutations cause disorders with a shared presentation of features, including exclusive paternal origin, significant increase of the father's age, and high apparent germline mutation rate. To date, all known selfish mutations cluster within the components of the RTK-RAS-MAPK signaling pathway, a critical modulator of testicular homeostasis. Here, we demonstrate the selfish nature of the SMAD4 DNMs causing Myhre syndrome (MYHRS). By analyzing 16 informative trios, we show that MYHRS-causing DNMs originated on the paternally derived allele in all cases. We document a statistically significant epidemiological paternal age effect of 6.3 years excess for fathers of MYHRS probands. We developed an ultra-sensitive assay to quantify spontaneous MYHRS-causing SMAD4 variants in sperm and show that pathogenic variants at codon 500 are found at elevated level in sperm of most men and exhibit a strong positive correlation with donor's age, indicative of a high apparent germline mutation rate. Finally, we performed in vitro assays to validate the peculiar functional behavior of the clonally selected DNMs and explored the basis of the pathophysiology of the different SMAD4 sperm-enriched variants. Taken together, these data provide compelling evidence that SMAD4, a gene operating outside the canonical RAS-MAPK signaling pathway, is associated with selfish spermatogonial selection and raises the possibility that other genes/pathways are under positive selection in the aging human testis.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"1953-1969"},"PeriodicalIF":8.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11444041/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141905611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05Epub Date: 2024-08-12DOI: 10.1016/j.ajhg.2024.07.010
Christopher J Shore, Sergio Villicaña, Julia S El-Sayed Moustafa, Amy L Roberts, David A Gunn, Veronique Bataille, Panos Deloukas, Tim D Spector, Kerrin S Small, Jordana T Bell
Whole-skin DNA methylation variation has been implicated in several diseases, including melanoma, but its genetic basis has not yet been fully characterized. Using bulk skin tissue samples from 414 healthy female UK twins, we performed twin-based heritability and methylation quantitative trait loci (meQTL) analyses for >400,000 DNA methylation sites. We find that the human skin DNA methylome is on average less heritable than previously estimated in blood and other tissues (mean heritability: 10.02%). meQTL analysis identified local genetic effects influencing DNA methylation at 18.8% (76,442) of tested CpG sites, as well as 1,775 CpG sites associated with at least one distal genetic variant. As a functional follow-up, we performed skin expression QTL (eQTL) analyses in a partially overlapping sample of 604 female twins. Colocalization analysis identified over 3,500 shared genetic effects affecting thousands of CpG sites (10,067) and genes (4,475). Mediation analysis of putative colocalized gene-CpG pairs identified 114 genes with evidence for eQTL effects being mediated by DNA methylation in skin, including in genes implicating skin disease such as ALOX12 and CSPG4. We further explored the relevance of skin meQTLs to skin disease and found that skin meQTLs and CpGs under genetic influence were enriched for multiple skin-related genome-wide and epigenome-wide association signals, including for melanoma and psoriasis. Our findings give insights into the regulatory landscape of epigenomic variation in skin.
全皮肤 DNA 甲基化变异与包括黑色素瘤在内的多种疾病有关,但其遗传基础尚未完全确定。利用 414 例英国健康女性双胞胎的大块皮肤组织样本,我们对超过 40 万个 DNA 甲基化位点进行了基于双胞胎的遗传性和甲基化定量性状位点(meQTL)分析。我们发现,人类皮肤 DNA 甲基化组的平均遗传率低于之前对血液和其他组织的估计(平均遗传率:10.02%)。meQTL 分析确定了影响 18.8% (76,442 个)受测 CpG 位点 DNA 甲基化的局部遗传效应,以及与至少一个远端遗传变异相关的 1,775 个 CpG 位点。作为一项功能性后续研究,我们对部分重叠的 604 对女性双胞胎样本进行了皮肤表达 QTL(eQTL)分析。共定位分析确定了 3500 多个共同的遗传效应,影响数千个 CpG 位点(10,067 个)和基因(4,475 个)。对推测的共定位基因-CpG 对的中介分析发现了 114 个基因,有证据表明皮肤中的 DNA 甲基化介导了 eQTL 效应,包括 ALOX12 和 CSPG4 等与皮肤病有关的基因。我们进一步探讨了皮肤 meQTL 与皮肤病的相关性,发现受遗传影响的皮肤 meQTL 和 CpGs 富集了多个与皮肤相关的全基因组和全表观基因组关联信号,包括黑色素瘤和银屑病。我们的研究结果有助于深入了解皮肤表观基因组变异的调控格局。
{"title":"Genetic effects on the skin methylome in healthy older twins.","authors":"Christopher J Shore, Sergio Villicaña, Julia S El-Sayed Moustafa, Amy L Roberts, David A Gunn, Veronique Bataille, Panos Deloukas, Tim D Spector, Kerrin S Small, Jordana T Bell","doi":"10.1016/j.ajhg.2024.07.010","DOIUrl":"10.1016/j.ajhg.2024.07.010","url":null,"abstract":"<p><p>Whole-skin DNA methylation variation has been implicated in several diseases, including melanoma, but its genetic basis has not yet been fully characterized. Using bulk skin tissue samples from 414 healthy female UK twins, we performed twin-based heritability and methylation quantitative trait loci (meQTL) analyses for >400,000 DNA methylation sites. We find that the human skin DNA methylome is on average less heritable than previously estimated in blood and other tissues (mean heritability: 10.02%). meQTL analysis identified local genetic effects influencing DNA methylation at 18.8% (76,442) of tested CpG sites, as well as 1,775 CpG sites associated with at least one distal genetic variant. As a functional follow-up, we performed skin expression QTL (eQTL) analyses in a partially overlapping sample of 604 female twins. Colocalization analysis identified over 3,500 shared genetic effects affecting thousands of CpG sites (10,067) and genes (4,475). Mediation analysis of putative colocalized gene-CpG pairs identified 114 genes with evidence for eQTL effects being mediated by DNA methylation in skin, including in genes implicating skin disease such as ALOX12 and CSPG4. We further explored the relevance of skin meQTLs to skin disease and found that skin meQTLs and CpGs under genetic influence were enriched for multiple skin-related genome-wide and epigenome-wide association signals, including for melanoma and psoriasis. Our findings give insights into the regulatory landscape of epigenomic variation in skin.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"1932-1952"},"PeriodicalIF":8.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141974864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1016/j.ajhg.2024.08.010
Junyoung Kim,Kai Wang,Chunhua Weng,Cong Liu
Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.
{"title":"Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease.","authors":"Junyoung Kim,Kai Wang,Chunhua Weng,Cong Liu","doi":"10.1016/j.ajhg.2024.08.010","DOIUrl":"https://doi.org/10.1016/j.ajhg.2024.08.010","url":null,"abstract":"Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"31 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1016/j.ajhg.2024.08.018
Derek Shyr,Rounak Dey,Xihao Li,Hufeng Zhou,Eric Boerwinkle,Steve Buyske,Mark Daly,Richard A Gibbs,Ira Hall,Tara Matise,Catherine Reeves,Nathan O Stitziel,Michael Zody,Benjamin M Neale,Xihong Lin
Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.
{"title":"Semi-supervised machine learning method for predicting homogeneous ancestry groups to assess Hardy-Weinberg equilibrium in diverse whole-genome sequencing studies.","authors":"Derek Shyr,Rounak Dey,Xihao Li,Hufeng Zhou,Eric Boerwinkle,Steve Buyske,Mark Daly,Richard A Gibbs,Ira Hall,Tara Matise,Catherine Reeves,Nathan O Stitziel,Michael Zody,Benjamin M Neale,Xihong Lin","doi":"10.1016/j.ajhg.2024.08.018","DOIUrl":"https://doi.org/10.1016/j.ajhg.2024.08.018","url":null,"abstract":"Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"1 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142259771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1016/j.ajhg.2024.08.009
Yuxi Liu,Cheng Peng,Ina S Brorson,Denise G O'Mahony,Rebecca L Kelly,Yujing J Heng,Gabrielle M Baker,Grethe I Grenaker Alnæs,Clara Bodelon,Daniel G Stover,Eliezer M Van Allen,A Heather Eliassen,Vessela N Kristensen,Rulla M Tamimi,Peter Kraft
The tumor immune microenvironment (TIME) plays key roles in tumor progression and response to immunotherapy. Previous studies have identified individual germline variants associated with differences in TIME. Here, we hypothesize that common variants associated with breast cancer risk or cancer-related traits, represented by polygenic risk scores (PRSs), may jointly influence immune features in TIME. We derived 154 immune traits from bulk gene expression profiles of 764 breast tumors and 598 adjacent normal tissue samples from 825 individuals with breast cancer in the Nurses' Health Study (NHS) and NHSII. Immunohistochemical staining of four immune cell markers were available for a subset of 205 individuals. Germline PRSs were calculated for 16 different traits including breast cancer, autoimmune diseases, type 2 diabetes, ages at menarche and menopause, body mass index (BMI), BMI-adjusted waist-to-hip ratio, alcohol intake, and tobacco smoking. Overall, we identified 44 associations between germline PRSs and immune traits at false discovery rate q < 0.25, including 3 associations with q < 0.05. We observed consistent inverse associations of inflammatory bowel disease (IBD) and Crohn disease (CD) PRSs with interferon signaling and STAT1 scores in breast tumor and adjacent normal tissue; these associations were replicated in a Norwegian cohort. Inverse associations were also consistently observed for IBD PRS and B cell abundance in normal tissue. We also observed positive associations between CD PRS and endothelial cell abundance in tumor. Our findings suggest that the genetic mechanisms that influence immune-related diseases are also associated with TIME in breast cancer.
肿瘤免疫微环境(TIME)在肿瘤进展和对免疫疗法的反应中起着关键作用。以往的研究发现了与 TIME 差异相关的单个种系变异。在此,我们假设与乳腺癌风险或癌症相关特征相关的常见变异(以多基因风险评分(PRS)为代表)可能会共同影响 TIME 中的免疫特征。我们从护士健康研究(NHS)和 NHSII 中 825 名乳腺癌患者的 764 个乳腺肿瘤和 598 个邻近正常组织样本的大量基因表达谱中得出了 154 个免疫特征。对 205 人的子集进行了四种免疫细胞标记物的免疫组化染色。我们计算了 16 种不同性状的种系 PRS,包括乳腺癌、自身免疫性疾病、2 型糖尿病、初潮年龄和绝经年龄、体重指数 (BMI)、BMI 调整后的腰臀比、酒精摄入量和吸烟。总体而言,我们在种系PRS与免疫特征之间发现了44种假性发现率q < 0.25的关联,其中包括3种q < 0.05的关联。我们观察到炎症性肠病(IBD)和克罗恩病(CD)PRS与干扰素信号转导和乳腺肿瘤及邻近正常组织中STAT1评分之间存在一致的反向关联;这些关联在挪威队列中得到了复制。我们还持续观察到 IBD PRS 与正常组织中 B 细胞丰度的反向关联。我们还观察到 CD PRS 与肿瘤中内皮细胞的丰度呈正相关。我们的研究结果表明,影响免疫相关疾病的遗传机制也与乳腺癌的TIME有关。
{"title":"Germline polygenic risk scores are associated with immune gene expression signature and immune cell infiltration in breast cancer.","authors":"Yuxi Liu,Cheng Peng,Ina S Brorson,Denise G O'Mahony,Rebecca L Kelly,Yujing J Heng,Gabrielle M Baker,Grethe I Grenaker Alnæs,Clara Bodelon,Daniel G Stover,Eliezer M Van Allen,A Heather Eliassen,Vessela N Kristensen,Rulla M Tamimi,Peter Kraft","doi":"10.1016/j.ajhg.2024.08.009","DOIUrl":"https://doi.org/10.1016/j.ajhg.2024.08.009","url":null,"abstract":"The tumor immune microenvironment (TIME) plays key roles in tumor progression and response to immunotherapy. Previous studies have identified individual germline variants associated with differences in TIME. Here, we hypothesize that common variants associated with breast cancer risk or cancer-related traits, represented by polygenic risk scores (PRSs), may jointly influence immune features in TIME. We derived 154 immune traits from bulk gene expression profiles of 764 breast tumors and 598 adjacent normal tissue samples from 825 individuals with breast cancer in the Nurses' Health Study (NHS) and NHSII. Immunohistochemical staining of four immune cell markers were available for a subset of 205 individuals. Germline PRSs were calculated for 16 different traits including breast cancer, autoimmune diseases, type 2 diabetes, ages at menarche and menopause, body mass index (BMI), BMI-adjusted waist-to-hip ratio, alcohol intake, and tobacco smoking. Overall, we identified 44 associations between germline PRSs and immune traits at false discovery rate q < 0.25, including 3 associations with q < 0.05. We observed consistent inverse associations of inflammatory bowel disease (IBD) and Crohn disease (CD) PRSs with interferon signaling and STAT1 scores in breast tumor and adjacent normal tissue; these associations were replicated in a Norwegian cohort. Inverse associations were also consistently observed for IBD PRS and B cell abundance in normal tissue. We also observed positive associations between CD PRS and endothelial cell abundance in tumor. Our findings suggest that the genetic mechanisms that influence immune-related diseases are also associated with TIME in breast cancer.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"25 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142259772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1016/j.ajhg.2024.08.007
Christa Ventresca,Daphne O Martschenko,Robbee Wedow,Mete Civelek,James Tabery,Jedidiah Carlson,Stephen C J Parker,Paula S Ramos
Same-sex sexual behavior has long interested genetics researchers in part because, while there is evidence of heritability, the trait as typically defined is associated with fewer offspring. Investigations of this phenomenon began in the 1990s with linkage studies and continue today with the advent of genome-wide association studies. As this body of research grows, so does critical scientific and ethical review of it. Here, we provide a targeted overview of existing genetics studies on same-sex sexual behavior, highlight the ethical and scientific considerations of this nascent field, and provide recommendations developed by the authors to enhance social and ethical responsibility.
{"title":"The methodological and ethical concerns of genetic studies of same-sex sexual behavior.","authors":"Christa Ventresca,Daphne O Martschenko,Robbee Wedow,Mete Civelek,James Tabery,Jedidiah Carlson,Stephen C J Parker,Paula S Ramos","doi":"10.1016/j.ajhg.2024.08.007","DOIUrl":"https://doi.org/10.1016/j.ajhg.2024.08.007","url":null,"abstract":"Same-sex sexual behavior has long interested genetics researchers in part because, while there is evidence of heritability, the trait as typically defined is associated with fewer offspring. Investigations of this phenomenon began in the 1990s with linkage studies and continue today with the advent of genome-wide association studies. As this body of research grows, so does critical scientific and ethical review of it. Here, we provide a targeted overview of existing genetics studies on same-sex sexual behavior, highlight the ethical and scientific considerations of this nascent field, and provide recommendations developed by the authors to enhance social and ethical responsibility.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"45 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142198506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}