Pub Date : 2024-12-05Epub Date: 2024-11-18DOI: 10.1016/j.ajhg.2024.10.010
Iftikhar J Kullo, Matthew P Conomos, Sarah C Nelson, Sally N Adebamowo, Ananyo Choudhury, David Conti, Stephanie M Fullerton, Stephanie M Gogarten, Ben Heavner, Whitney E Hornsby, Eimear E Kenny, Alyna Khan, Amit V Khera, Yun Li, Iman Martin, Josep M Mercader, Maggie Ng, Laura M Raffield, Alex Reiner, Robb Rowley, Daniel Schaid, Adrienne Stilp, Ken Wiley, Riley Wilson, John S Witte, Pradeep Natarajan
By improving disease risk prediction, polygenic risk scores (PRSs) could have a significant impact on health promotion and disease prevention. Due to the historical oversampling of populations with European ancestry for genome-wide association studies, PRSs perform less well in other, understudied populations, leading to concerns that clinical use in their current forms could widen health care disparities. The PRIMED Consortium was established to develop methods to improve the performance of PRSs in global populations and individuals of diverse genetic ancestry. To this end, PRIMED is aggregating and harmonizing multiple phenotype and genotype datasets on AnVIL, an interoperable secure cloud-based platform, to perform individual- and summary-level analyses using population and statistical genetics approaches. Study sites, the coordinating center, and representatives from the NIH work alongside other NHGRI and global consortia to achieve these goals. PRIMED is also evaluating ethical and social implications of PRS implementation and investigating the joint modeling of social determinants of health and PRS in computing disease risk. The phenotypes of interest are primarily cardiometabolic diseases and cancer, the leading causes of death and disability worldwide. Early deliverables of the consortium include methods for data sharing on AnVIL, development of a common data model to harmonize phenotype and genotype data from cohort studies as well as electronic health records, adaptation of recent guidelines for population descriptors to global cohorts, and sharing of PRS methods/tools. As a multisite collaboration, PRIMED aims to foster equity in the development and use of polygenic risk assessment.
{"title":"The PRIMED Consortium: Reducing disparities in polygenic risk assessment.","authors":"Iftikhar J Kullo, Matthew P Conomos, Sarah C Nelson, Sally N Adebamowo, Ananyo Choudhury, David Conti, Stephanie M Fullerton, Stephanie M Gogarten, Ben Heavner, Whitney E Hornsby, Eimear E Kenny, Alyna Khan, Amit V Khera, Yun Li, Iman Martin, Josep M Mercader, Maggie Ng, Laura M Raffield, Alex Reiner, Robb Rowley, Daniel Schaid, Adrienne Stilp, Ken Wiley, Riley Wilson, John S Witte, Pradeep Natarajan","doi":"10.1016/j.ajhg.2024.10.010","DOIUrl":"10.1016/j.ajhg.2024.10.010","url":null,"abstract":"<p><p>By improving disease risk prediction, polygenic risk scores (PRSs) could have a significant impact on health promotion and disease prevention. Due to the historical oversampling of populations with European ancestry for genome-wide association studies, PRSs perform less well in other, understudied populations, leading to concerns that clinical use in their current forms could widen health care disparities. The PRIMED Consortium was established to develop methods to improve the performance of PRSs in global populations and individuals of diverse genetic ancestry. To this end, PRIMED is aggregating and harmonizing multiple phenotype and genotype datasets on AnVIL, an interoperable secure cloud-based platform, to perform individual- and summary-level analyses using population and statistical genetics approaches. Study sites, the coordinating center, and representatives from the NIH work alongside other NHGRI and global consortia to achieve these goals. PRIMED is also evaluating ethical and social implications of PRS implementation and investigating the joint modeling of social determinants of health and PRS in computing disease risk. The phenotypes of interest are primarily cardiometabolic diseases and cancer, the leading causes of death and disability worldwide. Early deliverables of the consortium include methods for data sharing on AnVIL, development of a common data model to harmonize phenotype and genotype data from cohort studies as well as electronic health records, adaptation of recent guidelines for population descriptors to global cohorts, and sharing of PRS methods/tools. As a multisite collaboration, PRIMED aims to foster equity in the development and use of polygenic risk assessment.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2594-2606"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142674908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05Epub Date: 2024-11-15DOI: 10.1016/j.ajhg.2024.10.009
Marie Saitou, Andy Dahl, Qingbo Wang, Xuanyao Liu
Population-level genetic studies are overwhelmingly biased toward European ancestries. Transferring genetic predictions from European ancestries to other ancestries results in a substantial loss of accuracy. Yet, it remains unclear how much various genetic factors, such as causal effect differences, linkage disequilibrium (LD) differences, or allele frequency differences, contribute to the loss of prediction accuracy across ancestries. In this study, we used gene expression levels in lymphoblastoid cell lines to understand how much each genetic factor contributes to lowered portability of gene expression prediction from European to African ancestries. We found that cis-genetic effects on gene expression are highly similar between European and African individuals. However, we found that allele frequency differences of causal variants have a striking impact on prediction portability. For example, portability is reduced by more than 32% when the causal cis-variant is common (minor allele frequency, MAF >5%) in European samples (training population) but is rarer (MAF <5%) in African samples (prediction population). While large allele frequency differences can decrease portability through increasing LD differences, we also determined that causal allele frequency can significantly impact portability when the impact from LD is substantially controlled. This observation suggests that improving statistical fine-mapping alone does not overcome the loss of portability resulting from differences in causal allele frequency. We conclude that causal cis-eQTL effects are highly similar in European and African individuals, and allele frequency differences have a large impact on the accuracy of gene expression prediction.
{"title":"Allele frequency impacts the cross-ancestry portability of gene expression prediction in lymphoblastoid cell lines.","authors":"Marie Saitou, Andy Dahl, Qingbo Wang, Xuanyao Liu","doi":"10.1016/j.ajhg.2024.10.009","DOIUrl":"10.1016/j.ajhg.2024.10.009","url":null,"abstract":"<p><p>Population-level genetic studies are overwhelmingly biased toward European ancestries. Transferring genetic predictions from European ancestries to other ancestries results in a substantial loss of accuracy. Yet, it remains unclear how much various genetic factors, such as causal effect differences, linkage disequilibrium (LD) differences, or allele frequency differences, contribute to the loss of prediction accuracy across ancestries. In this study, we used gene expression levels in lymphoblastoid cell lines to understand how much each genetic factor contributes to lowered portability of gene expression prediction from European to African ancestries. We found that cis-genetic effects on gene expression are highly similar between European and African individuals. However, we found that allele frequency differences of causal variants have a striking impact on prediction portability. For example, portability is reduced by more than 32% when the causal cis-variant is common (minor allele frequency, MAF >5%) in European samples (training population) but is rarer (MAF <5%) in African samples (prediction population). While large allele frequency differences can decrease portability through increasing LD differences, we also determined that causal allele frequency can significantly impact portability when the impact from LD is substantially controlled. This observation suggests that improving statistical fine-mapping alone does not overcome the loss of portability resulting from differences in causal allele frequency. We conclude that causal cis-eQTL effects are highly similar in European and African individuals, and allele frequency differences have a large impact on the accuracy of gene expression prediction.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2814-2825"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142643243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05Epub Date: 2024-11-22DOI: 10.1016/j.ajhg.2024.10.017
Euan McDonnell, Sarah E Orr, Matthew J Barter, Danielle Rux, Abby Brumwell, Nicola Wrobel, Lee Murphy, Lynne M Overman, Antony K Sorial, David A Young, Jamie Soul, Sarah J Rice
Increasing evidence is emerging to link age-associated complex musculoskeletal diseases, including osteoarthritis (OA), to developmental factors. Multiple studies have shown a functional role for DNA methylation in the genetic mechanisms of OA risk using articular cartilage samples taken from aged individuals, yet knowledge of temporal changes to the methylome during human cartilage development is limited. We quantified DNA methylation at ∼700,000 individual CpGs across the epigenome of developing human chondrocytes in 72 samples ranging from 7 to 21 post-conception weeks. We identified significant changes in 3% of all CpGs and >8,200 developmental differentially methylated regions. We further identified 24 loci at which OA genetic variants colocalize with methylation quantitative trait loci. Through integrating developmental and mature human chondrocyte datasets, we find evidence for functional effects exerted solely in development or throughout the life course. This will have profound impacts on future approaches to translating genetic pathways for therapeutic intervention.
{"title":"The methylomic landscape of human articular cartilage development contains epigenetic signatures of osteoarthritis risk.","authors":"Euan McDonnell, Sarah E Orr, Matthew J Barter, Danielle Rux, Abby Brumwell, Nicola Wrobel, Lee Murphy, Lynne M Overman, Antony K Sorial, David A Young, Jamie Soul, Sarah J Rice","doi":"10.1016/j.ajhg.2024.10.017","DOIUrl":"10.1016/j.ajhg.2024.10.017","url":null,"abstract":"<p><p>Increasing evidence is emerging to link age-associated complex musculoskeletal diseases, including osteoarthritis (OA), to developmental factors. Multiple studies have shown a functional role for DNA methylation in the genetic mechanisms of OA risk using articular cartilage samples taken from aged individuals, yet knowledge of temporal changes to the methylome during human cartilage development is limited. We quantified DNA methylation at ∼700,000 individual CpGs across the epigenome of developing human chondrocytes in 72 samples ranging from 7 to 21 post-conception weeks. We identified significant changes in 3% of all CpGs and >8,200 developmental differentially methylated regions. We further identified 24 loci at which OA genetic variants colocalize with methylation quantitative trait loci. Through integrating developmental and mature human chondrocyte datasets, we find evidence for functional effects exerted solely in development or throughout the life course. This will have profound impacts on future approaches to translating genetic pathways for therapeutic intervention.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2756-2772"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05Epub Date: 2024-11-18DOI: 10.1016/j.ajhg.2024.10.014
Jérôme Delplanque, Lauriane Le Collen, Hélène Loiselle, Audrey Leloire, Bénédicte Toussaint, Emmanuel Vaillant, Guillaume Charpentier, Sylvia Franc, Beverley Balkau, Michel Marre, Emma Henriques, Emmanuel Buse Falay, Mehdi Derhourhi, Philippe Froguel, Amélie Bonnefond
Individuals with obesity caused by biallelic pathogenic LEPR (leptin receptor) variants can benefit from setmelanotide, the novel MC4R agonist. An ongoing phase 3 clinical trial (NCT05093634) includes individuals with obesity who carry a heterozygous LEPR variant, although the obesogenic impact of these variants remains incompletely evaluated. The aim of this study was to functionally assess heterozygous variants in LEPR and to evaluate their effect on obesity. We sequenced LEPR in ∼10,000 participants from the French RaDiO study. We found 86 rare heterozygous variants. Each identified variant was then investigated in vitro using luciferase and western blot assays. Using the criteria of the American College of Medical Genetics and Genomics (ACMG), including the strong criterion related to functional assays, we found 12 pathogenic LEPR variants. Most heterozygotes did not present with obesity, and we found no association between these pathogenic variants and body mass index (BMI). This lack of association between pathogenic LEPR variants and obesity risk or BMI was confirmed using exome data from 200,000 individuals in the UK Biobank. In the literature, among 55 reported heterozygotes for of a rare pathogenic LEPR variant, only 27% had obesity. In conclusion, monoallelic pathogenic LEPR variants were functionally tested, and they do not elevate the risk of obesity or BMI levels. This raises questions about the use of setmelanotide, a costly drug with potential side effects, based solely on the presence of a heterozygous LEPR variant.
{"title":"Monoallelic pathogenic variants in LEPR do not cause obesity.","authors":"Jérôme Delplanque, Lauriane Le Collen, Hélène Loiselle, Audrey Leloire, Bénédicte Toussaint, Emmanuel Vaillant, Guillaume Charpentier, Sylvia Franc, Beverley Balkau, Michel Marre, Emma Henriques, Emmanuel Buse Falay, Mehdi Derhourhi, Philippe Froguel, Amélie Bonnefond","doi":"10.1016/j.ajhg.2024.10.014","DOIUrl":"10.1016/j.ajhg.2024.10.014","url":null,"abstract":"<p><p>Individuals with obesity caused by biallelic pathogenic LEPR (leptin receptor) variants can benefit from setmelanotide, the novel MC4R agonist. An ongoing phase 3 clinical trial (NCT05093634) includes individuals with obesity who carry a heterozygous LEPR variant, although the obesogenic impact of these variants remains incompletely evaluated. The aim of this study was to functionally assess heterozygous variants in LEPR and to evaluate their effect on obesity. We sequenced LEPR in ∼10,000 participants from the French RaDiO study. We found 86 rare heterozygous variants. Each identified variant was then investigated in vitro using luciferase and western blot assays. Using the criteria of the American College of Medical Genetics and Genomics (ACMG), including the strong criterion related to functional assays, we found 12 pathogenic LEPR variants. Most heterozygotes did not present with obesity, and we found no association between these pathogenic variants and body mass index (BMI). This lack of association between pathogenic LEPR variants and obesity risk or BMI was confirmed using exome data from 200,000 individuals in the UK Biobank. In the literature, among 55 reported heterozygotes for of a rare pathogenic LEPR variant, only 27% had obesity. In conclusion, monoallelic pathogenic LEPR variants were functionally tested, and they do not elevate the risk of obesity or BMI levels. This raises questions about the use of setmelanotide, a costly drug with potential side effects, based solely on the presence of a heterozygous LEPR variant.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2668-2674"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142674907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05Epub Date: 2024-11-19DOI: 10.1016/j.ajhg.2024.10.018
Sanni Ruotsalainen, Juha Karjalainen, Mitja Kurki, Elisa Lahtela, Matti Pirinen, Juha Riikonen, Jarmo Ritari, Silja Tammi, Jukka Partanen, Hannele Laivuori, Aarno Palotie, Henrike Heyne, Mark Daly, Elisabeth Widen
Female infertility is a common and complex health problem affecting millions of women worldwide. While multiple factors can contribute to this condition, the underlying cause remains elusive in up to 15%-30% of affected individuals. In our large genome-wide association study (GWAS) of 22,849 women with infertility and 198,989 control individuals from the Finnish population cohort FinnGen, we unveil a landscape of genetic factors associated with the disorder. Our recessive analysis identified a low-frequency stop-gained mutation in TATA-box binding protein-like 2 (TBPL2; c.895A>T [p.Arg299Ter]; minor-allele frequency [MAF] = 1.2%) with an impact comparable to highly penetrant monogenic mutations (odds ratio [OR] = 650, p = 4.1 × 10-25). While previous studies have linked the orthologous gene to anovulation and sterility in knockout mice, the severe consequence of the p.Arg299Ter variant was evidenced by individuals carrying two copies of that variant having significantly fewer offspring (average of 0.16) compared to women belonging to the other genotype groups (average of 1.75 offspring, p = 1.4 × 10-15). Notably, all homozygous women who had given birth had received infertility therapy. Moreover, our age-stratified analyses identified three additional genome-wide significant loci. Two loci were associated with early-onset infertility (diagnosed before age 30), located near CHEK2 and within the major histocompatibility complex (MHC) region. The third locus, associated with late-onset infertility, had its lead SNP located in an intron of a long non-coding RNA (lncRNA) gene. Taken together, our data highlight the significance of rare recessive alleles in shaping female infertility risk. The results further provide evidence supporting specific age-dependent mechanisms underlying this complex disorder.
{"title":"Inherited infertility: Mapping loci associated with impaired female reproduction.","authors":"Sanni Ruotsalainen, Juha Karjalainen, Mitja Kurki, Elisa Lahtela, Matti Pirinen, Juha Riikonen, Jarmo Ritari, Silja Tammi, Jukka Partanen, Hannele Laivuori, Aarno Palotie, Henrike Heyne, Mark Daly, Elisabeth Widen","doi":"10.1016/j.ajhg.2024.10.018","DOIUrl":"10.1016/j.ajhg.2024.10.018","url":null,"abstract":"<p><p>Female infertility is a common and complex health problem affecting millions of women worldwide. While multiple factors can contribute to this condition, the underlying cause remains elusive in up to 15%-30% of affected individuals. In our large genome-wide association study (GWAS) of 22,849 women with infertility and 198,989 control individuals from the Finnish population cohort FinnGen, we unveil a landscape of genetic factors associated with the disorder. Our recessive analysis identified a low-frequency stop-gained mutation in TATA-box binding protein-like 2 (TBPL2; c.895A>T [p.Arg299Ter]; minor-allele frequency [MAF] = 1.2%) with an impact comparable to highly penetrant monogenic mutations (odds ratio [OR] = 650, p = 4.1 × 10<sup>-25</sup>). While previous studies have linked the orthologous gene to anovulation and sterility in knockout mice, the severe consequence of the p.Arg299Ter variant was evidenced by individuals carrying two copies of that variant having significantly fewer offspring (average of 0.16) compared to women belonging to the other genotype groups (average of 1.75 offspring, p = 1.4 × 10<sup>-15</sup>). Notably, all homozygous women who had given birth had received infertility therapy. Moreover, our age-stratified analyses identified three additional genome-wide significant loci. Two loci were associated with early-onset infertility (diagnosed before age 30), located near CHEK2 and within the major histocompatibility complex (MHC) region. The third locus, associated with late-onset infertility, had its lead SNP located in an intron of a long non-coding RNA (lncRNA) gene. Taken together, our data highlight the significance of rare recessive alleles in shaping female infertility risk. The results further provide evidence supporting specific age-dependent mechanisms underlying this complex disorder.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2789-2798"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05Epub Date: 2024-11-18DOI: 10.1016/j.ajhg.2024.10.016
Sai Swaroop Chittoor, Simona Giunta
Secondary structures are non-canonical arrangements of nucleic acids due to intra-strand interactions, including base pairing, stacking, or other higher-order features that deviate from the standard double-helical conformation. While these structures are extensively studied in RNA, they can also form when DNA becomes single stranded, creating topological roadblocks that can impact essential DNA-based processes such as replication, transcription, and repair, ultimately affecting genome stability. The availability of a complete linear sequence of human genomes, including repetitive loci, enables the prediction of DNA secondary structures comparing across various regions. Here, we evaluate the intrinsic properties of linear single-stranded DNA sequences derived from sampling specialized human loci such as centromeres, pericentromeres, ribosomal DNA (rDNA), and coding regions from the CHM13 genome. Our comparative analysis of predicted secondary structures across human chromosomes revealed the heightened presence, complexity, and instability of secondary structures within the centromere, which gradually decreased toward the pericentromere onto chromosomes' arms, on average lowest in coding regions. Notably, centromeric repeats exhibited the highest level of topological complexity within both the active and divergent domains, even when compared to other repetitive tandem satellites, such as rDNA in acrocentric chromosomes. Our findings provide evidence of the intrinsic self-hybridizing properties of centromere repeats, which are capable of generating complex topological structures that may functionally correlate with chromosome missegregation, especially when centromeric chromatin is disrupted. Processes such as long non-coding RNA transcription, recombination, and other mechanisms that dechromatinize and unwind stretches of linear DNA in these regions create in vivo opportunities for the DNA acrobatics hereby predicted.
二级结构是核酸因链内相互作用(包括碱基配对、堆叠或其他偏离标准双螺旋构象的高阶特征)而形成的非规范排列。虽然这些结构在 RNA 中被广泛研究,但当 DNA 变为单链时也会形成这些结构,从而产生拓扑路障,影响以 DNA 为基础的基本过程,如复制、转录和修复,最终影响基因组的稳定性。人类基因组包括重复位点在内的完整线性序列的出现,使得对不同区域的 DNA 二级结构进行比较预测成为可能。在这里,我们评估了线性单链 DNA 序列的内在特性,这些序列来自于取样专门的人类基因座,如中心粒、周中心粒、核糖体 DNA (rDNA) 和 CHM13 基因组的编码区。我们对人类染色体上预测的二级结构进行了比较分析,发现中心粒内二级结构的存在性、复杂性和不稳定性都很高,向染色体臂的近中心粒方向逐渐降低,平均而言,编码区的二级结构最低。值得注意的是,即使与其他重复串联卫星(如非中心染色体中的 rDNA)相比,中心粒重复序列在活跃域和发散域内都表现出最高的拓扑复杂性。我们的研究结果提供了中心粒重复序列内在自杂交特性的证据,它能够产生复杂的拓扑结构,在功能上可能与染色体错分离相关,尤其是当中心粒染色质被破坏时。长非编码 RNA 转录、重组等过程,以及其他使这些区域的线性 DNA 片段脱染色质和解旋的机制,为此处预测的 DNA 杂技表演创造了活体机会。
{"title":"Comparative analysis of predicted DNA secondary structures infers complex human centromere topology.","authors":"Sai Swaroop Chittoor, Simona Giunta","doi":"10.1016/j.ajhg.2024.10.016","DOIUrl":"10.1016/j.ajhg.2024.10.016","url":null,"abstract":"<p><p>Secondary structures are non-canonical arrangements of nucleic acids due to intra-strand interactions, including base pairing, stacking, or other higher-order features that deviate from the standard double-helical conformation. While these structures are extensively studied in RNA, they can also form when DNA becomes single stranded, creating topological roadblocks that can impact essential DNA-based processes such as replication, transcription, and repair, ultimately affecting genome stability. The availability of a complete linear sequence of human genomes, including repetitive loci, enables the prediction of DNA secondary structures comparing across various regions. Here, we evaluate the intrinsic properties of linear single-stranded DNA sequences derived from sampling specialized human loci such as centromeres, pericentromeres, ribosomal DNA (rDNA), and coding regions from the CHM13 genome. Our comparative analysis of predicted secondary structures across human chromosomes revealed the heightened presence, complexity, and instability of secondary structures within the centromere, which gradually decreased toward the pericentromere onto chromosomes' arms, on average lowest in coding regions. Notably, centromeric repeats exhibited the highest level of topological complexity within both the active and divergent domains, even when compared to other repetitive tandem satellites, such as rDNA in acrocentric chromosomes. Our findings provide evidence of the intrinsic self-hybridizing properties of centromere repeats, which are capable of generating complex topological structures that may functionally correlate with chromosome missegregation, especially when centromeric chromatin is disrupted. Processes such as long non-coding RNA transcription, recombination, and other mechanisms that dechromatinize and unwind stretches of linear DNA in these regions create in vivo opportunities for the DNA acrobatics hereby predicted.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2707-2719"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639080/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142674906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05DOI: 10.1016/j.ajhg.2024.11.002
Teri A Manolio, Jahnavi Narula, Alauna Rupert, Carol J Bult, Rex L Chisholm, Geoffrey S Ginsburg, Eric D Green, Gillian Hooker, Gail P Jarvik, George A Mensah, Erin M Ramos, Dan M Roden, Robb Rowley, Casey Overby Taylor, Marc S Williams
{"title":"Genomic medicine year in review: 2024.","authors":"Teri A Manolio, Jahnavi Narula, Alauna Rupert, Carol J Bult, Rex L Chisholm, Geoffrey S Ginsburg, Eric D Green, Gillian Hooker, Gail P Jarvik, George A Mensah, Erin M Ramos, Dan M Roden, Robb Rowley, Casey Overby Taylor, Marc S Williams","doi":"10.1016/j.ajhg.2024.11.002","DOIUrl":"10.1016/j.ajhg.2024.11.002","url":null,"abstract":"","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"111 12","pages":"2585-2588"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142790957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05DOI: 10.1016/j.ajhg.2024.10.021
Stephen F Kingsmore, Meredith Wright, Laurie D Smith, Yupu Liang, William R Mowrey, Liana Protopsaltis, Matthew Bainbridge, Mei Baker, Sergey Batalov, Eric Blincow, Bryant Cao, Sara Caylor, Christina Chambers, Katarzyna Ellsworth, Annette Feigenbaum, Erwin Frise, Lucia Guidugli, Kevin P Hall, Christian Hansen, Mark Kiel, Lucita Van Der Kraan, Chad Krilow, Hugh Kwon, Lakshminarasimha Madhavrao, Sebastien Lefebvre, Jeremy Leipzig, Rebecca Mardach, Barry Moore, Danny Oh, Lauren Olsen, Eric Ontiveros, Mallory J Owen, Rebecca Reimers, Gunter Scharer, Jennifer Schleit, Seth Shelnutt, Shyamal S Mehtalia, Albert Oriol, Erica Sanford, Steve Schwartz, Kristen Wigby, Mary J Willis, Mark Yandell, Chris M Kunard, Thomas Defay
Genome-sequence-based newborn screening (gNBS) has substantial potential to improve outcomes in hundreds of severe childhood genetic disorders (SCGDs). However, a major impediment to gNBS is imprecision due to variants classified as pathogenic (P) or likely pathogenic (LP) that are not SCGD causal. gNBS with 53,855 P/LP variants, 342 genes, 412 SCGDs, and 1,603 therapies was positive in 74% of UK Biobank (UKB470K) adults, suggesting 97% false positives. We used the phenomenon of purifying hyperselection, which acts to decrease the frequency of SCGD causal diplotypes, to reduce false positives. Training of gene-disease-inheritance mode-diplotype tetrads in 618,290 control and affected subjects identified 293 variants or haplotypes and seven genes with variable inheritance contributing higher positive diplotype counts than consistent with purifying hyperselection and with little or no evidence of SCGD causality. With these changes, 2.0% of UKB470K adults were positive. In contrast, gNBS was positive in 7.2% of 3,118 critically ill children with suspected SCGDs and 7.9% of 705 infant deaths. When compared with rapid diagnostic genome sequencing (RDGS), gNBS had 99.1% recall. In eight true-positive children, gNBS was projected to decrease time to diagnosis by a median of 121 days and avoid life-threatening disease presentations in four children, organ damage in six children, ∼$1.25 million in healthcare cost, and ten (1.4%) infant deaths. Federated training predicated on purifying hyperselection provides a general framework to attain high precision in population screening. Federated training across many biobanks and clinical trials can provide a privacy-preserving mechanism for qualification of gNBS in diverse genetic ancestries.
{"title":"Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection.","authors":"Stephen F Kingsmore, Meredith Wright, Laurie D Smith, Yupu Liang, William R Mowrey, Liana Protopsaltis, Matthew Bainbridge, Mei Baker, Sergey Batalov, Eric Blincow, Bryant Cao, Sara Caylor, Christina Chambers, Katarzyna Ellsworth, Annette Feigenbaum, Erwin Frise, Lucia Guidugli, Kevin P Hall, Christian Hansen, Mark Kiel, Lucita Van Der Kraan, Chad Krilow, Hugh Kwon, Lakshminarasimha Madhavrao, Sebastien Lefebvre, Jeremy Leipzig, Rebecca Mardach, Barry Moore, Danny Oh, Lauren Olsen, Eric Ontiveros, Mallory J Owen, Rebecca Reimers, Gunter Scharer, Jennifer Schleit, Seth Shelnutt, Shyamal S Mehtalia, Albert Oriol, Erica Sanford, Steve Schwartz, Kristen Wigby, Mary J Willis, Mark Yandell, Chris M Kunard, Thomas Defay","doi":"10.1016/j.ajhg.2024.10.021","DOIUrl":"10.1016/j.ajhg.2024.10.021","url":null,"abstract":"<p><p>Genome-sequence-based newborn screening (gNBS) has substantial potential to improve outcomes in hundreds of severe childhood genetic disorders (SCGDs). However, a major impediment to gNBS is imprecision due to variants classified as pathogenic (P) or likely pathogenic (LP) that are not SCGD causal. gNBS with 53,855 P/LP variants, 342 genes, 412 SCGDs, and 1,603 therapies was positive in 74% of UK Biobank (UKB470K) adults, suggesting 97% false positives. We used the phenomenon of purifying hyperselection, which acts to decrease the frequency of SCGD causal diplotypes, to reduce false positives. Training of gene-disease-inheritance mode-diplotype tetrads in 618,290 control and affected subjects identified 293 variants or haplotypes and seven genes with variable inheritance contributing higher positive diplotype counts than consistent with purifying hyperselection and with little or no evidence of SCGD causality. With these changes, 2.0% of UKB470K adults were positive. In contrast, gNBS was positive in 7.2% of 3,118 critically ill children with suspected SCGDs and 7.9% of 705 infant deaths. When compared with rapid diagnostic genome sequencing (RDGS), gNBS had 99.1% recall. In eight true-positive children, gNBS was projected to decrease time to diagnosis by a median of 121 days and avoid life-threatening disease presentations in four children, organ damage in six children, ∼$1.25 million in healthcare cost, and ten (1.4%) infant deaths. Federated training predicated on purifying hyperselection provides a general framework to attain high precision in population screening. Federated training across many biobanks and clinical trials can provide a privacy-preserving mechanism for qualification of gNBS in diverse genetic ancestries.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"111 12","pages":"2618-2642"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639087/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142790961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The search for prognostic biomarkers capable of predicting patient outcomes, by analyzing gene expression in tissue samples and other molecular profiles, remains largely focused on single-gene-based or global-gene-search approaches. Gene-centric approaches, while foundational, fail to capture the higher-order dependencies that reflect the activities of co-regulated processes, pathway alterations, and regulatory networks, all of which are crucial in determining the patient outcomes in complex diseases like cancer. Here, we introduce GPS-Net, a computational framework that fills the gap in efficiently identifying prognostic modules by incorporating the holistic pathway structures and the network of gene interactions. By innovatively incorporating advanced multiple kernel learning techniques and network-based regularization, the proposed method not only enhances the accuracy of biomarker and pathway identification but also significantly reduces computational complexity, as demonstrated by extensive simulation studies. Applying GPS-Net, we identified key pathways that are predictive of patient outcomes in a cancer immunotherapy study. Overall, our approach provides a novel framework that renders genome-wide pathway-level prognostic analysis both feasible and scalable, synergizing both mechanism-driven and data-driven methodologies for precision genomics.
{"title":"GPS-Net: Discovering prognostic pathway modules based on network regularized kernel learning.","authors":"Sijie Yao, Kaiqiao Li, Tingyi Li, Xiaoqing Yu, Pei Fen Kuan, Xuefeng Wang","doi":"10.1016/j.ajhg.2024.10.004","DOIUrl":"10.1016/j.ajhg.2024.10.004","url":null,"abstract":"<p><p>The search for prognostic biomarkers capable of predicting patient outcomes, by analyzing gene expression in tissue samples and other molecular profiles, remains largely focused on single-gene-based or global-gene-search approaches. Gene-centric approaches, while foundational, fail to capture the higher-order dependencies that reflect the activities of co-regulated processes, pathway alterations, and regulatory networks, all of which are crucial in determining the patient outcomes in complex diseases like cancer. Here, we introduce GPS-Net, a computational framework that fills the gap in efficiently identifying prognostic modules by incorporating the holistic pathway structures and the network of gene interactions. By innovatively incorporating advanced multiple kernel learning techniques and network-based regularization, the proposed method not only enhances the accuracy of biomarker and pathway identification but also significantly reduces computational complexity, as demonstrated by extensive simulation studies. Applying GPS-Net, we identified key pathways that are predictive of patient outcomes in a cancer immunotherapy study. Overall, our approach provides a novel framework that renders genome-wide pathway-level prognostic analysis both feasible and scalable, synergizing both mechanism-driven and data-driven methodologies for precision genomics.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2826-2838"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05Epub Date: 2024-11-14DOI: 10.1016/j.ajhg.2024.10.012
Shiyang Ma, Fan Wang, Richard Border, Joseph Buxbaum, Noah Zaitlen, Iuliana Ionita-Laza
Local genetic correlation analysis is an important tool for identifying genetic loci with shared biology across traits. Recently, Border et al. have shown that the results of these analyses are confounded by cross-trait assortative mating (xAM), leading to many false-positive findings. Here, we describe LAVA-Knock, a local genetic correlation method that builds off an existing genetic correlation method, LAVA, and augments it by generating synthetic data in a way that preserves local and long-range linkage disequilibrium (LD), allowing us to reduce the confounding induced by xAM. We show in simulations based on a realistic xAM model and in genome-wide association study (GWAS) applications for 630 trait pairs that LAVA-Knock can greatly reduce the bias due to xAM relative to LAVA. Furthermore, we show a significant positive correlation between the reduction in local genetic correlations and estimates in the literature of cross-mate phenotype correlations; in particular, pairs of traits that are known to have high cross-mate phenotype correlation values have a significantly higher reduction in the number of local genetic correlations compared with other trait pairs. A few representative examples include education and intelligence, education and alcohol consumption, and attention-deficit hyperactivity disorder and depression. These results suggest that LAVA-Knock can reduce confounding due to both short-range LD and long-range LD induced by xAM.
局部遗传相关性分析是确定具有跨性状共同生物学特性的遗传位点的重要工具。最近,Border 等人的研究表明,这些分析的结果会受到跨性状同配(xAM)的干扰,从而导致许多假阳性结果。在这里,我们介绍一种局部遗传相关方法 LAVA-Knock,它以现有的遗传相关方法 LAVA 为基础,并通过生成合成数据的方式对其进行增强,从而保留局部和长程连锁不平衡(LD),使我们能够减少 xAM 引起的混杂。我们在基于现实 xAM 模型的模拟和针对 630 个性状对的全基因组关联研究(GWAS)应用中表明,相对于 LAVA,LAVA-Knock 能大大减少 xAM 带来的偏差。此外,我们还发现,局部遗传相关性的降低与文献中对跨配偶表型相关性的估计之间存在显著的正相关;特别是,与其他性状对相比,已知具有较高跨配偶表型相关性值的性状对的局部遗传相关性数量的降低幅度明显更高。一些有代表性的例子包括教育与智力、教育与饮酒、注意力缺陷多动障碍与抑郁。这些结果表明,LAVA-Knock 可以减少由 xAM 引起的短程 LD 和长程 LD 所造成的混杂。
{"title":"Local genetic correlation via knockoffs reduces confounding due to cross-trait assortative mating.","authors":"Shiyang Ma, Fan Wang, Richard Border, Joseph Buxbaum, Noah Zaitlen, Iuliana Ionita-Laza","doi":"10.1016/j.ajhg.2024.10.012","DOIUrl":"10.1016/j.ajhg.2024.10.012","url":null,"abstract":"<p><p>Local genetic correlation analysis is an important tool for identifying genetic loci with shared biology across traits. Recently, Border et al. have shown that the results of these analyses are confounded by cross-trait assortative mating (xAM), leading to many false-positive findings. Here, we describe LAVA-Knock, a local genetic correlation method that builds off an existing genetic correlation method, LAVA, and augments it by generating synthetic data in a way that preserves local and long-range linkage disequilibrium (LD), allowing us to reduce the confounding induced by xAM. We show in simulations based on a realistic xAM model and in genome-wide association study (GWAS) applications for 630 trait pairs that LAVA-Knock can greatly reduce the bias due to xAM relative to LAVA. Furthermore, we show a significant positive correlation between the reduction in local genetic correlations and estimates in the literature of cross-mate phenotype correlations; in particular, pairs of traits that are known to have high cross-mate phenotype correlation values have a significantly higher reduction in the number of local genetic correlations compared with other trait pairs. A few representative examples include education and intelligence, education and alcohol consumption, and attention-deficit hyperactivity disorder and depression. These results suggest that LAVA-Knock can reduce confounding due to both short-range LD and long-range LD induced by xAM.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2839-2848"},"PeriodicalIF":8.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}