Pub Date : 2025-04-01Epub Date: 2025-01-09DOI: 10.1007/s00439-024-02723-9
Mohammad Faraz Zafeer, Memoona Ramzan, Duygu Duman, Ahmet Mutlu, Serhat Seyhan, M Tayyar Kalcioglu, Suat Fitoz, Brooke A DeRosa, Shengru Guo, Derek M Dykxhoorn, Mustafa Tekin
Developmental anomalies of the hearing organ, the cochlea, are diagnosed in approximately one-fourth of individuals with congenital. The majority of patients with cochlear malformations remain etiologically undiagnosed due to insufficient knowledge about underlying genes or the inability to make conclusive interpretations of identified genetic variants. We used exome sequencing for the genetic evaluation of hearing loss associated with cochlear malformations in three probands from unrelated families deafness. We subsequently generated monoclonal induced pluripotent stem cell (iPSC) lines, bearing patient-specific knockins and knockouts using CRISPR/Cas9 to assess pathogenicity of candidate variants. We detected FGF3 (p.Arg165Gly) and GREB1L (p.Cys186Arg), variants of uncertain significance in two recognized genes for deafness, and PBXIP1(p.Trp574*) in a candidate gene. Upon differentiation of iPSCs towards inner ear organoids, we observed developmental aberrations in knockout lines compared to their isogenic controls. Patient-specific single nucleotide variants (SNVs) showed similar abnormalities as the knockout lines, functionally supporting their causality in the observed phenotype. Therefore, we present human inner ear organoids as a potential tool to validate the pathogenicity of DNA variants associated with cochlear malformations.
{"title":"Human organoids for rapid validation of gene variants linked to cochlear malformations.","authors":"Mohammad Faraz Zafeer, Memoona Ramzan, Duygu Duman, Ahmet Mutlu, Serhat Seyhan, M Tayyar Kalcioglu, Suat Fitoz, Brooke A DeRosa, Shengru Guo, Derek M Dykxhoorn, Mustafa Tekin","doi":"10.1007/s00439-024-02723-9","DOIUrl":"10.1007/s00439-024-02723-9","url":null,"abstract":"<p><p>Developmental anomalies of the hearing organ, the cochlea, are diagnosed in approximately one-fourth of individuals with congenital. The majority of patients with cochlear malformations remain etiologically undiagnosed due to insufficient knowledge about underlying genes or the inability to make conclusive interpretations of identified genetic variants. We used exome sequencing for the genetic evaluation of hearing loss associated with cochlear malformations in three probands from unrelated families deafness. We subsequently generated monoclonal induced pluripotent stem cell (iPSC) lines, bearing patient-specific knockins and knockouts using CRISPR/Cas9 to assess pathogenicity of candidate variants. We detected FGF3 (p.Arg165Gly) and GREB1L (p.Cys186Arg), variants of uncertain significance in two recognized genes for deafness, and PBXIP1(p.Trp574*) in a candidate gene. Upon differentiation of iPSCs towards inner ear organoids, we observed developmental aberrations in knockout lines compared to their isogenic controls. Patient-specific single nucleotide variants (SNVs) showed similar abnormalities as the knockout lines, functionally supporting their causality in the observed phenotype. Therefore, we present human inner ear organoids as a potential tool to validate the pathogenicity of DNA variants associated with cochlear malformations.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"375-389"},"PeriodicalIF":3.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-03-17DOI: 10.1007/s00439-025-02735-z
Giacomo Francesco Ena, Aaron Giménez, Annabel Carballo-Mesa, Petra Lišková, Marcos Araújo Castro E Silva, David Comas
The Roma people have a complex demographic history shaped by their recent dispersal from a South Asian origin into Europe, accompanied by continuous population bottlenecks and gene flow. After settling in the Balkans around 1,000 years ago, the Roma gradually dispersed across Europe, and approximately 500 years ago, they established in the Iberian Peninsula what is now one of the largest Roma populations in Western Europe. Focusing specifically on the Iberian Roma, we conducted the most comprehensive genome-wide analysis of European Roma populations to date. Using allele frequency and haplotype-based methods, we analysed 181 individuals to investigate their genetic diversity, social dynamics, and migration histories at both continental and local scales. Our findings demonstrate significant gene flow from populations encountered during the Roma's dispersal and confirm their South Asian origins. We show that, between the 14th and 19th centuries, the Roma spread westward from the Balkans in various waves, with multiple admixture events. Furthermore, our findings refute previous hypotheses of a North African dispersal route into Iberia and genetic connections to Jewish populations. The Iberian Roma exhibit ten times greater genetic differentiation compared to non-Roma Iberians, indicating significant regional substructure. Additionally, we provide the first genetic evidence of assortative mating within Roma groups, highlighting distinct mating patterns and suggesting a gradual shift towards increased integration with non-Roma individuals. This study significantly enhances our understanding of how demographic history and complex genetic structure have shaped the genetic diversity of Roma populations, while also highlighting the influence of their evolving social dynamics.
{"title":"The genetic footprint of the European Roma diaspora: evidence from the Balkans to the Iberian Peninsula.","authors":"Giacomo Francesco Ena, Aaron Giménez, Annabel Carballo-Mesa, Petra Lišková, Marcos Araújo Castro E Silva, David Comas","doi":"10.1007/s00439-025-02735-z","DOIUrl":"10.1007/s00439-025-02735-z","url":null,"abstract":"<p><p>The Roma people have a complex demographic history shaped by their recent dispersal from a South Asian origin into Europe, accompanied by continuous population bottlenecks and gene flow. After settling in the Balkans around 1,000 years ago, the Roma gradually dispersed across Europe, and approximately 500 years ago, they established in the Iberian Peninsula what is now one of the largest Roma populations in Western Europe. Focusing specifically on the Iberian Roma, we conducted the most comprehensive genome-wide analysis of European Roma populations to date. Using allele frequency and haplotype-based methods, we analysed 181 individuals to investigate their genetic diversity, social dynamics, and migration histories at both continental and local scales. Our findings demonstrate significant gene flow from populations encountered during the Roma's dispersal and confirm their South Asian origins. We show that, between the 14th and 19th centuries, the Roma spread westward from the Balkans in various waves, with multiple admixture events. Furthermore, our findings refute previous hypotheses of a North African dispersal route into Iberia and genetic connections to Jewish populations. The Iberian Roma exhibit ten times greater genetic differentiation compared to non-Roma Iberians, indicating significant regional substructure. Additionally, we provide the first genetic evidence of assortative mating within Roma groups, highlighting distinct mating patterns and suggesting a gradual shift towards increased integration with non-Roma individuals. This study significantly enhances our understanding of how demographic history and complex genetic structure have shaped the genetic diversity of Roma populations, while also highlighting the influence of their evolving social dynamics.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"463-479"},"PeriodicalIF":3.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143648425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-02-04DOI: 10.1007/s00439-025-02729-x
Arvind Srinivasan, Dorota Magner, Piotr Kozłowski, Anna Philips, Arkadiusz Kajdasz, Paweł Wojciechowski, Marzena Wojciechowska
Myotonic dystrophy type 1 (DM1) and type 2 (DM2) are autosomal dominant neuromuscular disorders associated with expansions of microsatellites, respectively, in DMPK and CNBP. Their pathogenesis is linked to the global aberrant alternative splicing (AAS) of many genes and marks mostly muscular and neuronal tissues, while blood is the least affected. Recent data in DM1 skeletal muscles indicated that abnormalities in RNA metabolism also include global upregulation of circular RNAs (circRNAs). CircRNAs are a heterogeneous group considered splicing errors and by-products of canonical splicing. To elucidate whether circRNA dysregulation is an inherent feature of the myotonic environment, we perform their analysis in the frontal cortex and whole blood of DM1 and DM2 patients. We find a global elevation of circRNAs in both tissues, and its magnitude is neither correlated with the differences in their parental gene expression nor is associated with AAS published earlier. Aberrantly spliced cassette exons of linear transcripts affected in DM1 and DM2 are not among the circularized exons, which unique genomic features prerequisite back-splicing. However, the blueprint of the AAS of linear RNAs is found in a variety of circRNA isoforms. The heterogeneity of circRNAs also originates from the utilization of exonic and intronic cryptic donors/acceptors in back splice junctions, and intron-containing circRNAs are more characteristic of the blood. Overall, this study reveals circRNA dysregulation in various tissues from DM1 and DM2; however, their levels do not correlate with the AAS in linear RNAs, suggesting a potential independent regulatory mechanism underlying circRNA upregulation in myotonic dystrophy.
{"title":"Global dysregulation of circular RNAs in frontal cortex and whole blood from DM1 and DM2.","authors":"Arvind Srinivasan, Dorota Magner, Piotr Kozłowski, Anna Philips, Arkadiusz Kajdasz, Paweł Wojciechowski, Marzena Wojciechowska","doi":"10.1007/s00439-025-02729-x","DOIUrl":"10.1007/s00439-025-02729-x","url":null,"abstract":"<p><p>Myotonic dystrophy type 1 (DM1) and type 2 (DM2) are autosomal dominant neuromuscular disorders associated with expansions of microsatellites, respectively, in DMPK and CNBP. Their pathogenesis is linked to the global aberrant alternative splicing (AAS) of many genes and marks mostly muscular and neuronal tissues, while blood is the least affected. Recent data in DM1 skeletal muscles indicated that abnormalities in RNA metabolism also include global upregulation of circular RNAs (circRNAs). CircRNAs are a heterogeneous group considered splicing errors and by-products of canonical splicing. To elucidate whether circRNA dysregulation is an inherent feature of the myotonic environment, we perform their analysis in the frontal cortex and whole blood of DM1 and DM2 patients. We find a global elevation of circRNAs in both tissues, and its magnitude is neither correlated with the differences in their parental gene expression nor is associated with AAS published earlier. Aberrantly spliced cassette exons of linear transcripts affected in DM1 and DM2 are not among the circularized exons, which unique genomic features prerequisite back-splicing. However, the blueprint of the AAS of linear RNAs is found in a variety of circRNA isoforms. The heterogeneity of circRNAs also originates from the utilization of exonic and intronic cryptic donors/acceptors in back splice junctions, and intron-containing circRNAs are more characteristic of the blood. Overall, this study reveals circRNA dysregulation in various tissues from DM1 and DM2; however, their levels do not correlate with the AAS in linear RNAs, suggesting a potential independent regulatory mechanism underlying circRNA upregulation in myotonic dystrophy.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"417-432"},"PeriodicalIF":3.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003446/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143189244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-01-22DOI: 10.1007/s00439-024-02725-7
Xiaoyu Wang, Wendu Pang, Xin Hu, Tao Shu, Yaxin Luo, Junhong Li, Lan Feng, Ke Qiu, Yufang Rao, Yao Song, Minzi Mao, Yuyang Zhang, Jianjun Ren, Yu Zhao
The genetic relationship between migraine and stroke remains underexplored, particularly in the context of druggable targets. Previous studies have been limited by small sample sizes and a lack of focus on genetic-targeted therapies for these conditions. We analyzed the association and causality between migraine and stroke using multivariable logistic regression in the UK Biobank cohort and Mendelian randomization (MR) analyses based on genome-wide association study (GWAS) data. Integrating expression quantitative trait loci (eQTLs) data from blood and brain regions, we explored the phenotypic and genetic links between migraine medications, drug target, and stroke. Additionally, we explored novel druggable genes for migraine and evaluated their effects on migraine signaling molecules and stroke risk. Migraine was significantly associated with stroke, particularly ischemic stroke (IS) and intracerebral hemorrhage (ICH), with MR analysis confirming a causal link to ICH. HTR1A emerged as a potential link between antidepressants (preventive medications for migraine) and stroke. We identified 17 migraine-related druggable genes, with 5 genes (HMGCR, TGFB1, TGFB3, KCNK5, IMPDH2) associated with nine existing drugs. Further MR analysis identified correlation of CELSR3 and IMPDH2 with cGMP pathway marker PRKG1, and identified KCNK5, PLXNB1, and MDK as novel migraine-associated druggable genes significantly linked to the stroke risks. These findings established the phenotypic and genetic link between migraine, its medication and stroke, identifying potential targets for single and dual-purpose therapies for migraine and stoke, and emphasized the need for further research to validate these associations.
{"title":"Conventional and genetic association between migraine and stroke with druggable genome-wide Mendelian randomization.","authors":"Xiaoyu Wang, Wendu Pang, Xin Hu, Tao Shu, Yaxin Luo, Junhong Li, Lan Feng, Ke Qiu, Yufang Rao, Yao Song, Minzi Mao, Yuyang Zhang, Jianjun Ren, Yu Zhao","doi":"10.1007/s00439-024-02725-7","DOIUrl":"10.1007/s00439-024-02725-7","url":null,"abstract":"<p><p>The genetic relationship between migraine and stroke remains underexplored, particularly in the context of druggable targets. Previous studies have been limited by small sample sizes and a lack of focus on genetic-targeted therapies for these conditions. We analyzed the association and causality between migraine and stroke using multivariable logistic regression in the UK Biobank cohort and Mendelian randomization (MR) analyses based on genome-wide association study (GWAS) data. Integrating expression quantitative trait loci (eQTLs) data from blood and brain regions, we explored the phenotypic and genetic links between migraine medications, drug target, and stroke. Additionally, we explored novel druggable genes for migraine and evaluated their effects on migraine signaling molecules and stroke risk. Migraine was significantly associated with stroke, particularly ischemic stroke (IS) and intracerebral hemorrhage (ICH), with MR analysis confirming a causal link to ICH. HTR1A emerged as a potential link between antidepressants (preventive medications for migraine) and stroke. We identified 17 migraine-related druggable genes, with 5 genes (HMGCR, TGFB1, TGFB3, KCNK5, IMPDH2) associated with nine existing drugs. Further MR analysis identified correlation of CELSR3 and IMPDH2 with cGMP pathway marker PRKG1, and identified KCNK5, PLXNB1, and MDK as novel migraine-associated druggable genes significantly linked to the stroke risks. These findings established the phenotypic and genetic link between migraine, its medication and stroke, identifying potential targets for single and dual-purpose therapies for migraine and stoke, and emphasized the need for further research to validate these associations.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"391-404"},"PeriodicalIF":3.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143004687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-04-09DOI: 10.1007/s00439-025-02736-y
Chanelle Warton, Danya F Vears
Background: The increasing integration of non-invasive prenatal testing (NIPT) into antenatal practice and public healthcare systems globally raises both significant challenges in standardising service delivery and important ethical questions around routinisation and reproductive autonomy. This systematic review aims to synthesise existing primary empirical research on healthcare professionals' views on and experiences with NIPT.
Methods: A systematic search was conducted across four major databases in September 2023 and repeated in December 2024. Studies that reported findings from primary empirical research, including quantitative, qualitative and mixed methods research were included.
Results: Searches returned 65 eligible articles, spanning 38 countries and 1 special administrative region and at least 12 professions. Views on who NIPT should have access to and which conditions should be screened for were influenced by perceived clinical utility. While healthcare professionals acknowledged NIPT as beneficial for supporting reproductive autonomy, concerns were raised about the amount and complexity of information to be conveyed during prenatal counseling and potential pressure to test. Cost was also identified as a significant barrier. Challenges reported during post-test counseling included communicating test failures and gaining information from laboratories. Views on the implications of NIPT for decision-making around abortion and for people with disabilities varied.
Conclusions: Healthcare professionals play a critical role in facilitating the access to and decisions by pregnant people around prenatal genetic testing. Addressing barriers in clinical practice and increasing consistency across and access to clinical guidelines and education resources may support healthcare professionals in supporting reproductive autonomy.
{"title":"Healthcare professionals' perspectives on and experiences with non-invasive prenatal testing: a systematic review.","authors":"Chanelle Warton, Danya F Vears","doi":"10.1007/s00439-025-02736-y","DOIUrl":"10.1007/s00439-025-02736-y","url":null,"abstract":"<p><strong>Background: </strong>The increasing integration of non-invasive prenatal testing (NIPT) into antenatal practice and public healthcare systems globally raises both significant challenges in standardising service delivery and important ethical questions around routinisation and reproductive autonomy. This systematic review aims to synthesise existing primary empirical research on healthcare professionals' views on and experiences with NIPT.</p><p><strong>Methods: </strong>A systematic search was conducted across four major databases in September 2023 and repeated in December 2024. Studies that reported findings from primary empirical research, including quantitative, qualitative and mixed methods research were included.</p><p><strong>Results: </strong>Searches returned 65 eligible articles, spanning 38 countries and 1 special administrative region and at least 12 professions. Views on who NIPT should have access to and which conditions should be screened for were influenced by perceived clinical utility. While healthcare professionals acknowledged NIPT as beneficial for supporting reproductive autonomy, concerns were raised about the amount and complexity of information to be conveyed during prenatal counseling and potential pressure to test. Cost was also identified as a significant barrier. Challenges reported during post-test counseling included communicating test failures and gaining information from laboratories. Views on the implications of NIPT for decision-making around abortion and for people with disabilities varied.</p><p><strong>Conclusions: </strong>Healthcare professionals play a critical role in facilitating the access to and decisions by pregnant people around prenatal genetic testing. Addressing barriers in clinical practice and increasing consistency across and access to clinical guidelines and education resources may support healthcare professionals in supporting reproductive autonomy.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":"144 4","pages":"343-374"},"PeriodicalIF":3.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143999424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-07-24DOI: 10.1007/s00439-024-02692-z
Selen Ozkan, Natàlia Padilla, Xavier de la Cruz
Next-generation sequencing (NGS) has revolutionized genetic diagnostics, yet its application in precision medicine remains incomplete, despite significant advances in computational tools for variant annotation. Many variants remain unannotated, and existing tools often fail to accurately predict the range of impacts that variants have on protein function. This limitation restricts their utility in relevant applications such as predicting disease severity and onset age. In response to these challenges, a new generation of computational models is emerging, aimed at producing quantitative predictions of genetic variant impacts. However, the field is still in its early stages, and several issues need to be addressed, including improved performance and better interpretability. This study introduces QAFI, a novel methodology that integrates protein-specific regression models within an ensemble learning framework, utilizing conservation-based and structure-related features derived from AlphaFold models. Our findings indicate that QAFI significantly enhances the accuracy of quantitative predictions across various proteins. The approach has been rigorously validated through its application in the CAGI6 contest, focusing on ARSA protein variants, and further tested on a comprehensive set of clinically labeled variants, demonstrating its generalizability and robust predictive power. The straightforward nature of our models may also contribute to better interpretability of the results.
{"title":"QAFI: a novel method for quantitative estimation of missense variant impact using protein-specific predictors and ensemble learning.","authors":"Selen Ozkan, Natàlia Padilla, Xavier de la Cruz","doi":"10.1007/s00439-024-02692-z","DOIUrl":"10.1007/s00439-024-02692-z","url":null,"abstract":"<p><p>Next-generation sequencing (NGS) has revolutionized genetic diagnostics, yet its application in precision medicine remains incomplete, despite significant advances in computational tools for variant annotation. Many variants remain unannotated, and existing tools often fail to accurately predict the range of impacts that variants have on protein function. This limitation restricts their utility in relevant applications such as predicting disease severity and onset age. In response to these challenges, a new generation of computational models is emerging, aimed at producing quantitative predictions of genetic variant impacts. However, the field is still in its early stages, and several issues need to be addressed, including improved performance and better interpretability. This study introduces QAFI, a novel methodology that integrates protein-specific regression models within an ensemble learning framework, utilizing conservation-based and structure-related features derived from AlphaFold models. Our findings indicate that QAFI significantly enhances the accuracy of quantitative predictions across various proteins. The approach has been rigorously validated through its application in the CAGI6 contest, focusing on ARSA protein variants, and further tested on a comprehensive set of clinically labeled variants, demonstrating its generalizability and robust predictive power. The straightforward nature of our models may also contribute to better interpretability of the results.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"191-208"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141758427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-08-07DOI: 10.1007/s00439-024-02680-3
Jing Zhang, Lisa Kinch, Panagiotis Katsonis, Olivier Lichtarge, Milind Jagota, Yun S Song, Yuanfei Sun, Yang Shen, Nurdan Kuru, Onur Dereli, Ogun Adebali, Muttaqi Ahmad Alladin, Debnath Pal, Emidio Capriotti, Maria Paola Turina, Castrense Savojardo, Pier Luigi Martelli, Giulia Babbi, Rita Casadio, Fabrizio Pucci, Marianne Rooman, Gabriel Cia, Matsvei Tsishyn, Alexey Strokach, Zhiqiang Hu, Warren van Loggerenberg, Frederick P Roth, Predrag Radivojac, Steven E Brenner, Qian Cong, Nick V Grishin
This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.
{"title":"Assessing predictions on fitness effects of missense variants in HMBS in CAGI6.","authors":"Jing Zhang, Lisa Kinch, Panagiotis Katsonis, Olivier Lichtarge, Milind Jagota, Yun S Song, Yuanfei Sun, Yang Shen, Nurdan Kuru, Onur Dereli, Ogun Adebali, Muttaqi Ahmad Alladin, Debnath Pal, Emidio Capriotti, Maria Paola Turina, Castrense Savojardo, Pier Luigi Martelli, Giulia Babbi, Rita Casadio, Fabrizio Pucci, Marianne Rooman, Gabriel Cia, Matsvei Tsishyn, Alexey Strokach, Zhiqiang Hu, Warren van Loggerenberg, Frederick P Roth, Predrag Radivojac, Steven E Brenner, Qian Cong, Nick V Grishin","doi":"10.1007/s00439-024-02680-3","DOIUrl":"10.1007/s00439-024-02680-3","url":null,"abstract":"<p><p>This paper presents an evaluation of predictions submitted for the \"HMBS\" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"173-189"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2025-01-09DOI: 10.1007/s00439-024-02722-w
Maria Cristina Aspromonte, Alessio Del Conte, Shaowen Zhu, Wuwei Tan, Yang Shen, Yexian Zhang, Qi Li, Maggie Haitian Wang, Giulia Babbi, Samuele Bovo, Pier Luigi Martelli, Rita Casadio, Azza Althagafi, Sumyyah Toonsi, Maxat Kulmanov, Robert Hoehndorf, Panagiotis Katsonis, Amanda Williams, Olivier Lichtarge, Su Xian, Wesley Surento, Vikas Pejaver, Sean D Mooney, Uma Sunderam, Rajgopal Srinivasan, Alessandra Murgia, Damiano Piovesan, Silvio C E Tosatto, Emanuela Leonardi
The Genetics of Neurodevelopmental Disorders Lab in Padua provided a new intellectual disability (ID) Panel challenge for computational methods to predict patient phenotypes and their causal variants in the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6). Eight research teams submitted a total of 30 models to predict phenotypes based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. Here, we assess the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and their causal variants. We also evaluated predictions for possible genetic causes in patients without a clear genetic diagnosis. Like the previous ID Panel challenge in CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (Pathogenic/Likely Pathogenic, Variants of Uncertain Significance and Risk Factors) were provided. The phenotypic traits and variant data of 150 patients from the CAGI5 ID Panel Challenge were provided as training set for predictors. The CAGI6 challenge confirms CAGI5 results that predicting phenotypes from gene panel data is highly challenging, with AUC values close to random, and no method able to predict relevant variants with both high accuracy and precision. However, a significant improvement is noted for the best method, with recall increasing from 66% to 82%. Several groups also successfully predicted difficult-to-detect variants, emphasizing the importance of variants initially excluded by the Padua NDD Lab.
帕多瓦的神经发育障碍遗传学实验室提供了一个新的智力残疾(ID)小组挑战,用于在基因组解释的关键评估,第6版(CAGI6)的背景下预测患者表型及其因果变异的计算方法。8个研究团队共提交了30个模型,基于415名神经发育障碍(ndd)患儿的74个基因序列(VCF格式)预测表型。ndd是临床上和遗传上的异质性疾病,在婴儿时期发病。在这里,我们评估了计算方法的能力和准确性,以预测共病表型为基础的临床特征描述在每个病人和他们的因果变异。我们还评估了对没有明确遗传诊断的患者可能的遗传原因的预测。与CAGI5中之前的ID Panel挑战一样,提供了7个临床特征(ID、ASD、共济失调、癫痫、小头畸形、大头畸形、张力低下)和变异(致病/可能致病、不确定意义变异和危险因素)。来自CAGI5 ID Panel Challenge的150名患者的表型性状和变异数据被提供作为预测者的训练集。CAGI6的挑战证实了CAGI5的结果,即从基因面板数据预测表型是极具挑战性的,AUC值接近随机,没有一种方法能够同时高精度地预测相关变异。然而,最好的方法有一个显著的改进,召回率从66%增加到82%。几个小组也成功地预测了难以检测的变异,强调了帕多瓦NDD实验室最初排除的变异的重要性。
{"title":"CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs).","authors":"Maria Cristina Aspromonte, Alessio Del Conte, Shaowen Zhu, Wuwei Tan, Yang Shen, Yexian Zhang, Qi Li, Maggie Haitian Wang, Giulia Babbi, Samuele Bovo, Pier Luigi Martelli, Rita Casadio, Azza Althagafi, Sumyyah Toonsi, Maxat Kulmanov, Robert Hoehndorf, Panagiotis Katsonis, Amanda Williams, Olivier Lichtarge, Su Xian, Wesley Surento, Vikas Pejaver, Sean D Mooney, Uma Sunderam, Rajgopal Srinivasan, Alessandra Murgia, Damiano Piovesan, Silvio C E Tosatto, Emanuela Leonardi","doi":"10.1007/s00439-024-02722-w","DOIUrl":"10.1007/s00439-024-02722-w","url":null,"abstract":"<p><p>The Genetics of Neurodevelopmental Disorders Lab in Padua provided a new intellectual disability (ID) Panel challenge for computational methods to predict patient phenotypes and their causal variants in the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6). Eight research teams submitted a total of 30 models to predict phenotypes based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. Here, we assess the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and their causal variants. We also evaluated predictions for possible genetic causes in patients without a clear genetic diagnosis. Like the previous ID Panel challenge in CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (Pathogenic/Likely Pathogenic, Variants of Uncertain Significance and Risk Factors) were provided. The phenotypic traits and variant data of 150 patients from the CAGI5 ID Panel Challenge were provided as training set for predictors. The CAGI6 challenge confirms CAGI5 results that predicting phenotypes from gene panel data is highly challenging, with AUC values close to random, and no method able to predict relevant variants with both high accuracy and precision. However, a significant improvement is noted for the best method, with recall increasing from 66% to 82%. Several groups also successfully predicted difficult-to-detect variants, emphasizing the importance of variants initially excluded by the Padua NDD Lab.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"227-242"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976362/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2025-02-20DOI: 10.1007/s00439-024-02724-8
Paola Turina, Maria Petrosino, Carlos A Enriquez Sandoval, Leonore Novak, Alessandra Pasquo, Emil Alexov, Muttaqi Ahmad Alladin, David B Ascher, Giulia Babbi, Constantina Bakolitsa, Rita Casadio, Jianlin Cheng, Piero Fariselli, Lukas Folkman, Akash Kamandula, Panagiotis Katsonis, Minghui Li, Dong Li, Olivier Lichtarge, Sajid Mahmud, Pier Luigi Martelli, Debnath Pal, Shailesh Kumar Panday, Douglas E V Pires, Stephanie Portelli, Fabrizio Pucci, Carlos H M Rodrigues, Marianne Rooman, Castrense Savojardo, Martin Schwersensky, Yang Shen, Alexey V Strokach, Yuanfei Sun, Junwoo Woo, Predrag Radivojac, Steven E Brenner, Roberta Chiaraluce, Valerio Consalvi, Emidio Capriotti
New thermodynamic and functional studies have been recently conducted to evaluate the impact of amino acid substitutions on the Mitogen Activated Protein Kinases 1 and 3 (MAPK1/3). The Critical Assessment of Genome Interpretation (CAGI) data provider, at Sapienza University of Rome, measured the unfolding free energy and the enzymatic activity of a set of variants (MAPK challenge dataset). Thermodynamic measurements for the denaturant-induced equilibrium unfolding of the phosphorylated and unphosphorylated forms of the MAPKs were obtained by monitoring the far-UV circular dichroism and intrinsic fluorescence changes as a function of denaturant concentration. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( ). The enzymatic activity of the phosphorylated MAPKs variants was also measured using Chelation-Enhanced Fluorescence to monitor the phosphorylation of a peptide substrate. The MAPK challenge dataset, composed of a total of 23 single amino acid substitutions (11 and 12 for MAPK1 and MAPK3, respectively), was used to assess the effectiveness of the computational methods in predicting the values, associated with the variants, and categorize them as destabilizing and not destabilizing. The data on the enzymatic activity of the MAPKs mutants were used to assess the performance of the methods for predicting the functional impact of the variants. For the sixth edition of CAGI, thirteen independent research groups from four continents (Asia, Australia, Europe and North America) submitted > 80 sets of predictions, obtained from different approaches. In this manuscript, we summarized the results of our assessment to highlight the possible limitations of the available algorithms.
{"title":"Assessing the predicted impact of single amino acid substitutions in MAPK proteins for CAGI6 challenges.","authors":"Paola Turina, Maria Petrosino, Carlos A Enriquez Sandoval, Leonore Novak, Alessandra Pasquo, Emil Alexov, Muttaqi Ahmad Alladin, David B Ascher, Giulia Babbi, Constantina Bakolitsa, Rita Casadio, Jianlin Cheng, Piero Fariselli, Lukas Folkman, Akash Kamandula, Panagiotis Katsonis, Minghui Li, Dong Li, Olivier Lichtarge, Sajid Mahmud, Pier Luigi Martelli, Debnath Pal, Shailesh Kumar Panday, Douglas E V Pires, Stephanie Portelli, Fabrizio Pucci, Carlos H M Rodrigues, Marianne Rooman, Castrense Savojardo, Martin Schwersensky, Yang Shen, Alexey V Strokach, Yuanfei Sun, Junwoo Woo, Predrag Radivojac, Steven E Brenner, Roberta Chiaraluce, Valerio Consalvi, Emidio Capriotti","doi":"10.1007/s00439-024-02724-8","DOIUrl":"10.1007/s00439-024-02724-8","url":null,"abstract":"<p><p>New thermodynamic and functional studies have been recently conducted to evaluate the impact of amino acid substitutions on the Mitogen Activated Protein Kinases 1 and 3 (MAPK1/3). The Critical Assessment of Genome Interpretation (CAGI) data provider, at Sapienza University of Rome, measured the unfolding free energy and the enzymatic activity of a set of variants (MAPK challenge dataset). Thermodynamic measurements for the denaturant-induced equilibrium unfolding of the phosphorylated and unphosphorylated forms of the MAPKs were obtained by monitoring the far-UV circular dichroism and intrinsic fluorescence changes as a function of denaturant concentration. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( <math><mrow><mi>Δ</mi> <mi>Δ</mi> <msup><mi>G</mi> <mrow><msub><mtext>H</mtext> <mn>2</mn></msub> <mtext>O</mtext></mrow> </msup> </mrow> </math> ). The enzymatic activity of the phosphorylated MAPKs variants was also measured using Chelation-Enhanced Fluorescence to monitor the phosphorylation of a peptide substrate. The MAPK challenge dataset, composed of a total of 23 single amino acid substitutions (11 and 12 for MAPK1 and MAPK3, respectively), was used to assess the effectiveness of the computational methods in predicting the <math><mrow><mi>Δ</mi> <mi>Δ</mi> <msup><mi>G</mi> <mrow><msub><mtext>H</mtext> <mn>2</mn></msub> <mtext>O</mtext></mrow> </msup> </mrow> </math> values, associated with the variants, and categorize them as destabilizing and not destabilizing. The data on the enzymatic activity of the MAPKs mutants were used to assess the performance of the methods for predicting the functional impact of the variants. For the sixth edition of CAGI, thirteen independent research groups from four continents (Asia, Australia, Europe and North America) submitted > 80 sets of predictions, obtained from different approaches. In this manuscript, we summarized the results of our assessment to highlight the possible limitations of the available algorithms.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"265-280"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975483/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143457846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2025-03-21DOI: 10.1007/s00439-025-02732-2
Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E Brenner, Nilah M Ioannidis
Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.
{"title":"Critical assessment of missense variant effect predictors on disease-relevant variant data.","authors":"Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E Brenner, Nilah M Ioannidis","doi":"10.1007/s00439-025-02732-2","DOIUrl":"10.1007/s00439-025-02732-2","url":null,"abstract":"<p><p>Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"281-293"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}