首页 > 最新文献

Human Genetics最新文献

英文 中文
Critical assessment of missense variant effect predictors on disease-relevant variant data. 错义变异效应预测因子对疾病相关变异数据的关键评估。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2025-03-21 DOI: 10.1007/s00439-025-02732-2
Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E Brenner, Nilah M Ioannidis

Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.

对用于预测错义变异致病性的计算工具进行定期、系统和独立的评估,对于评估其临床和研究效用并指导未来的改进是必要的。基因组解释的关键评估(CAGI)进行了正在进行的注释-全错义(错义马拉松)挑战,其中错义变异效应预测因子(也称为变异影响预测因子)是在预测提交截止日期后添加到疾病相关数据库中的错义变异进行评估的。在这里,我们评估了提交给CAGI 6 Annotate-All-Missense挑战的预测因子,临床遗传学中常用的预测因子,以及最近开发的深度学习方法。我们在一系列与临床和研究应用相关的设置中检查性能,重点关注评估数据的不同子集以及高特异性和高灵敏度制度。我们的评估揭示了当前方法相对于该领域中较老的、被广泛引用的工具的显著进步。虽然元预测因子往往优于其组成的个体预测因子,但一些较新的个体预测因子的表现与常用的元预测因子相当。预测器的性能在高特异性和高灵敏度制度之间有所不同,强调不同的方法可能适合不同的用例。我们还描述了两种潜在的偏见来源。将等位基因频率作为预测特征的预测器在区分致病性变异和非常罕见的良性变异时往往表现不佳,而根据来自精心设计的变异数据库的致病性标签进行训练的预测器通常继承基因水平的标签不平衡。我们的发现有助于阐明现代错义变异效应预测因子的临床和研究效用,并确定未来发展的潜在领域。
{"title":"Critical assessment of missense variant effect predictors on disease-relevant variant data.","authors":"Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E Brenner, Nilah M Ioannidis","doi":"10.1007/s00439-025-02732-2","DOIUrl":"10.1007/s00439-025-02732-2","url":null,"abstract":"<p><p>Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"281-293"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases. 以基因型和表型为驱动力的人工智能方法,提高遗传疾病的诊断率。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2024-03-23 DOI: 10.1007/s00439-023-02638-x
S Zucca, G Nicora, F De Paoli, M G Carta, R Bellazzi, P Magni, E Rizzo, I Limongelli

Identifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called "Suggested Diagnosis", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality. Starting from (1) the VCF file containing proband's variants, (2) the list of proband's phenotypes encoded in Human Phenotype Ontology terms, and optionally (3) the information about family members (if available), the "Suggested Diagnosis" ranks all the variants according to their machine learning prediction. This method significantly reduces the number of variants that need to be evaluated by geneticists by pinpointing causative variants in the very first positions of the prioritized list. Most importantly, our approach proved to be among the top performers within the CAGI6 Rare Genome Project Challenge, where it was able to rank the true causative variant among the first positions and, uniquely among all the challenge participants, increased the diagnostic yield of 12.5% by solving 2 undiagnosed cases.

识别罕见病患者基因组中的致病变异是一个具有挑战性的问题。为了完成这项任务,我们描述了一个机器学习框架,我们称之为 "建议诊断",其目的是根据外显子组/基因组中基因变异的致病概率确定其优先级。为此,我们的方法利用了美国人类基因组学学会(ACMG)和分子病理学协会(AMP)定义的种系变异解释标准指南、遗传信息、表型相似性和变异质量。建议诊断 "从(1)包含原癌基因变异的 VCF 文件、(2)以人类表型本体术语编码的原癌基因表型列表以及(3)家庭成员信息(如有)开始,根据机器学习预测结果对所有变异进行排序。这种方法通过将致病变体精确定位在优先列表的首位,大大减少了遗传学家需要评估的变体数量。最重要的是,我们的方法被证明是 CAGI6 罕见基因组项目挑战赛中表现最出色的方法之一,它能够将真正的致病变异体排在第一位,并且在所有挑战赛参与者中独一无二地解决了 2 个未诊断病例,从而将诊断率提高了 12.5%。
{"title":"An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases.","authors":"S Zucca, G Nicora, F De Paoli, M G Carta, R Bellazzi, P Magni, E Rizzo, I Limongelli","doi":"10.1007/s00439-023-02638-x","DOIUrl":"10.1007/s00439-023-02638-x","url":null,"abstract":"<p><p>Identifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called \"Suggested Diagnosis\", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality. Starting from (1) the VCF file containing proband's variants, (2) the list of proband's phenotypes encoded in Human Phenotype Ontology terms, and optionally (3) the information about family members (if available), the \"Suggested Diagnosis\" ranks all the variants according to their machine learning prediction. This method significantly reduces the number of variants that need to be evaluated by geneticists by pinpointing causative variants in the very first positions of the prioritized list. Most importantly, our approach proved to be among the top performers within the CAGI6 Rare Genome Project Challenge, where it was able to rank the true causative variant among the first positions and, uniquely among all the challenge participants, increased the diagnostic yield of 12.5% by solving 2 undiagnosed cases.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"159-171"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976766/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140193639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of calmodulin missense variants associated with congenital arrhythmia on the thermal stability and the degree of unfolding. 与先天性心律失常有关的钙调素错义变体对热稳定性和展开程度的影响
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2023-12-28 DOI: 10.1007/s00439-023-02629-y
Giuditta Dal Cortivo, Valerio Marino, Davide Zamboni, Daniele Dell'Orco

Thermal denaturation profiles of proteins that bind several ligands may deviate from the single transition, making their thermodynamic description challenging. We report an empirical method that estimates melting temperatures (Tm) from multi-transition thermal denaturation profiles of 16 variants of calmodulin (CaM) associated with congenital arrhythmia. Differences in Tm estimated by empirical fitting correlate (for apo CaM variants) with those obtained by thermodynamic models. Most CaM variants were more stable than the wild type (WT) in the absence of Ca2+, but less stable in the presence of Ca2+, and displayed either WT-like or higher unfolding percentages in their apo-form, as evaluated by circular dichroism spectroscopy.

与多种配体结合的蛋白质的热变性曲线可能会偏离单一转变,因此对其进行热力学描述具有挑战性。我们报告了一种从与先天性心律失常有关的 16 种钙调素(CaM)变体的多转变热变性曲线估算熔化温度(Tm)的经验方法。经验拟合估算出的 Tm 差异(对于 apo CaM 变体)与热力学模型得出的 Tm 差异相关。大多数 CaM 变体在没有 Ca2+ 的情况下比野生型(WT)更稳定,但在有 Ca2+ 的情况下则不太稳定,而且根据圆二色光谱法的评估,它们的apo-form 要么显示出与 WT 相似的解折百分比,要么显示出更高的解折百分比。
{"title":"Impact of calmodulin missense variants associated with congenital arrhythmia on the thermal stability and the degree of unfolding.","authors":"Giuditta Dal Cortivo, Valerio Marino, Davide Zamboni, Daniele Dell'Orco","doi":"10.1007/s00439-023-02629-y","DOIUrl":"10.1007/s00439-023-02629-y","url":null,"abstract":"<p><p>Thermal denaturation profiles of proteins that bind several ligands may deviate from the single transition, making their thermodynamic description challenging. We report an empirical method that estimates melting temperatures (T<sub>m</sub>) from multi-transition thermal denaturation profiles of 16 variants of calmodulin (CaM) associated with congenital arrhythmia. Differences in T<sub>m</sub> estimated by empirical fitting correlate (for apo CaM variants) with those obtained by thermodynamic models. Most CaM variants were more stable than the wild type (WT) in the absence of Ca<sup>2+</sup>, but less stable in the presence of Ca<sup>2+</sup>, and displayed either WT-like or higher unfolding percentages in their apo-form, as evaluated by circular dichroism spectroscopy.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"337-341"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12163107/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139048655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetic variants and phenotypic data curated for the CAGI6 intellectual disability panel challenge. 遗传变异和表型数据为CAGI6智力残疾小组挑战策划。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2025-02-28 DOI: 10.1007/s00439-025-02733-1
Maria Cristina Aspromonte, Alessio Del Conte, Roberta Polli, Demetrio Baldo, Francesco Benedicenti, Elisa Bettella, Stefania Bigoni, Stefania Boni, Claudia Ciaccio, Stefano D'Arrigo, Ilaria Donati, Elisa Granocchio, Isabella Mammi, Donatella Milani, Susanna Negrin, Margherita Nosadini, Fiorenza Soli, Franco Stanzial, Licia Turolla, Damiano Piovesan, Silvio C E Tosatto, Alessandra Murgia, Emanuela Leonardi

Neurodevelopmental disorders (NDDs) are common conditions including clinically diverse and genetically heterogeneous diseases, such as intellectual disability, autism spectrum disorders, and epilepsy. The intricate genetic underpinnings of NDDs pose a formidable challenge, given their multifaceted genetic architecture and heterogeneous clinical presentations. This work delves into the intricate interplay between genetic variants and phenotypic manifestations in neurodevelopmental disorders, presenting a dataset curated for the Critical Assessment of Genome Interpretation (CAGI6) ID Panel Challenge. The CAGI6 competition serves as a platform for evaluating the efficacy of computational methods in predicting phenotypic outcomes from genetic data. In this study, a targeted gene panel sequencing has been used to investigate the genetic causes of NDDs in a cohort of 415 paediatric patients. We identified 60 pathogenic and 49 likely pathogenic variants in 102 individuals that accounted for 25% of NDD cases in the cohort. The most mutated genes were ANKRD11, MECP2, ARID1B, ASH1L, CHD8, KDM5C, MED12 and PTCHD1 The majority of pathogenic variants were de novo, with some inherited from mildly affected parents. Loss-of-function variants were the most common type of pathogenic variant. In silico analysis tools were used to assess the potential impact of variants on splicing and structural/functional effects of missense variants. The study highlights the challenges in variant interpretation especially in cases with atypical phenotypic manifestations. Overall, this study provides valuable insights into the genetic causes of NDDs and emphasises the importance of understanding the underlying genetic factors for accurate diagnosis, and intervention development in neurodevelopmental conditions.

神经发育障碍(NDDs)是一种常见疾病,包括临床上多种多样的遗传异质性疾病,如智力障碍、自闭症谱系障碍和癫痫。鉴于 NDDs 的多方面遗传结构和异质性临床表现,其错综复杂的遗传基础构成了一项艰巨的挑战。这项研究深入探讨了神经发育障碍中遗传变异与表型表现之间错综复杂的相互作用,并展示了为基因组解读关键评估(CAGI6)ID Panel Challenge(ID Panel Challenge)策划的数据集。CAGI6 竞赛是一个评估计算方法从遗传数据中预测表型结果的有效性的平台。在这项研究中,我们利用靶向基因组测序研究了 415 名儿科患者的 NDD 遗传原因。我们在 102 人中发现了 60 个致病变异基因和 49 个可能致病的变异基因,这些变异基因占该组 NDD 病例的 25%。突变最多的基因是 ANKRD11、MECP2、ARID1B、ASH1L、CHD8、KDM5C、MED12 和 PTCHD1。功能缺失变异是最常见的致病变异类型。研究人员使用硅分析工具评估了变异对剪接的潜在影响以及错义变异的结构/功能影响。该研究强调了变异解读的挑战,尤其是在表型表现不典型的病例中。总之,这项研究为了解 NDDs 的遗传原因提供了宝贵的见解,并强调了了解潜在遗传因素对于准确诊断和制定神经发育疾病干预措施的重要性。
{"title":"Genetic variants and phenotypic data curated for the CAGI6 intellectual disability panel challenge.","authors":"Maria Cristina Aspromonte, Alessio Del Conte, Roberta Polli, Demetrio Baldo, Francesco Benedicenti, Elisa Bettella, Stefania Bigoni, Stefania Boni, Claudia Ciaccio, Stefano D'Arrigo, Ilaria Donati, Elisa Granocchio, Isabella Mammi, Donatella Milani, Susanna Negrin, Margherita Nosadini, Fiorenza Soli, Franco Stanzial, Licia Turolla, Damiano Piovesan, Silvio C E Tosatto, Alessandra Murgia, Emanuela Leonardi","doi":"10.1007/s00439-025-02733-1","DOIUrl":"10.1007/s00439-025-02733-1","url":null,"abstract":"<p><p>Neurodevelopmental disorders (NDDs) are common conditions including clinically diverse and genetically heterogeneous diseases, such as intellectual disability, autism spectrum disorders, and epilepsy. The intricate genetic underpinnings of NDDs pose a formidable challenge, given their multifaceted genetic architecture and heterogeneous clinical presentations. This work delves into the intricate interplay between genetic variants and phenotypic manifestations in neurodevelopmental disorders, presenting a dataset curated for the Critical Assessment of Genome Interpretation (CAGI6) ID Panel Challenge. The CAGI6 competition serves as a platform for evaluating the efficacy of computational methods in predicting phenotypic outcomes from genetic data. In this study, a targeted gene panel sequencing has been used to investigate the genetic causes of NDDs in a cohort of 415 paediatric patients. We identified 60 pathogenic and 49 likely pathogenic variants in 102 individuals that accounted for 25% of NDD cases in the cohort. The most mutated genes were ANKRD11, MECP2, ARID1B, ASH1L, CHD8, KDM5C, MED12 and PTCHD1 The majority of pathogenic variants were de novo, with some inherited from mildly affected parents. Loss-of-function variants were the most common type of pathogenic variant. In silico analysis tools were used to assess the potential impact of variants on splicing and structural/functional effects of missense variants. The study highlights the challenges in variant interpretation especially in cases with atypical phenotypic manifestations. Overall, this study provides valuable insights into the genetic causes of NDDs and emphasises the importance of understanding the underlying genetic factors for accurate diagnosis, and intervention development in neurodevelopmental conditions.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"309-326"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143523342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges. 通过基于结构的方法探索错义突变对蛋白质热力学的影响:来自 CAGI6 挑战的发现。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2024-01-16 DOI: 10.1007/s00439-023-02623-4
Carlos H M Rodrigues, Stephanie Portelli, David B Ascher

Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.

众所周知,错义突变是多种遗传疾病的诱因,因为它们会导致蛋白质发生微妙的单氨基酸变化。因此,了解这些突变对蛋白质稳定性和功能的影响对于揭示疾病机制和开发靶向疗法至关重要。基因组解读关键评估(CAGI)为预测疾病相关突变对蛋白质热力学影响的最先进计算方法提供了一个宝贵的基准平台。在这里,我们报告了我们基于结构的计算方法综合平台的性能,该平台用于评估 CAGI6 中 3 个挑战中影响蛋白质结构和功能的突变:Calmodulin、MAPK1 和 MAPK3。在预测 MAPK1 和 MAPK3 的 ΔΔG 变化时,我们的稳定性预测因子的相关性高达 0.74,AUC 为 1;在预测 Calmodulin 挑战的 AUC 时,相关性高达 0.75。总之,我们的研究强调了基于结构的方法在理解错义突变对蛋白质热力学影响方面的重要性。从 CAGI6 挑战赛中获得的结果有助于我们加深对疾病机理的理解,促进个性化医疗方法的开发。
{"title":"Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges.","authors":"Carlos H M Rodrigues, Stephanie Portelli, David B Ascher","doi":"10.1007/s00439-023-02623-4","DOIUrl":"10.1007/s00439-023-02623-4","url":null,"abstract":"<p><p>Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"327-335"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139472312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-informed protein language models are robust predictors for variant effects. 结构信息蛋白质语言模型是变异效应的稳健预测器。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2024-08-08 DOI: 10.1007/s00439-024-02695-w
Yuanfei Sun, Yang Shen

Emerging variant effect predictors, protein language models (pLMs) learn evolutionary distribution of functional sequences to capture fitness landscape. Considering that variant effects are manifested through biological contexts beyond sequence (such as structure), we first assess how much structure context is learned in sequence-only pLMs and affecting variant effect prediction. And we establish a need to inject into pLMs protein structural context purposely and controllably. We thus introduce a framework of structure-informed pLMs (SI-pLMs), by extending masked sequence denoising to cross-modality denoising for both sequence and structure. Numerical results over deep mutagenesis scanning benchmarks show that our SI-pLMs, even when using smaller models and less data, are robustly top performers against competing methods including other pLMs, which shows that introducing biological context can be more effective at capturing fitness landscape than simply using larger models or bigger data. Case studies reveal that, compared to sequence-only pLMs, SI-pLMs can be better at capturing fitness landscape because (a) learned embeddings of low/high-fitness sequences can be more separable and (b) learned amino-acid distributions of functionally and evolutionarily conserved residues can be of much lower entropy, thus much more conserved, than other residues. Our SI-pLMs are applicable to revising any sequence-only pLMs through model architecture and training objectives. They do not require structure data as model inputs for variant effect prediction and only use structures as context provider and model regularizer during training.

作为新兴的变异效应预测工具,蛋白质语言模型(pLMs)通过学习功能序列的进化分布来捕捉适应性景观。考虑到变异效应是通过序列之外的生物背景(如结构)表现出来的,我们首先评估了纯序列 pLMs 学习到的结构背景对变异效应预测的影响程度。我们认为有必要有目的、可控地将蛋白质结构背景注入 pLM。因此,我们引入了结构信息 pLMs(SI-pLMs)框架,将屏蔽序列去噪扩展到序列和结构的跨模态去噪。对深度诱变扫描基准的数值结果表明,即使使用较小的模型和较少的数据,我们的SI-pLMs在与包括其他pLMs在内的竞争方法的竞争中也能稳健地名列前茅。案例研究表明,与纯序列 pLMs 相比,SI-pLMs 可以更好地捕捉适配性景观,这是因为:(a)低/高适配性序列的学习嵌入更容易分离;(b)功能和进化保守残基的学习氨基酸分布的熵值可能比其他残基低得多,因此保守性也更高。通过模型结构和训练目标,我们的 SI-pLMs 适用于修正任何纯序列 pLMs。它们不需要结构数据作为变异效应预测的模型输入,在训练过程中只使用结构作为上下文提供者和模型规整器。
{"title":"Structure-informed protein language models are robust predictors for variant effects.","authors":"Yuanfei Sun, Yang Shen","doi":"10.1007/s00439-024-02695-w","DOIUrl":"10.1007/s00439-024-02695-w","url":null,"abstract":"<p><p>Emerging variant effect predictors, protein language models (pLMs) learn evolutionary distribution of functional sequences to capture fitness landscape. Considering that variant effects are manifested through biological contexts beyond sequence (such as structure), we first assess how much structure context is learned in sequence-only pLMs and affecting variant effect prediction. And we establish a need to inject into pLMs protein structural context purposely and controllably. We thus introduce a framework of structure-informed pLMs (SI-pLMs), by extending masked sequence denoising to cross-modality denoising for both sequence and structure. Numerical results over deep mutagenesis scanning benchmarks show that our SI-pLMs, even when using smaller models and less data, are robustly top performers against competing methods including other pLMs, which shows that introducing biological context can be more effective at capturing fitness landscape than simply using larger models or bigger data. Case studies reveal that, compared to sequence-only pLMs, SI-pLMs can be better at capturing fitness landscape because (a) learned embeddings of low/high-fitness sequences can be more separable and (b) learned amino-acid distributions of functionally and evolutionarily conserved residues can be of much lower entropy, thus much more conserved, than other residues. Our SI-pLMs are applicable to revising any sequence-only pLMs through model architecture and training objectives. They do not require structure data as model inputs for variant effect prediction and only use structures as context provider and model regularizer during training.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"209-225"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12068927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141906463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A. 对芳基磺化酶A未知变异的酶活性预测评价。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2025-03-08 DOI: 10.1007/s00439-025-02731-3
Shantanu Jain, Marena Trinidad, Thanh Binh Nguyen, Kaiya Jones, Santiago Diaz Neto, Fang Ge, Ailin Glagovsky, Cameron Jones, Giankaleb Moran, Boqi Wang, Kobra Rahimi, Sümeyra Zeynep Çalıcı, Luis R Cedillo, Silvia Berardelli, Buse Özden, Ken Chen, Panagiotis Katsonis, Amanda Williams, Olivier Lichtarge, Sadhna Rana, Swatantra Pradhan, Rajgopal Srinivasan, Rakshanda Sajeed, Dinesh Joshi, Eshel Faraggi, Robert Jernigan, Andrzej Kloczkowski, Jierui Xu, Zigang Song, Selen Özkan, Natàlia Padilla, Xavier de la Cruz, Rocio Acuna-Hidalgo, Andrea Grafmüller, Laura T Jiménez Barrón, Matteo Manfredi, Castrense Savojardo, Giulia Babbi, Pier Luigi Martelli, Rita Casadio, Yuanfei Sun, Shaowen Zhu, Yang Shen, Fabrizio Pucci, Marianne Rooman, Gabriel Cia, Daniele Raimondi, Pauline Hermans, Sofia Kwee, Ella Chen, Courtney Astore, Akash Kamandula, Vikas Pejaver, Rashika Ramola, Michelle Velyunskiy, Daniel Zeiberg, Reet Mishra, Teague Sterling, Jennifer L Goldstein, Jose Lugo-Martinez, Sufyan Kazi, Sindy Li, Kinsey Long, Steven E Brenner, Constantina Bakolitsa, Predrag Radivojac, Dean Suhr, Teryn Suhr, Wyatt T Clark

Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.

为了证明机器学习方法能够准确地确定未知显著性变异(VUS)的临床影响,变体效应预测的持续进展是必要的。为了实现这一目标,ARSA基因组解释关键评估(CAGI)挑战旨在通过利用219个实验检测的Arylsulfatase A (ARSA)基因错义VUS来评估社区提交的变异功能效应预测的性能,从而表征进展。这项挑战涉及15个团队,并评估了来自已建立和最近发布的模型的额外预测。值得注意的是,一个由遗传学和编码训练营参与者开发的模型,在Python中接受了标准机器学习工具的训练,在提交的作品中表现出了卓越的性能。此外,该研究发现,与不那么复杂的技术相比,最先进的深度学习方法在预测性能方面提供了微小但统计上显著的改进。这些发现强调了变异效应预测的效用,以及在遗传和临床研究中使用适度资源训练的模型准确分类VUS的潜力。
{"title":"Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A.","authors":"Shantanu Jain, Marena Trinidad, Thanh Binh Nguyen, Kaiya Jones, Santiago Diaz Neto, Fang Ge, Ailin Glagovsky, Cameron Jones, Giankaleb Moran, Boqi Wang, Kobra Rahimi, Sümeyra Zeynep Çalıcı, Luis R Cedillo, Silvia Berardelli, Buse Özden, Ken Chen, Panagiotis Katsonis, Amanda Williams, Olivier Lichtarge, Sadhna Rana, Swatantra Pradhan, Rajgopal Srinivasan, Rakshanda Sajeed, Dinesh Joshi, Eshel Faraggi, Robert Jernigan, Andrzej Kloczkowski, Jierui Xu, Zigang Song, Selen Özkan, Natàlia Padilla, Xavier de la Cruz, Rocio Acuna-Hidalgo, Andrea Grafmüller, Laura T Jiménez Barrón, Matteo Manfredi, Castrense Savojardo, Giulia Babbi, Pier Luigi Martelli, Rita Casadio, Yuanfei Sun, Shaowen Zhu, Yang Shen, Fabrizio Pucci, Marianne Rooman, Gabriel Cia, Daniele Raimondi, Pauline Hermans, Sofia Kwee, Ella Chen, Courtney Astore, Akash Kamandula, Vikas Pejaver, Rashika Ramola, Michelle Velyunskiy, Daniel Zeiberg, Reet Mishra, Teague Sterling, Jennifer L Goldstein, Jose Lugo-Martinez, Sufyan Kazi, Sindy Li, Kinsey Long, Steven E Brenner, Constantina Bakolitsa, Predrag Radivojac, Dean Suhr, Teryn Suhr, Wyatt T Clark","doi":"10.1007/s00439-025-02731-3","DOIUrl":"10.1007/s00439-025-02731-3","url":null,"abstract":"<p><p>Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"295-308"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143585545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance. 基于蛋白质家族特异性变异数据训练的增强变压器模型可以提高对不确定意义变异的预测能力。
IF 3.6 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2025-01-27 DOI: 10.1007/s00439-025-02727-z
Dinesh Joshi, Swatantra Pradhan, Rakshanda Sajeed, Rajgopal Srinivasan, Sadhna Rana

Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy. Our innovative framework combines zero-shot log odds scores and embeddings from the ESM, an evolutionary scale model as features for training a supervised model on gene variants functionally related to the ARSA gene. The zero-shot log odds score feature captures the generic properties of the proteins learned due to its pre-training on millions of sequences in the UniProt data, while the ESM embeddings for the proteins in the ARSA family capture features specific to the family. We also tested our approach on another enzyme, N-acetyl-glucosaminidase (NAGLU), that belongs to the same superfamily as ARSA. Our results demonstrate that the performance of our family models (augmented ESM models) is either comparable or better than the ESM models. The ARSA model compares favorably with the majority of state-of-the-art predictors on area under precision and recall curve (AUPRC) performance metric. However, the NAGLU model outperforms all pathogenicity predictors evaluated in this study on AUPRC metric. The improved AUPRC has relevance in a diagnostic setting where variant prioritization generally entails identifying a small number of pathogenic variants from a larger number of benign variants. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions. Attention analysis of active sites and binding sites in ARSA and NAGLU proteins shed light on probable mechanisms of pathogenicity for positions that are highly attended.

不确定意义变异(VUS)表示缺乏足够证据来确定与疾病相关的变异,因此对基因检测结果的解释提出了挑战。在这里,我们报告了一种改进的方法来预测Arylsulfatase A (ARSA)基因的VUS,作为基因组解释挑战的关键评估(CAGI6)的一部分。我们的方法使用迁移学习方法,利用预训练的蛋白质语言模型来预测突变对ARSA酶活性的影响,已知ARSA酶的缺乏会导致一种罕见的遗传疾病,异色性脑白质营养不良。我们的创新框架结合了零投对数赔率分数和ESM的嵌入,ESM是一种进化尺度模型,作为训练与ARSA基因功能相关的基因变异的监督模型的特征。零射击对数赔率得分特征捕获了由于对UniProt数据中数百万序列的预训练而学习到的蛋白质的通用特性,而ARSA家族中蛋白质的ESM嵌入捕获了该家族特有的特征。我们还在与ARSA属于同一超家族的另一种酶n -乙酰氨基葡萄糖酶(NAGLU)上测试了我们的方法。我们的结果表明,我们的家族模型(增强ESM模型)的性能与ESM模型相当或更好。ARSA模型在精确度下面积和召回曲线(AUPRC)性能指标上优于大多数最先进的预测器。然而,NAGLU模型在AUPRC度量上优于本研究中评估的所有致病性预测因子。改进的AUPRC在诊断环境中具有相关性,其中变异优先级通常需要从大量良性变异中识别少量致病变异。我们的研究结果还表明,具有稀疏或没有实验变异影响数据的基因,家族变异数据可以作为代理训练数据进行准确预测。对ARSA和NAGLU蛋白活性位点和结合位点的关注分析揭示了这些备受关注的位点的可能致病机制。
{"title":"An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance.","authors":"Dinesh Joshi, Swatantra Pradhan, Rakshanda Sajeed, Rajgopal Srinivasan, Sadhna Rana","doi":"10.1007/s00439-025-02727-z","DOIUrl":"10.1007/s00439-025-02727-z","url":null,"abstract":"<p><p>Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy. Our innovative framework combines zero-shot log odds scores and embeddings from the ESM, an evolutionary scale model as features for training a supervised model on gene variants functionally related to the ARSA gene. The zero-shot log odds score feature captures the generic properties of the proteins learned due to its pre-training on millions of sequences in the UniProt data, while the ESM embeddings for the proteins in the ARSA family capture features specific to the family. We also tested our approach on another enzyme, N-acetyl-glucosaminidase (NAGLU), that belongs to the same superfamily as ARSA. Our results demonstrate that the performance of our family models (augmented ESM models) is either comparable or better than the ESM models. The ARSA model compares favorably with the majority of state-of-the-art predictors on area under precision and recall curve (AUPRC) performance metric. However, the NAGLU model outperforms all pathogenicity predictors evaluated in this study on AUPRC metric. The improved AUPRC has relevance in a diagnostic setting where variant prioritization generally entails identifying a small number of pathogenic variants from a larger number of benign variants. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions. Attention analysis of active sites and binding sites in ARSA and NAGLU proteins shed light on probable mechanisms of pathogenicity for positions that are highly attended.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"143-158"},"PeriodicalIF":3.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143046678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the predicted impact of single amino acid substitutions in calmodulin for CAGI6 challenges. 评估单氨基酸取代钙调素对cag6挑战的预测影响。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2024-12-23 DOI: 10.1007/s00439-024-02720-y
Paola Turina, Giuditta Dal Cortivo, Carlos A Enriquez Sandoval, Emil Alexov, David B Ascher, Giulia Babbi, Constantina Bakolitsa, Rita Casadio, Piero Fariselli, Lukas Folkman, Akash Kamandula, Panagiotis Katsonis, Dong Li, Olivier Lichtarge, Pier Luigi Martelli, Shailesh Kumar Panday, Douglas E V Pires, Stephanie Portelli, Fabrizio Pucci, Carlos H M Rodrigues, Marianne Rooman, Castrense Savojardo, Martin Schwersensky, Yang Shen, Alexey V Strokach, Yuanfei Sun, Junwoo Woo, Predrag Radivojac, Steven E Brenner, Daniele Dell'Orco, Emidio Capriotti

Recent thermodynamic and functional studies have been conducted to evaluate the impact of amino acid substitutions on Calmodulin (CaM). The Critical Assessment of Genome Interpretation (CAGI) data provider at University of Verona (Italy) measured the melting temperature (Tm) and the percentage of unfolding (%unfold) of a set of CaM variants (CaM challenge dataset). Thermodynamic measurements for the equilibrium unfolding of CaM were obtained by monitoring far-UV Circular Dichroism as a function of temperature. These measurements were used to determine the Tm and the percentage of protein remaining unfolded at the highest temperature. The CaM challenge dataset, comprising a total of 15 single amino acid substitutions, was used to evaluate the effectiveness of computational methods in predicting the Tm and unfolding percentages associated with the variants, and categorizing them as destabilizing or not. For the sixth edition of CAGI, nine independent research groups from four continents (Asia, Australia, Europe, and North America) submitted over 52 sets of predictions, derived from various approaches. In this manuscript, we summarize the results of our assessment to highlight the potential limitations of current algorithms and provide insights into the future development of more accurate prediction tools. By evaluating the thermodynamic stability of CaM variants, this study aims to enhance our understanding of the relationship between amino acid substitutions and protein stability, ultimately contributing to more accurate predictions of the effects of genetic variants.

最近进行了热力学和功能研究,以评估氨基酸取代对钙调素(CaM)的影响。维罗纳大学(意大利)的基因组解读关键评估(CAGI)数据提供商测量了一组CaM变体(CaM挑战数据集)的融化温度(Tm)和展开百分比(%展开)。通过监测远紫外圆二色性作为温度的函数,获得了CaM平衡展开的热力学测量。这些测量用于确定Tm和在最高温度下未展开的蛋白质百分比。CaM挑战数据集共包含15个单氨基酸取代,用于评估计算方法在预测与变异相关的Tm和展开百分比以及将其分类为不稳定或不稳定方面的有效性。对于第六版的CAGI,来自四大洲(亚洲、澳大利亚、欧洲和北美)的九个独立研究小组提交了超过52组预测,这些预测来自不同的方法。在本文中,我们总结了我们的评估结果,以突出当前算法的潜在局限性,并为更准确的预测工具的未来发展提供见解。通过对CaM变异的热力学稳定性进行评估,本研究旨在加深我们对氨基酸取代与蛋白质稳定性之间关系的理解,最终有助于更准确地预测遗传变异的影响。
{"title":"Assessing the predicted impact of single amino acid substitutions in calmodulin for CAGI6 challenges.","authors":"Paola Turina, Giuditta Dal Cortivo, Carlos A Enriquez Sandoval, Emil Alexov, David B Ascher, Giulia Babbi, Constantina Bakolitsa, Rita Casadio, Piero Fariselli, Lukas Folkman, Akash Kamandula, Panagiotis Katsonis, Dong Li, Olivier Lichtarge, Pier Luigi Martelli, Shailesh Kumar Panday, Douglas E V Pires, Stephanie Portelli, Fabrizio Pucci, Carlos H M Rodrigues, Marianne Rooman, Castrense Savojardo, Martin Schwersensky, Yang Shen, Alexey V Strokach, Yuanfei Sun, Junwoo Woo, Predrag Radivojac, Steven E Brenner, Daniele Dell'Orco, Emidio Capriotti","doi":"10.1007/s00439-024-02720-y","DOIUrl":"10.1007/s00439-024-02720-y","url":null,"abstract":"<p><p>Recent thermodynamic and functional studies have been conducted to evaluate the impact of amino acid substitutions on Calmodulin (CaM). The Critical Assessment of Genome Interpretation (CAGI) data provider at University of Verona (Italy) measured the melting temperature (T<sub>m</sub>) and the percentage of unfolding (%unfold) of a set of CaM variants (CaM challenge dataset). Thermodynamic measurements for the equilibrium unfolding of CaM were obtained by monitoring far-UV Circular Dichroism as a function of temperature. These measurements were used to determine the T<sub>m</sub> and the percentage of protein remaining unfolded at the highest temperature. The CaM challenge dataset, comprising a total of 15 single amino acid substitutions, was used to evaluate the effectiveness of computational methods in predicting the T<sub>m</sub> and unfolding percentages associated with the variants, and categorizing them as destabilizing or not. For the sixth edition of CAGI, nine independent research groups from four continents (Asia, Australia, Europe, and North America) submitted over 52 sets of predictions, derived from various approaches. In this manuscript, we summarize the results of our assessment to highlight the potential limitations of current algorithms and provide insights into the future development of more accurate prediction tools. By evaluating the thermodynamic stability of CaM variants, this study aims to enhance our understanding of the relationship between amino acid substitutions and protein stability, ultimately contributing to more accurate predictions of the effects of genetic variants.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"113-125"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating predictors of kinase activity of STK11 variants identified in primary human non-small cell lung cancers. 评估原发性人类非小细胞肺癌中STK11变异激酶活性的预测因子。
IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Pub Date : 2025-03-01 Epub Date: 2025-02-12 DOI: 10.1007/s00439-025-02726-0
Yile Chen, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Rita Casadio, Pier Luigi Martelli, Castrense Savojardo, Matteo Manfredi, Yang Shen, Yuanfei Sun, Panagiotis Katsonis, Olivier Lichtarge, Vikas Pejaver, David J Seward, Akash Kamandula, Constantina Bakolitsa, Steven E Brenner, Predrag Radivojac, Anne O'Donnell-Luria, Sean D Mooney, Shantanu Jain

Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, a dataset of 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified in primary non-small cell lung cancer biopsies, was experimentally assayed to characterize computational methods from four participating teams and five publicly available tools. Predictors demonstrated a high level of performance on key evaluation metrics, measuring correlation with the assay outputs and separating loss-of-function (LoF) variants from wildtype-like (WT-like) variants. The best participant model, 3Cnet, performed competitively with well-known tools. Unique to this challenge was that the functional data was generated with both biological and technical replicates, thus allowing the assessors to realistically establish maximum predictive performance based on experimental variability. Three out of the five publicly available tools and 3Cnet approached the performance of the assay replicates in separating LoF variants from WT-like variants. Surprisingly, REVEL, an often-used model, achieved a comparable correlation with the real-valued assay output as that seen for the experimental replicates. Performing variant interpretation by combining the new functional evidence with computational and population data evidence led to 16 new variants receiving a clinically actionable classification of likely pathogenic (LP) or likely benign (LB). Overall, the STK11 challenge highlights the utility of variant effect predictors in biomedical sciences and provides encouraging results for driving research in the field of computational genome interpretation.

考虑到预测变异效应的计算工具在疾病诊断和驱动分子发现中的应用日益增加,对其进行关键评估是很重要的。在第6版基因组解释关键评估(CAGI)挑战中,实验分析了在原发性非小细胞肺癌活检中发现的28个STK11罕见变异(27个错义,1个单氨基酸缺失)的数据集,以表征来自四个参与团队和五个公开可用工具的计算方法。预测者在关键评估指标上表现出高水平的表现,测量与分析输出的相关性,并将功能丧失(LoF)变体与野生型(WT-like)变体区分开来。最佳参与者模型3Cnet与知名工具进行了竞争。该挑战的独特之处在于,功能数据是通过生物和技术复制生成的,因此允许评估人员根据实验可变性实际建立最大的预测性能。五个公开可用的工具中有三个和3Cnet在分离LoF变异和wt样变异方面接近实验重复的性能。令人惊讶的是,REVEL,一个经常使用的模型,实现了与实值分析输出的可比相关性,因为在实验重复中看到。通过将新的功能证据与计算和群体数据证据相结合,进行变异解释,导致16个新变异接受临床可操作的可能致病(LP)或可能良性(LB)分类。总的来说,STK11挑战突出了变异效应预测因子在生物医学科学中的效用,并为推动计算基因组解释领域的研究提供了令人鼓舞的结果。
{"title":"Evaluating predictors of kinase activity of STK11 variants identified in primary human non-small cell lung cancers.","authors":"Yile Chen, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Rita Casadio, Pier Luigi Martelli, Castrense Savojardo, Matteo Manfredi, Yang Shen, Yuanfei Sun, Panagiotis Katsonis, Olivier Lichtarge, Vikas Pejaver, David J Seward, Akash Kamandula, Constantina Bakolitsa, Steven E Brenner, Predrag Radivojac, Anne O'Donnell-Luria, Sean D Mooney, Shantanu Jain","doi":"10.1007/s00439-025-02726-0","DOIUrl":"10.1007/s00439-025-02726-0","url":null,"abstract":"<p><p>Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, a dataset of 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified in primary non-small cell lung cancer biopsies, was experimentally assayed to characterize computational methods from four participating teams and five publicly available tools. Predictors demonstrated a high level of performance on key evaluation metrics, measuring correlation with the assay outputs and separating loss-of-function (LoF) variants from wildtype-like (WT-like) variants. The best participant model, 3Cnet, performed competitively with well-known tools. Unique to this challenge was that the functional data was generated with both biological and technical replicates, thus allowing the assessors to realistically establish maximum predictive performance based on experimental variability. Three out of the five publicly available tools and 3Cnet approached the performance of the assay replicates in separating LoF variants from WT-like variants. Surprisingly, REVEL, an often-used model, achieved a comparable correlation with the real-valued assay output as that seen for the experimental replicates. Performing variant interpretation by combining the new functional evidence with computational and population data evidence led to 16 new variants receiving a clinically actionable classification of likely pathogenic (LP) or likely benign (LB). Overall, the STK11 challenge highlights the utility of variant effect predictors in biomedical sciences and provides encouraging results for driving research in the field of computational genome interpretation.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"127-142"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11976797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143399119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Human Genetics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1