Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project

IF 3.8 3区 医学 Q2 GENETICS & HEREDITY Human Genomics Pub Date : 2024-04-29 DOI:10.1186/s40246-024-00604-w
Sarah L. Stenton, Melanie C. O’Leary, Gabrielle Lemire, Grace E. VanNoy, Stephanie DiTroia, Vijay S. Ganesh, Emily Groopman, Emily O’Heir, Brian Mangilog, Ikeoluwa Osei-Owusu, Lynn S. Pais, Jillian Serrano, Moriel Singer-Berk, Ben Weisburd, Michael W. Wilson, Christina Austin-Tse, Marwa Abdelhakim, Azza Althagafi, Giulia Babbi, Riccardo Bellazzi, Samuele Bovo, Maria Giulia Carta, Rita Casadio, Pieter-Jan Coenen, Federica De Paoli, Matteo Floris, Manavalan Gajapathy, Robert Hoehndorf, Julius O. B. Jacobsen, Thomas Joseph, Akash Kamandula, Panagiotis Katsonis, Cyrielle Kint, Olivier Lichtarge, Ivan Limongelli, Yulan Lu, Paolo Magni, Tarun Karthik Kumar Mamidi, Pier Luigi Martelli, Marta Mulargia, Giovanna Nicora, Keith Nykamp, Vikas Pejaver, Yisu Peng, Thi Hong Cam Pham, Maurizio S. Podda, Aditya Rao, Ettore Rizzo, Vangala G. Saipradeep, Castrense Savojardo, Peter Schols, Yang Shen, Naveen Sivadasan, Damian Smedley, Dorian Soru, Rajgopal Srinivasan, Yuanfei Sun, Uma Sunderam, W..
{"title":"Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project","authors":"Sarah L. Stenton, Melanie C. O’Leary, Gabrielle Lemire, Grace E. VanNoy, Stephanie DiTroia, Vijay S. Ganesh, Emily Groopman, Emily O’Heir, Brian Mangilog, Ikeoluwa Osei-Owusu, Lynn S. Pais, Jillian Serrano, Moriel Singer-Berk, Ben Weisburd, Michael W. Wilson, Christina Austin-Tse, Marwa Abdelhakim, Azza Althagafi, Giulia Babbi, Riccardo Bellazzi, Samuele Bovo, Maria Giulia Carta, Rita Casadio, Pieter-Jan Coenen, Federica De Paoli, Matteo Floris, Manavalan Gajapathy, Robert Hoehndorf, Julius O. B. Jacobsen, Thomas Joseph, Akash Kamandula, Panagiotis Katsonis, Cyrielle Kint, Olivier Lichtarge, Ivan Limongelli, Yulan Lu, Paolo Magni, Tarun Karthik Kumar Mamidi, Pier Luigi Martelli, Marta Mulargia, Giovanna Nicora, Keith Nykamp, Vikas Pejaver, Yisu Peng, Thi Hong Cam Pham, Maurizio S. Podda, Aditya Rao, Ettore Rizzo, Vangala G. Saipradeep, Castrense Savojardo, Peter Schols, Yang Shen, Naveen Sivadasan, Damian Smedley, Dorian Soru, Rajgopal Srinivasan, Yuanfei Sun, Uma Sunderam, W..","doi":"10.1186/s40246-024-00604-w","DOIUrl":null,"url":null,"abstract":"A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average \"diagnostic odyssey\" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.","PeriodicalId":13183,"journal":{"name":"Human Genomics","volume":"33 1","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40246-024-00604-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对罕见基因组项目中用于罕见病诊断的变异体优先排序方法进行严格评估
罕见病家庭面临的一个主要障碍是获得基因诊断。平均 "诊断之旅 "要持续 5 年以上,即使在全基因组范围内捕捉变异,也只有不到 50% 的病因变异能被识别出来。为了帮助解释和优先处理检测到的大量变异,计算方法层出不穷。但哪些工具最有效仍不清楚。为了评估计算方法的性能并鼓励方法开发的创新,我们设计了基因组解读关键评估(CAGI)社区挑战赛,在真实的临床诊断环境中对变异优先级模型进行正面交锋。我们利用的基因组测序(GS)数据来自罕见基因组计划(RGP)中的测序家系,该计划是一项直接面向参与者的研究,旨在了解基因组测序在罕见病诊断和基因发现中的应用。我们向挑战预测者提供了来自 175 个 RGP 个体(65 个家系)的变异调用和表型术语数据集,其中包括 35 个指定了因果变异的已解决训练集家系和 30 个未标记测试集家系(14 个已解决,16 个未解决)。我们要求各小组在尽可能多的家系中找出因果变异体。预测者提交带有估计因果关系概率 (EPCR) 值的变异预测。模型性能由两个指标决定,一个是基于因果变异体排名位置的加权得分,另一个是基于所有 EPCR 值中因果变异体的精确度和召回率的最大 F 度量。16 个团队提交了 52 个模型的预测结果,其中一些模型还加入了人工审核。在排名前 5 位的变异中,表现最出色的团队召回了 14 个已解家系中多达 13 个家系的因果变异。在对 RNA 测序进行确证后,新发现的诊断变体被送回两个以前未解决的家系,两个新的候选疾病基因被输入到 Matchmaker Exchange 中。其中一个例子是,RNA 测序显示 ASNS 中的一个深内含子缺失导致剪接异常,在一个表型与天冬酰胺合成酶缺乏症一致的未解谜探查者中与一个框架移位变异反式确定。模型方法和性能差异很大。权衡调用质量、等位基因频率、预测缺失性、分离和表型的模型能有效地识别因果变异,而对表型扩展和非编码变异开放的模型则能捕捉更困难的诊断并发现新的诊断。总之,计算模型能极大地帮助确定变异的优先次序。在诊断中使用时,需要根据既定标准对优先级变异进行详细审查和保守评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Human Genomics
Human Genomics GENETICS & HEREDITY-
CiteScore
6.00
自引率
2.20%
发文量
55
审稿时长
11 weeks
期刊介绍: Human Genomics is a peer-reviewed, open access, online journal that focuses on the application of genomic analysis in all aspects of human health and disease, as well as genomic analysis of drug efficacy and safety, and comparative genomics. Topics covered by the journal include, but are not limited to: pharmacogenomics, genome-wide association studies, genome-wide sequencing, exome sequencing, next-generation deep-sequencing, functional genomics, epigenomics, translational genomics, expression profiling, proteomics, bioinformatics, animal models, statistical genetics, genetic epidemiology, human population genetics and comparative genomics.
期刊最新文献
Integration of single-cell sequencing and drug sensitivity profiling reveals an 11-gene prognostic model for liver cancer. SCAN: a nanopore-based, cost effective decision-supporting tool for mass screening of aneuploidies. Advancing understanding of human variability through toxicokinetic modeling, in vitro-in vivo extrapolation, and new approach methodologies. Genetic heterogeneity in familial forms of genetic generalized epilepsy: from mono- to oligogenism. Analysis of public perceptions on the use of artificial intelligence in genomic medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1