Identification of influential rare variants in aggregate testing using random forest importance measures

IF 1 4区 生物学 Q4 GENETICS & HEREDITY Annals of Human Genetics Pub Date : 2023-05-23 DOI:10.1111/ahg.12509
Rachel Z. Blumhagen, David A. Schwartz, Carl D. Langefeld, Tasha E. Fingerlin
{"title":"Identification of influential rare variants in aggregate testing using random forest importance measures","authors":"Rachel Z. Blumhagen,&nbsp;David A. Schwartz,&nbsp;Carl D. Langefeld,&nbsp;Tasha E. Fingerlin","doi":"10.1111/ahg.12509","DOIUrl":null,"url":null,"abstract":"<div>\n \n <section>\n \n \n <p>Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are “driving” the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] &lt; 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 &lt; MAF &lt; 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in <i>TERT</i> and <i>FAM13A</i>, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.</p>\n </section>\n </div>","PeriodicalId":8085,"journal":{"name":"Annals of Human Genetics","volume":"87 4","pages":"184-195"},"PeriodicalIF":1.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ahg.12509","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ahg.12509","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are “driving” the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用随机森林重要性测度识别聚集检验中有影响的罕见变异
与顺序测试每个单个变体相比,通常采用罕见变体的聚合测试来识别相关区域。当一个聚合测试是显著的,确定哪些罕见的变异是“驱动”的关联是有意义的。我们最近开发了罕见变异影响过滤工具(RIFT)来识别有影响的罕见变异,并表明RIFT与其他已发表的方法相比具有更高的真阳性率。在这里,我们使用来自标准随机森林(RF)和可变重要性加权RF (vi-RF)的重要性度量来识别有影响的变量。对于非常罕见的变异(次要等位基因频率[MAF] <0.001), vi-RF:Accuracy法的中位真阳性率最高(TPR = 0.24;四分位间距[IQR]: 0.13, 0.42),其次是RF:准确度法(TPR = 0.16;IQR: 0.07, 0.33),均优于RIFT (TPR = 0.05;Iqr: 0.02, 0.15)。在不常见的变异中(0.001 <加器& lt;0.03), RF方法的真阳性率高于RIFT方法,同时观察到相似的假阳性率。最后,我们将RF方法应用于特发性肺纤维化(IPF)的靶向重测序研究,其中vi-RF方法分别鉴定了TERT和FAM13A的8个和7个变体。总之,vi-RF提供了一种改进的、客观的方法,通过显著的聚合测试来识别有影响的变异。我们已经扩展了之前开发的R包RIFT,以包含随机森林方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Human Genetics
Annals of Human Genetics 生物-遗传学
CiteScore
4.20
自引率
0.00%
发文量
34
审稿时长
3 months
期刊介绍: Annals of Human Genetics publishes material directly concerned with human genetics or the application of scientific principles and techniques to any aspect of human inheritance. Papers that describe work on other species that may be relevant to human genetics will also be considered. Mathematical models should include examples of application to data where possible. Authors are welcome to submit Supporting Information, such as data sets or additional figures or tables, that will not be published in the print edition of the journal, but which will be viewable via the online edition and stored on the website.
期刊最新文献
Intermittent episodes of acute severe encephalomyopathy and early death in two siblings caused by biallelic likely pathogenic variants in FASTKD2: Expanding phenotype and literature review. Secondary findings in 443 exome sequencing data. Gastroesophageal reflux disease increases predisposition to severe COVID-19: Insights from integrated Mendelian randomization and genetic analysis. Clinical and immunological features of four patients with activation-induced cytidine deaminase deficiency: Renal amyloidosis and other presentations. Incorporating familial risk, lifestyle factors, and pharmacogenomic insights into personalized noncommunicable disease (NCD) reports for healthcare funder beneficiaries participating in the Open Genome Project.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1