A scalable approach for genome-wide inference of ancestral recombination graphs

Árni Freyr Gunnarsson, Jiazheng Zhu, Brian C. Zhang, Zoi Tsangalidou, Alex Allmont, Pier Francesco Palamara
{"title":"A scalable approach for genome-wide inference of ancestral recombination graphs","authors":"Árni Freyr Gunnarsson, Jiazheng Zhu, Brian C. Zhang, Zoi Tsangalidou, Alex Allmont, Pier Francesco Palamara","doi":"10.1101/2024.08.31.610248","DOIUrl":null,"url":null,"abstract":"The ancestral recombination graph (ARG) is a graph-like structure that encodes a detailed genealogical history of a set of individuals along the genome. ARGs that are accurately reconstructed from genomic data have several downstream applications, but inference from data sets comprising millions of samples and variants remains computationally challenging. We introduce Threads, a threading-based method that significantly reduces the computational costs of ARG inference while retaining high accuracy. We apply Threads to infer the ARG of 487,409 genomes from the UK Biobank using ~10 million high-quality imputed variants, reconstructing a detailed genealogical history of the samples while compressing the input genotype data. Additionally, we develop ARG-based imputation strategies that increase genotype imputation accuracy for ultra-rare variants (MAC ≤ 10) from UK Biobank exome sequencing data by 5-10%. We leverage ARGs inferred by Threads to detect associations with 52 quantitative traits in non-European UK Biobank samples, identifying 22.5% more signals than ARG-Needle. These analyses underscore the value of using computationally efficient genealogical modeling to improve and complement genotype imputation in large-scale genomic studies.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.31.610248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The ancestral recombination graph (ARG) is a graph-like structure that encodes a detailed genealogical history of a set of individuals along the genome. ARGs that are accurately reconstructed from genomic data have several downstream applications, but inference from data sets comprising millions of samples and variants remains computationally challenging. We introduce Threads, a threading-based method that significantly reduces the computational costs of ARG inference while retaining high accuracy. We apply Threads to infer the ARG of 487,409 genomes from the UK Biobank using ~10 million high-quality imputed variants, reconstructing a detailed genealogical history of the samples while compressing the input genotype data. Additionally, we develop ARG-based imputation strategies that increase genotype imputation accuracy for ultra-rare variants (MAC ≤ 10) from UK Biobank exome sequencing data by 5-10%. We leverage ARGs inferred by Threads to detect associations with 52 quantitative traits in non-European UK Biobank samples, identifying 22.5% more signals than ARG-Needle. These analyses underscore the value of using computationally efficient genealogical modeling to improve and complement genotype imputation in large-scale genomic studies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
推断祖先重组图的全基因组可扩展方法
祖先重组图(ARG)是一种类似图的结构,它编码了一组个体沿基因组的详细谱系历史。从基因组数据中准确重建的 ARG 有多种下游应用,但从包含数百万个样本和变体的数据集中进行推断仍然具有计算上的挑战性。我们介绍的 Threads 是一种基于线程的方法,它能在保持高精确度的同时显著降低 ARG 推断的计算成本。我们利用 Threads 推断了英国生物库中 487,409 个基因组的 ARG,使用了约 1,000 万个高质量推算变体,重建了样本的详细谱系历史,同时压缩了输入的基因型数据。此外,我们还开发了基于 ARG 的估算策略,将英国生物库外显子组测序数据中超稀有变异(MAC ≤ 10)的基因型估算准确率提高了 5-10%。我们利用 Threads 推断出的 ARG 来检测非欧洲英国生物库样本中 52 个数量性状的关联,比 ARG-Needle 多识别出 22.5% 的信号。这些分析强调了在大规模基因组研究中使用计算效率高的系谱建模来改进和补充基因型归因的价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multiplexed spatial mapping of chromatin features, transcriptome, and proteins in tissues Mitochondrial superoxide acts in the intestine to extend longevity AyurPhenoClusters define common molecular roots for rare diseases and uncover ciliary dysfunctions in syndromic conditions Screening and identification of gene expression in large cohorts of clinical lung cancer samples unveils the major involvement of EZH2 and SOX2 LncRNA TAAL is a Modulator of Tie1-Mediated Vascular Function in Diabetic Retinopathy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1