Enhancing Variant Calling in Whole-Exome Sequencing Data Using Population-Matched Reference Genomes.

Shuming Guo, Zhuo Huang, Yanming Zhang, Yukun He, Xiangju Chen, Wenjuan Wang, Lansheng Li, Yu Kang, Zhancheng Gao, Jun Yu, Zhenglin Du, Yanan Chu
{"title":"Enhancing Variant Calling in Whole-Exome Sequencing Data Using Population-Matched Reference Genomes.","authors":"Shuming Guo, Zhuo Huang, Yanming Zhang, Yukun He, Xiangju Chen, Wenjuan Wang, Lansheng Li, Yu Kang, Zhancheng Gao, Jun Yu, Zhenglin Du, Yanan Chu","doi":"10.1093/gpbjnl/qzae070","DOIUrl":null,"url":null,"abstract":"<p><p>Whole-exome sequencing (WES) data are frequently used for cancer diagnosis and genome-wide association studies (GWAS), based on high-coverage read mapping, informative variant calling, and high-quality reference genomes. The center position of the currently used genome assembly, GRCh38, is now challenged by two newly published telomere-to-telomere (T2T) genomes, T2T-CHM13 and T2T-YAO, and it becomes urgent to have a comparative study to test population specificity using the three reference genomes based on real case WES data. Here we report our analysis along this line for 19 tumor samples collected from Chinese patients. The primary comparison of the exon regions among the three references reveals that the sequences in up to ∼ 1% target regions in T2T-YAO are widely diversified from GRCh38 and may lead to off-target in sequence capture. However, T2T-YAO still outperforms GRCh38 genomes by obtaining 7.41% more mapped reads. Due to more reliable read-mapping and closer phylogenetic relationship with the samples than GRCh38, T2T-YAO reduces half of variant calls of clinical significance which are mostly benign, while maintaining sensitivity in identifying pathogenic variants. T2T-YAO also outperforms T2T-CHM13 in reducing calls of Chinese-specific variants. Our findings highlight the critical need for employing population-specific reference genomes in genomic analysis to ensure accurate variant analysis and the significant benefits of tailoring these approaches to the unique genetic backgrounds of each ethnic group.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzae070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Whole-exome sequencing (WES) data are frequently used for cancer diagnosis and genome-wide association studies (GWAS), based on high-coverage read mapping, informative variant calling, and high-quality reference genomes. The center position of the currently used genome assembly, GRCh38, is now challenged by two newly published telomere-to-telomere (T2T) genomes, T2T-CHM13 and T2T-YAO, and it becomes urgent to have a comparative study to test population specificity using the three reference genomes based on real case WES data. Here we report our analysis along this line for 19 tumor samples collected from Chinese patients. The primary comparison of the exon regions among the three references reveals that the sequences in up to ∼ 1% target regions in T2T-YAO are widely diversified from GRCh38 and may lead to off-target in sequence capture. However, T2T-YAO still outperforms GRCh38 genomes by obtaining 7.41% more mapped reads. Due to more reliable read-mapping and closer phylogenetic relationship with the samples than GRCh38, T2T-YAO reduces half of variant calls of clinical significance which are mostly benign, while maintaining sensitivity in identifying pathogenic variants. T2T-YAO also outperforms T2T-CHM13 in reducing calls of Chinese-specific variants. Our findings highlight the critical need for employing population-specific reference genomes in genomic analysis to ensure accurate variant analysis and the significant benefits of tailoring these approaches to the unique genetic backgrounds of each ethnic group.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用人群匹配参考基因组增强全基因组测序数据中的变异调用。
全外显子组测序(WES)数据经常被用于癌症诊断和全基因组关联研究(GWAS),其基础是高覆盖率的读图映射、信息丰富的变异调用和高质量的参考基因组。目前使用的基因组组装--GRCh38--的中心位置现在受到了两个新发表的端粒到端粒(T2T)基因组--T2T-CHM13 和 T2T-YAO 的挑战,因此迫切需要进行一项比较研究,根据真实病例的 WES 数据使用这三个参考基因组来测试群体特异性。在此,我们报告了根据这一思路对收集自中国患者的 19 份肿瘤样本进行的分析。对三个参考基因组的外显子区域进行初步比较后发现,T2T-YAO 与 GRCh38 相比,高达 1%目标区域的序列差异较大,可能导致序列捕获脱靶。不过,T2T-YAO 仍然比 GRCh38 基因组多获得 7.41% 的映射读数。与 GRCh38 相比,T2T-YAO 的读数映射更可靠,与样本的系统发育关系更密切,因此 T2T-YAO 减少了一半具有临床意义的变异调用,这些变异大多是良性的,同时保持了识别致病变异的灵敏度。在减少中国特异性变异的调用方面,T2T-YAO 也优于 T2T-CHM13。我们的研究结果凸显了在基因组分析中采用特定人群参考基因组以确保变异分析准确性的迫切需要,以及根据每个种族群体的独特遗传背景定制这些方法的显著优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evolution of Plant Genome Size and Composition. Enzymes Repertoires and Genomic Insights into Lycium Barbarum Pectin Polysaccharides Biosynthesis. Multi-omics Mediated Wide Association Studies: Novel Approaches for Understanding Diseases. Centromere Landscapes Resolved from Hundreds of Human Genomes. The Role of N6-methyladenosine Modification in Gametogenesis and Embryogenesis: Impact on Fertility.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1