Closing the gaps, and improving somatic structural variant analysis and benchmarking using CHM13-T2T

IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Genome research Pub Date : 2025-03-17 DOI:10.1101/gr.279352.124
Luis F. Paulin, Jeremy Fan, Kieran O'Neill, Erin Pleasance, Vanessa L. Porter, Steven J.M. Jones, Fritz J. Sedlazeck
{"title":"Closing the gaps, and improving somatic structural variant analysis and benchmarking using CHM13-T2T","authors":"Luis F. Paulin, Jeremy Fan, Kieran O'Neill, Erin Pleasance, Vanessa L. Porter, Steven J.M. Jones, Fritz J. Sedlazeck","doi":"10.1101/gr.279352.124","DOIUrl":null,"url":null,"abstract":"The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While the detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remain challenging. We hypothesized that the use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumor–normal matched benchmark sample and three patient samples show that the CHM13-T2T improves SV detection accuracy compared to GRCh38 with a notable reduction in false-positive calls, and thus supports improved prioritization. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 54 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. Our work demonstrates new approaches to optimize somatic SV detection in cancer with potential improvements in other genetic diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"33 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279352.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While the detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remain challenging. We hypothesized that the use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumor–normal matched benchmark sample and three patient samples show that the CHM13-T2T improves SV detection accuracy compared to GRCh38 with a notable reduction in false-positive calls, and thus supports improved prioritization. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 54 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. Our work demonstrates new approaches to optimize somatic SV detection in cancer with potential improvements in other genetic diseases.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用CHM13-T2T缩小差距,改进体细胞结构变异分析和对标
由于测序技术的进步和生物信息学分析的改进,癌症基因组的复杂性变得越来越容易解释。结构变异(SVs)是肿瘤中躯体事件的一个重要子集。随着长读测序技术的发展,SVs的检测得到了显著提高,但体细胞变异的鉴定和注释仍然具有挑战性。我们假设使用完整的人类参考基因组(CHM13-T2T)可以改善体细胞SV呼叫。我们在一个肿瘤-正常匹配的基准样本和三个患者样本中的研究结果表明,与GRCh38相比,CHM13-T2T提高了SV检测的准确性,显著减少了假阳性呼叫,从而支持改进的优先级。我们还克服了CHM13-T2T注释资源的缺乏,将CHM13-T2T对齐的reads提升到GRCh38基因组,从而结合了改进的对齐和高级注释。在这一过程中,我们评估了目前对COLO829/COLO829BL的SV基准集,在不同中心用不同的长读技术进行了四个重复测序。我们发现这个细胞系在这些复制中具有不稳定性;346个sv(1.13%)仅在单个重复中被发现。我们鉴定出54种体细胞SVs,它们似乎是稳定的,因为它们在四个重复中始终存在。因此,我们提出这一共识集作为体细胞SV呼叫的更新基准,并在我们的基准中包括GRCh38和CHM13-T2T坐标。我们的工作展示了优化癌症体细胞SV检测的新方法,并有可能改善其他遗传疾病的检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genome research
Genome research 生物-生化与分子生物学
CiteScore
12.40
自引率
1.40%
发文量
140
审稿时长
6 months
期刊介绍: Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.
期刊最新文献
A genome-wide survey reveals a diverse array of enhancers coordinates the Drosophila innate immune response. Epigenetic characterization of pseudogenes across human tissues. Highly efficient and scarless genome editing via essential gene-coupled homology-directed repair. Age-dependent mutational loads in human tRNA genes are tumor-specific and result in chimeric tRNA sequences that could disrupt the genetic code. Balancing gene ontology annotation specificity in protein function prediction based on the protein sequence large graph.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1