Multisample motif discovery and visualization for tandem repeats

IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Genome research Pub Date : 2024-11-13 DOI:10.1101/gr.279278.124
Yaran Zhang, Marc Hulsman, Alex Salazar, Niccoló Tesi, Lydian Knoop, Sven van der Lee, Sanduni Wijesekera, Jana Krizova, Erik-Jan Kamsteeg, Henne Holstege
{"title":"Multisample motif discovery and visualization for tandem repeats","authors":"Yaran Zhang, Marc Hulsman, Alex Salazar, Niccoló Tesi, Lydian Knoop, Sven van der Lee, Sanduni Wijesekera, Jana Krizova, Erik-Jan Kamsteeg, Henne Holstege","doi":"10.1101/gr.279278.124","DOIUrl":null,"url":null,"abstract":"Tandem Repeats (TR) occupy a significant portion of the human genome and are the source of polymorphism due to variations in sizes and motif compositions. Some of these variations have been associated with various neuropathological disorders, highlighting the clinical importance of assessing the motif structure of TRs. Moreover, assessing the TR motif variation can offer valuable insights into evolutionary dynamics and population structure. Previously, characterizations of TRs have been limited by short-read sequencing technology, which lacks the ability to accurately capture the full TR sequences. As long-read sequencing becomes more accessible and can capture the full complexity of TRs, there is now also a need for tools to characterize and analyze TRs using long-read data across multiple samples. In this study, we present MotifScope, a novel algorithm for characterization and visualization of TRs based on a de novo <em>k</em>-mer approach for motif discovery. Comparative analysis against established tools reveals that MotifScope can identify a greater number of motifs and more accurately represent the underlying repeat sequence. Moreover, MotifScope has been specifically designed to enable motif composition comparisons across assemblies of different individuals, as well as across long-read sequencing reads within an individual, through combined motif discovery and sequence alignment. We showcase potential applications of MotifScope in diverse fields, including population genetics, clinical settings, and forensic analyses.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"98 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279278.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Tandem Repeats (TR) occupy a significant portion of the human genome and are the source of polymorphism due to variations in sizes and motif compositions. Some of these variations have been associated with various neuropathological disorders, highlighting the clinical importance of assessing the motif structure of TRs. Moreover, assessing the TR motif variation can offer valuable insights into evolutionary dynamics and population structure. Previously, characterizations of TRs have been limited by short-read sequencing technology, which lacks the ability to accurately capture the full TR sequences. As long-read sequencing becomes more accessible and can capture the full complexity of TRs, there is now also a need for tools to characterize and analyze TRs using long-read data across multiple samples. In this study, we present MotifScope, a novel algorithm for characterization and visualization of TRs based on a de novo k-mer approach for motif discovery. Comparative analysis against established tools reveals that MotifScope can identify a greater number of motifs and more accurately represent the underlying repeat sequence. Moreover, MotifScope has been specifically designed to enable motif composition comparisons across assemblies of different individuals, as well as across long-read sequencing reads within an individual, through combined motif discovery and sequence alignment. We showcase potential applications of MotifScope in diverse fields, including population genetics, clinical settings, and forensic analyses.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
串联重复序列的多样本主题发现和可视化
串联重复序列(TR)在人类基因组中占有很大的比例,由于其大小和基序组成的变化而成为多态性的来源。其中一些变异与各种神经病理学疾病有关,这凸显了评估串联重复序列基序结构的临床重要性。此外,评估TR基序变异还能为了解进化动态和种群结构提供有价值的信息。以前,TRs的特征描述受到短线程测序技术的限制,因为短线程测序技术无法准确捕捉TRs的完整序列。随着长线程测序技术的普及并能捕捉到TRs的全部复杂性,现在也需要一些工具来利用多个样本的长线程数据表征和分析TRs。在本研究中,我们介绍了 MotifScope,这是一种基于从头发现 k-mer 主题词的方法来表征和可视化 TRs 的新型算法。与已有工具的比较分析表明,MotifScope 能识别出更多的基元,并更准确地表示底层重复序列。此外,MotifScope 还经过专门设计,可通过组合主题发现和序列比对,在不同个体的集合间以及个体内的长读数测序读数间进行主题组成比较。我们展示了 MotifScope 在不同领域的潜在应用,包括群体遗传学、临床环境和法医分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genome research
Genome research 生物-生化与分子生物学
CiteScore
12.40
自引率
1.40%
发文量
140
审稿时长
6 months
期刊介绍: Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.
期刊最新文献
Modeling gene interactions in polygenic prediction via geometric deep learning High-quality sika deer omics data and integrative analysis reveal genic and cellular regulation of antler regeneration ISWI1 complex proteins facilitate developmental genome editing in Paramecium Haplotype-resolved genome and population genomics of the threatened garden dormouse in Europe. Multisample motif discovery and visualization for tandem repeats
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1