利用简洁彩色德布鲁因图进行比较基因组学研究

IF 0.4 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Acta Informatica Pub Date : 2024-11-21 DOI:10.1007/s00236-024-00467-7
Lucas P. Ramos, Felipe A. Louza, Guilherme P. Telles
{"title":"利用简洁彩色德布鲁因图进行比较基因组学研究","authors":"Lucas P. Ramos,&nbsp;Felipe A. Louza,&nbsp;Guilherme P. Telles","doi":"10.1007/s00236-024-00467-7","DOIUrl":null,"url":null,"abstract":"<div><p>DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the complete sequence of a genome. Many genomes are left in a draft state, in which each chromosome is represented by a set of sequences with partial information on their relative order. Recently, some approaches have been proposed to compare draft genomes by comparing paths in de Bruijn graphs, which are constructed by many practical genome assemblers. In this article we describe in more detail a method for comparing genomes represented as succinct colored de Bruijn graphs directly and without resorting to sequence alignments, called <span>\\(\\texttt {gcBB}\\)</span>, that evaluates the entropy and expectation measures based on the Burrows-Wheeler Similarity Distribution. We also introduce an improved version of <span>\\(\\texttt {gcBB}\\)</span>, called <span>\\(\\texttt {multi-gcBB}\\)</span>, that improves the time and space performance considerably through the selection of different data structures. We have compared phylogenies of 12 Drosophila species obtained by other methods to those obtained with <span>\\(\\texttt {gcBB}\\)</span>, achieving promising results.</p></div>","PeriodicalId":7189,"journal":{"name":"Acta Informatica","volume":"62 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative genomics with succinct colored de Bruijn graphs\",\"authors\":\"Lucas P. Ramos,&nbsp;Felipe A. Louza,&nbsp;Guilherme P. Telles\",\"doi\":\"10.1007/s00236-024-00467-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the complete sequence of a genome. Many genomes are left in a draft state, in which each chromosome is represented by a set of sequences with partial information on their relative order. Recently, some approaches have been proposed to compare draft genomes by comparing paths in de Bruijn graphs, which are constructed by many practical genome assemblers. In this article we describe in more detail a method for comparing genomes represented as succinct colored de Bruijn graphs directly and without resorting to sequence alignments, called <span>\\\\(\\\\texttt {gcBB}\\\\)</span>, that evaluates the entropy and expectation measures based on the Burrows-Wheeler Similarity Distribution. We also introduce an improved version of <span>\\\\(\\\\texttt {gcBB}\\\\)</span>, called <span>\\\\(\\\\texttt {multi-gcBB}\\\\)</span>, that improves the time and space performance considerably through the selection of different data structures. We have compared phylogenies of 12 Drosophila species obtained by other methods to those obtained with <span>\\\\(\\\\texttt {gcBB}\\\\)</span>, achieving promising results.</p></div>\",\"PeriodicalId\":7189,\"journal\":{\"name\":\"Acta Informatica\",\"volume\":\"62 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Informatica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00236-024-00467-7\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s00236-024-00467-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

过去几年中,DNA 技术取得了长足发展,能够在短时间内完成大量基因组的测序工作。然而,组装序列片段的基本问题在计算上是很困难的,而且许多技术因素和限制使得获得基因组的完整序列变得更加复杂。许多基因组都处于草稿状态,其中每条染色体都由一组序列表示,并附有关于其相对顺序的部分信息。最近,人们提出了一些通过比较德布鲁因图(de Bruijn graph)中的路径来比较草拟基因组的方法,许多实用的基因组组装器都会构建德布鲁因图。在本文中,我们更详细地描述了一种直接比较以简洁彩色 de Bruijn 图表示的基因组的方法,无需借助序列比对,称为 \(\texttt {gcBB}\),它基于 Burrows-Wheeler 相似性分布来评估熵和期望度量。我们还引入了 \(\texttt {gcBB}\) 的改进版本,称为 \(\texttt {multi-gcBB}\),它通过选择不同的数据结构大大提高了时间和空间性能。我们比较了用其他方法和用\(\texttt {gcBB}\)得到的12种果蝇的系统发生,取得了很好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparative genomics with succinct colored de Bruijn graphs

DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the complete sequence of a genome. Many genomes are left in a draft state, in which each chromosome is represented by a set of sequences with partial information on their relative order. Recently, some approaches have been proposed to compare draft genomes by comparing paths in de Bruijn graphs, which are constructed by many practical genome assemblers. In this article we describe in more detail a method for comparing genomes represented as succinct colored de Bruijn graphs directly and without resorting to sequence alignments, called \(\texttt {gcBB}\), that evaluates the entropy and expectation measures based on the Burrows-Wheeler Similarity Distribution. We also introduce an improved version of \(\texttt {gcBB}\), called \(\texttt {multi-gcBB}\), that improves the time and space performance considerably through the selection of different data structures. We have compared phylogenies of 12 Drosophila species obtained by other methods to those obtained with \(\texttt {gcBB}\), achieving promising results.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta Informatica
Acta Informatica 工程技术-计算机:信息系统
CiteScore
2.40
自引率
16.70%
发文量
24
审稿时长
>12 weeks
期刊介绍: Acta Informatica provides international dissemination of articles on formal methods for the design and analysis of programs, computing systems and information structures, as well as related fields of Theoretical Computer Science such as Automata Theory, Logic in Computer Science, and Algorithmics. Topics of interest include: • semantics of programming languages • models and modeling languages for concurrent, distributed, reactive and mobile systems • models and modeling languages for timed, hybrid and probabilistic systems • specification, program analysis and verification • model checking and theorem proving • modal, temporal, first- and higher-order logics, and their variants • constraint logic, SAT/SMT-solving techniques • theoretical aspects of databases, semi-structured data and finite model theory • theoretical aspects of artificial intelligence, knowledge representation, description logic • automata theory, formal languages, term and graph rewriting • game-based models, synthesis • type theory, typed calculi • algebraic, coalgebraic and categorical methods • formal aspects of performance, dependability and reliability analysis • foundations of information and network security • parallel, distributed and randomized algorithms • design and analysis of algorithms • foundations of network and communication protocols.
期刊最新文献
Comparative genomics with succinct colored de Bruijn graphs Editorial 2024: moving forwards in the electronic age Serial and parallel algorithms for order-preserving pattern matching based on the duel-and-sweep paradigm Linear-size suffix tries and linear-size CDAWGs simplified and improved Parameterized aspects of distinct Kemeny rank aggregation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1