PHONI: Streamed Matching Statistics with Multi-Genome References.

Christina Boucher, Travis Gagie, I Tomohiro, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi
{"title":"PHONI: Streamed Matching Statistics with Multi-Genome References.","authors":"Christina Boucher,&nbsp;Travis Gagie,&nbsp;I Tomohiro,&nbsp;Dominik Köppl,&nbsp;Ben Langmead,&nbsp;Giovanni Manzini,&nbsp;Gonzalo Navarro,&nbsp;Alejandro Pacheco,&nbsp;Massimiliano Rossi","doi":"10.1109/dcc50243.2021.00027","DOIUrl":null,"url":null,"abstract":"<p><p>Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2021 ","pages":"193-202"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/dcc50243.2021.00027","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dcc50243.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/5/10 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PHONI:流式匹配统计与多基因组参考。
计算模式相对于文本的匹配统计数据是生物信息学中的一项基本任务,但当文本是一个高度压缩的基因组数据库时,这是一项艰巨的任务。Bannai等人针对这种情况给出了一个有效的解决方案,Rossi等人最近实现了该解决方案,但它在模式上使用了两次传递,并在第一次传递期间为每个字符缓冲一个指针。在本文中,我们简化了他们的解决方案,并使其流式传输,代价是稍微放慢速度。这意味着,首先,我们可以并行计算几个长模式(如整个人类染色体)的匹配统计数据,同时仍然使用合理数量的RAM;其次,我们可以以低延迟在线计算匹配统计信息,从而快速识别模式何时相对于数据库变得不可压缩。我们的代码可在https://github.com/koeppl/phoni.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Faster Maximal Exact Matches with Lazy LCP Evaluation. Recursive Prefix-Free Parsing for Building Big BWTs. PHONI: Streamed Matching Statistics with Multi-Genome References. Client-Driven Transmission of JPEG2000 Image Sequences Using Motion Compensated Conditional Replenishment GeneComp, a new reference-based compressor for SAM files.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1