首页 > 最新文献

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)最新文献

英文 中文
Acceleration of FM-index Queries Through Prefix-free Parsing. 通过无前缀解析加速fm索引查询。
Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie

FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [5] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [3] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. The source code for PFP-FM is available at https://github.com/marco-oliva/afm.

例如,fm索引是DNA比对中的关键数据结构,但是使用它们进行搜索通常需要对查询模式中的每个字符进行至少一次随机访问。Ferragina和Fischer在2007年观察到,基于单词的索引通常比基于字符的索引使用更少的随机访问,因此支持更快的搜索。然而,由于DNA缺乏自然的词边界,在应用基于词的fm索引之前,有必要以某种方式对其进行解析。去年,Deng等人提出了通过诱导后缀排序来解析基因组数据,并表明当模式是几千个字符或更长时,所得到的基于单词的fm索引比标准fm索引支持更快的计数查询。在本文中,我们展示了使用无前缀解析(它采用参数来调整短语的平均长度)而不是诱导后缀排序,可以为只有几百个字符的模式提供显著的加速。我们实现了我们的方法,并证明它在查询GRCh38时比竞争方法快3到18倍。并且在对25,000,50,000和100,000个SARS-CoV-2基因组进行查询时始终更快。因此,很明显,我们的方法在内存稍微增加的情况下,比所有最先进的方法加速了count的性能。PFP-FM的源代码可从https://github.com/marco-oliva/afm获得。
{"title":"Acceleration of FM-index Queries Through Prefix-free Parsing.","authors":"Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie","doi":"10.4230/LIPIcs.WABI.2023.13","DOIUrl":"10.4230/LIPIcs.WABI.2023.13","url":null,"abstract":"<p><p>FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [5] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [3] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. The source code for PFP-FM is available at https://github.com/marco-oliva/afm.</p>","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"273 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12576618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pangenomic Genotyping with the Marker Array. 标记阵列泛基因组基因分型。
Taher Mun, Naga Sai Kavya Vaddadi, Ben Langmead

We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while avoiding the reference bias that results when aligning to a single linear reference. rowbowt can infer accurate genotypes in less time and memory compared to existing graph-based methods.

我们提出了一种新的方法和软件工具,称为rowbowt,应用泛基因组指数从短读测序数据推断基因型的问题。该方法使用一种称为标记数组的新颖索引结构。使用标记阵列,我们可以根据像1000基因组计划这样的大型面板进行基因型变异,同时避免与单个线性参考对齐时产生的参考偏差。与现有的基于图的方法相比,Rowbowt可以在更短的时间和内存中推断出准确的基因型。
{"title":"Pangenomic Genotyping with the Marker Array.","authors":"Taher Mun,&nbsp;Naga Sai Kavya Vaddadi,&nbsp;Ben Langmead","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while avoiding the reference bias that results when aligning to a single linear reference. rowbowt can infer accurate genotypes in less time and memory compared to existing graph-based methods.</p>","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"242 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40488271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Motifs (I) 基序的检测(一)
{"title":"Detection of Motifs (I)","authors":"","doi":"10.1002/9781119698005.ch8","DOIUrl":"https://doi.org/10.1002/9781119698005.ch8","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75533113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Chains: Log Likelihood ( II ) 马尔可夫链:对数似然(II)
{"title":"Markov Chains: Log Likelihood (\u0000 II\u0000 )","authors":"","doi":"10.1002/9781119698005.ch14","DOIUrl":"https://doi.org/10.1002/9781119698005.ch14","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74580479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Objective Digital Stains ( III ) 数码染色(III)
{"title":"Objective Digital Stains (\u0000 III\u0000 )","authors":"","doi":"10.1002/9781119698005.ch7","DOIUrl":"https://doi.org/10.1002/9781119698005.ch7","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90446370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequencies and Percentages ( II ) 频率和百分比(II)
{"title":"Frequencies and Percentages (\u0000 II\u0000 )","authors":"","doi":"10.1002/9781119698005.ch6","DOIUrl":"https://doi.org/10.1002/9781119698005.ch6","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75567841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Chains: The Machine ( I ) 马尔可夫链:机器(I)
{"title":"Markov Chains: The Machine (\u0000 I\u0000 )","authors":"","doi":"10.1002/9781119698005.ch13","DOIUrl":"https://doi.org/10.1002/9781119698005.ch13","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77065815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation of Motifs ( II ) 母题的表达(二)
{"title":"Representation of Motifs (\u0000 II\u0000 )","authors":"","doi":"10.1002/9781119698005.ch9","DOIUrl":"https://doi.org/10.1002/9781119698005.ch9","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80947812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Backgrounds ( V ) 动态背景(V)
{"title":"Dynamic Backgrounds (\u0000 V\u0000 )","authors":"","doi":"10.1002/9781119698005.ch12","DOIUrl":"https://doi.org/10.1002/9781119698005.ch12","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78585477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Index 指数
{"title":"Index","authors":"","doi":"10.1002/9781119698005.index","DOIUrl":"https://doi.org/10.1002/9781119698005.index","url":null,"abstract":"","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86399260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1