Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)最新文献

英文中文

Acceleration of FM-index Queries Through Prefix-free Parsing. 通过无前缀解析加速fm索引查询。

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2023-08-29 DOI: 10.4230/LIPIcs.WABI.2023.13

Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie

FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [5] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [3] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. The source code for PFP-FM is available at https://github.com/marco-oliva/afm.

例如，fm索引是DNA比对中的关键数据结构，但是使用它们进行搜索通常需要对查询模式中的每个字符进行至少一次随机访问。Ferragina和Fischer在2007年观察到，基于单词的索引通常比基于字符的索引使用更少的随机访问，因此支持更快的搜索。然而，由于DNA缺乏自然的词边界，在应用基于词的fm索引之前，有必要以某种方式对其进行解析。去年，Deng等人提出了通过诱导后缀排序来解析基因组数据，并表明当模式是几千个字符或更长时，所得到的基于单词的fm索引比标准fm索引支持更快的计数查询。在本文中，我们展示了使用无前缀解析（它采用参数来调整短语的平均长度）而不是诱导后缀排序，可以为只有几百个字符的模式提供显著的加速。我们实现了我们的方法，并证明它在查询GRCh38时比竞争方法快3到18倍。并且在对25,000,50,000和100,000个SARS-CoV-2基因组进行查询时始终更快。因此，很明显，我们的方法在内存稍微增加的情况下，比所有最先进的方法加速了count的性能。PFP-FM的源代码可从https://github.com/marco-oliva/afm获得。

{"title":"Acceleration of FM-index Queries Through Prefix-free Parsing.","authors":"Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie","doi":"10.4230/LIPIcs.WABI.2023.13","DOIUrl":"10.4230/LIPIcs.WABI.2023.13","url":null,"abstract":"<p><p>FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [5] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [3] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. The source code for PFP-FM is available at https://github.com/marco-oliva/afm.</p>","PeriodicalId":93254,"journal":{"name":"Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)","volume":"273 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12576618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pangenomic Genotyping with the Marker Array. 标记阵列泛基因组基因分型。

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2022-09-01

Taher Mun, Naga Sai Kavya Vaddadi, Ben Langmead

We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while avoiding the reference bias that results when aligning to a single linear reference. rowbowt can infer accurate genotypes in less time and memory compared to existing graph-based methods.

我们提出了一种新的方法和软件工具，称为rowbowt，应用泛基因组指数从短读测序数据推断基因型的问题。该方法使用一种称为标记数组的新颖索引结构。使用标记阵列，我们可以根据像1000基因组计划这样的大型面板进行基因型变异，同时避免与单个线性参考对齐时产生的参考偏差。与现有的基于图的方法相比，Rowbowt可以在更短的时间和内存中推断出准确的基因型。

引用次数: 0

Detection of Motifs (I) 基序的检测(一)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch8

引用次数: 0

Markov Chains: Log Likelihood ( II ) 马尔可夫链:对数似然(II)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch14

引用次数: 0

Objective Digital Stains ( III ) 数码染色(III)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch7

引用次数: 0

Frequencies and Percentages ( II ) 频率和百分比(II)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch6

引用次数: 0

Markov Chains: The Machine ( I ) 马尔可夫链:机器(I)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch13

引用次数: 0

Representation of Motifs ( II ) 母题的表达(二)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch9

引用次数: 0

Dynamic Backgrounds ( V ) 动态背景(V)

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.ch12

引用次数: 0

Index 指数

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

Pub Date : 2021-07-16 DOI: 10.1002/9781119698005.index

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀