{"title":"A Numerical Method for Suffix Array Index Compression","authors":"Baomin Xu, Jie Huang, Yang Yang","doi":"10.14257/IJDTA.2017.10.1.19","DOIUrl":null,"url":null,"abstract":"Suffix arrays is versatile data structures playing a key role in numerous string processing applications such as the data structure can be used to represent the given DNA strings. However, the most serious drawback of suffix arrays is their size, namely space usage. In this paper, we propose a new suffix array compression technique, i.e., numerical method for suffix array index compression, for the problem. With the method, we will translate DNA bases characters ATGC to the corresponding integer number 1234. The experimental results show that the numerical method for suffix array index compression not only can greatly compress the memory space of suffix array, but also can retain the quick search characteristics of suffix array.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"48 1","pages":"207-212"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJDTA.2017.10.1.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Suffix arrays is versatile data structures playing a key role in numerous string processing applications such as the data structure can be used to represent the given DNA strings. However, the most serious drawback of suffix arrays is their size, namely space usage. In this paper, we propose a new suffix array compression technique, i.e., numerical method for suffix array index compression, for the problem. With the method, we will translate DNA bases characters ATGC to the corresponding integer number 1234. The experimental results show that the numerical method for suffix array index compression not only can greatly compress the memory space of suffix array, but also can retain the quick search characteristics of suffix array.