N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah
{"title":"Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression","authors":"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah","doi":"10.1109/e-Science.2009.18","DOIUrl":null,"url":null,"abstract":"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fifth IEEE International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/e-Science.2009.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.