N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah
{"title":"利用索引压缩提高数据存储和分析效率","authors":"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah","doi":"10.1109/e-Science.2009.18","DOIUrl":null,"url":null,"abstract":"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression\",\"authors\":\"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah\",\"doi\":\"10.1109/e-Science.2009.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.\",\"PeriodicalId\":325840,\"journal\":{\"name\":\"2009 Fifth IEEE International Conference on e-Science\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Fifth IEEE International Conference on e-Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/e-Science.2009.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fifth IEEE International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/e-Science.2009.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression
The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.