利用索引压缩提高数据存储和分析效率

2009 Fifth IEEE International Conference on e-Science Pub Date : 2009-12-09 DOI:10.1109/e-Science.2009.18

N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah

{"title":"利用索引压缩提高数据存储和分析效率","authors":"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah","doi":"10.1109/e-Science.2009.18","DOIUrl":null,"url":null,"abstract":"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression\",\"authors\":\"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah\",\"doi\":\"10.1109/e-Science.2009.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.\",\"PeriodicalId\":325840,\"journal\":{\"name\":\"2009 Fifth IEEE International Conference on e-Science\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Fifth IEEE International Conference on e-Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/e-Science.2009.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fifth IEEE International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/e-Science.2009.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

蛋白质组学中使用的高通量、多维质谱仪器产生的大量数据集给数据采集、存储和分析带来了挑战。数据压缩可以帮助缓解其中的一些问题，但代价是数据访问效率较低，这直接影响了数据分析的计算时间。我们已经开发了一种压缩方法，1)针对目标质谱蛋白质组学数据集进行了优化，2)提供了压缩的大小和速度的好处，同时通过允许从文件中提取未压缩数据的片段而无需解压整个文件来提高分析效率。本文描述了我们的压缩算法，给出了压缩大小和速度的比较指标，并探讨了将该算法应用于广义数据集的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression

The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 Fifth IEEE International Conference on e-Science

自引率

0.00%

发文量

期刊最新文献

A Methodology for File Relationship Discovery A Protocol for Exchanging Scientific Citations Enabling Computational Steering with an Asynchronous-Iterative Computation Framework Topic Maps in the eHumanities Comparing METS and OAI-ORE for Encapsulating Scientific Data Products: A Protein Crystallography Case Study