Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression

N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah
{"title":"Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression","authors":"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah","doi":"10.1109/e-Science.2009.18","DOIUrl":null,"url":null,"abstract":"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fifth IEEE International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/e-Science.2009.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用索引压缩提高数据存储和分析效率
蛋白质组学中使用的高通量、多维质谱仪器产生的大量数据集给数据采集、存储和分析带来了挑战。数据压缩可以帮助缓解其中的一些问题,但代价是数据访问效率较低,这直接影响了数据分析的计算时间。我们已经开发了一种压缩方法,1)针对目标质谱蛋白质组学数据集进行了优化,2)提供了压缩的大小和速度的好处,同时通过允许从文件中提取未压缩数据的片段而无需解压整个文件来提高分析效率。本文描述了我们的压缩算法,给出了压缩大小和速度的比较指标,并探讨了将该算法应用于广义数据集的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Methodology for File Relationship Discovery A Protocol for Exchanging Scientific Citations Enabling Computational Steering with an Asynchronous-Iterative Computation Framework Topic Maps in the eHumanities Comparing METS and OAI-ORE for Encapsulating Scientific Data Products: A Protein Crystallography Case Study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1