TERSE/PROLIX (TRPX) - a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data.

IF 1.9 4区 材料科学 Q3 CHEMISTRY, MULTIDISCIPLINARY Acta Crystallographica Section A: Foundations and Advances Pub Date : 2023-11-01 Epub Date: 2023-09-25 DOI:10.1107/S205327332300760X
Senik Matinyan, Jan Pieter Abrahams
{"title":"TERSE/PROLIX (TRPX) - a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data.","authors":"Senik Matinyan,&nbsp;Jan Pieter Abrahams","doi":"10.1107/S205327332300760X","DOIUrl":null,"url":null,"abstract":"<p><p>High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.</p>","PeriodicalId":106,"journal":{"name":"Acta Crystallographica Section A: Foundations and Advances","volume":" ","pages":"536-541"},"PeriodicalIF":1.9000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626653/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Crystallographica Section A: Foundations and Advances","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1107/S205327332300760X","RegionNum":4,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/25 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TERSE/PROLIX (TRPX) -一种快速无损压缩和解压衍射和低温电镜数据的新算法。
晶体学中的高通量数据收集在处理大量数据方面带来了重大挑战。本文提出了一种专门针对衍射数据设计的无损压缩算法TERSE/POLIX(简称TRPX)。利用无机化合物的连续旋转电子衍射数据和原始冷冻电镜数据,将该算法与在gzip、bzip2、CBF(晶体学二进制文件)、Zstandard(zstd)、LZ4和HDF5中使用gzip、LZF和bitshuffle+LZ4滤波器实现的无损压缩算法在压缩效率和速度方面进行了比较。结果表明,TRPX在速度和压缩率方面显著优于所有这些算法。它比bzip2快60倍(压缩率相似),比LZ4快3倍多,LZ4是速度亚军,但压缩率要差得多。TRPX文件与字节顺序无关,编译后算法占用的内存非常少。因此,它可以很容易地在硬件中实现。通过为衍射和原始低温EM数据提供量身定制的解决方案,TRPX有助于更高效的数据分析和解释,同时减轻存储和传输方面的担忧。C++20压缩/解压缩代码、自定义TIFF库和用于读取TRPX文件的ImageJ/Fiji Java插件在GitHub上是根据MIT许可证开源的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Acta Crystallographica Section A: Foundations and Advances
Acta Crystallographica Section A: Foundations and Advances CHEMISTRY, MULTIDISCIPLINARYCRYSTALLOGRAPH-CRYSTALLOGRAPHY
CiteScore
2.60
自引率
11.10%
发文量
419
期刊介绍: Acta Crystallographica Section A: Foundations and Advances publishes articles reporting advances in the theory and practice of all areas of crystallography in the broadest sense. As well as traditional crystallography, this includes nanocrystals, metacrystals, amorphous materials, quasicrystals, synchrotron and XFEL studies, coherent scattering, diffraction imaging, time-resolved studies and the structure of strain and defects in materials. The journal has two parts, a rapid-publication Advances section and the traditional Foundations section. Articles for the Advances section are of particularly high value and impact. They receive expedited treatment and may be highlighted by an accompanying scientific commentary article and a press release. Further details are given in the November 2013 Editorial. The central themes of the journal are, on the one hand, experimental and theoretical studies of the properties and arrangements of atoms, ions and molecules in condensed matter, periodic, quasiperiodic or amorphous, ideal or real, and, on the other, the theoretical and experimental aspects of the various methods to determine these properties and arrangements.
期刊最新文献
Complete classification of six-dimensional iso-edge domains. The general equation of δ direct methods and the novel SMAR algorithm residuals using the absolute value of ρ and the zero conversion of negative ripples. Periodic graphs with coincident edges: folding-ladder and related graphs. Influence of device configuration and noise on a machine learning predictor for the selection of nanoparticle small-angle X-ray scattering models. An alternative method to the Takagi-Taupin equations for studying dark-field X-ray microscopy of deformed crystals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1