Towards benchmarking erasure coding schemes in object storage system: A systematic review

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-09-04 DOI:10.1016/j.future.2024.107522
Jannatun Noor , Rezuana Imtiaz Upoma , Md. Sadiqul Islam Sakif , A.B.M. Alim Al Islam
{"title":"Towards benchmarking erasure coding schemes in object storage system: A systematic review","authors":"Jannatun Noor ,&nbsp;Rezuana Imtiaz Upoma ,&nbsp;Md. Sadiqul Islam Sakif ,&nbsp;A.B.M. Alim Al Islam","doi":"10.1016/j.future.2024.107522","DOIUrl":null,"url":null,"abstract":"<div><p>Erasure Coding (EC) in cloud storage minimizes data replication by reconstructing data from parity fragments. This method enhances data redundancy and efficiency while reducing storage costs and improving fault tolerance. It is more advantageous than replication in Object Storage Systems. EC guarantees data integrity by ensuring lossless transmission of all coded pieces. As data volumes continue to increase rapidly, the time efficiency of the EC method becomes crucial in ensuring optimal system performance. Various variables, including the algorithm employed, data size, number of storage nodes, hardware resources, and network conditions, can influence the speed of EC operations. Although some literature covers various aspects, there is still a research gap in understanding the I/O activities, time efficiency, and fault tolerance of EC in object storage systems. Hence, our research aims to address these challenges in cloud-based object storage systems. We analyze and benchmark the data storage I/O performance of OpenStack Swift, focusing on the time efficiency of the Reed–Solomon (RS) algorithm across two datasets. Additionally, our contributions include benchmarking EC performance in both local and remote testbeds, utilizing the SimEDC simulator for comprehensive efficiency and fault tolerance assessments. Moreover, we create a comprehensive dataset (MCSD-100) for benchmarking and conduct a systematic literature review. Finally, we identify and discuss future opportunities for enhancing EC in cloud-based object storage systems.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"163 ","pages":"Article 107522"},"PeriodicalIF":6.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24004862","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Erasure Coding (EC) in cloud storage minimizes data replication by reconstructing data from parity fragments. This method enhances data redundancy and efficiency while reducing storage costs and improving fault tolerance. It is more advantageous than replication in Object Storage Systems. EC guarantees data integrity by ensuring lossless transmission of all coded pieces. As data volumes continue to increase rapidly, the time efficiency of the EC method becomes crucial in ensuring optimal system performance. Various variables, including the algorithm employed, data size, number of storage nodes, hardware resources, and network conditions, can influence the speed of EC operations. Although some literature covers various aspects, there is still a research gap in understanding the I/O activities, time efficiency, and fault tolerance of EC in object storage systems. Hence, our research aims to address these challenges in cloud-based object storage systems. We analyze and benchmark the data storage I/O performance of OpenStack Swift, focusing on the time efficiency of the Reed–Solomon (RS) algorithm across two datasets. Additionally, our contributions include benchmarking EC performance in both local and remote testbeds, utilizing the SimEDC simulator for comprehensive efficiency and fault tolerance assessments. Moreover, we create a comprehensive dataset (MCSD-100) for benchmarking and conduct a systematic literature review. Finally, we identify and discuss future opportunities for enhancing EC in cloud-based object storage systems.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为对象存储系统中的擦除编码方案制定基准:系统回顾
云存储中的擦除编码(EC)通过从奇偶校验片段重建数据,最大限度地减少了数据复制。这种方法可提高数据冗余度和效率,同时降低存储成本并提高容错性。它比对象存储系统中的复制更具优势。EC 通过确保无损传输所有编码片段来保证数据的完整性。随着数据量的持续快速增长,EC 方法的时间效率对确保最佳系统性能至关重要。各种变量,包括采用的算法、数据大小、存储节点数量、硬件资源和网络条件,都会影响 EC 的运行速度。虽然一些文献涉及各个方面,但在了解对象存储系统中 EC 的 I/O 活动、时间效率和容错性方面仍存在研究空白。因此,我们的研究旨在应对基于云的对象存储系统中的这些挑战。我们对 OpenStack Swift 的数据存储 I/O 性能进行了分析和基准测试,重点关注两个数据集中里德-所罗门(RS)算法的时间效率。此外,我们的贡献还包括在本地和远程测试平台上对 EC 性能进行基准测试,利用 SimEDC 模拟器进行全面的效率和容错评估。此外,我们还创建了一个用于基准测试的综合数据集(MCSD-100),并进行了系统的文献综述。最后,我们确定并讨论了在基于云的对象存储系统中增强 EC 的未来机遇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
期刊最新文献
Identifying runtime libraries in statically linked linux binaries High throughput edit distance computation on FPGA-based accelerators using HLS In silico framework for genome analysis Adaptive ensemble optimization for memory-related hyperparameters in retraining DNN at edge Convergence-aware optimal checkpointing for exploratory deep learning training jobs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1