Jannatun Noor , Rezuana Imtiaz Upoma , Md. Sadiqul Islam Sakif , A.B.M. Alim Al Islam
{"title":"Towards benchmarking erasure coding schemes in object storage system: A systematic review","authors":"Jannatun Noor , Rezuana Imtiaz Upoma , Md. Sadiqul Islam Sakif , A.B.M. Alim Al Islam","doi":"10.1016/j.future.2024.107522","DOIUrl":null,"url":null,"abstract":"<div><p>Erasure Coding (EC) in cloud storage minimizes data replication by reconstructing data from parity fragments. This method enhances data redundancy and efficiency while reducing storage costs and improving fault tolerance. It is more advantageous than replication in Object Storage Systems. EC guarantees data integrity by ensuring lossless transmission of all coded pieces. As data volumes continue to increase rapidly, the time efficiency of the EC method becomes crucial in ensuring optimal system performance. Various variables, including the algorithm employed, data size, number of storage nodes, hardware resources, and network conditions, can influence the speed of EC operations. Although some literature covers various aspects, there is still a research gap in understanding the I/O activities, time efficiency, and fault tolerance of EC in object storage systems. Hence, our research aims to address these challenges in cloud-based object storage systems. We analyze and benchmark the data storage I/O performance of OpenStack Swift, focusing on the time efficiency of the Reed–Solomon (RS) algorithm across two datasets. Additionally, our contributions include benchmarking EC performance in both local and remote testbeds, utilizing the SimEDC simulator for comprehensive efficiency and fault tolerance assessments. Moreover, we create a comprehensive dataset (MCSD-100) for benchmarking and conduct a systematic literature review. Finally, we identify and discuss future opportunities for enhancing EC in cloud-based object storage systems.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"163 ","pages":"Article 107522"},"PeriodicalIF":6.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24004862","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Erasure Coding (EC) in cloud storage minimizes data replication by reconstructing data from parity fragments. This method enhances data redundancy and efficiency while reducing storage costs and improving fault tolerance. It is more advantageous than replication in Object Storage Systems. EC guarantees data integrity by ensuring lossless transmission of all coded pieces. As data volumes continue to increase rapidly, the time efficiency of the EC method becomes crucial in ensuring optimal system performance. Various variables, including the algorithm employed, data size, number of storage nodes, hardware resources, and network conditions, can influence the speed of EC operations. Although some literature covers various aspects, there is still a research gap in understanding the I/O activities, time efficiency, and fault tolerance of EC in object storage systems. Hence, our research aims to address these challenges in cloud-based object storage systems. We analyze and benchmark the data storage I/O performance of OpenStack Swift, focusing on the time efficiency of the Reed–Solomon (RS) algorithm across two datasets. Additionally, our contributions include benchmarking EC performance in both local and remote testbeds, utilizing the SimEDC simulator for comprehensive efficiency and fault tolerance assessments. Moreover, we create a comprehensive dataset (MCSD-100) for benchmarking and conduct a systematic literature review. Finally, we identify and discuss future opportunities for enhancing EC in cloud-based object storage systems.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.