{"title":"Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage Systems","authors":"Zhirong Shen, P. Lee, J. Shu, Wenzhong Guo","doi":"10.1109/SRDS.2017.18","DOIUrl":null,"url":null,"abstract":"Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In this paper, we reconsider this problem from two new perspectives: data correlation and stripe organization, and propose CASO, a correlation-aware stripe organization algorithm. CASO captures data correlation of a data access stream. It packs correlated data into a small number of stripes to reduce the incurred I/Os in partial stripe writes, and further organizes uncorrelated data into stripes to leverage the spatial locality in later accesses. By differentiating correlated and uncorrelated data in stripe organization, we show via extensive trace-driven evaluation that CASO reduces up to 25.1% of parity updates and accelerates the write speed by up to 28.4%.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"32 1","pages":"134-143"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2017.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In this paper, we reconsider this problem from two new perspectives: data correlation and stripe organization, and propose CASO, a correlation-aware stripe organization algorithm. CASO captures data correlation of a data access stream. It packs correlated data into a small number of stripes to reduce the incurred I/Os in partial stripe writes, and further organizes uncorrelated data into stripes to leverage the spatial locality in later accesses. By differentiating correlated and uncorrelated data in stripe organization, we show via extensive trace-driven evaluation that CASO reduces up to 25.1% of parity updates and accelerates the write speed by up to 28.4%.