{"title":"RMD:基于相似性和合并的高性能重复数据删除方法","authors":"Panfeng Zhang, Ping Huang, Xubin He, Hua Wang, Lingyu Yan, Ke Zhou","doi":"10.1109/ICPP.2016.68","DOIUrl":null,"url":null,"abstract":"Data deduplication, a data redundancy elimination technique, has been employed in almost all kinds of application environments to reduce storage space. However, one of the main challenges facing deduplication technology is to provide a fast key-value fingerprint index for large datasets, as the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and the data resemblance algorithm to dramatically reduce the query range for deduplication. Moreover, RMD utilizes mergence based approach to merge resemblance segments to relevant bins, and exploits frequency-based Fingerprint Retention Policy to reduce the bin capacity to improve query throughput and improve data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve pretty high query performance and outperforms several state-of-the-art deduplication schemes.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"RMD: A Resemblance and Mergence Based Approach for High Performance Deduplication\",\"authors\":\"Panfeng Zhang, Ping Huang, Xubin He, Hua Wang, Lingyu Yan, Ke Zhou\",\"doi\":\"10.1109/ICPP.2016.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data deduplication, a data redundancy elimination technique, has been employed in almost all kinds of application environments to reduce storage space. However, one of the main challenges facing deduplication technology is to provide a fast key-value fingerprint index for large datasets, as the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and the data resemblance algorithm to dramatically reduce the query range for deduplication. Moreover, RMD utilizes mergence based approach to merge resemblance segments to relevant bins, and exploits frequency-based Fingerprint Retention Policy to reduce the bin capacity to improve query throughput and improve data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve pretty high query performance and outperforms several state-of-the-art deduplication schemes.\",\"PeriodicalId\":409991,\"journal\":{\"name\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2016.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RMD: A Resemblance and Mergence Based Approach for High Performance Deduplication
Data deduplication, a data redundancy elimination technique, has been employed in almost all kinds of application environments to reduce storage space. However, one of the main challenges facing deduplication technology is to provide a fast key-value fingerprint index for large datasets, as the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and the data resemblance algorithm to dramatically reduce the query range for deduplication. Moreover, RMD utilizes mergence based approach to merge resemblance segments to relevant bins, and exploits frequency-based Fingerprint Retention Policy to reduce the bin capacity to improve query throughput and improve data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve pretty high query performance and outperforms several state-of-the-art deduplication schemes.