{"title":"Data De-duplication on Similar File Detection","authors":"Yueguang Zhu, Xingjun Zhang, Runting Zhao, Xiaoshe Dong","doi":"10.1109/IMIS.2014.9","DOIUrl":null,"url":null,"abstract":"At present, there exist many bottlenecks in block level data de-duplication on the metadata management and read/write rate. In order to achieve higher de-duplication elimination ratio, the traditional way is to expand the range of data for data de-duplication, but that would make metadata fields longer and increase the number of metadata entries. When detecting the redundant data, metadata needs to be constantly imported and exported into the memory and access bottleneck will be produced. So it is necessary to detect similar documents to classify valuable data for de-duplication. In this paper, we propose a new method of block-level data de-duplication combined with similar file detection. At the time of guaranteeing the de-duplication elimination ratio, we narrow the range of data to reduce the metadata and eliminate performance bottlenecks. We present a detailed evaluation of our method and other existing data deduplication methods, and we show that our method meets its design goals as it improves the de-duplication ratio while reducing overhead costs.","PeriodicalId":345694,"journal":{"name":"2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMIS.2014.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
At present, there exist many bottlenecks in block level data de-duplication on the metadata management and read/write rate. In order to achieve higher de-duplication elimination ratio, the traditional way is to expand the range of data for data de-duplication, but that would make metadata fields longer and increase the number of metadata entries. When detecting the redundant data, metadata needs to be constantly imported and exported into the memory and access bottleneck will be produced. So it is necessary to detect similar documents to classify valuable data for de-duplication. In this paper, we propose a new method of block-level data de-duplication combined with similar file detection. At the time of guaranteeing the de-duplication elimination ratio, we narrow the range of data to reduce the metadata and eliminate performance bottlenecks. We present a detailed evaluation of our method and other existing data deduplication methods, and we show that our method meets its design goals as it improves the de-duplication ratio while reducing overhead costs.