Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage

2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2022-11-10 DOI:10.1109/ASONAM55673.2022.10068661

Xincheng Yuan, M. Moh, Teng-Sheng Moh

{"title":"Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage","authors":"Xincheng Yuan, M. Moh, Teng-Sheng Moh","doi":"10.1109/ASONAM55673.2022.10068661","DOIUrl":null,"url":null,"abstract":"Deduplication is the process of removing replicated data content from storage facilities like online databases, cloud datastore, local file systems, etc. It is commonly performed as part of data preprocessing to eliminate redundant data that requires extra storage spaces and computing power and is crucial for data storage management in cloud computing. Deduplication is essential for file backup systems since duplicated files will presumably consume more storage space, especially with a short backup period such as daily. A common technique in this field involves splitting files into chunks whose hashes can be compared using data structures or techniques like clustering. This paper explores the possibility of performing such file chunk deduplication leveraging an innovative reinforcement learning approach to achieve a high deduplication ratio. The proposed system is named SegDup, which achieves 13% higher deduplication ratio than Extreme Binning, a state-of-the art deduplication algorithm.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deduplication is the process of removing replicated data content from storage facilities like online databases, cloud datastore, local file systems, etc. It is commonly performed as part of data preprocessing to eliminate redundant data that requires extra storage spaces and computing power and is crucial for data storage management in cloud computing. Deduplication is essential for file backup systems since duplicated files will presumably consume more storage space, especially with a short backup period such as daily. A common technique in this field involves splitting files into chunks whose hashes can be compared using data structures or techniques like clustering. This paper explores the possibility of performing such file chunk deduplication leveraging an innovative reinforcement learning approach to achieve a high deduplication ratio. The proposed system is named SegDup, which achieves 13% higher deduplication ratio than Extreme Binning, a state-of-the art deduplication algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于强化学习的云存储全文件块重复数据删除

重复数据删除是从在线数据库、云数据存储、本地文件系统等存储设施中删除复制数据内容的过程。它通常作为数据预处理的一部分执行，以消除需要额外存储空间和计算能力的冗余数据，并且对于云计算中的数据存储管理至关重要。重复数据删除对于文件备份系统是必不可少的，因为重复的文件可能会消耗更多的存储空间，特别是对于较短的备份周期(如每日备份)。该领域的一种常用技术涉及将文件分割成块，这些块的哈希值可以使用数据结构或集群等技术进行比较。本文探讨了利用创新的强化学习方法执行此类文件块重复数据删除的可能性，以实现高重复数据删除比率。该系统被命名为SegDup，它的重复数据删除率比最先进的重复数据删除算法Extreme Binning高出13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量