Roei Kisous, Ariel Kolikant, Abhinav Duggal, Sarai Sheinvald, Gala Yadgar
{"title":"The what, The from, and The to: The Migration Games in Deduplicated Systems","authors":"Roei Kisous, Ariel Kolikant, Abhinav Duggal, Sarai Sheinvald, Gala Yadgar","doi":"https://dl.acm.org/doi/10.1145/3565025","DOIUrl":null,"url":null,"abstract":"<p>Deduplication reduces the size of the data stored in large-scale storage systems by replacing duplicate data blocks with references to their unique copies. This creates dependencies between files that contain similar content and complicates the management of data in the system. In this article, we address the problem of data migration, in which files are remapped between different volumes as a result of system expansion or maintenance. The challenge of determining which files and blocks to migrate has been studied extensively for systems without deduplication. In the context of deduplicated storage, however, only simplified migration scenarios have been considered.</p><p>In this article, we formulate the general migration problem for deduplicated systems as an optimization problem whose objective is to minimize the system’s size while ensuring that the storage load is evenly distributed between the system’s volumes and that the network traffic required for the migration does not exceed its allocation.</p><p>We then present three algorithms for generating effective migration plans, each based on a different approach and representing a different trade-off between computation time and migration efficiency. Our <i>greedy algorithm</i> provides modest space savings but is appealing thanks to its exceptionally short runtime. Its results can be improved by using larger system representations. Our <i>theoretically optimal algorithm</i> formulates the migration problem as an integer linear programming (ILP) instance. Its migration plans consistently result in smaller and more balanced systems than those of the greedy approach, although its runtime is long and, as a result, the theoretical optimum is not always found. Our <i>clustering algorithm</i> enjoys the best of both worlds: its migration plans are comparable to those generated by the ILP-based algorithm, but its runtime is shorter, sometimes by an order of magnitude. It can be further accelerated at a modest cost in the quality of its results.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"44 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3565025","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Deduplication reduces the size of the data stored in large-scale storage systems by replacing duplicate data blocks with references to their unique copies. This creates dependencies between files that contain similar content and complicates the management of data in the system. In this article, we address the problem of data migration, in which files are remapped between different volumes as a result of system expansion or maintenance. The challenge of determining which files and blocks to migrate has been studied extensively for systems without deduplication. In the context of deduplicated storage, however, only simplified migration scenarios have been considered.
In this article, we formulate the general migration problem for deduplicated systems as an optimization problem whose objective is to minimize the system’s size while ensuring that the storage load is evenly distributed between the system’s volumes and that the network traffic required for the migration does not exceed its allocation.
We then present three algorithms for generating effective migration plans, each based on a different approach and representing a different trade-off between computation time and migration efficiency. Our greedy algorithm provides modest space savings but is appealing thanks to its exceptionally short runtime. Its results can be improved by using larger system representations. Our theoretically optimal algorithm formulates the migration problem as an integer linear programming (ILP) instance. Its migration plans consistently result in smaller and more balanced systems than those of the greedy approach, although its runtime is long and, as a result, the theoretical optimum is not always found. Our clustering algorithm enjoys the best of both worlds: its migration plans are comparable to those generated by the ILP-based algorithm, but its runtime is shorter, sometimes by an order of magnitude. It can be further accelerated at a modest cost in the quality of its results.
期刊介绍:
The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.