内容定义的默克尔树，用于高效的集装箱交付

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI:10.1109/HiPC50609.2020.00026

Yuta Nakamura, Raza Ahmad, T. Malik

{"title":"内容定义的默克尔树，用于高效的集装箱交付","authors":"Yuta Nakamura, Raza Ahmad, T. Malik","doi":"10.1109/HiPC50609.2020.00026","DOIUrl":null,"url":null,"abstract":"Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Content-defined Merkle Trees for Efficient Container Delivery\",\"authors\":\"Yuta Nakamura, Raza Ahmad, T. Malik\",\"doi\":\"10.1109/HiPC50609.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.\",\"PeriodicalId\":375004,\"journal\":{\"name\":\"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC50609.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

当软件交付链中的环境发生变化时，容器化简化了应用程序的共享和部署。要部署应用程序，容器交付方法可以推送和拉取容器映像。这些方法对文件和层(文件集)粒度进行操作，并在容器中引入冗余数据。由于复制和提供冗余数据，一些容器操作(如升级、安装和维护)变得效率低下。在本文中，我们通过使用内容定义的分块验证结果，重新建立了最近的结果，即块级重复数据删除减少了单个容器的大小。然而，块级重复数据删除并不能提高推/拉操作的效率，因为推/拉操作必须确定要传输的特定块。我们在容器中的重复数据删除存储上引入了内容定义的Merkle树(CDMT)。CDMT对重复数据删除块进行索引，并在客户端上以对数时间确定对块的更改。CDMT有效地从注册中心推送和提取容器映像，特别是在客户机上升级和(重新)供应容器时。我们还描述了注册中心如何在推出新映像版本时有效地维护CDMT索引。我们使用Docker Hub的15个容器映像和233个映像版本，从磁盘和网络I/O节省方面展示了CDMT在Merkle树上的可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Content-defined Merkle Trees for Efficient Container Delivery

Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

自引率

0.00%

发文量

期刊最新文献

HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program