首页 > 最新文献

ACM Transactions on Storage (TOS)最新文献

英文 中文
DEFUSE: An Interface for Fast and Correct User Space File System Access 一个快速和正确的用户空间文件系统访问接口
Pub Date : 2022-08-30 DOI: 10.1145/3494556
James Lembke, Pierre-Louis Roman, P. Eugster
Traditionally, the only option for developers was to implement file systems (FSs) via drivers within the operating system kernel. However, there exists a growing number of file systems (FSs), notably distributed FSs for the cloud, whose interfaces are implemented solely in user space to (i) isolate FS logic, (ii) take advantage of user space libraries, and/or (iii) for rapid FS prototyping. Common interfaces for implementing FSs in user space exist, but they do not guarantee POSIX compliance in all cases, or suffer from considerable performance penalties due to high amounts of wait context switchs between kernel and user space processes. We propose DEFUSE: an interface for user space FSs that provides fast accesses while ensuring access correctness and requiring no modifications to applications. DEFUSE: achieves significant performance improvements over existing user space FS interfaces thanks to its novel design that drastically reduces the number of wait context switchs for FS accesses. Additionally, to ensure access correctness, DEFUSE: maintains POSIX compliance for FS accesses thanks to three novel concepts of bypassed file descriptor (FD) lookup, FD stashing, and user space paging. Our evaluation spanning a variety of workloads shows that by reducing the number of wait context switchs per workload from as many as 16,000 or 41,000 with filesystem in user space down to 9 on average, DEFUSE: increases performance 2× over existing interfaces for typical workloads and by as many as 10× in certain instances.
传统上,开发人员的唯一选择是通过操作系统内核中的驱动程序实现文件系统(fs)。然而,存在越来越多的文件系统(FS),特别是用于云的分布式FS,其接口仅在用户空间中实现,以(i)隔离FS逻辑,(ii)利用用户空间库,和/或(iii)快速FS原型。存在用于在用户空间中实现fs的通用接口,但是它们不能保证在所有情况下都符合POSIX,或者由于内核和用户空间进程之间的大量等待上下文切换而遭受相当大的性能损失。我们提出了一个用户空间fs的接口,它提供快速访问,同时确保访问正确性,并且不需要修改应用程序。由于其新颖的设计,大大减少了FS访问的等待上下文切换的数量,因此与现有的用户空间FS界面相比,实现了显著的性能改进。此外,为了确保访问的正确性,通过三个新颖的概念:绕过文件描述符(FD)查找、FD存储和用户空间分页,可以维护FS访问的POSIX遵从性。我们对各种工作负载的评估表明,通过将每个工作负载的等待上下文切换次数从用户空间中的文件系统的16,000或41,000次减少到平均9次,在典型工作负载下,与现有接口相比,性能提高了2倍,在某些情况下提高了10倍。
{"title":"DEFUSE: An Interface for Fast and Correct User Space File System Access","authors":"James Lembke, Pierre-Louis Roman, P. Eugster","doi":"10.1145/3494556","DOIUrl":"https://doi.org/10.1145/3494556","url":null,"abstract":"Traditionally, the only option for developers was to implement file systems (FSs) via drivers within the operating system kernel. However, there exists a growing number of file systems (FSs), notably distributed FSs for the cloud, whose interfaces are implemented solely in user space to (i) isolate FS logic, (ii) take advantage of user space libraries, and/or (iii) for rapid FS prototyping. Common interfaces for implementing FSs in user space exist, but they do not guarantee POSIX compliance in all cases, or suffer from considerable performance penalties due to high amounts of wait context switchs between kernel and user space processes. We propose DEFUSE: an interface for user space FSs that provides fast accesses while ensuring access correctness and requiring no modifications to applications. DEFUSE: achieves significant performance improvements over existing user space FS interfaces thanks to its novel design that drastically reduces the number of wait context switchs for FS accesses. Additionally, to ensure access correctness, DEFUSE: maintains POSIX compliance for FS accesses thanks to three novel concepts of bypassed file descriptor (FD) lookup, FD stashing, and user space paging. Our evaluation spanning a variety of workloads shows that by reducing the number of wait context switchs per workload from as many as 16,000 or 41,000 with filesystem in user space down to 9 on average, DEFUSE: increases performance 2× over existing interfaces for typical workloads and by as many as 10× in certain instances.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126545898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
WebAssembly-based Delta Sync for Cloud Storage Services 基于webassembly的增量同步云存储服务
Pub Date : 2022-08-30 DOI: 10.1145/3502847
Jianwei Zheng, Zhenhua Li, Yuanhui Qiu, Hao Lin, HE Xiao, Yang Li, Yun-Fei Liu
Delta synchronization (sync) is crucial to the network-level efficiency of cloud storage services, especially when handling large files with small increments. Practical delta sync techniques are, however, only available for PC clients and mobile apps, but not web browsers—the most pervasive and OS-independent access method. To bridge this gap, prior work concentrates on either reversing the delta sync protocol or utilizing the native client, all striving around the tradeoffs among efficiency, applicability, and usability and thus forming an “impossible triangle.” Recently, we note the advent of WebAssembly (WASM), a portable binary instruction format that is efficient in both encoding size and load time. In principle, the unique advantages of WASM can make web-based applications enjoy near-native runtime speed without significant cloud-side or client-side changes. Thus, we implement a straightforward WASM-based delta sync solution, WASMrsync, finding its quasi-asynchronous working manner and conventional In-situ Separate Memory Allocation greatly increase sync time and memory usage. To address them, we strategically devise sync-async code decoupling and streaming compilation, together with Informed In-place File Construction. The resulting solution, WASMrsync+, achieves comparable sync time as the state-of-the-art (most efficient) solution with nearly only half of memory usage, letting the “impossible triangle” reach a reconciliation.
增量同步(同步)对于云存储服务的网络级效率至关重要,特别是在处理具有小增量的大文件时。然而,实际的增量同步技术只适用于PC客户端和移动应用程序,而不适用于web浏览器——这是最普遍且与操作系统无关的访问方法。为了弥补这一差距,之前的工作集中在反转增量同步协议或利用本地客户端上,所有这些都围绕着效率、适用性和可用性之间的权衡而努力,从而形成了一个“不可能的三角形”。最近,我们注意到WebAssembly (WASM)的出现,这是一种可移植的二进制指令格式,在编码大小和加载时间上都很有效。原则上,WASM的独特优势可以使基于web的应用程序享受接近本机的运行时速度,而无需进行重大的云端或客户端更改。因此,我们实现了一个简单的基于wasm的增量同步解决方案WASMrsync,发现它的准异步工作方式和传统的原位分离内存分配大大增加了同步时间和内存使用。为了解决这些问题,我们战略性地设计了同步-异步代码解耦和流式编译,以及知情就地文件构建。最终的解决方案WASMrsync+实现了与最先进(最有效)的解决方案相当的同步时间,而内存使用量几乎只有一半,从而使“不可能三角”达成了和解。
{"title":"WebAssembly-based Delta Sync for Cloud Storage Services","authors":"Jianwei Zheng, Zhenhua Li, Yuanhui Qiu, Hao Lin, HE Xiao, Yang Li, Yun-Fei Liu","doi":"10.1145/3502847","DOIUrl":"https://doi.org/10.1145/3502847","url":null,"abstract":"Delta synchronization (sync) is crucial to the network-level efficiency of cloud storage services, especially when handling large files with small increments. Practical delta sync techniques are, however, only available for PC clients and mobile apps, but not web browsers—the most pervasive and OS-independent access method. To bridge this gap, prior work concentrates on either reversing the delta sync protocol or utilizing the native client, all striving around the tradeoffs among efficiency, applicability, and usability and thus forming an “impossible triangle.” Recently, we note the advent of WebAssembly (WASM), a portable binary instruction format that is efficient in both encoding size and load time. In principle, the unique advantages of WASM can make web-based applications enjoy near-native runtime speed without significant cloud-side or client-side changes. Thus, we implement a straightforward WASM-based delta sync solution, WASMrsync, finding its quasi-asynchronous working manner and conventional In-situ Separate Memory Allocation greatly increase sync time and memory usage. To address them, we strategically devise sync-async code decoupling and streaming compilation, together with Informed In-place File Construction. The resulting solution, WASMrsync+, achieves comparable sync time as the state-of-the-art (most efficient) solution with nearly only half of memory usage, letting the “impossible triangle” reach a reconciliation.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121145288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Donag: Generating Efficient Patches and Diffs for Compressed Archives Donag:生成有效的补丁和差异压缩档案
Pub Date : 2022-07-27 DOI: 10.1145/3507919
Michael J. May
Differencing between compressed archives is a common task in file management and synchronization. Applications include source code distribution, application updates, and document synchronization. General purpose binary differencing tools can create and apply patches to compressed archives, but don’t consider the internal structure of the compressed archive or the file lifecycle. Therefore, they miss opportunities to save space based on the archive’s internal structure and metadata. To address the gap, we develop a content-aware, format independent theory for differencing on compressed archives and propose a canonical form and digest for compressed archives. Based on them, we present Donag, a content-aware differencing and patching algorithm that produces smaller patches than general purpose binary differencing tools on versioned archives by exploiting the compressed archives’ internal structure. Donag uses the VCDiff and BSDiff engines internally. We compare Donag’s patches to ones produced by bsdiff, xdelta3, and Delta++ on three classes of compressed archives: open-source code repositories, large and small applications, and office productivity documents (DOCX, XLSX, PPTX). Donag’s patches are typically 10% to 89% smaller than those produced by bsdiff, xdelta3, and Delta++, with reasonable memory overhead and throughput on commodity hardware. In the worst case, Donag’s patches are negligibly larger.
在文件管理和同步中,区分压缩档案是一项常见的任务。应用程序包括源代码分发、应用程序更新和文档同步。通用的二进制差异工具可以创建并应用补丁到压缩归档,但不考虑压缩归档的内部结构或文件生命周期。因此,它们错过了基于归档的内部结构和元数据节省空间的机会。为了解决这一差距,我们提出了一种内容感知的、格式独立的压缩档案差异理论,并提出了压缩档案的规范形式和摘要。在此基础上,我们提出了Donag,一种内容感知差分和补丁算法,通过利用压缩档案的内部结构,在版本化档案上产生比通用二进制差分工具更小的补丁。Donag在内部使用VCDiff和BSDiff引擎。我们将Donag的补丁与bsdiff、xdelta3和Delta++在三类压缩归档上生成的补丁进行了比较:开源代码存储库、大型和小型应用程序以及办公生产力文档(DOCX、XLSX、PPTX)。dong的补丁通常比bsdiff、xdelta3和dell++产生的补丁小10%到89%,并且在商用硬件上具有合理的内存开销和吞吐量。在最坏的情况下,Donag的补丁要大得可以忽略不计。
{"title":"Donag: Generating Efficient Patches and Diffs for Compressed Archives","authors":"Michael J. May","doi":"10.1145/3507919","DOIUrl":"https://doi.org/10.1145/3507919","url":null,"abstract":"Differencing between compressed archives is a common task in file management and synchronization. Applications include source code distribution, application updates, and document synchronization. General purpose binary differencing tools can create and apply patches to compressed archives, but don’t consider the internal structure of the compressed archive or the file lifecycle. Therefore, they miss opportunities to save space based on the archive’s internal structure and metadata. To address the gap, we develop a content-aware, format independent theory for differencing on compressed archives and propose a canonical form and digest for compressed archives. Based on them, we present Donag, a content-aware differencing and patching algorithm that produces smaller patches than general purpose binary differencing tools on versioned archives by exploiting the compressed archives’ internal structure. Donag uses the VCDiff and BSDiff engines internally. We compare Donag’s patches to ones produced by bsdiff, xdelta3, and Delta++ on three classes of compressed archives: open-source code repositories, large and small applications, and office productivity documents (DOCX, XLSX, PPTX). Donag’s patches are typically 10% to 89% smaller than those produced by bsdiff, xdelta3, and Delta++, with reasonable memory overhead and throughput on commodity hardware. In the worst case, Donag’s patches are negligibly larger.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115327421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building GC-free Key-value Store on HM-SMR Drives with ZoneFS 用ZoneFS在HM-SMR驱动器上构建无gc键值存储
Pub Date : 2022-07-22 DOI: 10.1145/3502846
Yiwen Zhang, Ting Yao, Ji-guang Wan, Changsheng Xie
Host-managed shingled magnetic recording drives (HM-SMR) are advantageous in capacity to harness the explosive growth of data. For key-value (KV) stores based on log-structured merge trees (LSM-trees), the HM-SMR drive is an ideal solution owning to its capacity, predictable performance, and economical cost. However, building an LSM-tree-based KV store on HM-SMR drives presents severe challenges in maintaining the performance and space utilization efficiency due to the redundant cleaning processes for applications and storage devices (i.e., compaction and garbage collection). To eliminate the overhead of on-disk garbage collection (GC) and improve compaction efficiency, this article presents GearDB, a GC-free KV store tailored for HM-SMR drives. GearDB improves the write performance and space efficiency through three new techniques: a new on-disk data layout, compaction windows, and a novel gear compaction algorithm. We further augment the read performance of GearDB with a new SSTable layout and read ahead mechanism. We implement GearDB with LevelDB, and use zonefs to access a real HM-SMR drive. Our extensive experiments confirm that GearDB achieves both high performance and space efficiency, i.e., on average 1.7× and 1.5× better than LevelDB in random write and read, respectively, with up to 86.9% space efficiency.
主机管理的瓦式磁记录驱动器(HM-SMR)在利用数据爆炸式增长的能力方面具有优势。对于基于日志结构合并树(lsm -tree)的键值(KV)存储,HM-SMR驱动器因其容量大、性能可预测且成本经济而成为理想的解决方案。然而,由于应用程序和存储设备的冗余清理过程(即压缩和垃圾收集),在HM-SMR驱动器上构建基于lsm树的KV存储在保持性能和空间利用效率方面提出了严峻的挑战。为了消除磁盘上垃圾收集(GC)的开销并提高压缩效率,本文介绍了GearDB,这是为HM-SMR驱动器量身定制的无GC的KV存储。GearDB通过三种新技术提高了写性能和空间效率:新的磁盘上数据布局、压缩窗口和新的齿轮压缩算法。我们通过新的SSTable布局和预读机制进一步增强了GearDB的读性能。我们使用LevelDB实现GearDB,并使用zone来访问真正的HM-SMR驱动器。我们的大量实验证实,GearDB实现了高性能和空间效率,即在随机写入和读取方面分别比LevelDB平均高1.7倍和1.5倍,空间效率高达86.9%。
{"title":"Building GC-free Key-value Store on HM-SMR Drives with ZoneFS","authors":"Yiwen Zhang, Ting Yao, Ji-guang Wan, Changsheng Xie","doi":"10.1145/3502846","DOIUrl":"https://doi.org/10.1145/3502846","url":null,"abstract":"Host-managed shingled magnetic recording drives (HM-SMR) are advantageous in capacity to harness the explosive growth of data. For key-value (KV) stores based on log-structured merge trees (LSM-trees), the HM-SMR drive is an ideal solution owning to its capacity, predictable performance, and economical cost. However, building an LSM-tree-based KV store on HM-SMR drives presents severe challenges in maintaining the performance and space utilization efficiency due to the redundant cleaning processes for applications and storage devices (i.e., compaction and garbage collection). To eliminate the overhead of on-disk garbage collection (GC) and improve compaction efficiency, this article presents GearDB, a GC-free KV store tailored for HM-SMR drives. GearDB improves the write performance and space efficiency through three new techniques: a new on-disk data layout, compaction windows, and a novel gear compaction algorithm. We further augment the read performance of GearDB with a new SSTable layout and read ahead mechanism. We implement GearDB with LevelDB, and use zonefs to access a real HM-SMR drive. Our extensive experiments confirm that GearDB achieves both high performance and space efficiency, i.e., on average 1.7× and 1.5× better than LevelDB in random write and read, respectively, with up to 86.9% space efficiency.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123704586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Kangaroo: Theory and Practice of Caching Billions of Tiny Objects on Flash 袋鼠:在Flash上缓存数十亿微小对象的理论和实践
Pub Date : 2022-06-13 DOI: 10.1145/3542928
Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, S. Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, G. Ganger
Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i) SSDs can read/write data only in multi-KB “pages” that are much larger than a single object, stressing the limited number of times flash can be written; and (ii) very few bits per cached object can be kept in DRAM without losing flash’s cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo, a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo’s flash writes. Experiments using traces from Meta and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo’s design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated by analytical models presented herein and with a test deployment of Kangaroo in a production flash cache at Meta.
许多社交媒体和物联网服务都有非常大的工作集,由数十亿个微小(≈100 B)对象组成。大型的基于闪存的缓存对于以可接受的货币成本为这些工作集提供服务非常重要。然而,在闪存上缓存微小对象是具有挑战性的,原因有两个:(i) ssd只能在多kb的“页”中读取/写入数据,这比单个对象大得多,强调闪存可以写入的次数有限;(ii)在不失去闪存的成本优势的情况下,每个缓存对象可以保存在DRAM中的比特很少。不幸的是,现有的闪存缓存设计无法解决这些挑战:写入优化设计需要太多的DRAM,而DRAM优化设计需要太多的闪存写入。我们提出袋鼠,一种新的闪存缓存设计,优化了DRAM的使用和闪存写入,以最大限度地提高缓存性能,同时最大限度地降低成本。Kangaroo结合了一个大的、集关联的缓存和一个小的、日志结构的缓存。集合关联缓存需要最少的DRAM,而日志结构缓存则最小化了Kangaroo的闪存写入。使用Meta和Twitter跟踪的实验表明,Kangaroo实现了接近最佳先前DRAM优化设计的DRAM使用率,接近最佳先前写入优化设计的闪存写入,并且缺失率优于两者。袋鼠的设计在允许的写入速率、DRAM大小和闪存大小范围内都是帕累托最优的,比目前的技术水平减少了29%的失误。这些结果得到了本文提出的分析模型的证实,并在Meta的生产闪存缓存中测试部署了Kangaroo。
{"title":"Kangaroo: Theory and Practice of Caching Billions of Tiny Objects on Flash","authors":"Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, S. Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, G. Ganger","doi":"10.1145/3542928","DOIUrl":"https://doi.org/10.1145/3542928","url":null,"abstract":"Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i) SSDs can read/write data only in multi-KB “pages” that are much larger than a single object, stressing the limited number of times flash can be written; and (ii) very few bits per cached object can be kept in DRAM without losing flash’s cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo, a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo’s flash writes. Experiments using traces from Meta and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo’s design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated by analytical models presented herein and with a test deployment of Kangaroo in a production flash cache at Meta.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133521398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Nil-external Interfaces for Fast Replicated Storage 利用非外部接口实现快速复制存储
Pub Date : 2022-06-06 DOI: 10.1145/3542821
Aishwarya Ganesan, R. Alagappan, Anthony Rebello, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Do some storage interfaces enable higher performance than others? Can one identify and exploit such interfaces to realize high performance in storage systems? This article answers these questions in the affirmative by identifying nil-externality, a property of storage interfaces. A nil-externalizing (nilext) interface may modify state within a storage system but does not externalize its effects or system state immediately to the outside world. As a result, a storage system can apply nilext operations lazily, improving performance. In this article, we take advantage of nilext interfaces to build high-performance replicated storage. We implement Skyros, a nilext-aware replication protocol that offers high performance by deferring ordering and executing operations until their effects are externalized. We show that exploiting nil-externality offers significant benefit: For many workloads, Skyros provides higher performance than standard consensus-based replication. For example, Skyros offers 3× lower latency while providing the same high throughput offered by throughput-optimized Paxos.
是否某些存储接口比其他存储接口提供更高的性能?能否识别并利用这些接口来实现存储系统的高性能?本文通过识别非外部性(存储接口的一种属性)来肯定地回答这些问题。非外部化(nilext)接口可以修改存储系统内的状态,但不会将其效果或系统状态立即外部化到外部世界。因此,存储系统可以延迟应用next操作,从而提高性能。在本文中,我们将利用nilext接口来构建高性能复制存储。我们实现了Skyros,这是一种可感知next的复制协议,它通过延迟排序和执行操作来提供高性能,直到它们的效果被外部化。我们展示了利用无外部性提供了显著的好处:对于许多工作负载,Skyros提供了比标准的基于共识的复制更高的性能。例如,Skyros提供了3倍的低延迟,同时提供了与吞吐量优化的Paxos相同的高吞吐量。
{"title":"Exploiting Nil-external Interfaces for Fast Replicated Storage","authors":"Aishwarya Ganesan, R. Alagappan, Anthony Rebello, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.1145/3542821","DOIUrl":"https://doi.org/10.1145/3542821","url":null,"abstract":"Do some storage interfaces enable higher performance than others? Can one identify and exploit such interfaces to realize high performance in storage systems? This article answers these questions in the affirmative by identifying nil-externality, a property of storage interfaces. A nil-externalizing (nilext) interface may modify state within a storage system but does not externalize its effects or system state immediately to the outside world. As a result, a storage system can apply nilext operations lazily, improving performance. In this article, we take advantage of nilext interfaces to build high-performance replicated storage. We implement Skyros, a nilext-aware replication protocol that offers high performance by deferring ordering and executing operations until their effects are externalized. We show that exploiting nil-externality offers significant benefit: For many workloads, Skyros provides higher performance than standard consensus-based replication. For example, Skyros offers 3× lower latency while providing the same high throughput offered by throughput-optimized Paxos.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130017594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s Locality 从超维结构到线性结构:重复数据的局部性维护
Pub Date : 2022-06-02 DOI: 10.1145/3507921
Xiangyu Zou, Jingsong Yuan, Philip Shilane, Wen Xia, Haijun Zhang, Xuan Wang
Data deduplication is widely used to reduce the size of backup workloads, but it has the known disadvantage of causing poor data locality, also referred to as the fragmentation problem. This results from the gap between the hyper-dimensional structure of deduplicated data and the sequential nature of many storage devices, and this leads to poor restore and garbage collection (GC) performance. Current research has considered writing duplicates to maintain locality (e.g., rewriting) or caching data in memory or SSD, but fragmentation continues to lower restore and GC performance. Investigating the locality issue, we design a method to flatten the hyper-dimensional structured deduplicated data to a one-dimensional format, which is based on classification of each chunk’s lifecycle, and this creates our proposed data layout. Furthermore, we present a novel management-friendly deduplication framework, called MFDedup, that applies our data layout and maintains locality as much as possible. Specifically, we use two key techniques in MFDedup: Neighbor-duplicate-focus indexing (NDF) and Across-version-aware Reorganization scheme (AVAR). NDF performs duplicate detection against a previous backup, then AVAR rearranges chunks with an offline and iterative algorithm into a compact, sequential layout, which nearly eliminates random I/O during file restores after deduplication. Evaluation results with five backup datasets demonstrate that, compared with state-of-the-art techniques, MFDedup achieves deduplication ratios that are 1.12× to 2.19× higher and restore throughputs that are 1.92× to 10.02× faster due to the improved data layout. While the rearranging stage introduces overheads, it is more than offset by a nearly-zero overhead GC process. Moreover, the NDF index only requires indices for two backup versions, while the traditional index grows with the number of versions retained.
重复数据删除被广泛用于减少备份工作负载的大小,但它有一个众所周知的缺点,即导致数据局部性差,也称为碎片问题。这是由于重复数据删除数据的超维结构与许多存储设备的顺序特性之间存在差距,从而导致较差的恢复和垃圾收集(GC)性能。目前的研究已经考虑过写副本来保持局部性(例如,重写)或在内存或SSD中缓存数据,但是碎片继续降低恢复和GC性能。针对局部性问题,我们设计了一种基于块生命周期分类的方法,将超维结构化重删数据扁平化为一维格式,从而创建了我们提出的数据布局。此外,我们提出了一种新的管理友好型重复数据删除框架,称为MFDedup,它应用我们的数据布局并尽可能地保持局部性。具体来说,我们在MFDedup中使用了两个关键技术:邻居重复焦点索引(NDF)和跨版本感知重组方案(AVAR)。NDF对以前的备份执行重复检测,然后AVAR使用离线迭代算法将块重新排列成紧凑的顺序布局,这几乎消除了重复数据删除后文件恢复期间的随机I/O。5个备份数据集的评估结果表明,与最先进的技术相比,由于改进的数据布局,MFDedup实现的重复数据删除比率提高了1.12到2.19倍,恢复吞吐量的速度提高了1.92到10.02倍。虽然重新安排阶段引入了开销,但几乎为零的开销GC进程绰绰有余。此外,NDF索引只需要两个备份版本的索引,而传统索引则随着保留版本数量的增加而增长。
{"title":"From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s Locality","authors":"Xiangyu Zou, Jingsong Yuan, Philip Shilane, Wen Xia, Haijun Zhang, Xuan Wang","doi":"10.1145/3507921","DOIUrl":"https://doi.org/10.1145/3507921","url":null,"abstract":"Data deduplication is widely used to reduce the size of backup workloads, but it has the known disadvantage of causing poor data locality, also referred to as the fragmentation problem. This results from the gap between the hyper-dimensional structure of deduplicated data and the sequential nature of many storage devices, and this leads to poor restore and garbage collection (GC) performance. Current research has considered writing duplicates to maintain locality (e.g., rewriting) or caching data in memory or SSD, but fragmentation continues to lower restore and GC performance. Investigating the locality issue, we design a method to flatten the hyper-dimensional structured deduplicated data to a one-dimensional format, which is based on classification of each chunk’s lifecycle, and this creates our proposed data layout. Furthermore, we present a novel management-friendly deduplication framework, called MFDedup, that applies our data layout and maintains locality as much as possible. Specifically, we use two key techniques in MFDedup: Neighbor-duplicate-focus indexing (NDF) and Across-version-aware Reorganization scheme (AVAR). NDF performs duplicate detection against a previous backup, then AVAR rearranges chunks with an offline and iterative algorithm into a compact, sequential layout, which nearly eliminates random I/O during file restores after deduplication. Evaluation results with five backup datasets demonstrate that, compared with state-of-the-art techniques, MFDedup achieves deduplication ratios that are 1.12× to 2.19× higher and restore throughputs that are 1.92× to 10.02× faster due to the improved data layout. While the rearranging stage introduces overheads, it is more than offset by a nearly-zero overhead GC process. Moreover, the NDF index only requires indices for two backup versions, while the traditional index grows with the number of versions retained.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125046993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RACE: One-sided RDMA-conscious Extendible Hashing RACE:片面rdma感知的可扩展哈希
Pub Date : 2022-04-28 DOI: 10.1145/3511895
Pengfei Zuo, Qihui Zhou, Jiazhao Sun, Liu Yang, Shuangwu Zhang, Yu Hua, James Cheng, Rongfeng He, Huabing Yan
Memory disaggregation is a promising technique in datacenters with the benefit of improving resource utilization, failure isolation, and elasticity. Hashing indexes have been widely used to provide fast lookup services in distributed memory systems. However, traditional hashing indexes become inefficient for disaggregated memory, since the computing power in the memory pool is too weak to execute complex index requests. To provide efficient indexing services in disaggregated memory scenarios, this article proposes RACE hashing, a one-sided RDMA-Conscious Extendible hashing index with lock-free remote concurrency control and efficient remote resizing. RACE hashing enables all index operations to be efficiently executed by using only one-sided RDMA verbs without involving any compute resource in the memory pool. To support remote concurrent access with high performance, RACE hashing leverages a lock-free remote concurrency control scheme to enable different clients to concurrently operate the same hashing index in the memory pool in a lock-free manner. To resize the hash table with low overheads, RACE hashing leverages an extendible remote resizing scheme to reduce extra RDMA accesses caused by extendible resizing and allow concurrent request execution during resizing. Extensive experimental results demonstrate that RACE hashing outperforms state-of-the-art distributed in-memory hashing indexes by 1.4–13.7× in YCSB hybrid workloads.
内存分解在数据中心是一种很有前途的技术,它具有提高资源利用率、故障隔离和弹性的优点。散列索引已被广泛用于在分布式内存系统中提供快速查找服务。但是,传统的散列索引对于分解的内存来说效率很低,因为内存池中的计算能力太弱,无法执行复杂的索引请求。为了在分解内存场景中提供高效的索引服务,本文提出了RACE散列,这是一种单向rdma感知的可扩展散列索引,具有无锁远程并发控制和高效远程调整大小的功能。RACE散列允许通过仅使用单侧RDMA谓词有效地执行所有索引操作,而不涉及内存池中的任何计算资源。为了支持高性能的远程并发访问,RACE散列利用无锁远程并发控制方案,使不同的客户端能够以无锁的方式并发地操作内存池中相同的散列索引。为了以较低的开销调整哈希表的大小,RACE哈希利用一个可扩展的远程调整大小方案来减少由于可扩展调整大小引起的额外RDMA访问,并允许在调整大小期间并发执行请求。大量的实验结果表明,在YCSB混合工作负载中,RACE散列比最先进的分布式内存散列索引的性能高出1.4 - 13.7倍。
{"title":"RACE: One-sided RDMA-conscious Extendible Hashing","authors":"Pengfei Zuo, Qihui Zhou, Jiazhao Sun, Liu Yang, Shuangwu Zhang, Yu Hua, James Cheng, Rongfeng He, Huabing Yan","doi":"10.1145/3511895","DOIUrl":"https://doi.org/10.1145/3511895","url":null,"abstract":"Memory disaggregation is a promising technique in datacenters with the benefit of improving resource utilization, failure isolation, and elasticity. Hashing indexes have been widely used to provide fast lookup services in distributed memory systems. However, traditional hashing indexes become inefficient for disaggregated memory, since the computing power in the memory pool is too weak to execute complex index requests. To provide efficient indexing services in disaggregated memory scenarios, this article proposes RACE hashing, a one-sided RDMA-Conscious Extendible hashing index with lock-free remote concurrency control and efficient remote resizing. RACE hashing enables all index operations to be efficiently executed by using only one-sided RDMA verbs without involving any compute resource in the memory pool. To support remote concurrent access with high performance, RACE hashing leverages a lock-free remote concurrency control scheme to enable different clients to concurrently operate the same hashing index in the memory pool in a lock-free manner. To resize the hash table with low overheads, RACE hashing leverages an extendible remote resizing scheme to reduce extra RDMA accesses caused by extendible resizing and allow concurrent request execution during resizing. Extensive experimental results demonstrate that RACE hashing outperforms state-of-the-art distributed in-memory hashing indexes by 1.4–13.7× in YCSB hybrid workloads.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117189816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Introduction to the Special Section on USENIX ATC 2021 介绍USENIX ATC 2021的特殊部分
Pub Date : 2022-04-18 DOI: 10.1145/3519550
I. Calciu, G. Kuenning
This special section of the ACM Transactions on Storage presents some highlights from the storagerelated papers published in the USENIX Annual Technical Conference (ATC’21). Although ATC is a broad conference that covers all practical aspects of systems software, a large proportion of its papers have traditionally been related to storage in some way. ATC’21 has continued this trend. Out of 341 submissions, the authors tagged 121 (35%) with one or more topic labels of “Storage,” “File Systems,” or “Databases and Transactions.” The conference accepted 64 papers (19%), of which 21 (33%) were storage related. As conference co-chairs, we selected three storage papers to be highlighted in this special section. All three were expanded and retitled by their authors and re-reviewed in fast-track mode by several of their original ATC’21 reviewers. We summarize them here in no particular order. The first article is “RACE: One-sided RDMA-conscious Extendible Hashing” by Pengfei Zuo, Qihui Zhou, Jiazhao Sun, Liu Yang, Shuangwu Zhang, Yu Hua, James Cheng, Rongfeng He, and Huabing Yan (titled “One-sided RDMA-conscious Extendible Hashing for Disaggregated Memory” in ATC’21). RACE is a client-centric RDMA hash table designed for disaggregated memory running on low-power CPUs. RACE completely bypasses the remote CPU for all key-value store operations and allows the hash table to be resized without impacting the concurrent foreground traffic. The second article, “SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices” (originally “A Fast and Flexible Hardware-based Virtualization Mechanism for Computational Storage Devices”) is by Dongup Kwon, Wonsik Lee, Dongryeong Kim, Junehyuk Boo, and Jangwoo Kim. This article introduces a practical and low-overhead solution to virtualize computational storage devices that uses an FPGA with direct access to an SSD through NVMe. SmartFVM uses hardware-assisted virtualization to remove software-stack overheads while still maintaining isolation, and a hardware-level orchestration mechanism between the FPGA and the SSD. The final article is “Power Optimized Deployment of Key-value Stores Using Storage Class Memory” by Hiwot Tadese Kassa, Jason Akers, Mrinmoy Ghosh, Zhichao Cao, Vaibhav Gogte, and Ronald Dreslinski (previously “Improving Performance of Flash-based Key-value Stores Using Storage Class Memory as a Volatile Memory Extension”). It optimizes RocksDB by introducing a second layer of block cache using storage class memory. The article shows that adding storageclass memory to a smaller, single-socket server results in significant performance improvements for RocksDB in production deployments at Facebook, while improving the cost compared to large two-socket servers with DRAM only.
ACM存储事务的这个特殊部分介绍了USENIX年度技术会议(ATC ' 21)上发表的与存储相关的论文中的一些亮点。尽管ATC是一个广泛的会议,涵盖了系统软件的所有实际方面,但其大部分论文传统上都以某种方式与存储相关。ATC ' 21延续了这一趋势。在341篇投稿中,作者给121篇(35%)贴上了一个或多个主题标签,如“存储”、“文件系统”或“数据库和事务”。本次会议共接收论文64篇(19%),其中仓储相关论文21篇(33%)。作为会议的共同主席,我们选择了三篇存储论文,在这个特别的部分中重点介绍。这三本书都由作者进行了扩展和重命名,并由几位最初的ATC ' 21审稿人以快速通道模式重新审查。我们在这里总结一下,没有特别的顺序。第一篇文章是由左鹏飞、周启辉、孙家钊、杨刘、张双武、华宇、James Cheng、何荣峰和闫华兵撰写的“RACE:片面rdma可扩展哈希”(题为“面向分解内存的片面rdma可扩展哈希”,发表于ATC ' 21)。RACE是一个以客户端为中心的RDMA哈希表,专为运行在低功耗cpu上的分解内存而设计。RACE完全绕过远程CPU进行所有键值存储操作,并允许在不影响并发前台流量的情况下调整哈希表的大小。第二篇文章“SmartFVM:面向商品存储设备的快速、灵活和可扩展的基于硬件的虚拟化”(原“面向计算存储设备的快速、灵活的基于硬件的虚拟化机制”)由Dongup Kwon、Wonsik Lee、donggryeong Kim、Junehyuk Boo和Jangwoo Kim撰写。本文介绍了一种实用且低开销的解决方案,用于虚拟化计算存储设备,该解决方案使用FPGA并通过NVMe直接访问SSD。SmartFVM使用硬件辅助虚拟化来消除软件堆栈开销,同时仍然保持隔离,以及FPGA和SSD之间的硬件级编排机制。最后一篇文章是Hiwot Tadese Kassa、Jason Akers、Mrinmoy Ghosh、Zhichao Cao、Vaibhav Gogte和Ronald Dreslinski撰写的“使用存储类内存作为易挥发性内存扩展提高基于闪存的键值存储性能”。它通过使用存储类内存引入第二层块缓存来优化RocksDB。这篇文章表明,将存储类内存添加到较小的单插槽服务器中,可以显著提高Facebook生产部署中的RocksDB的性能,同时与仅使用DRAM的大型双插槽服务器相比,可以提高成本。
{"title":"Introduction to the Special Section on USENIX ATC 2021","authors":"I. Calciu, G. Kuenning","doi":"10.1145/3519550","DOIUrl":"https://doi.org/10.1145/3519550","url":null,"abstract":"This special section of the ACM Transactions on Storage presents some highlights from the storagerelated papers published in the USENIX Annual Technical Conference (ATC’21). Although ATC is a broad conference that covers all practical aspects of systems software, a large proportion of its papers have traditionally been related to storage in some way. ATC’21 has continued this trend. Out of 341 submissions, the authors tagged 121 (35%) with one or more topic labels of “Storage,” “File Systems,” or “Databases and Transactions.” The conference accepted 64 papers (19%), of which 21 (33%) were storage related. As conference co-chairs, we selected three storage papers to be highlighted in this special section. All three were expanded and retitled by their authors and re-reviewed in fast-track mode by several of their original ATC’21 reviewers. We summarize them here in no particular order. The first article is “RACE: One-sided RDMA-conscious Extendible Hashing” by Pengfei Zuo, Qihui Zhou, Jiazhao Sun, Liu Yang, Shuangwu Zhang, Yu Hua, James Cheng, Rongfeng He, and Huabing Yan (titled “One-sided RDMA-conscious Extendible Hashing for Disaggregated Memory” in ATC’21). RACE is a client-centric RDMA hash table designed for disaggregated memory running on low-power CPUs. RACE completely bypasses the remote CPU for all key-value store operations and allows the hash table to be resized without impacting the concurrent foreground traffic. The second article, “SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices” (originally “A Fast and Flexible Hardware-based Virtualization Mechanism for Computational Storage Devices”) is by Dongup Kwon, Wonsik Lee, Dongryeong Kim, Junehyuk Boo, and Jangwoo Kim. This article introduces a practical and low-overhead solution to virtualize computational storage devices that uses an FPGA with direct access to an SSD through NVMe. SmartFVM uses hardware-assisted virtualization to remove software-stack overheads while still maintaining isolation, and a hardware-level orchestration mechanism between the FPGA and the SSD. The final article is “Power Optimized Deployment of Key-value Stores Using Storage Class Memory” by Hiwot Tadese Kassa, Jason Akers, Mrinmoy Ghosh, Zhichao Cao, Vaibhav Gogte, and Ronald Dreslinski (previously “Improving Performance of Flash-based Key-value Stores Using Storage Class Memory as a Volatile Memory Extension”). It optimizes RocksDB by introducing a second layer of block cache using storage class memory. The article shows that adding storageclass memory to a smaller, single-socket server results in significant performance improvements for RocksDB in production deployments at Facebook, while improving the cost compared to large two-socket servers with DRAM only.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127205963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices SmartFVM:快速、灵活、可扩展的商用存储硬件虚拟化
Pub Date : 2022-04-12 DOI: 10.1145/3511213
Dongup Kwon, Wonsik Lee, Dongryeong Kim, Junehyuk Boo, Jangwoo Kim
A computational storage device incorporating a computation unit inside or near its storage unit is a highly promising technology to maximize a storage server’s performance. However, to apply such computational storage devices and take their full potential in virtualized environments, server architects must resolve a fundamental challenge: cost-effective virtualization. This critical challenge can be directly addressed by the following questions: (1) how to virtualize two different hardware units (i.e., computation and storage), and (2) how to integrate them to construct virtual computational storage devices, and (3) how to provide them to users. However, the existing methods for computational storage virtualization severely suffer from their low performance and high costs due to the lack of hardware-assisted virtualization support. In this work, we propose SmartFVM-Engine, an FPGA card designed to maximize the performance and cost-effectiveness of computational storage virtualization. SmartFVM-Engine introduces three key ideas to achieve the design goals. First, it achieves high virtualization performance by applying hardware-assisted virtualization to both computation and storage units. Second, it further improves the performance by applying hardware-assisted resource orchestration for the virtualized units. Third, it achieves high cost-effectiveness by dynamically constructing and scheduling virtual computational storage devices. To the best of our knowledge, this is the first work to implement a hardware-assisted virtualization mechanism for modern computational storage devices.
将计算单元集成在其存储单元内部或附近的计算存储设备是一种非常有前途的技术,可以最大限度地提高存储服务器的性能。然而,要在虚拟化环境中应用这些计算存储设备并充分发挥其潜力,服务器架构师必须解决一个基本挑战:经济高效的虚拟化。这一关键挑战可以通过以下问题直接解决:(1)如何虚拟化两个不同的硬件单元(即计算和存储),(2)如何集成它们以构建虚拟计算存储设备,以及(3)如何向用户提供它们。然而,由于缺乏硬件辅助的虚拟化支持,现有的计算存储虚拟化方法存在性能低、成本高的问题。在这项工作中,我们提出了SmartFVM-Engine,一种FPGA卡,旨在最大限度地提高计算存储虚拟化的性能和成本效益。SmartFVM-Engine引入了三个关键思想来实现设计目标。首先,它通过对计算单元和存储单元应用硬件辅助虚拟化来实现高虚拟化性能。其次,它通过为虚拟单元应用硬件辅助的资源编排进一步提高了性能。第三,通过动态构建和调度虚拟计算存储设备,实现了高性价比。据我们所知,这是第一个为现代计算存储设备实现硬件辅助虚拟化机制的工作。
{"title":"SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices","authors":"Dongup Kwon, Wonsik Lee, Dongryeong Kim, Junehyuk Boo, Jangwoo Kim","doi":"10.1145/3511213","DOIUrl":"https://doi.org/10.1145/3511213","url":null,"abstract":"A computational storage device incorporating a computation unit inside or near its storage unit is a highly promising technology to maximize a storage server’s performance. However, to apply such computational storage devices and take their full potential in virtualized environments, server architects must resolve a fundamental challenge: cost-effective virtualization. This critical challenge can be directly addressed by the following questions: (1) how to virtualize two different hardware units (i.e., computation and storage), and (2) how to integrate them to construct virtual computational storage devices, and (3) how to provide them to users. However, the existing methods for computational storage virtualization severely suffer from their low performance and high costs due to the lack of hardware-assisted virtualization support. In this work, we propose SmartFVM-Engine, an FPGA card designed to maximize the performance and cost-effectiveness of computational storage virtualization. SmartFVM-Engine introduces three key ideas to achieve the design goals. First, it achieves high virtualization performance by applying hardware-assisted virtualization to both computation and storage units. Second, it further improves the performance by applying hardware-assisted resource orchestration for the virtualized units. Third, it achieves high cost-effectiveness by dynamically constructing and scheduling virtual computational storage devices. To the best of our knowledge, this is the first work to implement a hardware-assisted virtualization mechanism for modern computational storage devices.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ACM Transactions on Storage (TOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1