首页 > 最新文献

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)最新文献

英文 中文
Improving Relational Database Upon the Arrival of Storage Hardware with Built-in Transparent Compression 利用内置透明压缩的存储硬件改进关系数据库
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605481
Yifan Qiao, Xubin Chen, Jingpeng Hao, Jiangpeng Li, Qi Wu, Jingqiang Wang, Yang Liu, Tong Zhang
This paper presents an approach to enable relational database take full advantage of modern storage hardware with built-in transparent compression. Advanced storage appliances (e.g., all-flash array) and some latest SSDs (solid-state drives) can perform hardware-based data compression, transparently from OS and applications. Moreover, the growing deployment of hardware-based compression capability in Cloud storage infrastructure leads to the imminent arrival of cloud-based storage hardware with built-in transparent compression. To make relational database better leverage modern storage hardware, we propose to deploy a dual in-memory vs. on-storage page format: While pages in database cache memory retain the conventional row-based format, each page on storage devices has a column-based format so that it can be better compressed by storage hardware. We present design techniques that can further improve the on-storage page data compressibility through additional light-weight column data transformation. We the impact of compression algorithms on the selection of column data transformation techniques. We integrated the design techniques into MySQL/InnoDB by adding only about 600 lines of code, and ran Sysbench OLTP workloads on a commercial SSD with built-in transparent compression. The results show that the proposed solution can bring up to 45% additional reduction on the storage cost at only a few percentage of performance degradation.
本文提出了一种通过内置透明压缩使关系数据库充分利用现代存储硬件的方法。先进的存储设备(例如,全闪存阵列)和一些最新的ssd(固态硬盘)可以执行基于硬件的数据压缩,透明地从操作系统和应用程序。此外,云存储基础设施中基于硬件的压缩能力的部署不断增长,导致内置透明压缩的基于云存储硬件即将到来。为了使关系数据库更好地利用现代存储硬件,我们建议部署内存内与存储上的双重页面格式:数据库缓存中的页面保留传统的基于行的格式,而存储设备上的每个页面都具有基于列的格式,以便存储硬件可以更好地压缩它。我们提出的设计技术可以通过额外的轻量级列数据转换进一步提高存储页面数据的可压缩性。我们讨论了压缩算法对列数据转换技术选择的影响。我们将设计技术集成到MySQL/InnoDB中,只增加了大约600行代码,并在内置透明压缩的商用SSD上运行Sysbench OLTP工作负载。结果表明,所提出的解决方案可以在只有几个百分比的性能下降的情况下,将存储成本额外降低45%。
{"title":"Improving Relational Database Upon the Arrival of Storage Hardware with Built-in Transparent Compression","authors":"Yifan Qiao, Xubin Chen, Jingpeng Hao, Jiangpeng Li, Qi Wu, Jingqiang Wang, Yang Liu, Tong Zhang","doi":"10.1109/nas51552.2021.9605481","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605481","url":null,"abstract":"This paper presents an approach to enable relational database take full advantage of modern storage hardware with built-in transparent compression. Advanced storage appliances (e.g., all-flash array) and some latest SSDs (solid-state drives) can perform hardware-based data compression, transparently from OS and applications. Moreover, the growing deployment of hardware-based compression capability in Cloud storage infrastructure leads to the imminent arrival of cloud-based storage hardware with built-in transparent compression. To make relational database better leverage modern storage hardware, we propose to deploy a dual in-memory vs. on-storage page format: While pages in database cache memory retain the conventional row-based format, each page on storage devices has a column-based format so that it can be better compressed by storage hardware. We present design techniques that can further improve the on-storage page data compressibility through additional light-weight column data transformation. We the impact of compression algorithms on the selection of column data transformation techniques. We integrated the design techniques into MySQL/InnoDB by adding only about 600 lines of code, and ran Sysbench OLTP workloads on a commercial SSD with built-in transparent compression. The results show that the proposed solution can bring up to 45% additional reduction on the storage cost at only a few percentage of performance degradation.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121945097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Terrestrial and Space-based Cloud Computing with Scalable, Responsible and Explainable Artificial Intelligence - A Position Paper 具有可扩展、负责任和可解释的人工智能的地面和空间云计算-立场文件
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605446
D. Martizzi, P. Ray
Adoption of cloud computing and storage is becoming ubiquitous in multiple industries and human endeavors. The cloud computing market is expected to significantly evolve during the next decade. In particular, in order to enhance security and remote accessibility, several new architectures have been proposed to move a significant part of the cloud stack to satellites in space. These technologies are expected to become more prominent in the coming years. Despite the significant improvements hybrid terrestrial and space-based cloud architectures would bring, the growth in size of both infrastructures and distributed compute and storage tasks poses a significant challenges for organizations interested in deploying their software stack to cloud. In this Position Paper, we provide a series of basic principles to develop a scalable, responsible and explainable Artificial Intelligence platform that will assist experts in enhancing the efficiency of cloud deployments.
云计算和存储的采用在多个行业和人类活动中变得无处不在。云计算市场预计将在未来十年显著发展。特别是,为了增强安全性和远程可访问性,已经提出了几种新的架构,以将云堆栈的很大一部分转移到太空中的卫星上。预计这些技术将在未来几年变得更加突出。尽管地面和天基混合云架构将带来重大改进,但基础设施和分布式计算和存储任务规模的增长,对有兴趣将软件堆栈部署到云上的组织构成了重大挑战。在这份立场文件中,我们提供了一系列基本原则,以开发一个可扩展、负责任和可解释的人工智能平台,帮助专家提高云部署的效率。
{"title":"Terrestrial and Space-based Cloud Computing with Scalable, Responsible and Explainable Artificial Intelligence - A Position Paper","authors":"D. Martizzi, P. Ray","doi":"10.1109/nas51552.2021.9605446","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605446","url":null,"abstract":"Adoption of cloud computing and storage is becoming ubiquitous in multiple industries and human endeavors. The cloud computing market is expected to significantly evolve during the next decade. In particular, in order to enhance security and remote accessibility, several new architectures have been proposed to move a significant part of the cloud stack to satellites in space. These technologies are expected to become more prominent in the coming years. Despite the significant improvements hybrid terrestrial and space-based cloud architectures would bring, the growth in size of both infrastructures and distributed compute and storage tasks poses a significant challenges for organizations interested in deploying their software stack to cloud. In this Position Paper, we provide a series of basic principles to develop a scalable, responsible and explainable Artificial Intelligence platform that will assist experts in enhancing the efficiency of cloud deployments.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130278478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach 从x86到ARM架构的软件迁移:指令预测方法
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605443
B. W. Ford, Apan Qasem, Jelena Tešić, Ziliang Zong
For decades, the x86 architecture supported by Intel and AMD has been the dominate target for software development. Recently, ARM has solidified itself as a highly competitive and promising CPU architecture by exhibiting both high performance and low power consumption simultaneously. In the foreseeable future, a copious amount of software will be fully migrated to the ARM architecture or support both x86 and ARM simultaneously. Nevertheless, software ports from x86 to ARM are not trivial for a number of reasons. First, it is time consuming to write code that resolves all compatibility issues for a new architecture. Second, specific hardware (e.g. ARM chips) and supporting toolkits (e.g. libraries and compilers) may not be readily available for developers, which will delay the porting process. Third, it is hard to predict the performance of software before testing it on production chips. In this paper, we strive to tackle these challenges by proposing an instruction prediction method that can automatically generate AARCH64 code from existing x86-64 executables. Although the generated code might not be directly executable, it provides a cheap and efficient solution for developers to estimate certain runtime metrics before actually building, deploying and testing code on an ARM-based CPU. Our experimental results show that AARCH64 instructions derived using prediction can achieve a high Bilingual Evaluation Understudy (BLEU) Score. This indicates a quality match between generated executables and natively ported AARCH64 software.
几十年来,Intel和AMD支持的x86架构一直是软件开发的主要目标。最近,ARM通过同时展示高性能和低功耗,巩固了自己作为一个极具竞争力和前景的CPU架构的地位。在可预见的未来,大量的软件将完全迁移到ARM架构或同时支持x86和ARM。然而,由于许多原因,从x86到ARM的软件移植并非微不足道。首先,编写解决新体系结构的所有兼容性问题的代码非常耗时。其次,特定的硬件(如ARM芯片)和支持工具包(如库和编译器)可能不容易为开发人员所用,这将延迟移植过程。第三,在生产芯片上测试软件之前,很难预测软件的性能。在本文中,我们通过提出一种指令预测方法来解决这些挑战,该方法可以从现有的x86-64可执行文件中自动生成AARCH64代码。虽然生成的代码可能不能直接执行,但它为开发人员提供了一个廉价而有效的解决方案,以便在基于arm的CPU上实际构建、部署和测试代码之前估计某些运行时指标。实验结果表明,基于预测的AARCH64指令可以获得较高的双语评价替补(BLEU)分数。这表明生成的可执行文件和本机移植的AARCH64软件之间的质量匹配。
{"title":"Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach","authors":"B. W. Ford, Apan Qasem, Jelena Tešić, Ziliang Zong","doi":"10.1109/nas51552.2021.9605443","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605443","url":null,"abstract":"For decades, the x86 architecture supported by Intel and AMD has been the dominate target for software development. Recently, ARM has solidified itself as a highly competitive and promising CPU architecture by exhibiting both high performance and low power consumption simultaneously. In the foreseeable future, a copious amount of software will be fully migrated to the ARM architecture or support both x86 and ARM simultaneously. Nevertheless, software ports from x86 to ARM are not trivial for a number of reasons. First, it is time consuming to write code that resolves all compatibility issues for a new architecture. Second, specific hardware (e.g. ARM chips) and supporting toolkits (e.g. libraries and compilers) may not be readily available for developers, which will delay the porting process. Third, it is hard to predict the performance of software before testing it on production chips. In this paper, we strive to tackle these challenges by proposing an instruction prediction method that can automatically generate AARCH64 code from existing x86-64 executables. Although the generated code might not be directly executable, it provides a cheap and efficient solution for developers to estimate certain runtime metrics before actually building, deploying and testing code on an ARM-based CPU. Our experimental results show that AARCH64 instructions derived using prediction can achieve a high Bilingual Evaluation Understudy (BLEU) Score. This indicates a quality match between generated executables and natively ported AARCH64 software.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129873139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage 云存储快速变粒度相似性重复数据删除
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605398
Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu
With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.
随着云存储的普及,重复数据删除技术通过消除跨用户的重复数据,节省网络带宽而得到广泛应用。然而,传统的重复数据删除很难检测到相似块之间的重复数据。目前,人们提出了一种相似数据重复删除技术,即Finesse,以有效地检测和删除相似数据块之间的重复数据。然而,我们观察到,在相似块之后的块有很高的机会具有相似的数据局部性属性,反之亦然。这些相邻的相似块以较小的平均块大小处理,会增加元数据,从而降低重复数据删除系统的性能。此外,现有的相似性重复数据删除方案忽略了元数据对性能的影响。因此,我们提出了一种快速的云存储可变粒度相似性重复数据删除方法。它动态地将相邻的相似块或唯一块合并或分割,这些块位于相似块和唯一块之间的过渡区域。最后,我们实现了一个原型,并在现实世界的数据集上进行了一系列实验。结果表明,该方法在实现高重复数据删除率的同时,显著减小了元数据的大小。
{"title":"Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage","authors":"Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu","doi":"10.1109/nas51552.2021.9605398","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605398","url":null,"abstract":"With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Incast-Coflow-Aware Minimum-Rate-Guaranteed Congestion Control Protocol for Datacenter Applications 数据中心应用的即时流感知最小速率保证拥塞控制协议
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605478
Zhijun Wang, Yunxiang Wu, Stoddard Rosenkrantz, Ning Li, Minh Nguyen, Hao Che
Today s datacenters need to meet service level objectives (SLOs) for applications, which can be translated into deadlines for (co)flows running between job execution stages. As a result, meeting (co)flow deadlines with high probabilities is essential to attract and retain customers and hence, generate high revenue. To fill the lack of a transport protocol that can facilitate low (co)flow deadline miss rate, especially in the face of incast congestion, in this paper, we propose DCMRG, an incast-coflow-aware, ECN-based soft minimum-rate-guaranteed congestion control protocol for datacenter applications. DCMRG is composed of two major components, i.e., a congestion controller running on the send host and an incast congestion controller running on the receive host. DCMRG possesses three salient features. First, it is the first congestion control protocol that integrates congestion control with coflow-aware incast control while providing soft minimum flow rate guarantee. Second, DCMRG is readily deployable in datacenter networks. It only requires software upgrade in the hosts and minimum assistance (i.e., ECN) from in-network nodes. Third, DCMRG is backward compatible with and, by design, friendly to the widely deployed, standard-based transport protocols, such as DCTCP. The results from large-scale datacenter network simulation demonstrate that in the absence of incast congestion, DCMRG can reduce flow deadline miss rates by 3x and 1.6x compared to D2TCP and MRG, respectively. Moreover, DCMRG further reduces the coflow deadline miss rate by more than 40% and 60% and lowers the packet drop probability by 60% and 80%, in the face of incast congestion, compared to D2TCP with ICTCP and MRG with ICTCP, respectively.
今天的数据中心需要满足应用程序的服务水平目标(slo),这可以转化为在作业执行阶段之间运行的(co)流的最后期限。因此,高概率地满足(co)流截止日期对于吸引和留住客户,从而产生高收入至关重要。为了弥补传输协议的不足,特别是在面对突发拥塞时,我们提出了DCMRG,一种突发拥塞感知、基于ecn的软最小速率保证拥塞控制协议,用于数据中心应用。DCMRG由两个主要组件组成,即运行在发送主机上的拥塞控制器和运行在接收主机上的即时拥塞控制器。DCMRG具有三个显著特征。首先,它是第一个将拥塞控制与流量感知控制相结合,同时提供软最小流量保证的拥塞控制协议。其次,DCMRG很容易部署在数据中心网络中。它只需要主机的软件升级和网络内节点的最小帮助(即ECN)。第三,DCMRG向后兼容广泛部署的、基于标准的传输协议(如DCTCP),并且在设计上对它们友好。大规模数据中心网络仿真结果表明,在没有铸成拥塞的情况下,与D2TCP和MRG相比,DCMRG可以将流截止日期错过率分别降低3倍和1.6倍。此外,与D2TCP与ICTCP、MRG与ICTCP相比,DCMRG在面对突发拥塞时,将共流截止日期缺失率分别降低了40%和60%以上,将丢包概率分别降低了60%和80%。
{"title":"An Incast-Coflow-Aware Minimum-Rate-Guaranteed Congestion Control Protocol for Datacenter Applications","authors":"Zhijun Wang, Yunxiang Wu, Stoddard Rosenkrantz, Ning Li, Minh Nguyen, Hao Che","doi":"10.1109/nas51552.2021.9605478","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605478","url":null,"abstract":"Today s datacenters need to meet service level objectives (SLOs) for applications, which can be translated into deadlines for (co)flows running between job execution stages. As a result, meeting (co)flow deadlines with high probabilities is essential to attract and retain customers and hence, generate high revenue. To fill the lack of a transport protocol that can facilitate low (co)flow deadline miss rate, especially in the face of incast congestion, in this paper, we propose DCMRG, an incast-coflow-aware, ECN-based soft minimum-rate-guaranteed congestion control protocol for datacenter applications. DCMRG is composed of two major components, i.e., a congestion controller running on the send host and an incast congestion controller running on the receive host. DCMRG possesses three salient features. First, it is the first congestion control protocol that integrates congestion control with coflow-aware incast control while providing soft minimum flow rate guarantee. Second, DCMRG is readily deployable in datacenter networks. It only requires software upgrade in the hosts and minimum assistance (i.e., ECN) from in-network nodes. Third, DCMRG is backward compatible with and, by design, friendly to the widely deployed, standard-based transport protocols, such as DCTCP. The results from large-scale datacenter network simulation demonstrate that in the absence of incast congestion, DCMRG can reduce flow deadline miss rates by 3x and 1.6x compared to D2TCP and MRG, respectively. Moreover, DCMRG further reduces the coflow deadline miss rate by more than 40% and 60% and lowers the packet drop probability by 60% and 80%, in the face of incast congestion, compared to D2TCP with ICTCP and MRG with ICTCP, respectively.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cache Compression with Efficient in-SRAM Data Comparison 高效sram内数据比较的缓存压缩
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605440
Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das
We present a novel cache compression method that leverages the fine-grained data duplication across cache lines. We leverage the XOR operation of the in-SRAM bit-line computing peripherals, to search for compressible data over a wide range of data locations on cache, reducing the data movement requirements. To reduce the decompression latency, we design specialized compression schemes by fetching the data with the same parallelism as the original cache, according to the architecture of the last-level cache slice. The proposed compression method achieves a 2.05× compression ratio on average (up to 67×), and 4.73% of speedup on average (up to 29%), over the SPEC2006 benchmarks.
我们提出了一种新的缓存压缩方法,它利用了跨缓存线的细粒度数据复制。我们利用sram内位线计算外设的异或操作,在缓存上的广泛数据位置上搜索可压缩数据,减少数据移动需求。为了减少解压缩延迟,我们设计了专门的压缩方案,根据最后一级缓存片的体系结构,以与原始缓存相同的并行度获取数据。与SPEC2006基准测试相比,所提出的压缩方法实现了平均2.05倍的压缩比(高达67倍)和平均4.73%的加速(高达29%)。
{"title":"Cache Compression with Efficient in-SRAM Data Comparison","authors":"Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das","doi":"10.1109/nas51552.2021.9605440","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605440","url":null,"abstract":"We present a novel cache compression method that leverages the fine-grained data duplication across cache lines. We leverage the XOR operation of the in-SRAM bit-line computing peripherals, to search for compressible data over a wide range of data locations on cache, reducing the data movement requirements. To reduce the decompression latency, we design specialized compression schemes by fetching the data with the same parallelism as the original cache, according to the architecture of the last-level cache slice. The proposed compression method achieves a 2.05× compression ratio on average (up to 67×), and 4.73% of speedup on average (up to 29%), over the SPEC2006 benchmarks.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121592232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Case Study of Migrating RocksDB on Intel Optane Persistent Memory 基于Intel Optane Persistent Memory的RocksDB迁移案例研究
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605438
Ziyi Lu, Q. Cao
The application of product-level persistent memory (PM) presents a great opportunity for key-value stores. However, PM devices differ significantly from traditional block-based storage devices such as HDD and SSD in terms of IO characteristics and approaches. To reveal the adaptability of existing persistent key-value store on PM and to explore the potential optimization space of PM-based key-value stores, we migrate one of the most widely used persistent key-value store, RocksDB, to PM device and evaluated its performance. The results show that the performance of RocksDB is limited by the traditional IO stacks optimized for fast SSDs on PM devices. We then perform further experimental analysis on the IO methods of the two main files, log and SST, in RocksDB. Based on the results, we propose a set of optimized IO configurations for each of the two files. These configurations improve read and write performance of RocksDB by up to 3× and 2×, respectively, over the default configurations on an Intel Optane Persistent Memory.
产品级持久内存(PM)的应用为键值存储提供了一个很好的机会。然而,PM设备在IO特性和方法方面与传统的基于块的存储设备(如HDD和SSD)有很大的不同。为了揭示现有的持久键值存储在PM上的适应性,并探索基于PM的键值存储的潜在优化空间,我们将最广泛使用的持久键值存储之一RocksDB迁移到PM设备上并评估其性能。结果表明,RocksDB的性能受到针对PM设备上的快速ssd优化的传统IO堆栈的限制。然后,我们对RocksDB中log和SST两个主要文件的IO方法进行了进一步的实验分析。基于结果,我们为这两个文件分别提出了一组优化的IO配置。与Intel Optane Persistent Memory的默认配置相比,这些配置分别将RocksDB的读写性能提高了3倍和2倍。
{"title":"A Case Study of Migrating RocksDB on Intel Optane Persistent Memory","authors":"Ziyi Lu, Q. Cao","doi":"10.1109/nas51552.2021.9605438","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605438","url":null,"abstract":"The application of product-level persistent memory (PM) presents a great opportunity for key-value stores. However, PM devices differ significantly from traditional block-based storage devices such as HDD and SSD in terms of IO characteristics and approaches. To reveal the adaptability of existing persistent key-value store on PM and to explore the potential optimization space of PM-based key-value stores, we migrate one of the most widely used persistent key-value store, RocksDB, to PM device and evaluated its performance. The results show that the performance of RocksDB is limited by the traditional IO stacks optimized for fast SSDs on PM devices. We then perform further experimental analysis on the IO methods of the two main files, log and SST, in RocksDB. Based on the results, we propose a set of optimized IO configurations for each of the two files. These configurations improve read and write performance of RocksDB by up to 3× and 2×, respectively, over the default configurations on an Intel Optane Persistent Memory.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Constant Time Garbage Collection in SSDs ssd中的定时垃圾收集
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605386
Reza Salkhordeh, Kevin Kremer, Lars Nagel, Dennis Maisenbacher, Hans Holmberg, Matias Bjørling, A. Brinkmann
The Flash Translation Layer (FTL) plays a crucial role for the performance and lifetime of SSDs. It has been difficult to evaluate different FTL strategies in real SSDs in the past, as the FTL has been deeply embedded into the SSD hardware. Recent host-based FTL architectures like ZNS now enable researchers to implement and evaluate new FTL strategies. In this paper, we evaluate the overhead of various garbage collection strategies using a host-side FTL, and show their performance limitations when scaling the SSD size or the number of outstanding requests. To address these limitations, we propose constant cost-benefit policy, which removes the scalability limitations of previous policies and can be efficiently deployed on host-based architectures. The experimental results show that our proposed policy significantly reduces the CPU overhead while having a comparable write amplification compared to the best previous policies.
闪存转换层(FTL)对ssd的性能和寿命起着至关重要的作用。过去,由于FTL已经深深嵌入到SSD硬件中,因此很难对真实SSD中的不同FTL策略进行评估。最近的基于主机的超光速架构,如ZNS,现在使研究人员能够实施和评估新的超光速策略。在本文中,我们评估了使用主机端FTL的各种垃圾收集策略的开销,并显示了它们在扩展SSD大小或未完成请求数量时的性能限制。为了解决这些限制,我们提出了恒定的成本效益策略,该策略消除了以前策略的可伸缩性限制,并且可以有效地部署在基于主机的架构上。实验结果表明,与以前的最佳策略相比,我们提出的策略显著降低了CPU开销,同时具有相当的写放大。
{"title":"Constant Time Garbage Collection in SSDs","authors":"Reza Salkhordeh, Kevin Kremer, Lars Nagel, Dennis Maisenbacher, Hans Holmberg, Matias Bjørling, A. Brinkmann","doi":"10.1109/nas51552.2021.9605386","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605386","url":null,"abstract":"The Flash Translation Layer (FTL) plays a crucial role for the performance and lifetime of SSDs. It has been difficult to evaluate different FTL strategies in real SSDs in the past, as the FTL has been deeply embedded into the SSD hardware. Recent host-based FTL architectures like ZNS now enable researchers to implement and evaluate new FTL strategies. In this paper, we evaluate the overhead of various garbage collection strategies using a host-side FTL, and show their performance limitations when scaling the SSD size or the number of outstanding requests. To address these limitations, we propose constant cost-benefit policy, which removes the scalability limitations of previous policies and can be efficiently deployed on host-based architectures. The experimental results show that our proposed policy significantly reduces the CPU overhead while having a comparable write amplification compared to the best previous policies.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115465433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards a Proactive Lightweight Serverless Edge Cloud for Internet-of-Things Applications 面向物联网应用的主动轻量级无服务器边缘云
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605384
Ian Wang, Shixiong Qi, Elizabeth Liri, K. Ramakrishnan
Edge cloud solutions that bring the cloud closer to the sensors can be very useful to meet the low latency requirements of many Internet-of-Things (IoT) applications. However, IoT traffic can also be intermittent, so running applications constantly can be wasteful. Therefore, having a serverless edge cloud that is responsive and provides low-latency features is a very attractive option for a resource and cost-efficient IoT application environment.In this paper, we discuss the key components needed to support IoT traffic in the serverless edge cloud and identify the critical challenges that make it difficult to directly use existing serverless solutions such as Knative, for IoT applications. These include overhead from heavyweight components for managing the overall system and software adaptors for communication protocol translation used in off-the-shelf serverless platforms that are designed for large-scale centralized clouds. The latency imposed by ‘cold start’ is a further deterrent.To address these challenges we redesign several components of the Knative serverless framework. We use a streamlined protocol adaptor to leverage the MQTT IoT protocol in our serverless framework for IoT event processing. We also create a novel, event-driven proxy based on the extended Berkeley Packet Filter (eBPF), to replace the regular heavyweight Knative queue proxy. Our preliminary experimental results show that the event-driven proxy is a suitable replacement for the queue proxy in an IoT serverless environment and results in lower CPU usage and a higher request throughput.
边缘云解决方案使云更接近传感器,对于满足许多物联网(IoT)应用程序的低延迟要求非常有用。然而,物联网流量也可能是间歇性的,因此不断运行应用程序可能会造成浪费。因此,对于资源和经济高效的物联网应用环境来说,拥有响应迅速并提供低延迟特性的无服务器边缘云是一个非常有吸引力的选择。在本文中,我们讨论了在无服务器边缘云中支持物联网流量所需的关键组件,并确定了难以直接使用现有无服务器解决方案(如Knative)进行物联网应用的关键挑战。其中包括用于管理整个系统的重量级组件的开销,以及用于通信协议转换的软件适配器,这些适配器用于为大规模集中式云设计的现成无服务器平台。“冷启动”带来的延迟是进一步的阻碍。为了应对这些挑战,我们重新设计了Knative无服务器框架的几个组件。我们使用一个简化的协议适配器在我们的无服务器框架中利用MQTT物联网协议进行物联网事件处理。我们还基于扩展的Berkeley Packet Filter (eBPF)创建了一个新颖的事件驱动代理,以取代常规的重量级native队列代理。我们的初步实验结果表明,在物联网无服务器环境中,事件驱动代理是队列代理的合适替代品,并且可以降低CPU使用率和提高请求吞吐量。
{"title":"Towards a Proactive Lightweight Serverless Edge Cloud for Internet-of-Things Applications","authors":"Ian Wang, Shixiong Qi, Elizabeth Liri, K. Ramakrishnan","doi":"10.1109/nas51552.2021.9605384","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605384","url":null,"abstract":"Edge cloud solutions that bring the cloud closer to the sensors can be very useful to meet the low latency requirements of many Internet-of-Things (IoT) applications. However, IoT traffic can also be intermittent, so running applications constantly can be wasteful. Therefore, having a serverless edge cloud that is responsive and provides low-latency features is a very attractive option for a resource and cost-efficient IoT application environment.In this paper, we discuss the key components needed to support IoT traffic in the serverless edge cloud and identify the critical challenges that make it difficult to directly use existing serverless solutions such as Knative, for IoT applications. These include overhead from heavyweight components for managing the overall system and software adaptors for communication protocol translation used in off-the-shelf serverless platforms that are designed for large-scale centralized clouds. The latency imposed by ‘cold start’ is a further deterrent.To address these challenges we redesign several components of the Knative serverless framework. We use a streamlined protocol adaptor to leverage the MQTT IoT protocol in our serverless framework for IoT event processing. We also create a novel, event-driven proxy based on the extended Berkeley Packet Filter (eBPF), to replace the regular heavyweight Knative queue proxy. Our preliminary experimental results show that the event-driven proxy is a suitable replacement for the queue proxy in an IoT serverless environment and results in lower CPU usage and a higher request throughput.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127998209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Vulkan vs OpenGL ES: Performance and Energy Efficiency Comparison on the big.LITTLE Architecture Vulkan vs OpenGL ES:性能和能源效率的比较。小建筑
Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605447
Michael Lujan, Michael McCrary, B. W. Ford, Ziliang Zong
Mobile apps such as games and virtual reality(VR) are getting increasingly popular but they drain battery quickly due to the heavy graphics rending process. Currently, Open Graphics Library for Embedded Systems (OpenGL ES) is the dominating API for rendering advanced graphics on embedded and mobile systems. Despite the attracting usability of OpenGL ES, the lacking support of multi-threading limits its performance and power efficiency on modern multicore mobile chips, especially when the big.LITTLE architecture has become the de facto industry standard of mobile phones. Vulkan was recently proposed to address the weaknesses of OpenGL but its performance and energy efficiency on the big.LITTLE architecture has not been fully explored yet. This paper conducts a comprehensive study to compare the performance and energy efficiency of Vulkan versus OpenGL ES on an ARM processor with both high performance cores (i.e. big cores) and low power cores (i.e. LITTLE cores). Our experimental results show that 1) Vulkan can save up to 24% of energy by leveraging multi-threading and parallel execution on LITTLE cores for heavy workloads; and 2) Vulkan can render at a much higher frame rate when OpenGL ES has reached its full capability. Meanwhile, writing efficient Vulkan code is not trivial and the performance/energy gains are negligible for light workloads. The clear tradeoff between optimizing verbose Vulkan code manually and potential performance or energy efficiency benefits should be carefully considered.
游戏和虚拟现实(VR)等移动应用程序越来越受欢迎,但由于繁重的图形渲染过程,它们很快就会耗尽电池。目前,用于嵌入式系统的开放图形库(OpenGL ES)是在嵌入式和移动系统上渲染高级图形的主要API。尽管OpenGL ES具有吸引人的可用性,但缺乏对多线程的支持限制了它在现代多核移动芯片上的性能和功率效率,特别是在大型移动设备上。LITTLE架构已经成为移动电话事实上的行业标准。Vulkan最近被提议用来解决OpenGL的弱点,但其性能和能源效率在很大程度上。LITTLE架构尚未被充分探索。本文对Vulkan和OpenGL ES在ARM处理器上的性能和能源效率进行了全面的研究,其中包括高性能内核(即大内核)和低功耗内核(即小内核)。我们的实验结果表明:1)Vulkan可以通过在LITTLE内核上利用多线程和并行执行来节省高达24%的能量;2)当OpenGL ES达到其全部能力时,Vulkan可以以更高的帧率渲染。同时,编写高效的Vulkan代码并不是微不足道的,对于轻量级工作负载来说,性能/能量增益可以忽略不计。手动优化冗长的Vulkan代码与潜在的性能或能效优势之间的权衡应该仔细考虑。
{"title":"Vulkan vs OpenGL ES: Performance and Energy Efficiency Comparison on the big.LITTLE Architecture","authors":"Michael Lujan, Michael McCrary, B. W. Ford, Ziliang Zong","doi":"10.1109/nas51552.2021.9605447","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605447","url":null,"abstract":"Mobile apps such as games and virtual reality(VR) are getting increasingly popular but they drain battery quickly due to the heavy graphics rending process. Currently, Open Graphics Library for Embedded Systems (OpenGL ES) is the dominating API for rendering advanced graphics on embedded and mobile systems. Despite the attracting usability of OpenGL ES, the lacking support of multi-threading limits its performance and power efficiency on modern multicore mobile chips, especially when the big.LITTLE architecture has become the de facto industry standard of mobile phones. Vulkan was recently proposed to address the weaknesses of OpenGL but its performance and energy efficiency on the big.LITTLE architecture has not been fully explored yet. This paper conducts a comprehensive study to compare the performance and energy efficiency of Vulkan versus OpenGL ES on an ARM processor with both high performance cores (i.e. big cores) and low power cores (i.e. LITTLE cores). Our experimental results show that 1) Vulkan can save up to 24% of energy by leveraging multi-threading and parallel execution on LITTLE cores for heavy workloads; and 2) Vulkan can render at a much higher frame rate when OpenGL ES has reached its full capability. Meanwhile, writing efficient Vulkan code is not trivial and the performance/energy gains are negligible for light workloads. The clear tradeoff between optimizing verbose Vulkan code manually and potential performance or energy efficiency benefits should be carefully considered.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131585735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 IEEE International Conference on Networking, Architecture and Storage (NAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1