2021 IEEE International Conference on Networking, Architecture and Storage (NAS)最新文献

英文中文

PowerPrep: A power management proposal for user-facing datacenter workloads PowerPrep:面向用户的数据中心工作负载的电源管理建议

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605364

V. Govindaraj, Sumitha George, M. Kandemir, J. Sampson, N. Vijaykrishnan

Modern data center applications are user facing/latency critical. Our work analyzes the characteristics of such applications i.e., high idleness, unpredictable CPU usage, and high sensitivity to CPU performance. In spite of such execution characteristics, datacenter operators disable sleep states to optimize performance. Deep-sleep states hurt performance mainly due to: a) high wake-latency and b) cache warm-up after exiting deep-sleep. To address these challenges, we quantify three necessary characteristics required to realize deep-sleep states in datacenter applications: a) low wake-latency, b) low resident power, and c) selective retention of cache-state. Using these observations, we show how emerging technological advances can be leveraged to improve the energy efficiency of latency-critical datacenter workloads.

现代数据中心应用程序是面向用户/延迟关键的。我们的工作分析了这类应用程序的特点，即高空闲、不可预测的CPU使用率和对CPU性能的高灵敏度。尽管有这样的执行特性，数据中心操作员还是禁用休眠状态来优化性能。深度睡眠状态对性能的影响主要是由于:a)高唤醒延迟和b)退出深度睡眠后的缓存预热。为了应对这些挑战，我们量化了在数据中心应用中实现深度睡眠状态所需的三个必要特征:a)低唤醒延迟，b)低驻留功率，以及c)缓存状态的选择性保留。通过这些观察，我们将展示如何利用新兴技术进步来提高延迟关键型数据中心工作负载的能源效率。

引用次数: 0

Message from Program Chairs 节目主持人的信息

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605416

引用次数: 0

MLPP: Exploring Transfer Learning and Model Distillation for Predicting Application Performance MLPP:应用程序性能预测的迁移学习和模型蒸馏探索

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605431

J. Gunasekaran, Cyan Subhra Mishra

Performance prediction for applications is quintessential towards detecting malicious hardware and software vulnerabilities. Typically application performance is predicted using the profiling data generated from hardware tools such as linux perf. By leveraging the data, prediction models, both machine learning (ML) based and non ML-based have been proposed. However a majority of these models suffer from either loss in prediction accuracy, very large model sizes, and/or lack of general applicability to different hardware types such as wearables, handhelds, desktops etc. To address the aforementioned inefficiencies, in this paper we proposed MLPP, a machine learning based performance prediction model which can accurately predict application performance, and at the same time be easily transferable to a wide both mobile and desktop hardware platforms by leveraging transfer learning technique. Furthermore, MLPP incorporates model distillation techniques to significantly reduce the model size. Through our extensive experimentation and evaluation we show that MLPP can achieve up to 92.5% prediction accuracy while reducing the model size by up to 3.5 ×.

应用程序的性能预测是检测恶意硬件和软件漏洞的关键。通常使用硬件工具(如linux perf)生成的分析数据来预测应用程序性能。通过利用这些数据，提出了基于机器学习和非机器学习的预测模型。然而，这些模型中的大多数要么在预测精度上存在损失，要么模型尺寸非常大，要么缺乏对不同硬件类型(如可穿戴设备、手持设备、台式电脑等)的普遍适用性。为了解决上述低效率问题，本文提出了基于机器学习的性能预测模型MLPP，该模型可以准确预测应用程序的性能，同时利用迁移学习技术可以很容易地转移到广泛的移动和桌面硬件平台。此外，MLPP结合了模型蒸馏技术，显著减小了模型尺寸。通过我们广泛的实验和评估，我们表明MLPP可以达到高达92.5%的预测精度，同时将模型大小减少高达3.5倍。

{"title":"MLPP: Exploring Transfer Learning and Model Distillation for Predicting Application Performance","authors":"J. Gunasekaran, Cyan Subhra Mishra","doi":"10.1109/nas51552.2021.9605431","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605431","url":null,"abstract":"Performance prediction for applications is quintessential towards detecting malicious hardware and software vulnerabilities. Typically application performance is predicted using the profiling data generated from hardware tools such as linux perf. By leveraging the data, prediction models, both machine learning (ML) based and non ML-based have been proposed. However a majority of these models suffer from either loss in prediction accuracy, very large model sizes, and/or lack of general applicability to different hardware types such as wearables, handhelds, desktops etc. To address the aforementioned inefficiencies, in this paper we proposed MLPP, a machine learning based performance prediction model which can accurately predict application performance, and at the same time be easily transferable to a wide both mobile and desktop hardware platforms by leveraging transfer learning technique. Furthermore, MLPP incorporates model distillation techniques to significantly reduce the model size. Through our extensive experimentation and evaluation we show that MLPP can achieve up to 92.5% prediction accuracy while reducing the model size by up to 3.5 ×.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125482240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Network Delay Variability to Improve QoE of Latency Critical Services 利用网络延迟可变性提高延迟关键服务的QoE

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605367

S. Shukla, M. Farrens

Even as cloud providers offer strict guarantees on the intra-cloud delay of requests for Latency-Critical (LC) Services, a high external network delay can result in a large end-to-end delay, causing a low user Quality of Experience (QoE). Furthermore, due to the variability in the external network delay, there is a disconnect between the user’s QoE and the cloud guaranteed service level objective (SLO). Specifically, a request that meets the SLO, can have a high or low QoE depending on the external network delay. In this work we propose a usercentric End-to-end Service Level Objective (ESLO), an extension of the traditional cloud-centric SLO, that guarantees stricter bounds on end-to-end delay and thereby achieving a higher QoE. We show how the variability in the external network delay can be both addressed and leveraged to meet the ESLO and improve server utilization. We propose ESLO-aware extensions to the Kubernetes infrastructure, that uses information about the external network delay and its distribution - (a) to reduce the number of QoE-violating responses by using deadline-based scheduling at the service instances, and (b) to appropriately scale service instances with load. We implement the ESLO-aware framework on the NSF Chameleon cloud testbed and present experimental results demonstrating the benefit of the proposed paradigm.

即使云提供商对延迟关键型(LC)服务请求的云内延迟提供严格保证，高外部网络延迟也可能导致大的端到端延迟，从而导致低用户体验质量(QoE)。此外，由于外部网络延迟的可变性，用户的QoE和云保证的服务水平目标(SLO)之间存在脱节。具体来说，满足SLO的请求可以具有高或低的QoE，这取决于外部网络延迟。在这项工作中，我们提出了一个以用户为中心的端到端服务水平目标(ESLO)，这是传统的以云为中心的SLO的扩展，它保证了端到端延迟的更严格限制，从而实现了更高的QoE。我们将展示如何解决和利用外部网络延迟中的可变性，以满足ESLO并提高服务器利用率。我们建议对Kubernetes基础设施进行eslo感知扩展，该扩展使用有关外部网络延迟及其分布的信息- (a)通过在服务实例上使用基于截止日期的调度来减少违反qos的响应的数量，以及(b)根据负载适当地扩展服务实例。我们在NSF变色龙云测试平台上实现了eslo感知框架，并给出了实验结果，证明了所提出范式的优势。

{"title":"Leveraging Network Delay Variability to Improve QoE of Latency Critical Services","authors":"S. Shukla, M. Farrens","doi":"10.1109/nas51552.2021.9605367","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605367","url":null,"abstract":"Even as cloud providers offer strict guarantees on the intra-cloud delay of requests for Latency-Critical (LC) Services, a high external network delay can result in a large end-to-end delay, causing a low user Quality of Experience (QoE). Furthermore, due to the variability in the external network delay, there is a disconnect between the user’s QoE and the cloud guaranteed service level objective (SLO). Specifically, a request that meets the SLO, can have a high or low QoE depending on the external network delay. In this work we propose a usercentric End-to-end Service Level Objective (ESLO), an extension of the traditional cloud-centric SLO, that guarantees stricter bounds on end-to-end delay and thereby achieving a higher QoE. We show how the variability in the external network delay can be both addressed and leveraged to meet the ESLO and improve server utilization. We propose ESLO-aware extensions to the Kubernetes infrastructure, that uses information about the external network delay and its distribution - (a) to reduce the number of QoE-violating responses by using deadline-based scheduling at the service instances, and (b) to appropriately scale service instances with load. We implement the ESLO-aware framework on the NSF Chameleon cloud testbed and present experimental results demonstrating the benefit of the proposed paradigm.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131344750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Comprehensive Empirical Study of File Systems on Optane Persistent Memory 文件系统对Optane Persistent Memory的综合实证研究

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605448

Yang Yang, Qian Cao, Shengjue Wang

Emerging byte-addressable Non-volatile memories (NVM) are promising techniques as memory-like storage. Researchers have developed many NVM-aware file systems to exploit the benefits of NVM. However, many early file systems are usually evaluated based on DRAM-based simulations or emulations. Their experimental results cannot present the actual behaviors upon real NVM devices, since the devices do not perform like slow DRAMs as expected. In this paper, we provide a comprehensive empirical study of NVM-aware file systems on the first commercially available byte-addressable NVM (i.e., the Intel Optane DC Persistent Memory Module (PMM)). We evaluate and analyze the performance of the kernel-level file systems (XFS-DAX, Ext4-DAX, PMFS, and NOVA) and the user-space file systems (Strata and Libnvmmio) on PMM with various synthetic and real-world benchmarks (FIO, Filebench, FXmark, Redis, etc.). We also employ different file system configurations and different PMM configurations to evaluate their performance impact. We believe that the experimental results and performance analysis will provide implications for the developers of various applications and storage systems to reap the full characteristics of NVMs.

新兴的字节可寻址非易失性存储器(NVM)是一种很有前途的类内存存储技术。研究人员已经开发了许多支持NVM的文件系统来利用NVM的优势。然而，许多早期的文件系统通常是基于基于dram的模拟或仿真来评估的。他们的实验结果不能在真正的NVM设备上呈现出实际的行为，因为这些设备并不像预期的那样表现得像慢速dram。在本文中，我们在第一个商用字节可寻址的NVM(即Intel Optane DC Persistent Memory Module (PMM))上对NVM感知文件系统进行了全面的实证研究。我们评估和分析了内核级文件系统(XFS-DAX, Ext4-DAX, PMFS和NOVA)和用户空间文件系统(Strata和libvmmio)在PMM上的性能，并使用各种合成和实际基准(FIO, Filebench, FXmark, Redis等)。我们还使用不同的文件系统配置和不同的PMM配置来评估它们对性能的影响。我们相信实验结果和性能分析将为各种应用程序和存储系统的开发人员提供启示，以获得nvm的全部特性。

{"title":"A Comprehensive Empirical Study of File Systems on Optane Persistent Memory","authors":"Yang Yang, Qian Cao, Shengjue Wang","doi":"10.1109/nas51552.2021.9605448","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605448","url":null,"abstract":"Emerging byte-addressable Non-volatile memories (NVM) are promising techniques as memory-like storage. Researchers have developed many NVM-aware file systems to exploit the benefits of NVM. However, many early file systems are usually evaluated based on DRAM-based simulations or emulations. Their experimental results cannot present the actual behaviors upon real NVM devices, since the devices do not perform like slow DRAMs as expected. In this paper, we provide a comprehensive empirical study of NVM-aware file systems on the first commercially available byte-addressable NVM (i.e., the Intel Optane DC Persistent Memory Module (PMM)). We evaluate and analyze the performance of the kernel-level file systems (XFS-DAX, Ext4-DAX, PMFS, and NOVA) and the user-space file systems (Strata and Libnvmmio) on PMM with various synthetic and real-world benchmarks (FIO, Filebench, FXmark, Redis, etc.). We also employ different file system configurations and different PMM configurations to evaluate their performance impact. We believe that the experimental results and performance analysis will provide implications for the developers of various applications and storage systems to reap the full characteristics of NVMs.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124153555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Cached Mapping Table Prefetching for Random Reads in Solid-State Drives 固态硬盘随机读取的缓存映射表预取

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605397

X. Ruan, Xunfei Jiang, Haiquan Chen

Data caching strategies and Garbage Collection on SSDs have been extensively explored in the past years. However, the Mapping Table cache performance has not been well studied. Mapping table provides page translation information to Flash Translation Layer (FTL) in order to translate Logical Page Address (LPA) to Physical Page Address (PPA). Missing in mapping table cache causes extra read transactions to flash storage which results in stalls of I/O requests processing in SSDs. Random read requests are affected more than random write requests since write requests can be handled by write cache effectively. In this paper, we analyze the impact of CMT on different random read requests and present a Cached Mapping Table prefetching approach which fetches logical-to-physical page translation information in order to mitigate the stalls in processing random read requests. Our experimental results show an improvement of average request waiting time by up to 13%.

在过去的几年中，ssd上的数据缓存策略和垃圾收集得到了广泛的研究。然而，映射表的缓存性能还没有得到很好的研究。映射表为FTL (Flash translation Layer)提供页面转换信息，用于将LPA (Logical page Address)转换为PPA (Physical page Address)。缺少映射表缓存会导致额外的读事务到闪存，从而导致ssd中I/O请求处理的停顿。随机读请求比随机写请求受到的影响更大，因为写请求可以通过写缓存有效地处理。在本文中，我们分析了CMT对不同随机读请求的影响，并提出了一种缓存映射表预取方法，该方法可以获取逻辑到物理的页面转换信息，以减轻处理随机读请求时的延迟。我们的实验结果表明，平均请求等待时间提高了13%。

引用次数: 1

Towards Energy-Efficient and Real-Time Cloud Computing 迈向节能和实时云计算

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605453

T. Tekreeti, T. Cao, Xiaopu Peng, T. Bhattacharya, Jianzhou Mao, X. Qin, Wei-Shinn Ku

In modern cloud computing environments, there is a tremendous growth of data to be stored and managed in data centers. Large-scale data centers demand high utilization of computing and storage resources, which lead to expensive operational cost for energy usage. Evidence shows that consolidating virtual machines (VMs) can conserve energy consumption in clouds through VM migrations. VM-consolidation techniques, however, inevitably induce a burden on performance. To address this issue, we propose a holistic solution - EGRET - to boost energy efficiency of cloud computing platforms by seamlessly integrating the DVFS scheme with the VM-consolidation technique. EGRET dynamically determines the most energy-efficient strategy by issuing a command to either scale CPU frequencies on a VM or marking the VM as underutilized. We conduct extensive experiments to evaluate the performance of EGRET. The experimental results show that EGRET substantially improves the energy efficiency of cloud computing platforms.

在现代云计算环境中，需要在数据中心存储和管理的数据有了巨大的增长。大型数据中心对计算和存储资源的利用率要求很高，能源使用的运营成本也很高。有证据表明，通过虚拟机迁移，整合虚拟机可以节省云环境中的能源消耗。然而，虚拟机整合技术不可避免地会给性能带来负担。为了解决这个问题，我们提出了一个整体的解决方案——EGRET——通过无缝集成DVFS方案和虚拟机整合技术来提高云计算平台的能源效率。EGRET通过发出命令来调整VM上的CPU频率或将VM标记为未充分利用，从而动态地确定最节能的策略。我们进行了大量的实验来评估EGRET的性能。实验结果表明，EGRET极大地提高了云计算平台的能源效率。

引用次数: 0

Efficient NVM Crash Consistency by Mitigating Resource Contention 通过减少资源争用实现高效的NVM崩溃一致性

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605429

Zhiyuan Lu, Jianhui Yue, Yifu Deng, Yifeng Zhu

Logging is widely adopted to ensure crash consistency for Non-Volatile Memory (NVM) systems. However, the logging imposes significant performance overhead caused by the extra log operations and ordering constraints between the logging and in-place updates, degrading the system performance. There are some research efforts to reduce the logging overhead. Recently, LAD proposed that exploiting the non-volatility of Asynchronous DRAM Refresh (ADR) buffer can remove log operations for a transaction whose total amount of updated cachelines is smaller than the buffer capacity, ensuring crash consistency. However, on multi-core systems, concurrent transactions contend the scarce ADR buffer and frequently lead to the buffer overflow. Upon the buffer overflow, LAD resorts to logging operations for in-flight transactions, degrading the system performance. Our experiments show that LAD produces a significant number of log operations when multiple transactions run concurrently. To decrease log operations caused by LAD, this paper presents a new transaction execution scheme, called two-stage transaction execution(TSTE), which allows the write requests of a transaction to be in both the ADR buffer and the staging SRAM buffer. Our new scheme performs log operations for a transaction’s write requests in the SRAM buffer and executes in-place update operations for this transaction’s write requests in the ADR buffer. The introduced SRAM buffer can make the ADR buffer serve more update requests, reducing log operations.The evaluation results demonstrate that our proposed schemes can efficiently reduce log operations up to 39.29% and improve the transaction throughput up to 28.22%

日志记录被广泛用于确保非易失性内存(NVM)系统的崩溃一致性。但是，由于额外的日志操作以及日志记录和就地更新之间的排序约束，日志记录会带来显著的性能开销，从而降低系统性能。有一些研究工作可以减少日志开销。最近，LAD提出利用异步DRAM刷新(ADR)缓冲区的非易失性，可以删除更新缓存总数小于缓冲区容量的事务的日志操作，从而确保崩溃一致性。然而，在多核系统中，并发事务占用了有限的ADR缓冲区，经常导致缓冲区溢出。在缓冲区溢出时，LAD会对正在运行的事务进行日志操作，从而降低系统性能。我们的实验表明，当多个事务并发运行时，LAD会产生大量的日志操作。为了减少由LAD引起的日志操作，本文提出了一种新的事务执行方案，称为两阶段事务执行(TSTE)，该方案允许事务的写请求同时位于ADR缓冲区和分段SRAM缓冲区中。我们的新方案对SRAM缓冲区中的事务写请求执行日志操作，并对ADR缓冲区中的事务写请求执行就地更新操作。引入的SRAM缓冲区可以使ADR缓冲区处理更多的更新请求，减少日志操作。评估结果表明，我们提出的方案可以有效地减少日志操作达39.29%，提高事务吞吐量达28.22%

{"title":"Efficient NVM Crash Consistency by Mitigating Resource Contention","authors":"Zhiyuan Lu, Jianhui Yue, Yifu Deng, Yifeng Zhu","doi":"10.1109/nas51552.2021.9605429","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605429","url":null,"abstract":"Logging is widely adopted to ensure crash consistency for Non-Volatile Memory (NVM) systems. However, the logging imposes significant performance overhead caused by the extra log operations and ordering constraints between the logging and in-place updates, degrading the system performance. There are some research efforts to reduce the logging overhead. Recently, LAD proposed that exploiting the non-volatility of Asynchronous DRAM Refresh (ADR) buffer can remove log operations for a transaction whose total amount of updated cachelines is smaller than the buffer capacity, ensuring crash consistency. However, on multi-core systems, concurrent transactions contend the scarce ADR buffer and frequently lead to the buffer overflow. Upon the buffer overflow, LAD resorts to logging operations for in-flight transactions, degrading the system performance. Our experiments show that LAD produces a significant number of log operations when multiple transactions run concurrently. To decrease log operations caused by LAD, this paper presents a new transaction execution scheme, called two-stage transaction execution(TSTE), which allows the write requests of a transaction to be in both the ADR buffer and the staging SRAM buffer. Our new scheme performs log operations for a transaction’s write requests in the SRAM buffer and executes in-place update operations for this transaction’s write requests in the ADR buffer. The introduced SRAM buffer can make the ADR buffer serve more update requests, reducing log operations.The evaluation results demonstrate that our proposed schemes can efficiently reduce log operations up to 39.29% and improve the transaction throughput up to 28.22%","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"39 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115021274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Egress Engineering over BGP Label Unicast in MPLS-based Networks 基于mpls网络的BGP标签单播出口工程

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605412

Sundaram Tirunelveli Radhakrishnan, S. Mohanty

Egress Peer Engineering (EPE) is a current technology that is commonly being used to steer egress traffic from an Autonomous System (AS) to external peers via a pre-determined set of links. The steering is generally achieved through a controller that uses Border Gateway Protocol Link-State (BGP-LS) and Segment Routing traffic engineering (SRTE) policies to program the forwarding at the ingress router with a chosen set of MPLS labels that help determine the egress links at the Autonomous System Boundary Routers (ASBRs). An alternate solution is to use BGP Labeled-Unicast (BGP-LU) to distribute the Egress Engineering Labels. We highlight two key limitations in the proposed BGP-LU use-case and provide a solution that mitigates these problem. Our proposed solution is compatible with legacy routers that are currently deployed in production.

出口对等体工程(EPE)是一种常用的技术，用于通过一组预先确定的链路将出口流量从自治系统(AS)引导到外部对等体。这种控制通常是通过一个控制器来实现的，控制器使用边界网关协议链路状态(BGP-LS)和段路由流量工程(SRTE)策略，用一组选定的MPLS标签来编程入接口路由器的转发，这些标签有助于确定自治系统边界路由器(asbr)的出接口链路。另一种解决方案是使用BGP- lu (label - unicast)来分发出口工程标签。我们强调了建议的BGP-LU用例中的两个关键限制，并提供了减轻这些问题的解决方案。我们提出的解决方案与目前部署在生产中的传统路由器兼容。

引用次数: 0

GPU-Assisted Memory Expansion gpu辅助内存扩展

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605372

Pisacha Srinuan, Purushottam Sigdel, Xu Yuan, Lu Peng, Paul Darby, Christopher Aucoin, N. Tzeng

Recent graphic processing units (GPUs) often come with large on-board physical memory to accelerate diverse parallel program executions on big datasets with regular access patterns, including machine learning (ML) and data mining (DM). Such a GPU may underutilize its physical memory during lengthy ML model training or DM, making it possible to lend otherwise unused GPU memory to applications executed concurrently on the host machine. This work explores an effective approach that lets memory-intensive applications run on the host machine CPU with its memory expanded dynamically onto available GPU on-board DRAM, called GPU-assisted memory expansion (GAME). Targeting computer systems equipped with the recent GPUs, our GAME approach permits speedy executions on CPU with large memory footprints by harvesting unused GPU on-board memory on-demand for swapping, far surpassing competitive GPU executions. Implemented in user space, our GAME prototype lets GPU memory house swapped-out memory pages transparently, without code modifications for high usability and portability. The evaluation of NAS-NPB benchmark applications demonstrates that GAME expedites monotasking (or multitasking) executions considerably by up to 2.1× (or 3.1×), when memory footprints exceed the CPU DRAM size and an equipped GPU has unused VDRAM available for swapping use.

最近的图形处理单元(gpu)通常带有大型板载物理内存，以加速具有常规访问模式的大数据集上的各种并行程序执行，包括机器学习(ML)和数据挖掘(DM)。这样的GPU可能在冗长的ML模型训练或DM期间未充分利用其物理内存，从而有可能将未使用的GPU内存借给主机上并发执行的应用程序。这项工作探索了一种有效的方法，让内存密集型应用程序在主机CPU上运行，其内存动态扩展到可用的GPU板载DRAM上，称为GPU辅助内存扩展(GAME)。针对配备最新GPU的计算机系统，我们的GAME方法通过收集未使用的GPU板载内存按需交换，允许在具有大内存占用的CPU上快速执行，远远超过竞争对手的GPU执行。在用户空间中实现，我们的GAME原型允许GPU内存透明地容纳交换出的内存页面，而无需修改代码以获得高可用性和可移植性。对NAS-NPB基准应用程序的评估表明，当内存占用超过CPU DRAM大小并且配备的GPU有未使用的VDRAM可用于交换使用时，GAME将单任务(或多任务)执行速度提高了2.1倍(或3.1倍)。

{"title":"GPU-Assisted Memory Expansion","authors":"Pisacha Srinuan, Purushottam Sigdel, Xu Yuan, Lu Peng, Paul Darby, Christopher Aucoin, N. Tzeng","doi":"10.1109/nas51552.2021.9605372","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605372","url":null,"abstract":"Recent graphic processing units (GPUs) often come with large on-board physical memory to accelerate diverse parallel program executions on big datasets with regular access patterns, including machine learning (ML) and data mining (DM). Such a GPU may underutilize its physical memory during lengthy ML model training or DM, making it possible to lend otherwise unused GPU memory to applications executed concurrently on the host machine. This work explores an effective approach that lets memory-intensive applications run on the host machine CPU with its memory expanded dynamically onto available GPU on-board DRAM, called GPU-assisted memory expansion (GAME). Targeting computer systems equipped with the recent GPUs, our GAME approach permits speedy executions on CPU with large memory footprints by harvesting unused GPU on-board memory on-demand for swapping, far surpassing competitive GPU executions. Implemented in user space, our GAME prototype lets GPU memory house swapped-out memory pages transparently, without code modifications for high usability and portability. The evaluation of NAS-NPB benchmark applications demonstrates that GAME expedites monotasking (or multitasking) executions considerably by up to 2.1× (or 3.1×), when memory footprints exceed the CPU DRAM size and an equipped GPU has unused VDRAM available for swapping use.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121828247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀