2021 IEEE International Conference on Networking, Architecture and Storage (NAS)最新文献

英文中文

Deflection-Aware Routing Algorithm in Network on Chip against Soft Errors and Crosstalk Faults 针对软错误和串扰故障的片上网络偏转感知路由算法

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605392

Hadi Zamani, Z. Shirmohammadi, Ali Jahanshahi

Marching into nano-scale technology, probability of soft errors and crosstalk faults has increased by about 6-7 times. Since buffers occupy about 40-90% of the switch area, the probability of soft errors in switches is significant. We propose a deflection-aware routing algorithm (DAR) combined with an information redundancy technique to cover the soft errors and crosstalk faults in the header flow control units (FLIT). We also introduce an interleaving method along with a simple hamming code to tolerate the errors in data and tail FLITs. The proposed methods have been evaluated in both circuit and simulation level through a simulator written in C++, Booksim 2, and Synopsys Design Compiler. The evaluation results show that we can cover the soft errors and crosstalk faults with reasonable power and performance overhead of 3% and 6.5% respectively.

进入纳米级技术，软错误和串扰故障的概率增加了约6-7倍。由于缓冲区约占交换机面积的40-90%，因此交换机中出现软错误的概率很大。提出了一种结合信息冗余技术的偏转感知路由算法(DAR)，以覆盖报头流控制单元(FLIT)中的软错误和串扰故障。我们还介绍了一种交织方法以及一个简单的汉明码，以容忍数据和尾FLITs中的错误。通过c++编写的仿真器、Booksim 2和Synopsys Design Compiler对所提出的方法进行了电路级和仿真级的评估。评估结果表明，在合理的功耗和性能开销分别为3%和6.5%的情况下，我们可以覆盖软误差和串扰故障。

引用次数: 0

GO: Out-Of-Core Partitioning of Large Irregular Graphs GO:大型不规则图的out - core Partitioning

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605433

Gurneet Kaur, Rajesh K. Gupta

Single-PC, disk-based processing of large irregular graphs has recently gained much popularity. At the core of a disk-based system is a static graph partitioning that must be created before the processing starts. By handling one partition at a time, graphs that do not fit in memory are processed on a single machine. However, the multilevel graph partitioning algorithms used by the most sophisticated partitioners cannot be run on the same machine as their memory requirements far exceed the size of the graph. The popular memory efficient Mt-Metis graph partitioner requires 4.8× to 13.8× the memory needed to hold the entire graph in memory. To overcome this problem, we present the GO out-of-core graph partitioner that can successfully partition large graphs on a single machine. GO performs just two passes over the entire input graph, partition creation pass that creates balanced partitions and partition refinement pass that reduces edgecuts. Both passes function in a memory constrained manner via disk-based processing. GO successfully partitions large graphs for which Mt-Metis runs out of memory. For graphs that can be successfully partitioned by Mt-Metis on a single machine, GO produces balanced 8-way partitions with 11.8× to 76.2× fewer edgecuts using 1.9× to 8.3× less memory in comparable runtime.

单pc、基于磁盘的大型不规则图形处理最近得到了广泛的普及。基于磁盘的系统的核心是静态图分区，必须在处理开始之前创建它。通过一次处理一个分区，在一台机器上处理不适合内存的图。然而，最复杂的分区器所使用的多层图分区算法不能在同一台机器上运行，因为它们的内存需求远远超过图的大小。流行的内存高效的Mt-Metis图分区器需要4.8到13.8倍的内存才能将整个图保存在内存中。为了克服这个问题，我们提出了GO out- core图分区器，它可以在单个机器上成功地对大型图进行分区。GO只在整个输入图上执行两次传递:创建平衡分区的分区创建传递和减少切边的分区细化传递。两者都通过基于磁盘的处理以内存受限的方式传递函数。GO成功地对Mt-Metis耗尽内存的大型图进行分区。对于可以通过Mt-Metis在单个机器上成功分区的图，GO产生平衡的8路分区，在可比运行时使用1.9到8.3倍的内存，减少了11.8到76.2倍的切边。

{"title":"GO: Out-Of-Core Partitioning of Large Irregular Graphs","authors":"Gurneet Kaur, Rajesh K. Gupta","doi":"10.1109/nas51552.2021.9605433","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605433","url":null,"abstract":"Single-PC, disk-based processing of large irregular graphs has recently gained much popularity. At the core of a disk-based system is a static graph partitioning that must be created before the processing starts. By handling one partition at a time, graphs that do not fit in memory are processed on a single machine. However, the multilevel graph partitioning algorithms used by the most sophisticated partitioners cannot be run on the same machine as their memory requirements far exceed the size of the graph. The popular memory efficient Mt-Metis graph partitioner requires 4.8× to 13.8× the memory needed to hold the entire graph in memory. To overcome this problem, we present the GO out-of-core graph partitioner that can successfully partition large graphs on a single machine. GO performs just two passes over the entire input graph, partition creation pass that creates balanced partitions and partition refinement pass that reduces edgecuts. Both passes function in a memory constrained manner via disk-based processing. GO successfully partitions large graphs for which Mt-Metis runs out of memory. For graphs that can be successfully partitioned by Mt-Metis on a single machine, GO produces balanced 8-way partitions with 11.8× to 76.2× fewer edgecuts using 1.9× to 8.3× less memory in comparable runtime.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128731448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A New PIS Accelerator for Text Searching 一个新的PIS文本搜索加速器

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605387

Yunxin Huang, Aiguo Song, Yafei Yang

We propose a new design of a hardware accelerator for processing regular expression to speedup text search inside SSD storage (Processing in Storage: PIS). The unique features include parallel processing of 32 streams to quickly identify the first matched character under scan mode and match four characters concurrently under matching mode. In addition, we present a new approach of combining forward and backward scan to accomplish the first character search efficiently. Our experimental results show that the new parallel algorithm reduces the depth of logic circuit and the hybrid architecture performs as well as the Linux Grep algorithm does.

本文提出了一种用于处理正则表达式的硬件加速器的新设计，以加速SSD存储中的文本搜索(processing in storage: PIS)。独特的特点包括并行处理32个流，在扫描模式下快速识别第一个匹配字符，在匹配模式下同时匹配四个字符。此外，我们还提出了一种将前向扫描和后向扫描相结合的新方法来高效地完成首字符搜索。实验结果表明，新的并行算法减少了逻辑电路的深度，并且混合架构的性能与Linux Grep算法相当。

引用次数: 0

Design of A Multi-Path Reconfigurable Traffic Monitoring System 一种多路径可重构交通监控系统设计

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605385

Liang-Min Wang, Timothy Miskell, J. Morgan, Edwin Verplanke

As network bandwidth consumption continues to grow exponentially, real-time traffic data analysis becomes increasingly challenging and expensive. In many cases, network traffic monitoring can only be achieved via hardware Test Access Point (TAP) devices. Due to the intrusiveness and inflexibility of deploying hardware devices, this approach is intractable within an SDN environment where dynamic network resource allocation is key to the orchestration of network services. This paper presents a novel mirror tunnel design to achieve near hardware level TAP-as-a-Service (TaaS) performance through network device mirror offloading, while retaining resource reconfigurability. Mirror tunneling is a hybrid approach whereby a software TAP transports traffic from a source device to a mirror tunnel device. Traffic is then mirrored in place and sent to the destination device. The combination of a software TAP with the mirroring capabilities of the underlying hardware empowers system administrators to create a dynamically reconfigurable multi-path traffic mirroring system. As demonstrated in the benchmark results, this approach is efficient in terms of network bandwidth consumption and computational resources. In addition, this methodology is designed to mirror traffic in high-throughput environments with minimal to no impact on the source Virtual Network Functions (VNFs).

随着网络带宽消耗呈指数级增长，实时流量数据分析变得越来越具有挑战性和昂贵。在许多情况下，网络流量监控只能通过硬件测试接入点(TAP)设备来实现。由于部署硬件设备的侵入性和不灵活性，这种方法在SDN环境中很难处理，因为动态网络资源分配是网络服务编排的关键。本文提出了一种新的镜像隧道设计，通过网络设备镜像卸载实现接近硬件级别的TAP-as-a-Service (TaaS)性能，同时保持资源的可重构性。镜像隧道是一种混合方法，通过软件TAP将流量从源设备传输到镜像隧道设备。然后将流量镜像到适当位置并发送到目标设备。软件TAP与底层硬件的镜像功能相结合，使系统管理员能够创建动态可重构的多路径流量镜像系统。正如基准测试结果所示，这种方法在网络带宽消耗和计算资源方面是有效的。此外，该方法旨在镜像高吞吐量环境中的流量，对源虚拟网络功能(VNFs)的影响最小，甚至没有影响。

{"title":"Design of A Multi-Path Reconfigurable Traffic Monitoring System","authors":"Liang-Min Wang, Timothy Miskell, J. Morgan, Edwin Verplanke","doi":"10.1109/nas51552.2021.9605385","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605385","url":null,"abstract":"As network bandwidth consumption continues to grow exponentially, real-time traffic data analysis becomes increasingly challenging and expensive. In many cases, network traffic monitoring can only be achieved via hardware Test Access Point (TAP) devices. Due to the intrusiveness and inflexibility of deploying hardware devices, this approach is intractable within an SDN environment where dynamic network resource allocation is key to the orchestration of network services. This paper presents a novel mirror tunnel design to achieve near hardware level TAP-as-a-Service (TaaS) performance through network device mirror offloading, while retaining resource reconfigurability. Mirror tunneling is a hybrid approach whereby a software TAP transports traffic from a source device to a mirror tunnel device. Traffic is then mirrored in place and sent to the destination device. The combination of a software TAP with the mirroring capabilities of the underlying hardware empowers system administrators to create a dynamically reconfigurable multi-path traffic mirroring system. As demonstrated in the benchmark results, this approach is efficient in terms of network bandwidth consumption and computational resources. In addition, this methodology is designed to mirror traffic in high-throughput environments with minimal to no impact on the source Virtual Network Functions (VNFs).","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129673568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edges: Evenly Distributing Garbage-Collections for Enterprise SSDs via Stochastic Optimization 边缘:基于随机优化的企业ssd均匀分布垃圾收集

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605402

Shuyi Pei, Jing Yang, Bin Li

Solid-state drives (SSDs) have been widely used in various computing systems owing to their significant advantages over hard disk drives (HDDs). One critical challenge that hinders its further adoption in enterprise systems is to resolve the performance variability issue caused by the garbage collection (GC) process that frees flash memory containing invalid data. To overcome this challenge, we formulate a stochastic optimization model that characterizes the nature of the GC process and considers both total GC count and GC distribution over time. Based on the optimization model, we propose Edges, an innovative self-adaptive GC strategy that evenly distributes GCs for enterprise SSDs. The key insight behind Edges is that the number of invalid pages is a finer-grained metric of triggering GCs than the number of free blocks. By testing various traces from practical applications, we show that Edges is able to reduce the total GC counts by as high as 70.17% and GC variance by up to 57.29%, compared to the state-of-the-art GC algorithm.

固态驱动器(ssd)由于其相对于硬盘驱动器(hdd)的显著优势而被广泛应用于各种计算系统中。阻碍其在企业系统中进一步采用的一个关键挑战是解决由垃圾收集(GC)进程释放包含无效数据的闪存所引起的性能可变性问题。为了克服这一挑战，我们制定了一个随机优化模型，该模型表征了GC过程的性质，并考虑了总GC计数和GC随时间的分布。基于优化模型，我们提出了一种创新的自适应GC策略Edges，该策略可以均匀地分布企业ssd的GC。edge背后的关键观点是，与空闲块的数量相比，无效页面的数量是触发gc的细粒度指标。通过测试实际应用中的各种痕迹，我们表明，与最先进的GC算法相比，Edges能够将总GC计数减少高达70.17%，GC方差减少高达57.29%。

引用次数: 0

Locality-aware Thread Block Design in Single and Multi-GPU Graph Processing 单gpu和多gpu图形处理中的位置感知线程块设计

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605484

Quan Fan, Zizhong Chen

Graphics Processing Unit (GPU) has been adopted to process graphs effectively. Recently, multi-GPU systems are also exploited for greater performance boost. To process graphs on multiple GPUs in parallel, input graphs should be partitioned into parts using partitioning schemes. The partitioning schemes can impact the communication overhead, locality of memory accesses, and further improve the overall performance. We found that both intra-GPU data sharing and inter-GPU communication can be summarized as inter-TB communication. Based on this key idea, we propose a new graph partitioning scheme by redefining the input graph as a TB Graph with calculated vertex and edge weights, and then partition it to reduce intra & inter-GPU communication overhead and improve the locality at the granularity of Thread Blocks (TB). We also propose to develop a partitioning and mapping scheme for heterogeneous architectures including physical links with different bandwidths. The experimental results on graph partitioning show that our scheme is effective to improve the overall performance of the Breadth First Search (BFS) by up to 33%.

图形处理器(GPU)被用来有效地处理图形。最近，多gpu系统也被用于更高的性能提升。为了在多个gpu上并行处理图形，应该使用分区方案将输入图形划分为多个部分。分区方案可以影响通信开销、内存访问的局部性，并进一步提高整体性能。我们发现gpu内部的数据共享和gpu之间的通信都可以归结为inter-TB通信。基于这一关键思想，我们提出了一种新的图分区方案，通过将输入图重新定义为计算顶点和边权的TB图，然后对其进行分区，以减少gpu内部和gpu之间的通信开销，并提高线程块(TB)粒度的局部性。我们还建议为异构架构开发一个分区和映射方案，包括不同带宽的物理链路。图分割实验结果表明，该方案可有效地将广度优先搜索(BFS)的整体性能提高33%。

{"title":"Locality-aware Thread Block Design in Single and Multi-GPU Graph Processing","authors":"Quan Fan, Zizhong Chen","doi":"10.1109/nas51552.2021.9605484","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605484","url":null,"abstract":"Graphics Processing Unit (GPU) has been adopted to process graphs effectively. Recently, multi-GPU systems are also exploited for greater performance boost. To process graphs on multiple GPUs in parallel, input graphs should be partitioned into parts using partitioning schemes. The partitioning schemes can impact the communication overhead, locality of memory accesses, and further improve the overall performance. We found that both intra-GPU data sharing and inter-GPU communication can be summarized as inter-TB communication. Based on this key idea, we propose a new graph partitioning scheme by redefining the input graph as a TB Graph with calculated vertex and edge weights, and then partition it to reduce intra & inter-GPU communication overhead and improve the locality at the granularity of Thread Blocks (TB). We also propose to develop a partitioning and mapping scheme for heterogeneous architectures including physical links with different bandwidths. The experimental results on graph partitioning show that our scheme is effective to improve the overall performance of the Breadth First Search (BFS) by up to 33%.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning-based Vulnerability Study of Interpose PUFs as Security Primitives for IoT Networks 基于机器学习的物联网网络安全原语干预puf漏洞研究

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605405

Bipana Thapaliya, Khalid T. Mursi, Yu Zhuang

Security is of importance for communication networks, and many network nodes, like sensors and IoT devices, are resource-constrained. Physical Unclonable Functions (PUFs) leverage physical variations of the integrated circuits to produce responses unique to individual circuits and have the potential for delivering security for low-cost networks. But before a PUF can be adopted for security applications, all security vulnerabilities must be discovered. Recently, a new PUF known as Interpose PUF (IPUF) was proposed, which was tested to be secure against reliability-based modeling attacks and machine learning attacks when the attacked IPUF is of small size. A recent study showed IPUFs succumbed to a divide-and-conquer attack, and the attack method requires the position of the interpose bit known to the attacker, a condition that can be easily obfuscated by using a random interpose position. Thus, large IPUFs may still remain secure against all known modeling attacks if the interpose position is unknown to attackers. In this paper, we present a new modeling attack method of IPUFs using multilayer neural networks, and the attack method requires no knowledge of the interpose position. Our attack was tested on simulated IPUFs and silicon IPUFs implemented on FPGAs, and the results showed that many IPUFs which were resilient against existing attacks cannot withstand our new attack method, revealing a new vulnerability of IPUFs by re-defining the boundary between secure and insecure regions in the IPUF parameter space.

安全对于通信网络来说非常重要，许多网络节点，如传感器和物联网设备，都是资源受限的。物理不可克隆功能(puf)利用集成电路的物理变化来产生对单个电路独特的响应，并具有为低成本网络提供安全性的潜力。但是，在将PUF用于安全应用程序之前，必须发现所有的安全漏洞。最近，人们提出了一种新的PUF (Interpose PUF, IPUF)，并在被攻击的IPUF规模较小时，对基于可靠性的建模攻击和机器学习攻击进行了安全测试。最近的一项研究表明，ipuf容易受到分而治之的攻击，攻击方法需要攻击者知道插入位的位置，这种情况很容易通过使用随机插入位置来混淆。因此，如果攻击者不知道插入位置，大型ipuf可能仍然对所有已知的建模攻击保持安全。本文提出了一种基于多层神经网络的ipuf建模攻击方法，该方法不需要知道插入位置。我们的攻击在模拟IPUF和fpga上实现的硅IPUF上进行了测试，结果表明许多对现有攻击具有弹性的IPUF无法承受我们的新攻击方法，通过重新定义IPUF参数空间中的安全和不安全区域之间的边界，揭示了IPUF的新漏洞。

{"title":"Machine Learning-based Vulnerability Study of Interpose PUFs as Security Primitives for IoT Networks","authors":"Bipana Thapaliya, Khalid T. Mursi, Yu Zhuang","doi":"10.1109/nas51552.2021.9605405","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605405","url":null,"abstract":"Security is of importance for communication networks, and many network nodes, like sensors and IoT devices, are resource-constrained. Physical Unclonable Functions (PUFs) leverage physical variations of the integrated circuits to produce responses unique to individual circuits and have the potential for delivering security for low-cost networks. But before a PUF can be adopted for security applications, all security vulnerabilities must be discovered. Recently, a new PUF known as Interpose PUF (IPUF) was proposed, which was tested to be secure against reliability-based modeling attacks and machine learning attacks when the attacked IPUF is of small size. A recent study showed IPUFs succumbed to a divide-and-conquer attack, and the attack method requires the position of the interpose bit known to the attacker, a condition that can be easily obfuscated by using a random interpose position. Thus, large IPUFs may still remain secure against all known modeling attacks if the interpose position is unknown to attackers. In this paper, we present a new modeling attack method of IPUFs using multilayer neural networks, and the attack method requires no knowledge of the interpose position. Our attack was tested on simulated IPUFs and silicon IPUFs implemented on FPGAs, and the results showed that many IPUFs which were resilient against existing attacks cannot withstand our new attack method, revealing a new vulnerability of IPUFs by re-defining the boundary between secure and insecure regions in the IPUF parameter space.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128812212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

CALC: A Content-Aware Learning Cache for Storage Systems CALC:存储系统的内容感知学习缓存

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605381

Maher Kachmar, D. Kaeli

In today’s enterprise storage systems, supported services such as data deduplication are becoming a common feature adopted in the data center, especially as new storage technologies mature. Static partitioning of storage system resources, including CPU cores and memory caches, may lead to missing Service Level Agreement (SLAs) thresholds, such as the Data Reduction Rate (DRR) or IO latency. However, typical storage system applications exhibit a workload pattern that can be learned. By learning these pattern, we are better equipped to address several storage system resource partitioning challenges, issues that cannot be overcome with traditional manual tuning and primitive feedback mechanisms.We propose a Content-Aware Learning Cache (CALC) that uses online reinforcement learning models (Q-Learning, SARSA and Actor-Critic) to actively partition the storage system cache between a data digest cache, content cache, and address-based data cache to improve cache hit performance, while maximizing data reduction rates. Using traces from popular storage applications, we show how our machine learning approach is robust and can out-perform an iterative search method for various datasets and cache sizes. Our content-aware learning cache improves hit rates by 7.1% when compared to iterative search methods, and 18.2% when compared to traditional LRU-based data cache implementation.

在当今的企业存储系统中，随着新的存储技术的成熟，支持的重复数据删除等服务已经成为数据中心普遍采用的特性。对存储系统资源(包括CPU内核和内存缓存)进行静态分区，可能导致sla (Service Level Agreement)阈值缺失，如DRR (Data Reduction Rate)、IO时延等。但是，典型的存储系统应用程序表现出一种可以学习的工作负载模式。通过学习这些模式，我们可以更好地解决几个存储系统资源分区挑战，这些问题是传统的手动调优和原始反馈机制无法克服的。我们提出了一个内容感知学习缓存(CALC)，它使用在线强化学习模型(Q-Learning, SARSA和Actor-Critic)在数据摘要缓存，内容缓存和基于地址的数据缓存之间主动分区存储系统缓存，以提高缓存命中率，同时最大化数据减少率。通过使用流行存储应用程序的跟踪，我们展示了我们的机器学习方法是如何鲁棒的，并且可以在各种数据集和缓存大小的迭代搜索方法中胜过迭代搜索方法。与迭代搜索方法相比，我们的内容感知学习缓存的命中率提高了7.1%，与传统的基于lru的数据缓存实现相比，命中率提高了18.2%。

{"title":"CALC: A Content-Aware Learning Cache for Storage Systems","authors":"Maher Kachmar, D. Kaeli","doi":"10.1109/nas51552.2021.9605381","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605381","url":null,"abstract":"In today’s enterprise storage systems, supported services such as data deduplication are becoming a common feature adopted in the data center, especially as new storage technologies mature. Static partitioning of storage system resources, including CPU cores and memory caches, may lead to missing Service Level Agreement (SLAs) thresholds, such as the Data Reduction Rate (DRR) or IO latency. However, typical storage system applications exhibit a workload pattern that can be learned. By learning these pattern, we are better equipped to address several storage system resource partitioning challenges, issues that cannot be overcome with traditional manual tuning and primitive feedback mechanisms.We propose a Content-Aware Learning Cache (CALC) that uses online reinforcement learning models (Q-Learning, SARSA and Actor-Critic) to actively partition the storage system cache between a data digest cache, content cache, and address-based data cache to improve cache hit performance, while maximizing data reduction rates. Using traces from popular storage applications, we show how our machine learning approach is robust and can out-perform an iterative search method for various datasets and cache sizes. Our content-aware learning cache improves hit rates by 7.1% when compared to iterative search methods, and 18.2% when compared to traditional LRU-based data cache implementation.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115713822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

World's #1 CRM Scale Challenges 世界排名第一的CRM规模挑战

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605424

In this talk, we will first describe scale challenges of the world's #1 CRM (Customer Relationship Management) platform that operates from the Cloud, executes billions of business tractions daily for hundreds of thousands of customer companies around the world. We will then describe how Salesforce researchers and engineers utilize computer science principles such as Amdahl's Law, temporal and spatial locality, plus big data and machine learning, to make software execute fast and efficiently on various types of compute, network, and storage architectures to meet the ever growing scale challenges.

在本次演讲中，我们将首先描述全球排名第一的CRM(客户关系管理)平台的规模挑战，该平台从云运行，每天为全球数十万家客户公司执行数十亿笔业务。然后，我们将描述Salesforce的研究人员和工程师如何利用计算机科学原理，如阿姆达尔定律，时间和空间局部性，以及大数据和机器学习，使软件在各种类型的计算，网络和存储架构上快速有效地执行，以满足不断增长的规模挑战。

引用次数: 0

Implementation of a High-Throughput Virtual Switch Port Monitoring System 高吞吐量虚拟交换机端口监控系统的实现

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

Pub Date : 2021-10-01 DOI: 10.1109/nas51552.2021.9605360

Liang-Min Wang, Timothy Miskell, Patrick Fu, Cunming Liang, Edwin Verplanke

As SDN-based networking infrastructure continues to evolve, an increasing number of traditional network functions are deployed over virtualized networks. Similar to fixed function switching networks, traffic monitoring in a Software Defined Network is critical in order to ensure the security and performance of the underlying infrastructure. In the context of virtualized networks, deployment of a virtualized TAP service has been reported as an effective VNF that can provide the same monitoring capabilities as a physical TAP. For most virtual switch implementations, e.g., OvS, network device virtualization is based upon a para-virtualization technology, i.e., VIRTIO. One of the primary use cases for port mirroring is inter-VM communication, i.e., packet streams that exist between virtual network devices, which remains prohibitively expensive for TAP devices. Specifically, it has been observed that virtual TAPs can contribute up to 70% performance degradation to the source VNF(s). With reference to prior work, we previously presented a feasibility study that included a novel approach towards the reduction of port-mirroring overhead. In this paper we present our latest contributions, in which we integrate our design into OvS and develop a VLAN based filtering scheme to pass traffic from a source device to a monitoring device. In this case, both devices may reside either within the same or different switch domains. Furthermore, we present an improvement over RSPAN and discuss its feasibility in delivering mirrored traffic across switch domains, which, in contrast to ERSPAN, does not require an L3 overlay network.

随着基于sdn的网络基础设施的不断发展，越来越多的传统网络功能部署在虚拟化网络上。与固定功能交换网络类似，软件定义网络中的流量监控对于确保底层基础设施的安全性和性能至关重要。在虚拟化网络的上下文中，虚拟化TAP服务的部署被认为是一种有效的VNF，可以提供与物理TAP相同的监视功能。对于大多数虚拟交换机实现，例如OvS，网络设备虚拟化是基于一种准虚拟化技术，即VIRTIO。端口镜像的主要用例之一是vm间通信，即存在于虚拟网络设备之间的数据包流，这对于TAP设备来说仍然非常昂贵。具体来说，已经观察到虚拟tap可以导致源VNF(s)高达70%的性能下降。参考之前的工作，我们之前提出了一项可行性研究，其中包括一种减少端口镜像开销的新方法。在本文中，我们介绍了我们的最新贡献，我们将我们的设计集成到OvS中，并开发了一个基于VLAN的过滤方案，将流量从源设备传递到监控设备。在这种情况下，两个设备可能位于相同或不同的交换机域中。此外，我们提出了对RSPAN的改进，并讨论了它在跨交换域传递镜像流量的可行性，与ERSPAN相比，它不需要L3覆盖网络。

{"title":"Implementation of a High-Throughput Virtual Switch Port Monitoring System","authors":"Liang-Min Wang, Timothy Miskell, Patrick Fu, Cunming Liang, Edwin Verplanke","doi":"10.1109/nas51552.2021.9605360","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605360","url":null,"abstract":"As SDN-based networking infrastructure continues to evolve, an increasing number of traditional network functions are deployed over virtualized networks. Similar to fixed function switching networks, traffic monitoring in a Software Defined Network is critical in order to ensure the security and performance of the underlying infrastructure. In the context of virtualized networks, deployment of a virtualized TAP service has been reported as an effective VNF that can provide the same monitoring capabilities as a physical TAP. For most virtual switch implementations, e.g., OvS, network device virtualization is based upon a para-virtualization technology, i.e., VIRTIO. One of the primary use cases for port mirroring is inter-VM communication, i.e., packet streams that exist between virtual network devices, which remains prohibitively expensive for TAP devices. Specifically, it has been observed that virtual TAPs can contribute up to 70% performance degradation to the source VNF(s). With reference to prior work, we previously presented a feasibility study that included a novel approach towards the reduction of port-mirroring overhead. In this paper we present our latest contributions, in which we integrate our design into OvS and develop a VLAN based filtering scheme to pass traffic from a source device to a monitoring device. In this case, both devices may reside either within the same or different switch domains. Furthermore, we present an improvement over RSPAN and discuss its feasibility in delivering mirrored traffic across switch domains, which, in contrast to ERSPAN, does not require an L3 overlay network.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122813353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀