IEEE Computer Architecture Letters最新文献

英文中文

Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling 模拟我们的安全软件之路：集成微体系结构模拟和泄漏估计建模的故事

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-08-10 DOI: 10.1109/LCA.2023.3303913

Justin Feng;Fatemeh Arkannezhad;Christopher Ryu;Enoch Huang;Siddhant Gupta;Nader Sehatbakhsh

An important step to protect software against side-channel vulnerability is to rigorously evaluate it on the target hardware using standard leakage tests. Recently, leakage estimation tools have received a lot of attention to improve this time-consuming process. Despite their advancements, existing tools often neglect the impact of microarchitecture and its underlying events in their leakage model which leads to inaccuracies. This paper takes the first step in addressing these issues by integrating a physical side-channel leakage estimation tool into a microarchitectural simulator. To achieve this, we first systematically explore the impact of various architecture and microarchitecture activities and their underlying interactions on the produced physical side-channel signals and integrate that into the microarchitecture model. Second, to create a comprehensive leakage estimation report, we leverage taint tracking and symbolic execution to accurately analyze different paths and inputs. The final outcome of this work is a tool that takes a binary and generates a leakage report that covers architecture and microarchitecture-related leakages for both data-dependent and path-dependent information leakage scenarios.

保护软件免受侧通道漏洞攻击的一个重要步骤是使用标准泄漏测试在目标硬件上对其进行严格评估。最近，泄漏估计工具受到了很多关注，以改进这一耗时的过程。尽管已有工具取得了进步，但它们往往忽略了微体系结构及其泄漏模型中潜在事件的影响，这导致了不准确。本文通过将物理侧信道泄漏估计工具集成到微体系结构模拟器中，迈出了解决这些问题的第一步。为了实现这一点，我们首先系统地探索了各种体系结构和微体系结构活动及其潜在相互作用对产生的物理侧通道信号的影响，并将其集成到微体系结构模型中。其次，为了创建一个全面的泄漏估计报告，我们利用污染跟踪和符号执行来准确分析不同的路径和输入。这项工作的最终结果是一个工具，它采用二进制文件并生成一份泄漏报告，该报告涵盖了数据相关和路径相关信息泄漏场景中与体系结构和微体系结构相关的泄漏。

{"title":"Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling","authors":"Justin Feng;Fatemeh Arkannezhad;Christopher Ryu;Enoch Huang;Siddhant Gupta;Nader Sehatbakhsh","doi":"10.1109/LCA.2023.3303913","DOIUrl":"10.1109/LCA.2023.3303913","url":null,"abstract":"An important step to protect software against side-channel vulnerability is to rigorously evaluate it on the target hardware using standard leakage tests. Recently, leakage estimation tools have received a lot of attention to improve this time-consuming process. Despite their advancements, existing tools often neglect the impact of microarchitecture and its underlying events in their leakage model which leads to inaccuracies. This paper takes the first step in addressing these issues by integrating a physical side-channel leakage estimation tool into a microarchitectural simulator. To achieve this, we first systematically explore the impact of various architecture and microarchitecture activities and their underlying interactions on the produced physical side-channel signals and integrate that into the microarchitecture model. Second, to create a comprehensive leakage estimation report, we leverage taint tracking and symbolic execution to accurately analyze different paths and inputs. The final outcome of this work is a tool that takes a binary and generates a leakage report that covers architecture and microarchitecture-related leakages for both data-dependent and path-dependent information leakage scenarios.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"109-112"},"PeriodicalIF":2.3,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41489016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SoCurity: A Design Approach for Enhancing SoC Security SoCurity：一种增强SoC安全性的设计方法

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-08-03 DOI: 10.1109/LCA.2023.3301448

Naorin Hossain;Alper Buyuktosunoglu;John-David Wellman;Pradip Bose;Margaret Martonosi

We propose SoCurity, the first NoC counter-based hardware monitoring approach for enhancing heterogeneous SoC security. With SoCurity, we develop a fast, lightweight anomalous activity detection system leveraging semi-supervised machine learning models that require no prior attack knowledge for detecting anomalies. We demonstrate our techniques with a case study on a real SoC for a connected autonomous vehicle system and find up to 96% detection accuracy.

我们提出了SoCurity，这是第一种基于NoC计数器的硬件监控方法，用于增强异构SoC的安全性。利用SoCurity，我们开发了一个快速、轻量级的异常活动检测系统，该系统利用半监督机器学习模型，不需要事先的攻击知识来检测异常。我们通过对连接的自动驾驶汽车系统的真实SoC的案例研究来展示我们的技术，并发现检测准确率高达96%。

引用次数: 0

Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories 智能存储器：在三维堆叠存储器中加速深度学习

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-08-01 DOI: 10.1109/LCA.2023.3287976

Seyyed Hossein SeyyedAghaei Rezaei;Parham Zilouchian Moghaddam;Mehdi Modarressi

Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory, a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.

内存处理（PIM）是解决深度神经网络（DNN）加速器带宽瓶颈的最有前途的模式。然而，DNN 的算法和数据流结构仍然需要在内存设备内部的存储库之间移动大量数据，以便将输入数据及其相应的模型参数汇集到一起，这就将部分带宽瓶颈转移到了内存数据通信基础设施上。为了缓解这一瓶颈，我们提出了智能内存（Smart Memory），这是一种适用于三维内存的高度并行内存 DNN 加速器，得益于可扩展的高带宽内存网络。现有的 PIM 设计是在底层 3D 存储器的逻辑芯片上实现计算单元和片上网络，而在 Smart Memory 中，计算和数据传输任务则分布在各个存储器组中。为此，每个内存组都配备了：（1）一个非常简单的处理单元，用于运行神经网络；（2）一个电路交换路由器，用于通过三维内存网络互连内存组。我们的评估显示，与最先进的内存神经网络加速器相比，平均性能提高了 44%。

{"title":"Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories","authors":"Seyyed Hossein SeyyedAghaei Rezaei;Parham Zilouchian Moghaddam;Mehdi Modarressi","doi":"10.1109/LCA.2023.3287976","DOIUrl":"10.1109/LCA.2023.3287976","url":null,"abstract":"Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present \u0000<italic>Smart Memory</i>\u0000, a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in \u0000<italic>Smart Memory</i>\u0000 the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 1","pages":"137-141"},"PeriodicalIF":2.3,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE 打破海市蜃楼的海市蜃楼:分析海市蜃楼新出现的“攻击”中的建模缺陷

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-07-21 DOI: 10.1109/LCA.2023.3297875

Gururaj Saileshwar;Moinuddin Qureshi

This letter studies common modeling pitfalls in security analyses of hardware defenses to highlight the importance of accurate reproduction of defenses. We provide a case study of MIRAGE (Saileshwar and Qureshi 2021), a defense against cache side channel attacks, and analyze its incorrect modeling in a recent work (Chakraborty et al., 2023) that claimed to break its security. We highlight several modeling pitfalls that can invalidate the security properties of any defense including a) incomplete modeling of components critical for security, b) usage of random number generators that are insufficiently random, and c) initialization of system to improbable states, leading to an incorrect conclusion of a vulnerability, and show how these modeling bugs incorrectly cause set conflicts to be observed in a recent work’s (Chakraborty et al., 2023) model of MIRAGE. We also provide an implementation addressing these bugs that does not incur set-conflicts, highlighting that MIRAGE is still unbroken.

本文研究了硬件防御安全分析中常见的建模缺陷，以强调准确复制防御的重要性。我们提供了MIRAGE的案例研究(Saileshwar和Qureshi 2021)，这是一种针对缓存侧通道攻击的防御，并在最近的一项工作(Chakraborty等人，2023)中分析了其错误的建模，该工作声称破坏了其安全性。我们强调了几个可以使任何防御的安全属性无效的建模缺陷，包括a)对安全至关重要的组件的不完整建模，b)使用随机数生成器的随机性不足，以及c)将系统初始化到不可能的状态，导致对漏洞的错误结论，并展示了这些建模错误如何在最近的工作(Chakraborty et al.， 2023) MIRAGE模型中观察到的集合冲突。我们还提供了一个解决这些错误的实现，它不会引起集合冲突，强调MIRAGE仍然是未被破坏的。

引用次数: 0

Exploring the Latency Sensitivity of Cache Replacement Policies 缓存替换策略的时延敏感性研究

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-07-19 DOI: 10.1109/LCA.2023.3296251

Ahmed Nematallah;Chang Hyun Park;David Black-Schaffer

With DRAM latencies increasing relative to CPU speeds, the performance of caches has become more important. This has led to increasingly sophisticated replacement policies that require complex calculations to update their replacement metadata, which often require multiple cycles. To minimize the negative impact of these metadata updates, architects have focused on policies that incur as little update latency as possible through a combination of reducing the policies’ precision and using parallel hardware. In this work we investigate whether these tradeoffs to reduce cache metadata update latency are needed. Specifically, we look at the performance and energy impact of increasing the latency of cache replacement policy updates. We find that even dramatic increases in replacement policy update latency have very limited effect. This indicates that designers have far more freedom to increase policy complexity and latency than previously assumed.

随着DRAM延迟相对于CPU速度的增加，缓存的性能变得更加重要。这导致了越来越复杂的替换策略，需要复杂的计算来更新其替换元数据，这通常需要多个周期。为了最大限度地减少这些元数据更新的负面影响，架构师将重点放在通过降低策略的精度和使用并行硬件来尽可能减少更新延迟的策略上。在这项工作中，我们研究了是否需要这些折衷来减少缓存元数据更新延迟。具体来说，我们将研究增加缓存替换策略更新延迟对性能和能源的影响。我们发现，即使替换策略更新延迟急剧增加，效果也非常有限。这表明，与之前假设的相比，设计人员在增加策略复杂性和延迟方面有更大的自由度。

引用次数: 0

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands X射线：通过发出内存命令发现DRAM内部结构和错误特征

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-07-17 DOI: 10.1109/LCA.2023.3296153

Hwayong Nam;Seungmin Baek;Minbok Wi;Michael Jaemin Kim;Jaehyun Park;Chihun Song;Nam Sung Kim;Jung Ho Ahn

The demand for accurate information about the internal structure and characteristics of DRAM has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official documents, making it difficult to find specific information about actual DRAM devices. This paper presents reliable findings on the internal structure and characteristics of DRAM using activate-induced bitflips (AIBs), retention time test, and row-copy operation. While previous studies have attempted to understand the internal behaviors of DRAM devices, they have only shown results without identifying the causes or have analyzed DRAM modules rather than individual chips. We first uncover the size, structure, and operation of DRAM subarrays and verify our findings on the characteristics of DRAM. Then, we correct misunderstood information related to AIBs and demonstrate experimental results supporting the cause of rowhammer.

对有关DRAM内部结构和特性的准确信息的需求一直在增长。最近的研究探索了DRAM的结构和特性，以改善内存中的处理，提高可靠性，并减轻被称为rowhammer的漏洞。然而，DRAM制造商只通过官方文件披露有限的信息，因此很难找到有关实际DRAM设备的具体信息。本文使用激活诱导位翻转（AIB）、保留时间测试和行复制操作，对DRAM的内部结构和特性进行了可靠的研究。虽然之前的研究试图了解DRAM器件的内部行为，但他们只显示了结果，没有确定原因，也没有分析DRAM模块而不是单个芯片。我们首先揭示了DRAM子阵列的大小、结构和操作，并验证了我们对DRAM特性的发现。然后，我们纠正了与AIBs相关的误解信息，并展示了支持rowhammer原因的实验结果。

{"title":"X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands","authors":"Hwayong Nam;Seungmin Baek;Minbok Wi;Michael Jaemin Kim;Jaehyun Park;Chihun Song;Nam Sung Kim;Jung Ho Ahn","doi":"10.1109/LCA.2023.3296153","DOIUrl":"10.1109/LCA.2023.3296153","url":null,"abstract":"The demand for accurate information about the internal structure and characteristics of DRAM has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official documents, making it difficult to find specific information about actual DRAM devices. This paper presents reliable findings on the internal structure and characteristics of DRAM using activate-induced bitflips (AIBs), retention time test, and row-copy operation. While previous studies have attempted to understand the internal behaviors of DRAM devices, they have only shown results without identifying the causes or have analyzed DRAM modules rather than individual chips. We first uncover the size, structure, and operation of DRAM subarrays and verify our findings on the characteristics of DRAM. Then, we correct misunderstood information related to AIBs and demonstrate experimental results supporting the cause of rowhammer.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"89-92"},"PeriodicalIF":2.3,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49364167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads 针对I/O密集型工作负载的泄漏感知直接I/O

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-07-03 DOI: 10.1109/LCA.2023.3290427

Ipoom Jeong;Jiaqi Lou;Yongseok Son;Yongjoo Park;Yifan Yuan;Nam Sung Kim

The advancement in I/O technology has posed an unprecedented demand for high-performance processing on I/O data, leading to the development of Data Direct I/O (DDIO) technology. DDIO improves I/O processing efficiency by directly injecting all inbound I/O data into the last-level cache (LLC) in cooperation with any type of I/O device. Nonetheless, in certain scenarios with more than one I/O applications, DDIO may have sub-optimal performance caused by interference inside the LLC, resulting in the degradation of system performance. Especially, in this paper, we demonstrate that storage I/O on modern high-performance NVMe SSDs hardly benefits from DDIO, sometimes causing inefficient use of the shared LLC due to the “leaky DMA problem”. To address this problem, we propose LADIO, an adaptive approach that mitigates inter-application interference by dynamically controlling the DDIO functionality and reallocating LLC ways based on the leakage and locality of storage I/O data, respectively. In scenarios with heavy I/O interference, LADIO improves the throughput of network-intensive applications by 20% while maintaining that of storage-intensive applications.

随着I/O技术的进步，对高性能的I/O数据处理提出了前所未有的要求，导致了DDIO (data Direct I/O)技术的发展。DDIO通过与任何类型的I/O设备合作，直接将所有入站I/O数据注入最后一级缓存(LLC)，从而提高了I/O处理效率。尽管如此，在某些具有多个I/O应用程序的场景中，由于LLC内部的干扰，DDIO的性能可能不太理想，从而导致系统性能下降。特别是，在本文中，我们证明了现代高性能NVMe ssd上的存储I/O很难从DDIO中受益，有时由于“泄漏DMA问题”导致共享LLC的低效使用。为了解决这个问题，我们提出了LADIO，这是一种自适应方法，通过动态控制DDIO功能和根据存储I/O数据的泄漏和局域性重新分配LLC方式来减轻应用程序间的干扰。在有严重I/O干扰的场景中，radio在保持存储密集型应用的吞吐量的同时，将网络密集型应用的吞吐量提高20%。

{"title":"LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads","authors":"Ipoom Jeong;Jiaqi Lou;Yongseok Son;Yongjoo Park;Yifan Yuan;Nam Sung Kim","doi":"10.1109/LCA.2023.3290427","DOIUrl":"10.1109/LCA.2023.3290427","url":null,"abstract":"The advancement in I/O technology has posed an unprecedented demand for high-performance processing on I/O data, leading to the development of Data Direct I/O (DDIO) technology. DDIO improves I/O processing efficiency by directly injecting all inbound I/O data into the last-level cache (LLC) in cooperation with any type of I/O device. Nonetheless, in certain scenarios with more than one I/O applications, DDIO may have sub-optimal performance caused by interference inside the LLC, resulting in the degradation of system performance. Especially, in this paper, we demonstrate that storage I/O on modern high-performance NVMe SSDs hardly benefits from DDIO, sometimes causing inefficient use of the shared LLC due to the “leaky DMA problem”. To address this problem, we propose \u0000<monospace>LADIO</monospace>\u0000, an adaptive approach that mitigates inter-application interference by dynamically controlling the DDIO functionality and reallocating LLC ways based on the leakage and locality of storage I/O data, respectively. In scenarios with heavy I/O interference, \u0000<monospace>LADIO</monospace>\u0000 improves the throughput of network-intensive applications by 20% while maintaining that of storage-intensive applications.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"77-80"},"PeriodicalIF":2.3,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48507765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving T-CAT：具有内存交错的分层内存系统的动态缓存分配

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-06-28 DOI: 10.1109/LCA.2023.3290197

Hwanjun Lee;Seunghak Lee;Yeji Jung;Daehoon Kim

New memory interconnect technology, such as Intel's Compute Express Link (CXL), helps to expand memory bandwidth and capacity by adding CPU-less NUMA nodes to the main memory system, addressing the growing memory wall challenge. Consequently, modern computing systems embrace the heterogeneity in memory systems, composing the memory systems with a tiered memory system with near and far memory (e.g., local DRAM and CXL-DRAM). However, adopting NUMA interleaving, which can improve performance by exploiting node-level parallelism and utilizing aggregate bandwidth, to the tiered memory systems can face challenges due to differences in the access latency between the two types of memory, leading to potential performance degradation for memory-intensive workloads. By tackling the challenges, we first investigate the effects of the NUMA interleaving on the performance of the tiered memory systems. We observe that while NUMA interleaving is essential for applications demanding high memory bandwidth, it can negatively impact the performance of applications demanding low memory bandwidth. Next, we propose a dynamic cache management, called T-CAT, which partitions the last-level cache between near and far memory, aiming to mitigate performance degradation by accessing far memory. T-CAT attempts to reduce the difference in the average access latency between near and far memory by re-sizing the cache partitions. Through dynamic cache management, T-CAT can preserve the performance benefits of NUMA interleaving while mitigating performance degradation by the far memory accesses. Our experimental results show that T-CAT improves performance by up to 17% compared to cases with NUMA interleaving without the cache management.

新的内存互连技术，如Intel的Compute Express Link（CXL），通过在主内存系统中添加无CPU的NUMA节点，有助于扩展内存带宽和容量，解决不断增长的内存墙挑战。因此，现代计算系统包含了存储器系统的异构性，将存储器系统与具有近存储器和远存储器的分层存储器系统（例如，本地DRAM和CXL-DRAM）组合在一起。然而，由于两种类型的内存之间的访问延迟差异，将NUMA交错应用于分层内存系统可能会面临挑战，这可能会导致内存密集型工作负载的性能下降。NUMA交错可以通过利用节点级并行性和利用聚合带宽来提高性能。通过应对这些挑战，我们首先研究了NUMA交织对分层存储系统性能的影响。我们观察到，虽然NUMA交织对于要求高内存带宽的应用程序至关重要，但它可能会对要求低内存带宽的程序的性能产生负面影响。接下来，我们提出了一种称为T-CAT的动态缓存管理，它在近内存和远内存之间划分最后一级缓存，旨在通过访问远内存来缓解性能下降。T-CAT试图通过重新调整缓存分区的大小来减少近内存和远内存之间平均访问延迟的差异。通过动态缓存管理，T-CAT可以保留NUMA交织的性能优势，同时减轻远程内存访问造成的性能下降。我们的实验结果表明，与在没有缓存管理的情况下使用NUMA交织的情况相比，T-CAT的性能提高了17%。

{"title":"T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving","authors":"Hwanjun Lee;Seunghak Lee;Yeji Jung;Daehoon Kim","doi":"10.1109/LCA.2023.3290197","DOIUrl":"10.1109/LCA.2023.3290197","url":null,"abstract":"New memory interconnect technology, such as Intel's Compute Express Link (CXL), helps to expand memory bandwidth and capacity by adding CPU-less NUMA nodes to the main memory system, addressing the growing memory wall challenge. Consequently, modern computing systems embrace the heterogeneity in memory systems, composing the memory systems with a tiered memory system with near and far memory (e.g., local DRAM and CXL-DRAM). However, adopting NUMA interleaving, which can improve performance by exploiting node-level parallelism and utilizing aggregate bandwidth, to the tiered memory systems can face challenges due to differences in the access latency between the two types of memory, leading to potential performance degradation for memory-intensive workloads. By tackling the challenges, we first investigate the effects of the NUMA interleaving on the performance of the tiered memory systems. We observe that while NUMA interleaving is essential for applications demanding high memory bandwidth, it can negatively impact the performance of applications demanding low memory bandwidth. Next, we propose a dynamic cache management, called \u0000<monospace>T-CAT</monospace>\u0000, which partitions the last-level cache between near and far memory, aiming to mitigate performance degradation by accessing far memory. \u0000<monospace>T-CAT</monospace>\u0000 attempts to reduce the difference in the average access latency between near and far memory by re-sizing the cache partitions. Through dynamic cache management, \u0000<monospace>T-CAT</monospace>\u0000 can preserve the performance benefits of NUMA interleaving while mitigating performance degradation by the far memory accesses. Our experimental results show that \u0000<monospace>T-CAT</monospace>\u0000 improves performance by up to 17% compared to cases with NUMA interleaving without the cache management.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"73-76"},"PeriodicalIF":2.3,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48978905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers 用于比特币区块链头的硬件加速可重用Merkle树生成

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-06-28 DOI: 10.1109/LCA.2023.3289515

Kiseok Jeon;Junghee Lee;Bumsoo Kim;James J. Kim

As the value of Bitcoin increases, the difficulty level of mining keeps increasing. This is generally addressed with application-specific integrated circuits (ASIC), but block candidates are still created by the software. The overhead of block candidate generation is relatively growing because the hash computation is boosted by ASIC. Additionally, it is getting harder to find the target nonce; If it is not found for a block candidate, a new block candidate must be generated. A new candidate can be generated to reduce the overhead of block candidate generation by modifying the coinbase without selecting and verifying transactions again. To this end, we propose a hardware accelerator for generating Merkle trees efficiently. The hash computation for Merkle tree generation is conducted with ASIC to reduce the overhead of block candidate generation, and the tree with only the modified coinbase is rapidly regenerated by reusing the intermediate results of the previously generated tree. Our simulation results demonstrate that the execution time can be reduced by up to 98.92% and power consumption by up to 99.73% when the number of transactions in a tree is 2048.

随着比特币价值的增加，挖矿的难度也在不断增加。这通常通过专用集成电路（ASIC）来解决，但块候选者仍然由软件创建。块候选生成的开销相对增长，因为ASIC提高了哈希计算。此外，查找目标nonce变得越来越困难；如果找不到块候选，则必须生成新的块候选。可以生成新的候选，以通过修改coinbase来减少块候选生成的开销，而无需再次选择和验证事务。为此，我们提出了一种有效生成Merkle树的硬件加速器。Merkle树生成的哈希计算是用ASIC进行的，以减少块候选生成的开销，并且通过重用先前生成的树的中间结果来快速再生仅具有修改的coinbase的树。我们的仿真结果表明，当树中的事务数为2048时，执行时间可以减少98.92%，功耗可以减少99.73%。

{"title":"Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers","authors":"Kiseok Jeon;Junghee Lee;Bumsoo Kim;James J. Kim","doi":"10.1109/LCA.2023.3289515","DOIUrl":"10.1109/LCA.2023.3289515","url":null,"abstract":"As the value of Bitcoin increases, the difficulty level of mining keeps increasing. This is generally addressed with application-specific integrated circuits (ASIC), but block candidates are still created by the software. The overhead of block candidate generation is relatively growing because the hash computation is boosted by ASIC. Additionally, it is getting harder to find the target nonce; If it is not found for a block candidate, a new block candidate must be generated. A new candidate can be generated to reduce the overhead of block candidate generation by modifying the coinbase without selecting and verifying transactions again. To this end, we propose a hardware accelerator for generating Merkle trees efficiently. The hash computation for Merkle tree generation is conducted with ASIC to reduce the overhead of block candidate generation, and the tree with only the modified coinbase is rapidly regenerated by reusing the intermediate results of the previously generated tree. Our simulation results demonstrate that the execution time can be reduced by up to 98.92% and power consumption by up to 99.73% when the number of transactions in a tree is 2048.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"69-72"},"PeriodicalIF":2.3,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41591979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Containerized In-Storage Processing Model and Hardware Acceleration for Fully-Flexible Computational SSDs 用于全灵活计算固态硬盘的容器化存储内处理模型和硬件加速

IF 2.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters

Pub Date : 2023-06-27 DOI: 10.1109/lca.2023.3289828

Donghyun Gouk, Miryeong Kwon, Hanyeoreum Bae, Myoungsoo Jung

In-storage processing (ISP) efficiently examines large datasets but faces performance and security challenges. We introduce DockerSSD, a flexible ISP model that runs various applications near flash without modification. It employs lightweight OS-level virtualization in modern SSDs for faster ISP and better storage intelligence with a high flexiblity. DockerSSD reuses existing Docker container images for real-time data processing without altering the storage interface or runtime. Our design includes a new communication method and virtual firmware, alongside automated container-related network and I/O handling hardware. DockerSSD achieves a 2× speed improvement and reduces system-level power by 35.7%, on average.

存储内处理（ISP）可有效检查大型数据集，但面临着性能和安全方面的挑战。我们引入了 DockerSSD，这是一种灵活的 ISP 模式，可在闪存附近运行各种应用，无需修改。它在现代固态硬盘中采用了轻量级操作系统级虚拟化技术，以实现更快的 ISP 和更高灵活性的存储智能。DockerSSD 可重复使用现有的 Docker 容器镜像进行实时数据处理，而无需更改存储接口或运行时。我们的设计包括新的通信方法和虚拟固件，以及与容器相关的自动化网络和 I/O 处理硬件。DockerSSD 的速度提高了 2 倍，系统级功耗平均降低了 35.7%。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Computer Architecture Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀