2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文中文

Robustness and Power Efficiency in Spin-Orbit Torque-Based Probabilistic Logic Circuits 基于自旋轨道转矩的概率逻辑电路的鲁棒性和功率效率

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238682

Kamal Danouchi, G. Prenat, Philippe Talatchian, Louis Hutin, Lorena Anghel

The efficiency of known algorithms for solving NP- hard problems is constrained by the limitations of conventional von Neumann architectures. Recurrent networks of stochastic neurons are an appealing alternative to conventional computing architectures, as they potentially allow exploring the binary search space of NP-hard problems with limited resources and overheads. In this study, we consider the case of Boolean Satisfiability on small logic functions, with technological implementations based on Spin-Orbit Torque Magnetic Tunnel Junctions. We propose innovative circuit-level implementations of invertible logic architectures for an AND gate and a Full Adder, emphasizing the design constraints of such invertible logic operations. Simulation results demonstrate the feasibility of SOT-based implementations, and their robustness against process variations. The realistic implementation enables identifying the main power efficiency trade-offs.

求解NP困难问题的已知算法的效率受到传统冯·诺依曼体系结构的限制。随机神经元的循环网络是传统计算架构的一个有吸引力的替代方案，因为它们可能允许在有限的资源和开销下探索np困难问题的二进制搜索空间。在本研究中，我们考虑了小逻辑函数上布尔可满足的情况，并基于自旋轨道转矩磁隧道结的技术实现。我们提出了用于与门和全加法器的可逆逻辑架构的创新电路级实现，强调了这种可逆逻辑操作的设计约束。仿真结果证明了基于sot实现的可行性，以及它们对过程变化的鲁棒性。实际的实现可以识别主要的功率效率权衡。

引用次数: 0

Exploiting Routing Asymmetry for APUF Implementation in FPGA: A Proof-of-Concept 利用FPGA实现APUF的路由不对称:概念验证

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238578

Trishna Rajkumar

Implementing Arbiter PUF in an FPGA requires identical logic and symmetrical routing to ensure the delay differences are due to process variations. As the FPGA routing tools optimise for performance and not for symmetry, the FPGA CAD flow requires interventions like manual routing and the use of hard macros. These measures require a designer to work at a lower level of abstraction than RTL which can be tedious and error prone. Furthermore, they require an extensive knowledge of the FPGA fabric which may not be available owing to their proprietary nature. Considering these challenges, we investigate the possibility of an arbiter PUF implementation within the FPGA CAD flow by leveraging the routing asymmetry instead of eliminating it. Preliminary characterisation of a proof of concept APUF model demonstrated uniformity of 49.4 % and reliability of 96.3 %.

在FPGA中实现Arbiter PUF需要相同的逻辑和对称路由，以确保延迟差异是由于过程变化造成的。由于FPGA路由工具针对性能而非对称性进行优化，因此FPGA CAD流程需要手动路由和硬宏的使用等干预。这些措施要求设计师在较低的抽象层次上工作，而不是RTL，这可能是乏味且容易出错的。此外，它们需要广泛的FPGA结构知识，由于其专有性质，这些知识可能无法获得。考虑到这些挑战，我们通过利用路由不对称而不是消除它，研究了在FPGA CAD流中实现仲裁者PUF的可能性。初步表征的概念验证APUF模型的均匀性为49.4%，可靠性为96.3%。

引用次数: 0

Design Exploration of Dynamic Multi-Level Ternary Content-Addressable Memory Using Nanoelectromechanical Relays 基于纳米机电继电器的动态多级三元内容可寻址存储器设计探索

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238633

Taixin Li, Hongtao Zhong, Sumitha George, N. Vijaykrishnan, Liang Shi, Huazhong Yang, Xueqing Li

Multi-Level Ternary Content Addressable Memories (ML-TCAMs) are a type of TCAM that can calculate the hamming distance between the stored data and the input vector, which can be used to accelerate several specific applications. There have been several existing current-domain and charge-domain ML-TCAMs based on SRAMs and nonvolatile memories (NVMs). However, they fail to meet a good balance between area and computational accuracy tradeoffs.In this paper, for the first time, we explore the design of dynamic ML-TCAMs that achieve both high cell density and high accuracy, and propose DyLAN, the current-domain dynamic ML-TCAM using the 4-terminal nanoelectromechanical (NEM) relays. Specifically, combined with the nearly zero OFF-state leakage and stable ON-state current of the 4-terminal NEM relays, this paper proposes DyLAN-W with ultra-long retention time and DyLAN-S with ultra-low single refresh overhead and high density, respectively. Results show that DyLAN achieves up to 2.7 x and 4.9x area reduction compared with the 16T SRAM ML-TCAM and the charge-domain ML-TCAMs, respectively, and increases the few-shot learning accuracy by 13.7% (from 79.9% to 93.6%) on average compared with the state-of-the-art nonvolatile ML-TCAM, i.e., the 2FeFET ML-TCAM.

多级三元内容可寻址存储器(ML-TCAMs)是一种可以计算存储数据与输入向量之间的汉明距离的TCAM，它可以用来加速一些特定的应用。目前已有几种基于sram和非易失性存储器(nvm)的电流域和电荷域ml - tcam。然而，它们不能很好地平衡面积和计算精度之间的权衡。在本文中，我们首次探索了同时实现高单元密度和高精度的动态ML-TCAM的设计，并提出了DyLAN，一种采用4端纳米机电(NEM)继电器的电流域动态ML-TCAM。具体而言，结合4端NEM继电器近乎零的off状态泄漏和稳定的on状态电流，本文分别提出了超长保持时间的DyLAN-W和超低单次刷新开销的DyLAN-S和高密度的DyLAN-S。结果表明，与16T SRAM ML-TCAM和电荷域ML-TCAM相比，DyLAN的面积分别减少了2.7倍和4.9倍，并且与最先进的非易失性ML-TCAM(即2FeFET ML-TCAM)相比，少射学习精度平均提高了13.7%(从79.9%提高到93.6%)。

{"title":"Design Exploration of Dynamic Multi-Level Ternary Content-Addressable Memory Using Nanoelectromechanical Relays","authors":"Taixin Li, Hongtao Zhong, Sumitha George, N. Vijaykrishnan, Liang Shi, Huazhong Yang, Xueqing Li","doi":"10.1109/ISVLSI59464.2023.10238633","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238633","url":null,"abstract":"Multi-Level Ternary Content Addressable Memories (ML-TCAMs) are a type of TCAM that can calculate the hamming distance between the stored data and the input vector, which can be used to accelerate several specific applications. There have been several existing current-domain and charge-domain ML-TCAMs based on SRAMs and nonvolatile memories (NVMs). However, they fail to meet a good balance between area and computational accuracy tradeoffs.In this paper, for the first time, we explore the design of dynamic ML-TCAMs that achieve both high cell density and high accuracy, and propose DyLAN, the current-domain dynamic ML-TCAM using the 4-terminal nanoelectromechanical (NEM) relays. Specifically, combined with the nearly zero OFF-state leakage and stable ON-state current of the 4-terminal NEM relays, this paper proposes DyLAN-W with ultra-long retention time and DyLAN-S with ultra-low single refresh overhead and high density, respectively. Results show that DyLAN achieves up to 2.7 x and 4.9x area reduction compared with the 16T SRAM ML-TCAM and the charge-domain ML-TCAMs, respectively, and increases the few-shot learning accuracy by 13.7% (from 79.9% to 93.6%) on average compared with the state-of-the-art nonvolatile ML-TCAM, i.e., the 2FeFET ML-TCAM.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115488235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd 基于可分解Winograd的fpga重构CNN训练加速器

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238574

Hui Wang, Jinming Lu, Jun Lin, Zhongfeng Wang

Convolutional neural networks (CNNs) have been widely used in computer vision in recent years. However, the huge amount of computation involved in CNN training limits its application on embedded devices. To solve the dilemma, this paper proposes an FPGA-based reconfigurable CNN training accelerator. First, we explore the possibility of using the Winograd algorithm to accelerate convolutions. An input-aligned decomposable Winograd method is proposed that broadens the scope of the application of Winograd and simplifies the implementation of Winograd on a unified processing element. Second, we propose a reconfigurable training architecture consisting of a transposable Winograd processing element array that can perform different training phases with high parallelism under limited resource costs. A series of unified data transformation units are designed to support various Winograd operations. The hierarchical barrel shift networks work for flexible and complex data access without bank conflict. Evaluated on VGG16 and ResNet18, our method reduces multiplications up to $2.4times$ compared to conventional convolution. Additionally, our accelerator implemented on Alveo U200 achieves up to 918.57 GOPS in terms of throughput and shows a $3.18times$ improvement in resource efficiency over the prior art.

卷积神经网络(cnn)近年来在计算机视觉领域得到了广泛的应用。然而，CNN训练所涉及的巨大计算量限制了其在嵌入式设备上的应用。为了解决这一难题，本文提出了一种基于fpga的可重构CNN训练加速器。首先，我们探索了使用Winograd算法加速卷积的可能性。提出了一种输入对齐的可分解Winograd方法，拓宽了Winograd的应用范围，简化了Winograd在统一处理元素上的实现。其次，我们提出了一个可重构的训练架构，该架构由一个可转置的Winograd处理元素阵列组成，可以在有限的资源成本下以高并行度执行不同的训练阶段。一系列统一的数据转换单元被设计用来支持各种Winograd操作。分层桶移网络可以在没有银行冲突的情况下实现灵活、复杂的数据访问。在VGG16和ResNet18上进行评估，与传统卷积相比，我们的方法减少了高达2.4倍的乘法。此外，我们在Alveo U200上实现的加速器在吞吐量方面达到了918.57 GOPS，并且在资源效率方面比现有技术提高了3.18倍。

{"title":"An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd","authors":"Hui Wang, Jinming Lu, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI59464.2023.10238574","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238574","url":null,"abstract":"Convolutional neural networks (CNNs) have been widely used in computer vision in recent years. However, the huge amount of computation involved in CNN training limits its application on embedded devices. To solve the dilemma, this paper proposes an FPGA-based reconfigurable CNN training accelerator. First, we explore the possibility of using the Winograd algorithm to accelerate convolutions. An input-aligned decomposable Winograd method is proposed that broadens the scope of the application of Winograd and simplifies the implementation of Winograd on a unified processing element. Second, we propose a reconfigurable training architecture consisting of a transposable Winograd processing element array that can perform different training phases with high parallelism under limited resource costs. A series of unified data transformation units are designed to support various Winograd operations. The hierarchical barrel shift networks work for flexible and complex data access without bank conflict. Evaluated on VGG16 and ResNet18, our method reduces multiplications up to $2.4times$ compared to conventional convolution. Additionally, our accelerator implemented on Alveo U200 achieves up to 918.57 GOPS in terms of throughput and shows a $3.18times$ improvement in resource efficiency over the prior art.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124844022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ISVLSI 2023 Cover Page ISVLSI 2023封面

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/isvlsi59464.2023.10238525

引用次数: 0

LAT-UP: Exposing Layout-Level Analog Hardware Trojans Using Contactless Optical Probing latup:使用非接触式光学探测暴露布局级模拟硬件木马

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238545

Sajjad Parvin, Mehran Goli, Thilo Krachenfels, Shahin Tajik, Jean-Pierre Seifert, Frank Sill, R. Drechsler

The insertion of a Hardware Trojan (HT) into a chip after the in-house layout design is outsourced to a chip manufacturer for fabrication is a major concern, especially for mission-critical applications. While several HT detection methods have been developed based on side-channel analysis and physical measurements to overcome this problem, there exist stealthy analog HTs, i.e., capacitive and dopant-level HTs, which have negligible or even zero overhead on the chip. Thus, these stealthy HTs cannot be detected using the aforementioned methods. In this work, we propose a novel analytical approach to detect these Layout-level Analog Trojans (LAT). Our proposed method uses an extension of Optical Probing (OP) for LAT detection, namely, the Laser Logic State Imaging (LLSI) technique. In principle, to detect LATs using LLSI, we only need the golden design and not a golden chip, which is not typically available. As we take advantage of LLSI to detect HTs, our approach is non-invasive, less costly, and scalable to larger designs. We report experimental results on a malicious RISC-V to demonstrate the effectiveness of our approach in detecting LATs.

在内部布局设计外包给芯片制造商制造后，将硬件木马(HT)插入芯片是一个主要问题，特别是对于任务关键型应用程序。虽然已经开发了几种基于侧信道分析和物理测量的高温超导检测方法来克服这个问题，但仍然存在隐形模拟高温超导，即电容级和掺杂级高温超导，它们在芯片上的开销可以忽略不计，甚至为零。因此，使用上述方法无法检测到这些隐形ht。在这项工作中，我们提出了一种新的分析方法来检测这些布局级模拟木马(LAT)。我们提出的方法使用光学探测(OP)的扩展进行LAT检测，即激光逻辑状态成像(LLSI)技术。原则上，要使用LLSI检测lat，我们只需要黄金设计，而不需要黄金芯片，这通常是不可用的。由于我们利用LLSI来检测高温，我们的方法是非侵入性的，成本更低，并且可扩展到更大的设计。我们报告了恶意RISC-V的实验结果，以证明我们的方法在检测lat方面的有效性。

{"title":"LAT-UP: Exposing Layout-Level Analog Hardware Trojans Using Contactless Optical Probing","authors":"Sajjad Parvin, Mehran Goli, Thilo Krachenfels, Shahin Tajik, Jean-Pierre Seifert, Frank Sill, R. Drechsler","doi":"10.1109/ISVLSI59464.2023.10238545","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238545","url":null,"abstract":"The insertion of a Hardware Trojan (HT) into a chip after the in-house layout design is outsourced to a chip manufacturer for fabrication is a major concern, especially for mission-critical applications. While several HT detection methods have been developed based on side-channel analysis and physical measurements to overcome this problem, there exist stealthy analog HTs, i.e., capacitive and dopant-level HTs, which have negligible or even zero overhead on the chip. Thus, these stealthy HTs cannot be detected using the aforementioned methods. In this work, we propose a novel analytical approach to detect these Layout-level Analog Trojans (LAT). Our proposed method uses an extension of Optical Probing (OP) for LAT detection, namely, the Laser Logic State Imaging (LLSI) technique. In principle, to detect LATs using LLSI, we only need the golden design and not a golden chip, which is not typically available. As we take advantage of LLSI to detect HTs, our approach is non-invasive, less costly, and scalable to larger designs. We report experimental results on a malicious RISC-V to demonstrate the effectiveness of our approach in detecting LATs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129067296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated Learning with Spiking Neural Networks in Heterogeneous Systems 基于脉冲神经网络的异构系统联邦学习

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238618

Sadia Anjum Tumpa, Sonali Singh, Md Fahim Faysal Khan, M. Kandemir, N. Vijaykrishnan, Chita R. Das

With the advances in IoT and edge-computing, Federated Learning is ever more popular as it offers data privacy. Low-power spiking neural networks (SNN) are ideal candidates for local nodes in such federated setup. Most prior works assume that the participating nodes have uniform compute resources, which may not be practical. In this work, we propose a federated SNN learning framework for a realistic heterogeneous environment, consisting of nodes with diverse memory-compute capabilities through activation-checkpointing and time-skipping that offers ~$4times$ reduction in effective memory requirement for low-memory nodes while improving the accuracy upto 10% for non-independent and identically-distributed data.

随着物联网和边缘计算的进步，联邦学习越来越受欢迎，因为它提供了数据隐私。低功耗尖峰神经网络(SNN)是这种联合设置中局部节点的理想选择。大多数先前的工作假设参与节点具有统一的计算资源，这可能不实际。在这项工作中，我们为现实的异构环境提出了一个联合SNN学习框架，该框架由具有不同内存计算能力的节点组成，通过激活检查点和时间跳变，可以将低内存节点的有效内存需求降低约4倍，同时将非独立和相同分布的数据的准确性提高10%。

引用次数: 1

CWAHA: Cluster-Wise Approximation for Hardware implementation of Arithmetic functions 算法函数硬件实现的聚类逼近

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238526

Omkar G. Ratnaparkhi, M. Rao

The proposed work adopts clustering method to approximate and segment normalized non-linear functions towards realizing arithmetic units such as divider, square-root, squarer, and inverse-of-squarer. The novel implementation of utilizing K-Means Clustering algorithm towards building an nonlinear partitions ensures the possibility of realizing arithmetic units with differed error characteristics as per the designers demand. In this paper IEEE half precision floating point format (fp16) is used to implement and validate novel arithmetic units. Improvement in accuracy was established for arithmetic unit with higher partitions, and conversely advantage in hardware metrics is achieved with lesser partitions. Maximum silicon footprint saving of 60.97%, and power benefit of 66.70%, were achieved for the proposed approximate divider over state-of-the-art (SOTA) dividers. The proposed square-rooter showed maximum footprint savings of 55.12% when compared with SOTA design. Besides, the proposed arithmetic functions especially dividers and square-rooters showed accelerated performance when compared with the respective SOTA implementations. The proposed cluster-wise approximation for computing designs, were validated for two of the image processing applications including color quantization, and edge detection. A maximum of 38.84% improvement in PSNR was realized using the proposed square-rooter designed sobel edge detection algorithm over its counterpart SOTA designed edge detector.

本文采用聚类方法对归一化非线性函数进行近似和分段，实现除、平方根、平方、平方反比等算术单元。利用k均值聚类算法构建非线性分区的新实现，保证了根据设计者的要求实现具有不同误差特征的算术单元的可能性。本文采用IEEE半精度浮点格式(fp16)来实现和验证新的算术单元。使用较高的分区可以提高算术单元的精度，而使用较少的分区则可以获得硬件指标方面的优势。与最先进的(SOTA)分频器相比，所提出的近似分频器实现了60.97%的最大硅足迹节约和66.70%的功耗效益。与SOTA设计相比，所建议的方形屋顶显示出最大的足迹节约55.12%。此外，所提出的算术函数，特别是除法函数和平方根函数，与各自的SOTA实现相比，表现出更快的性能。在两个图像处理应用(包括颜色量化和边缘检测)中验证了所提出的计算设计的聚类逼近。与SOTA设计的边缘检测器相比，本文提出的square-rooter设计的sobel边缘检测算法的PSNR提高了38.84%。

{"title":"CWAHA: Cluster-Wise Approximation for Hardware implementation of Arithmetic functions","authors":"Omkar G. Ratnaparkhi, M. Rao","doi":"10.1109/ISVLSI59464.2023.10238526","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238526","url":null,"abstract":"The proposed work adopts clustering method to approximate and segment normalized non-linear functions towards realizing arithmetic units such as divider, square-root, squarer, and inverse-of-squarer. The novel implementation of utilizing K-Means Clustering algorithm towards building an nonlinear partitions ensures the possibility of realizing arithmetic units with differed error characteristics as per the designers demand. In this paper IEEE half precision floating point format (fp16) is used to implement and validate novel arithmetic units. Improvement in accuracy was established for arithmetic unit with higher partitions, and conversely advantage in hardware metrics is achieved with lesser partitions. Maximum silicon footprint saving of 60.97%, and power benefit of 66.70%, were achieved for the proposed approximate divider over state-of-the-art (SOTA) dividers. The proposed square-rooter showed maximum footprint savings of 55.12% when compared with SOTA design. Besides, the proposed arithmetic functions especially dividers and square-rooters showed accelerated performance when compared with the respective SOTA implementations. The proposed cluster-wise approximation for computing designs, were validated for two of the image processing applications including color quantization, and edge detection. A maximum of 38.84% improvement in PSNR was realized using the proposed square-rooter designed sobel edge detection algorithm over its counterpart SOTA designed edge detector.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115731820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Accelerator Design in High-Level Synthesis Using Approximate Logic Components 使用近似逻辑元件的高级合成中的高效加速器设计

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238558

Tiago Da Silva Almeida, Lucas Wanner

FPGA-based architectures have emerged as a versatile acceleration solution for various applications, aided by High-Level Synthesis (HLS) tools. For applications with some level of error resilience, the use of approximate logic components such as imprecise multipliers and adders can improve resource usage and energy efficiency. Nevertheless, these components must be carefully composed and combined to prevent error accumulation and to ensure that the application produces valid outputs. In this work, we explore approximate multiplier and adder designs used in Multiply-accumulate (MAC) operations for accelerators implemented in HLS, aiming to find combinations of components that can save power and resources while effectively mitigating errors in application outputs. We show that the best combinations of components can improve the Power Area Product (PAP) of a Sobel filter accelerator design by 36-49% compared to a precise design while limiting errors and maintaining an acceptable quality of results.

基于fpga的架构已经成为各种应用的通用加速解决方案，在高级综合(HLS)工具的帮助下。对于具有一定程度的错误恢复能力的应用程序，使用近似逻辑组件(如不精确乘法器和加法器)可以提高资源使用和能源效率。然而，必须小心地组合和组合这些组件，以防止错误累积，并确保应用程序产生有效的输出。在这项工作中，我们探索了在HLS中实现的加速器的乘法累积(MAC)操作中使用的近似乘法器和加法器设计，旨在找到可以节省功率和资源的组件组合，同时有效地减少应用输出中的错误。我们表明，与精确设计相比，元件的最佳组合可以将索贝尔滤波加速器设计的功率面积积(PAP)提高36-49%，同时限制误差并保持可接受的结果质量。

引用次数: 0

An Investigation into the Security of Register Allocation with Spilling and Splitting 带有溢出和分裂的寄存器分配的安全性研究

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238662

Priyanka Panigrahi, C. Karfa

Compiler optimization can be functionally correct but not secure. Register allocation (RA) is an essential optimization performed by a compiler. This paper analyzes the security threat of RA concerning information flow. We define the relative security between two programs with respect to information flow. According to our definition of relative security, we show that RA is secure when there is no splitting and spilling into memory. We also show that register allocation with splitting is also secure based on our attack model. Then, we show that RA can lead to information leaks during spilling as it introduces new leaks through memory. Further, our experimental results on various benchmarks show that RA in LLVM is actually leaky. To address this vulnerability, we propose a secure RA approach in LLVM that mitigates the risk of new leaks during spilling. Our experimental evaluation on various benchmarks shows the effectiveness of our proposed approach.

编译器优化可能在功能上是正确的，但并不安全。寄存器分配(RA)是编译器执行的一项基本优化。本文从信息流的角度分析了RA的安全威胁。我们从信息流的角度定义了两个程序之间的相对安全性。根据我们对相对安全性的定义，我们表明，当没有分裂和溢出到内存中时，RA是安全的。基于我们的攻击模型，我们还证明了使用分割的寄存器分配也是安全的。然后，我们将展示RA在溢出期间可能导致信息泄漏，因为它通过内存引入了新的泄漏。此外，我们在各种基准测试上的实验结果表明，LLVM中的RA实际上是泄漏的。为了解决这一漏洞，我们在LLVM中提出了一种安全的RA方法，以减轻溢出期间新泄漏的风险。我们对各种基准的实验评估表明了我们提出的方法的有效性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀