首页 > 最新文献

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文 中文
Robustness and Power Efficiency in Spin-Orbit Torque-Based Probabilistic Logic Circuits 基于自旋轨道转矩的概率逻辑电路的鲁棒性和功率效率
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238682
Kamal Danouchi, G. Prenat, Philippe Talatchian, Louis Hutin, Lorena Anghel
The efficiency of known algorithms for solving NP- hard problems is constrained by the limitations of conventional von Neumann architectures. Recurrent networks of stochastic neurons are an appealing alternative to conventional computing architectures, as they potentially allow exploring the binary search space of NP-hard problems with limited resources and overheads. In this study, we consider the case of Boolean Satisfiability on small logic functions, with technological implementations based on Spin-Orbit Torque Magnetic Tunnel Junctions. We propose innovative circuit-level implementations of invertible logic architectures for an AND gate and a Full Adder, emphasizing the design constraints of such invertible logic operations. Simulation results demonstrate the feasibility of SOT-based implementations, and their robustness against process variations. The realistic implementation enables identifying the main power efficiency trade-offs.
求解NP困难问题的已知算法的效率受到传统冯·诺依曼体系结构的限制。随机神经元的循环网络是传统计算架构的一个有吸引力的替代方案,因为它们可能允许在有限的资源和开销下探索np困难问题的二进制搜索空间。在本研究中,我们考虑了小逻辑函数上布尔可满足的情况,并基于自旋轨道转矩磁隧道结的技术实现。我们提出了用于与门和全加法器的可逆逻辑架构的创新电路级实现,强调了这种可逆逻辑操作的设计约束。仿真结果证明了基于sot实现的可行性,以及它们对过程变化的鲁棒性。实际的实现可以识别主要的功率效率权衡。
{"title":"Robustness and Power Efficiency in Spin-Orbit Torque-Based Probabilistic Logic Circuits","authors":"Kamal Danouchi, G. Prenat, Philippe Talatchian, Louis Hutin, Lorena Anghel","doi":"10.1109/ISVLSI59464.2023.10238682","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238682","url":null,"abstract":"The efficiency of known algorithms for solving NP- hard problems is constrained by the limitations of conventional von Neumann architectures. Recurrent networks of stochastic neurons are an appealing alternative to conventional computing architectures, as they potentially allow exploring the binary search space of NP-hard problems with limited resources and overheads. In this study, we consider the case of Boolean Satisfiability on small logic functions, with technological implementations based on Spin-Orbit Torque Magnetic Tunnel Junctions. We propose innovative circuit-level implementations of invertible logic architectures for an AND gate and a Full Adder, emphasizing the design constraints of such invertible logic operations. Simulation results demonstrate the feasibility of SOT-based implementations, and their robustness against process variations. The realistic implementation enables identifying the main power efficiency trade-offs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131963713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting Trojan Insertion Techniques for Post-Silicon Trojan Detection Evaluation 后硅特洛伊木马检测评估中的木马插入技术重述
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238669
Vedika Saravanan, Mohammad Walid Charrwi, S. Saeed
The distributed supply chain of the semiconductor industry has promoted several attacks at different stages of Integrated Circuit (IC) design and manufacturing. Hardware Trojans (HTs) injected into the IC by a malicious foundry can lead to catastrophic consequences. Recent research efforts have shown the power of reinforcement learning not only in detecting HTs but also bypassing these detection mechanisms. However, they do not take into account the detailed circuit structural information. In this paper, we explore different new strategies for triggering HTs to evaluate the most recently proposed post-silicon HT detection techniques. Specifically, we develop different automated and scalable rare net selection techniques to construct HT trigger conditions informed by the circuit structure. We evaluate our approaches for different benchmarks against the most recently proposed reinforcement learning and other state-of-the-art logic testing HT detection techniques.
半导体行业的分布式供应链在集成电路(IC)设计和制造的不同阶段引发了几种攻击。硬件木马(ht)被恶意铸造厂注入到集成电路中可能导致灾难性的后果。最近的研究表明,强化学习不仅可以检测高温,还可以绕过这些检测机制。然而,它们没有考虑到电路结构的详细信息。在本文中,我们探索了触发高温的不同新策略,以评估最近提出的后硅高温检测技术。具体来说,我们开发了不同的自动化和可扩展的稀有网络选择技术来构建由电路结构通知的HT触发条件。针对最近提出的强化学习和其他最先进的逻辑测试HT检测技术,我们评估了不同基准的方法。
{"title":"Revisiting Trojan Insertion Techniques for Post-Silicon Trojan Detection Evaluation","authors":"Vedika Saravanan, Mohammad Walid Charrwi, S. Saeed","doi":"10.1109/ISVLSI59464.2023.10238669","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238669","url":null,"abstract":"The distributed supply chain of the semiconductor industry has promoted several attacks at different stages of Integrated Circuit (IC) design and manufacturing. Hardware Trojans (HTs) injected into the IC by a malicious foundry can lead to catastrophic consequences. Recent research efforts have shown the power of reinforcement learning not only in detecting HTs but also bypassing these detection mechanisms. However, they do not take into account the detailed circuit structural information. In this paper, we explore different new strategies for triggering HTs to evaluate the most recently proposed post-silicon HT detection techniques. Specifically, we develop different automated and scalable rare net selection techniques to construct HT trigger conditions informed by the circuit structure. We evaluate our approaches for different benchmarks against the most recently proposed reinforcement learning and other state-of-the-art logic testing HT detection techniques.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114503350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design Exploration of Dynamic Multi-Level Ternary Content-Addressable Memory Using Nanoelectromechanical Relays 基于纳米机电继电器的动态多级三元内容可寻址存储器设计探索
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238633
Taixin Li, Hongtao Zhong, Sumitha George, N. Vijaykrishnan, Liang Shi, Huazhong Yang, Xueqing Li
Multi-Level Ternary Content Addressable Memories (ML-TCAMs) are a type of TCAM that can calculate the hamming distance between the stored data and the input vector, which can be used to accelerate several specific applications. There have been several existing current-domain and charge-domain ML-TCAMs based on SRAMs and nonvolatile memories (NVMs). However, they fail to meet a good balance between area and computational accuracy tradeoffs.In this paper, for the first time, we explore the design of dynamic ML-TCAMs that achieve both high cell density and high accuracy, and propose DyLAN, the current-domain dynamic ML-TCAM using the 4-terminal nanoelectromechanical (NEM) relays. Specifically, combined with the nearly zero OFF-state leakage and stable ON-state current of the 4-terminal NEM relays, this paper proposes DyLAN-W with ultra-long retention time and DyLAN-S with ultra-low single refresh overhead and high density, respectively. Results show that DyLAN achieves up to 2.7 x and 4.9x area reduction compared with the 16T SRAM ML-TCAM and the charge-domain ML-TCAMs, respectively, and increases the few-shot learning accuracy by 13.7% (from 79.9% to 93.6%) on average compared with the state-of-the-art nonvolatile ML-TCAM, i.e., the 2FeFET ML-TCAM.
多级三元内容可寻址存储器(ML-TCAMs)是一种可以计算存储数据与输入向量之间的汉明距离的TCAM,它可以用来加速一些特定的应用。目前已有几种基于sram和非易失性存储器(nvm)的电流域和电荷域ml - tcam。然而,它们不能很好地平衡面积和计算精度之间的权衡。在本文中,我们首次探索了同时实现高单元密度和高精度的动态ML-TCAM的设计,并提出了DyLAN,一种采用4端纳米机电(NEM)继电器的电流域动态ML-TCAM。具体而言,结合4端NEM继电器近乎零的off状态泄漏和稳定的on状态电流,本文分别提出了超长保持时间的DyLAN-W和超低单次刷新开销的DyLAN-S和高密度的DyLAN-S。结果表明,与16T SRAM ML-TCAM和电荷域ML-TCAM相比,DyLAN的面积分别减少了2.7倍和4.9倍,并且与最先进的非易失性ML-TCAM(即2FeFET ML-TCAM)相比,少射学习精度平均提高了13.7%(从79.9%提高到93.6%)。
{"title":"Design Exploration of Dynamic Multi-Level Ternary Content-Addressable Memory Using Nanoelectromechanical Relays","authors":"Taixin Li, Hongtao Zhong, Sumitha George, N. Vijaykrishnan, Liang Shi, Huazhong Yang, Xueqing Li","doi":"10.1109/ISVLSI59464.2023.10238633","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238633","url":null,"abstract":"Multi-Level Ternary Content Addressable Memories (ML-TCAMs) are a type of TCAM that can calculate the hamming distance between the stored data and the input vector, which can be used to accelerate several specific applications. There have been several existing current-domain and charge-domain ML-TCAMs based on SRAMs and nonvolatile memories (NVMs). However, they fail to meet a good balance between area and computational accuracy tradeoffs.In this paper, for the first time, we explore the design of dynamic ML-TCAMs that achieve both high cell density and high accuracy, and propose DyLAN, the current-domain dynamic ML-TCAM using the 4-terminal nanoelectromechanical (NEM) relays. Specifically, combined with the nearly zero OFF-state leakage and stable ON-state current of the 4-terminal NEM relays, this paper proposes DyLAN-W with ultra-long retention time and DyLAN-S with ultra-low single refresh overhead and high density, respectively. Results show that DyLAN achieves up to 2.7 x and 4.9x area reduction compared with the 16T SRAM ML-TCAM and the charge-domain ML-TCAMs, respectively, and increases the few-shot learning accuracy by 13.7% (from 79.9% to 93.6%) on average compared with the state-of-the-art nonvolatile ML-TCAM, i.e., the 2FeFET ML-TCAM.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115488235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISVLSI 2023 Cover Page ISVLSI 2023封面
Pub Date : 2023-06-20 DOI: 10.1109/isvlsi59464.2023.10238525
{"title":"ISVLSI 2023 Cover Page","authors":"","doi":"10.1109/isvlsi59464.2023.10238525","DOIUrl":"https://doi.org/10.1109/isvlsi59464.2023.10238525","url":null,"abstract":"","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116268250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd 基于可分解Winograd的fpga重构CNN训练加速器
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238574
Hui Wang, Jinming Lu, Jun Lin, Zhongfeng Wang
Convolutional neural networks (CNNs) have been widely used in computer vision in recent years. However, the huge amount of computation involved in CNN training limits its application on embedded devices. To solve the dilemma, this paper proposes an FPGA-based reconfigurable CNN training accelerator. First, we explore the possibility of using the Winograd algorithm to accelerate convolutions. An input-aligned decomposable Winograd method is proposed that broadens the scope of the application of Winograd and simplifies the implementation of Winograd on a unified processing element. Second, we propose a reconfigurable training architecture consisting of a transposable Winograd processing element array that can perform different training phases with high parallelism under limited resource costs. A series of unified data transformation units are designed to support various Winograd operations. The hierarchical barrel shift networks work for flexible and complex data access without bank conflict. Evaluated on VGG16 and ResNet18, our method reduces multiplications up to $2.4times$ compared to conventional convolution. Additionally, our accelerator implemented on Alveo U200 achieves up to 918.57 GOPS in terms of throughput and shows a $3.18times$ improvement in resource efficiency over the prior art.
卷积神经网络(cnn)近年来在计算机视觉领域得到了广泛的应用。然而,CNN训练所涉及的巨大计算量限制了其在嵌入式设备上的应用。为了解决这一难题,本文提出了一种基于fpga的可重构CNN训练加速器。首先,我们探索了使用Winograd算法加速卷积的可能性。提出了一种输入对齐的可分解Winograd方法,拓宽了Winograd的应用范围,简化了Winograd在统一处理元素上的实现。其次,我们提出了一个可重构的训练架构,该架构由一个可转置的Winograd处理元素阵列组成,可以在有限的资源成本下以高并行度执行不同的训练阶段。一系列统一的数据转换单元被设计用来支持各种Winograd操作。分层桶移网络可以在没有银行冲突的情况下实现灵活、复杂的数据访问。在VGG16和ResNet18上进行评估,与传统卷积相比,我们的方法减少了高达2.4倍的乘法。此外,我们在Alveo U200上实现的加速器在吞吐量方面达到了918.57 GOPS,并且在资源效率方面比现有技术提高了3.18倍。
{"title":"An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd","authors":"Hui Wang, Jinming Lu, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI59464.2023.10238574","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238574","url":null,"abstract":"Convolutional neural networks (CNNs) have been widely used in computer vision in recent years. However, the huge amount of computation involved in CNN training limits its application on embedded devices. To solve the dilemma, this paper proposes an FPGA-based reconfigurable CNN training accelerator. First, we explore the possibility of using the Winograd algorithm to accelerate convolutions. An input-aligned decomposable Winograd method is proposed that broadens the scope of the application of Winograd and simplifies the implementation of Winograd on a unified processing element. Second, we propose a reconfigurable training architecture consisting of a transposable Winograd processing element array that can perform different training phases with high parallelism under limited resource costs. A series of unified data transformation units are designed to support various Winograd operations. The hierarchical barrel shift networks work for flexible and complex data access without bank conflict. Evaluated on VGG16 and ResNet18, our method reduces multiplications up to $2.4times$ compared to conventional convolution. Additionally, our accelerator implemented on Alveo U200 achieves up to 918.57 GOPS in terms of throughput and shows a $3.18times$ improvement in resource efficiency over the prior art.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124844022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LAT-UP: Exposing Layout-Level Analog Hardware Trojans Using Contactless Optical Probing latup:使用非接触式光学探测暴露布局级模拟硬件木马
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238545
Sajjad Parvin, Mehran Goli, Thilo Krachenfels, Shahin Tajik, Jean-Pierre Seifert, Frank Sill, R. Drechsler
The insertion of a Hardware Trojan (HT) into a chip after the in-house layout design is outsourced to a chip manufacturer for fabrication is a major concern, especially for mission-critical applications. While several HT detection methods have been developed based on side-channel analysis and physical measurements to overcome this problem, there exist stealthy analog HTs, i.e., capacitive and dopant-level HTs, which have negligible or even zero overhead on the chip. Thus, these stealthy HTs cannot be detected using the aforementioned methods. In this work, we propose a novel analytical approach to detect these Layout-level Analog Trojans (LAT). Our proposed method uses an extension of Optical Probing (OP) for LAT detection, namely, the Laser Logic State Imaging (LLSI) technique. In principle, to detect LATs using LLSI, we only need the golden design and not a golden chip, which is not typically available. As we take advantage of LLSI to detect HTs, our approach is non-invasive, less costly, and scalable to larger designs. We report experimental results on a malicious RISC-V to demonstrate the effectiveness of our approach in detecting LATs.
在内部布局设计外包给芯片制造商制造后,将硬件木马(HT)插入芯片是一个主要问题,特别是对于任务关键型应用程序。虽然已经开发了几种基于侧信道分析和物理测量的高温超导检测方法来克服这个问题,但仍然存在隐形模拟高温超导,即电容级和掺杂级高温超导,它们在芯片上的开销可以忽略不计,甚至为零。因此,使用上述方法无法检测到这些隐形ht。在这项工作中,我们提出了一种新的分析方法来检测这些布局级模拟木马(LAT)。我们提出的方法使用光学探测(OP)的扩展进行LAT检测,即激光逻辑状态成像(LLSI)技术。原则上,要使用LLSI检测lat,我们只需要黄金设计,而不需要黄金芯片,这通常是不可用的。由于我们利用LLSI来检测高温,我们的方法是非侵入性的,成本更低,并且可扩展到更大的设计。我们报告了恶意RISC-V的实验结果,以证明我们的方法在检测lat方面的有效性。
{"title":"LAT-UP: Exposing Layout-Level Analog Hardware Trojans Using Contactless Optical Probing","authors":"Sajjad Parvin, Mehran Goli, Thilo Krachenfels, Shahin Tajik, Jean-Pierre Seifert, Frank Sill, R. Drechsler","doi":"10.1109/ISVLSI59464.2023.10238545","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238545","url":null,"abstract":"The insertion of a Hardware Trojan (HT) into a chip after the in-house layout design is outsourced to a chip manufacturer for fabrication is a major concern, especially for mission-critical applications. While several HT detection methods have been developed based on side-channel analysis and physical measurements to overcome this problem, there exist stealthy analog HTs, i.e., capacitive and dopant-level HTs, which have negligible or even zero overhead on the chip. Thus, these stealthy HTs cannot be detected using the aforementioned methods. In this work, we propose a novel analytical approach to detect these Layout-level Analog Trojans (LAT). Our proposed method uses an extension of Optical Probing (OP) for LAT detection, namely, the Laser Logic State Imaging (LLSI) technique. In principle, to detect LATs using LLSI, we only need the golden design and not a golden chip, which is not typically available. As we take advantage of LLSI to detect HTs, our approach is non-invasive, less costly, and scalable to larger designs. We report experimental results on a malicious RISC-V to demonstrate the effectiveness of our approach in detecting LATs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129067296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning with Spiking Neural Networks in Heterogeneous Systems 基于脉冲神经网络的异构系统联邦学习
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238618
Sadia Anjum Tumpa, Sonali Singh, Md Fahim Faysal Khan, M. Kandemir, N. Vijaykrishnan, Chita R. Das
With the advances in IoT and edge-computing, Federated Learning is ever more popular as it offers data privacy. Low-power spiking neural networks (SNN) are ideal candidates for local nodes in such federated setup. Most prior works assume that the participating nodes have uniform compute resources, which may not be practical. In this work, we propose a federated SNN learning framework for a realistic heterogeneous environment, consisting of nodes with diverse memory-compute capabilities through activation-checkpointing and time-skipping that offers ~$4times$ reduction in effective memory requirement for low-memory nodes while improving the accuracy upto 10% for non-independent and identically-distributed data.
随着物联网和边缘计算的进步,联邦学习越来越受欢迎,因为它提供了数据隐私。低功耗尖峰神经网络(SNN)是这种联合设置中局部节点的理想选择。大多数先前的工作假设参与节点具有统一的计算资源,这可能不实际。在这项工作中,我们为现实的异构环境提出了一个联合SNN学习框架,该框架由具有不同内存计算能力的节点组成,通过激活检查点和时间跳变,可以将低内存节点的有效内存需求降低约4倍,同时将非独立和相同分布的数据的准确性提高10%。
{"title":"Federated Learning with Spiking Neural Networks in Heterogeneous Systems","authors":"Sadia Anjum Tumpa, Sonali Singh, Md Fahim Faysal Khan, M. Kandemir, N. Vijaykrishnan, Chita R. Das","doi":"10.1109/ISVLSI59464.2023.10238618","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238618","url":null,"abstract":"With the advances in IoT and edge-computing, Federated Learning is ever more popular as it offers data privacy. Low-power spiking neural networks (SNN) are ideal candidates for local nodes in such federated setup. Most prior works assume that the participating nodes have uniform compute resources, which may not be practical. In this work, we propose a federated SNN learning framework for a realistic heterogeneous environment, consisting of nodes with diverse memory-compute capabilities through activation-checkpointing and time-skipping that offers ~$4times$ reduction in effective memory requirement for low-memory nodes while improving the accuracy upto 10% for non-independent and identically-distributed data.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115412850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CWAHA: Cluster-Wise Approximation for Hardware implementation of Arithmetic functions 算法函数硬件实现的聚类逼近
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238526
Omkar G. Ratnaparkhi, M. Rao
The proposed work adopts clustering method to approximate and segment normalized non-linear functions towards realizing arithmetic units such as divider, square-root, squarer, and inverse-of-squarer. The novel implementation of utilizing K-Means Clustering algorithm towards building an nonlinear partitions ensures the possibility of realizing arithmetic units with differed error characteristics as per the designers demand. In this paper IEEE half precision floating point format (fp16) is used to implement and validate novel arithmetic units. Improvement in accuracy was established for arithmetic unit with higher partitions, and conversely advantage in hardware metrics is achieved with lesser partitions. Maximum silicon footprint saving of 60.97%, and power benefit of 66.70%, were achieved for the proposed approximate divider over state-of-the-art (SOTA) dividers. The proposed square-rooter showed maximum footprint savings of 55.12% when compared with SOTA design. Besides, the proposed arithmetic functions especially dividers and square-rooters showed accelerated performance when compared with the respective SOTA implementations. The proposed cluster-wise approximation for computing designs, were validated for two of the image processing applications including color quantization, and edge detection. A maximum of 38.84% improvement in PSNR was realized using the proposed square-rooter designed sobel edge detection algorithm over its counterpart SOTA designed edge detector.
本文采用聚类方法对归一化非线性函数进行近似和分段,实现除、平方根、平方、平方反比等算术单元。利用k均值聚类算法构建非线性分区的新实现,保证了根据设计者的要求实现具有不同误差特征的算术单元的可能性。本文采用IEEE半精度浮点格式(fp16)来实现和验证新的算术单元。使用较高的分区可以提高算术单元的精度,而使用较少的分区则可以获得硬件指标方面的优势。与最先进的(SOTA)分频器相比,所提出的近似分频器实现了60.97%的最大硅足迹节约和66.70%的功耗效益。与SOTA设计相比,所建议的方形屋顶显示出最大的足迹节约55.12%。此外,所提出的算术函数,特别是除法函数和平方根函数,与各自的SOTA实现相比,表现出更快的性能。在两个图像处理应用(包括颜色量化和边缘检测)中验证了所提出的计算设计的聚类逼近。与SOTA设计的边缘检测器相比,本文提出的square-rooter设计的sobel边缘检测算法的PSNR提高了38.84%。
{"title":"CWAHA: Cluster-Wise Approximation for Hardware implementation of Arithmetic functions","authors":"Omkar G. Ratnaparkhi, M. Rao","doi":"10.1109/ISVLSI59464.2023.10238526","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238526","url":null,"abstract":"The proposed work adopts clustering method to approximate and segment normalized non-linear functions towards realizing arithmetic units such as divider, square-root, squarer, and inverse-of-squarer. The novel implementation of utilizing K-Means Clustering algorithm towards building an nonlinear partitions ensures the possibility of realizing arithmetic units with differed error characteristics as per the designers demand. In this paper IEEE half precision floating point format (fp16) is used to implement and validate novel arithmetic units. Improvement in accuracy was established for arithmetic unit with higher partitions, and conversely advantage in hardware metrics is achieved with lesser partitions. Maximum silicon footprint saving of 60.97%, and power benefit of 66.70%, were achieved for the proposed approximate divider over state-of-the-art (SOTA) dividers. The proposed square-rooter showed maximum footprint savings of 55.12% when compared with SOTA design. Besides, the proposed arithmetic functions especially dividers and square-rooters showed accelerated performance when compared with the respective SOTA implementations. The proposed cluster-wise approximation for computing designs, were validated for two of the image processing applications including color quantization, and edge detection. A maximum of 38.84% improvement in PSNR was realized using the proposed square-rooter designed sobel edge detection algorithm over its counterpart SOTA designed edge detector.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115731820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Accelerator Design in High-Level Synthesis Using Approximate Logic Components 使用近似逻辑元件的高级合成中的高效加速器设计
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238558
Tiago Da Silva Almeida, Lucas Wanner
FPGA-based architectures have emerged as a versatile acceleration solution for various applications, aided by High-Level Synthesis (HLS) tools. For applications with some level of error resilience, the use of approximate logic components such as imprecise multipliers and adders can improve resource usage and energy efficiency. Nevertheless, these components must be carefully composed and combined to prevent error accumulation and to ensure that the application produces valid outputs. In this work, we explore approximate multiplier and adder designs used in Multiply-accumulate (MAC) operations for accelerators implemented in HLS, aiming to find combinations of components that can save power and resources while effectively mitigating errors in application outputs. We show that the best combinations of components can improve the Power Area Product (PAP) of a Sobel filter accelerator design by 36-49% compared to a precise design while limiting errors and maintaining an acceptable quality of results.
基于fpga的架构已经成为各种应用的通用加速解决方案,在高级综合(HLS)工具的帮助下。对于具有一定程度的错误恢复能力的应用程序,使用近似逻辑组件(如不精确乘法器和加法器)可以提高资源使用和能源效率。然而,必须小心地组合和组合这些组件,以防止错误累积,并确保应用程序产生有效的输出。在这项工作中,我们探索了在HLS中实现的加速器的乘法累积(MAC)操作中使用的近似乘法器和加法器设计,旨在找到可以节省功率和资源的组件组合,同时有效地减少应用输出中的错误。我们表明,与精确设计相比,元件的最佳组合可以将索贝尔滤波加速器设计的功率面积积(PAP)提高36-49%,同时限制误差并保持可接受的结果质量。
{"title":"Efficient Accelerator Design in High-Level Synthesis Using Approximate Logic Components","authors":"Tiago Da Silva Almeida, Lucas Wanner","doi":"10.1109/ISVLSI59464.2023.10238558","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238558","url":null,"abstract":"FPGA-based architectures have emerged as a versatile acceleration solution for various applications, aided by High-Level Synthesis (HLS) tools. For applications with some level of error resilience, the use of approximate logic components such as imprecise multipliers and adders can improve resource usage and energy efficiency. Nevertheless, these components must be carefully composed and combined to prevent error accumulation and to ensure that the application produces valid outputs. In this work, we explore approximate multiplier and adder designs used in Multiply-accumulate (MAC) operations for accelerators implemented in HLS, aiming to find combinations of components that can save power and resources while effectively mitigating errors in application outputs. We show that the best combinations of components can improve the Power Area Product (PAP) of a Sobel filter accelerator design by 36-49% compared to a precise design while limiting errors and maintaining an acceptable quality of results.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115839328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Investigation into the Security of Register Allocation with Spilling and Splitting 带有溢出和分裂的寄存器分配的安全性研究
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238662
Priyanka Panigrahi, C. Karfa
Compiler optimization can be functionally correct but not secure. Register allocation (RA) is an essential optimization performed by a compiler. This paper analyzes the security threat of RA concerning information flow. We define the relative security between two programs with respect to information flow. According to our definition of relative security, we show that RA is secure when there is no splitting and spilling into memory. We also show that register allocation with splitting is also secure based on our attack model. Then, we show that RA can lead to information leaks during spilling as it introduces new leaks through memory. Further, our experimental results on various benchmarks show that RA in LLVM is actually leaky. To address this vulnerability, we propose a secure RA approach in LLVM that mitigates the risk of new leaks during spilling. Our experimental evaluation on various benchmarks shows the effectiveness of our proposed approach.
编译器优化可能在功能上是正确的,但并不安全。寄存器分配(RA)是编译器执行的一项基本优化。本文从信息流的角度分析了RA的安全威胁。我们从信息流的角度定义了两个程序之间的相对安全性。根据我们对相对安全性的定义,我们表明,当没有分裂和溢出到内存中时,RA是安全的。基于我们的攻击模型,我们还证明了使用分割的寄存器分配也是安全的。然后,我们将展示RA在溢出期间可能导致信息泄漏,因为它通过内存引入了新的泄漏。此外,我们在各种基准测试上的实验结果表明,LLVM中的RA实际上是泄漏的。为了解决这一漏洞,我们在LLVM中提出了一种安全的RA方法,以减轻溢出期间新泄漏的风险。我们对各种基准的实验评估表明了我们提出的方法的有效性。
{"title":"An Investigation into the Security of Register Allocation with Spilling and Splitting","authors":"Priyanka Panigrahi, C. Karfa","doi":"10.1109/ISVLSI59464.2023.10238662","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238662","url":null,"abstract":"Compiler optimization can be functionally correct but not secure. Register allocation (RA) is an essential optimization performed by a compiler. This paper analyzes the security threat of RA concerning information flow. We define the relative security between two programs with respect to information flow. According to our definition of relative security, we show that RA is secure when there is no splitting and spilling into memory. We also show that register allocation with splitting is also secure based on our attack model. Then, we show that RA can lead to information leaks during spilling as it introduces new leaks through memory. Further, our experimental results on various benchmarks show that RA in LLVM is actually leaky. To address this vulnerability, we propose a secure RA approach in LLVM that mitigates the risk of new leaks during spilling. Our experimental evaluation on various benchmarks shows the effectiveness of our proposed approach.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116512278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1