首页 > 最新文献

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文 中文
Signature Driven Post-Manufacture Testing and Tuning of RRAM Spiking Neural Networks for Yield Recovery 用于良品率恢复的 RRAM 尖峰神经网络的特征驱动制造后测试与调谐
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473874
Anurup Saha, C. Amarnath, Kwondo Ma, Abhijit Chatterjee
Resistive random access Memory (RRAM) based spiking neural networks (SNN) are becoming increasingly attractive for pervasive energy-efficient classification tasks. However, such networks suffer from degradation of performance (as determined by classification accuracy) due to the effects of process variations on fabricated RRAM devices resulting in loss of manufacturing yield. To address such yield loss, a two-step approach is developed. First, an alternative test framework is used to predict the performance of fabricated RRAM based SNNs using the SNN response to a small subset of images from the test image dataset, called the SNN response signature (to minimize test cost). This diagnoses those SNNs that need to be performance-tuned for yield recovery. Next, SNN tuning is performed by modulating the spiking thresholds of the SNN neurons on a layer-by-layer basis using a trained regressor that maps the SNN response signature to the optimal spiking threshold values during tuning. The optimal spiking threshold values are determined by an off-line optimization algorithm. Experiments show that the proposed framework can reduce the number of out-of-spec SNN devices by up to 54% and improve yield by as much as 8.6%.
基于电阻式随机存取存储器(RRAM)的尖峰神经网络(SNN)对普遍的高能效分类任务越来越有吸引力。然而,由于制造 RRAM 器件时工艺变化的影响,此类网络的性能(由分类准确性决定)会下降,导致制造良率损失。为了解决这种产量损失问题,我们开发了一种分两步走的方法。首先,使用另一种测试框架,利用 SNN 对测试图像数据集中一小部分图像的响应(称为 SNN 响应特征,以最大限度地降低测试成本)来预测基于制造的 RRAM SNN 的性能。这样就能诊断出哪些 SNN 需要进行性能调整以恢复良率。接下来,SNN 的调整是通过逐层调制 SNN 神经元的尖峰阈值来进行的,使用训练有素的回归器将 SNN 响应特征映射到调整期间的最佳尖峰阈值。最佳尖峰阈值由离线优化算法确定。实验表明,所提出的框架可将不合规格的 SNN 器件数量减少 54%,产量提高 8.6%。
{"title":"Signature Driven Post-Manufacture Testing and Tuning of RRAM Spiking Neural Networks for Yield Recovery","authors":"Anurup Saha, C. Amarnath, Kwondo Ma, Abhijit Chatterjee","doi":"10.1109/ASP-DAC58780.2024.10473874","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473874","url":null,"abstract":"Resistive random access Memory (RRAM) based spiking neural networks (SNN) are becoming increasingly attractive for pervasive energy-efficient classification tasks. However, such networks suffer from degradation of performance (as determined by classification accuracy) due to the effects of process variations on fabricated RRAM devices resulting in loss of manufacturing yield. To address such yield loss, a two-step approach is developed. First, an alternative test framework is used to predict the performance of fabricated RRAM based SNNs using the SNN response to a small subset of images from the test image dataset, called the SNN response signature (to minimize test cost). This diagnoses those SNNs that need to be performance-tuned for yield recovery. Next, SNN tuning is performed by modulating the spiking thresholds of the SNN neurons on a layer-by-layer basis using a trained regressor that maps the SNN response signature to the optimal spiking threshold values during tuning. The optimal spiking threshold values are determined by an off-line optimization algorithm. Experiments show that the proposed framework can reduce the number of out-of-spec SNN devices by up to 54% and improve yield by as much as 8.6%.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"30 3","pages":"740-745"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Optimization-aware Pre-Routing Timing Prediction Framework Based on Heterogeneous Graph Learning 基于异构图学习的优化感知预路由定时预测框架
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473937
Guoqing He, Wenjie Ding, Yuyang Ye, Xu Cheng, Qianqian Song, Peng Cao
Accurate and efficient pre-routing timing estimation is particularly crucial in timing-driven placement, as design iterations caused by timing divergence are time-consuming. However, existing machine learning prediction models overlook the impact of timing optimization techniques during routing stage, such as adjusting gate sizes or swapping threshold voltage types to fix routing-induced timing violations. In this work, an optimization-aware pre-routing timing prediction framework based on heterogeneous graph learning is proposed to calibrate the timing changes introduced by wire parasitic and optimization techniques. The path embedding generated by the proposed framework fuses learned local information from graph neural network and global information from transformer network to perform accurate endpoint arrival time prediction. Experimental results demonstrate that the proposed framework achieves an average accuracy improvement of 0.10 in terms of R2 score on testing designs and brings average runtime acceleration of three orders of magnitude compared with the design flow.
准确高效的布线前时序估计在时序驱动布局中尤为重要,因为时序偏差导致的设计迭代非常耗时。然而,现有的机器学习预测模型忽略了布线阶段时序优化技术的影响,例如调整栅极尺寸或交换阈值电压类型,以解决布线引起的时序违规问题。在这项工作中,提出了一种基于异构图学习的优化感知预路由时序预测框架,以校准导线寄生和优化技术带来的时序变化。所提框架生成的路径嵌入融合了图神经网络的局部信息和变压器网络的全局信息,能准确预测端点到达时间。实验结果表明,与设计流程相比,拟议框架在测试设计的 R2 分数方面实现了 0.10 的平均精度提升,并带来了三个数量级的平均运行时间加速。
{"title":"An Optimization-aware Pre-Routing Timing Prediction Framework Based on Heterogeneous Graph Learning","authors":"Guoqing He, Wenjie Ding, Yuyang Ye, Xu Cheng, Qianqian Song, Peng Cao","doi":"10.1109/ASP-DAC58780.2024.10473937","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473937","url":null,"abstract":"Accurate and efficient pre-routing timing estimation is particularly crucial in timing-driven placement, as design iterations caused by timing divergence are time-consuming. However, existing machine learning prediction models overlook the impact of timing optimization techniques during routing stage, such as adjusting gate sizes or swapping threshold voltage types to fix routing-induced timing violations. In this work, an optimization-aware pre-routing timing prediction framework based on heterogeneous graph learning is proposed to calibrate the timing changes introduced by wire parasitic and optimization techniques. The path embedding generated by the proposed framework fuses learned local information from graph neural network and global information from transformer network to perform accurate endpoint arrival time prediction. Experimental results demonstrate that the proposed framework achieves an average accuracy improvement of 0.10 in terms of R2 score on testing designs and brings average runtime acceleration of three orders of magnitude compared with the design flow.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"268 8","pages":"177-182"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Sublogic-Cone-Based Switching Activity Estimation using Correlation Factor 利用相关因子进行基于子逻辑锥的高效开关活动估计
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473841
Kexin Zhu, Runjie Zhang, Qing He
Switching activity is one of the key factors that determine digital circuits’ power consumption. While gate-level simulations are too slow to support the average power analysis of modern designs blocks (e.g., millions or even billions of gates) over a longer period of time (e.g., millions of cycles), probabilistic methods provide a solution by using RTL simulation results and propagating the switching activity through the combinational logic. This work presents a sublogic-cone-based, probabilistic method for switching activity propagation in combinational logic circuits. We divide the switching activity estimation problem into two parts: incremental propagation (across the entire circuit) and accurate calculation (within the sublogic cones). To construct the sublogic cones, we first introduce a new metric called correlation factor to quantify the impact induced by the correlations between signal nets; then we develop an efficient algorithm that uses the calculated correlation factor to guide the construction of sublogic cones. The experimental results show that our method produces 73.2% more accurate switching activity estimation results compared with the state-of-the-art method, and achieves a 19X speedup at the meantime.
开关活动是决定数字电路功耗的关键因素之一。门级仿真速度太慢,无法支持对现代设计块(如数百万甚至数十亿门)在较长时间内(如数百万个周期)的平均功率分析,而概率方法通过使用 RTL 仿真结果和通过组合逻辑传播开关活动提供了一种解决方案。本研究提出了一种基于子逻辑锥的概率方法,用于组合逻辑电路中的开关活动传播。我们将开关活动估计问题分为两部分:增量传播(在整个电路中)和精确计算(在子逻辑锥内)。为了构建子逻辑锥,我们首先引入了一种名为 "相关因子 "的新指标,以量化信号网之间的相关性所引起的影响;然后,我们开发了一种高效算法,利用计算出的相关因子来指导子逻辑锥的构建。实验结果表明,与最先进的方法相比,我们的方法产生的开关活动估计结果准确率提高了 73.2%,同时速度提高了 19 倍。
{"title":"Efficient Sublogic-Cone-Based Switching Activity Estimation using Correlation Factor","authors":"Kexin Zhu, Runjie Zhang, Qing He","doi":"10.1109/ASP-DAC58780.2024.10473841","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473841","url":null,"abstract":"Switching activity is one of the key factors that determine digital circuits’ power consumption. While gate-level simulations are too slow to support the average power analysis of modern designs blocks (e.g., millions or even billions of gates) over a longer period of time (e.g., millions of cycles), probabilistic methods provide a solution by using RTL simulation results and propagating the switching activity through the combinational logic. This work presents a sublogic-cone-based, probabilistic method for switching activity propagation in combinational logic circuits. We divide the switching activity estimation problem into two parts: incremental propagation (across the entire circuit) and accurate calculation (within the sublogic cones). To construct the sublogic cones, we first introduce a new metric called correlation factor to quantify the impact induced by the correlations between signal nets; then we develop an efficient algorithm that uses the calculated correlation factor to guide the construction of sublogic cones. The experimental results show that our method produces 73.2% more accurate switching activity estimation results compared with the state-of-the-art method, and achieves a 19X speedup at the meantime.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"246 2","pages":"638-643"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KalmanHD: Robust On-Device Time Series Forecasting with Hyperdimensional Computing KalmanHD:利用超维计算进行稳健的设备上时间序列预测
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473878
Ivannia Gomez Moreno, Xiaofan Yu, Tajana Rosing
Time series forecasting is shifting towards Edge AI, where models are trained and executed on edge devices instead of in the cloud. However, training forecasting models at the edge faces two challenges concurrently: (1) dealing with streaming data containing abundant noise, which can lead to degradation in model predictions, and (2) coping with limited on-device resources. Traditional approaches focus on simple statistical methods like ARIMA or neural networks, which are either not robust to sensor noise or not efficient for edge deployment, or both. In this paper, we propose a novel, robust, and lightweight method named KalmanHD for on-device time series forecasting using Hyperdimensional Computing (HDC). KalmanHD integrates Kalman Filter (KF) with HDC, resulting in a new regression method that combines the robustness of KF towards sensor noise and the efficiency of HDC. KalmanHD first encodes the past values into a high-dimensional vector representation, then applies the Expectation-Maximization (EM) approach as in KF to iteratively update the model based on the incoming samples. KalmanHD inherently considers the variability of each sample and thereby enhances robustness. We further accelerate KalmanHD by substituting the expensive matrix multiplication with efficient binary operations between the covariance and the encoded values. Our results show that KalmanHD achieves MAE comparable to the state-of-the-art noise-optimized NN-based methods while running $3.6-8.6times$ faster on typical edge platforms. The source code is available at https://github.com/DarthIV02/Ka1manHD
时间序列预测正在向边缘人工智能(Edge AI)转变,在边缘设备上而不是在云端训练和执行模型。然而,在边缘设备上训练预测模型同时面临两个挑战:(1) 处理包含大量噪声的流数据,这会导致模型预测效果下降;(2) 应对有限的设备资源。传统方法侧重于 ARIMA 或神经网络等简单的统计方法,这些方法要么对传感器噪声不具有鲁棒性,要么对边缘部署不高效,或者两者兼而有之。在本文中,我们提出了一种新颖、稳健、轻量级的方法,名为 KalmanHD,用于使用超维计算(HDC)进行设备上时间序列预测。KalmanHD 将卡尔曼滤波(KF)与超维计算(HDC)整合在一起,形成了一种新的回归方法,该方法结合了 KF 对传感器噪声的鲁棒性和超维计算的高效性。KalmanHD 首先将过去的值编码为高维向量表示,然后应用 KF 中的期望最大化(EM)方法,根据输入样本迭代更新模型。KalmanHD 本身考虑了每个样本的可变性,从而增强了鲁棒性。我们用协方差和编码值之间的高效二进制运算取代了昂贵的矩阵乘法,从而进一步加快了 KalmanHD 的速度。我们的研究结果表明,KalmanHD 的 MAE 可与最先进的基于噪声优化 NN 方法相媲美,同时在典型边缘平台上的运行速度可提高 3.6-8.6 倍。源代码见 https://github.com/DarthIV02/Ka1manHD
{"title":"KalmanHD: Robust On-Device Time Series Forecasting with Hyperdimensional Computing","authors":"Ivannia Gomez Moreno, Xiaofan Yu, Tajana Rosing","doi":"10.1109/ASP-DAC58780.2024.10473878","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473878","url":null,"abstract":"Time series forecasting is shifting towards Edge AI, where models are trained and executed on edge devices instead of in the cloud. However, training forecasting models at the edge faces two challenges concurrently: (1) dealing with streaming data containing abundant noise, which can lead to degradation in model predictions, and (2) coping with limited on-device resources. Traditional approaches focus on simple statistical methods like ARIMA or neural networks, which are either not robust to sensor noise or not efficient for edge deployment, or both. In this paper, we propose a novel, robust, and lightweight method named KalmanHD for on-device time series forecasting using Hyperdimensional Computing (HDC). KalmanHD integrates Kalman Filter (KF) with HDC, resulting in a new regression method that combines the robustness of KF towards sensor noise and the efficiency of HDC. KalmanHD first encodes the past values into a high-dimensional vector representation, then applies the Expectation-Maximization (EM) approach as in KF to iteratively update the model based on the incoming samples. KalmanHD inherently considers the variability of each sample and thereby enhances robustness. We further accelerate KalmanHD by substituting the expensive matrix multiplication with efficient binary operations between the covariance and the encoded values. Our results show that KalmanHD achieves MAE comparable to the state-of-the-art noise-optimized NN-based methods while running $3.6-8.6times$ faster on typical edge platforms. The source code is available at https://github.com/DarthIV02/Ka1manHD","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"225 2","pages":"710-715"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR WER:通过 GPU Warp EqualizeR 最大化不规则图应用的并行性
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473955
En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh
Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 times$ and $1.47 times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.
不规则图形在各种数据分析应用中越来越普遍。尽管这些图形用途广泛,但其固有的复杂性和不规则性往往导致在图形处理器(GPU)上处理时,单指令多数据(SIMD)资源利用率不足。这种利用率不足主要源于两个问题:出现闲置线程和线程内负载不平衡。这些问题会产生闲置线程,导致 SIMD 资源使用效率低下,从而阻碍吞吐量并增加程序执行时间。为了应对这些挑战,我们引入了Warp EqualizeR(WER),这是一个旨在优化GPU上SIMD资源利用率的框架,用于处理不规则图形。WER 采用了软件 API 和专门定制的硬件微架构。这种协同方法能够在不规则图形中重新分配工作负载,从而使 WER 能够提高 SIMD 通道的利用率,并进一步利用 GPU 中的 SIMD 资源。我们对七种不同图形应用的实验结果表明,与基准 GPU 和现有的最先进方法相比,WER 的几何平均速度分别提高了 2.52 美元和 1.47 美元。
{"title":"WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR","authors":"En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh","doi":"10.1109/ASP-DAC58780.2024.10473955","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473955","url":null,"abstract":"Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 times$ and $1.47 times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"3 6","pages":"201-206"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator BFP-CIM:采用动态块浮点运算的无数据量化技术,实现基于内存的高能效计算加速器
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473797
Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu
Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.
卷积神经网络(CNN)以其在各种应用中的卓越性能而著称,然而,其在推理过程中的能耗可能非常大。模拟计算内存(CIM)有望提高 CNN 的能效,但模数转换器(ADC)的使用仍是一个挑战。模数转换器将 CIM 横条阵列中的模拟部分和转换为数字值,高精度模数转换器占系统能耗的 60% 以上。研究人员已经探索过量化 CNN,使用低精度 ADC 来解决这一问题,以精度换效率。然而,这些方法需要根据数据进行调整,以尽量减少精度损失。相反,我们观察到,第一个最显著的切换位指示了每个输入值的最佳量化范围。因此,我们提出了一种用于运行时调整位宽的范围感知舍入(RAR)方法,从而消除了预先部署的需要。RAR 可以使用动态块浮点运算轻松集成到 CIM 加速器中。实验结果表明,与最先进的技术相比,我们的方法在保持准确性的同时,在 CIFAR-10 和 ImageNet 数据集上分别实现了高达 1.81 倍和 2.08 倍的能效提升。
{"title":"BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator","authors":"Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu","doi":"10.1109/ASP-DAC58780.2024.10473797","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473797","url":null,"abstract":"Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"39 5-6","pages":"545-550"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JustQ: Automated Deployment of Fair and Accurate Quantum Neural Networks JustQ:自动部署公平准确的量子神经网络
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473829
Ruhan Wang, Fahiz Baba-Yara, Fan Chen
Despite the success of Quantum Neural Networks (QNNs) in decision-making systems, their fairness remains unexplored, as the focus primarily lies on accuracy. This work conducts a design space exploration, unveiling QNN unfairness, and highlighting the significant influence of QNN deployment and quantum noise on accuracy and fairness. To effectively navigate the vast QNN deployment design space, we propose JustQ, a framework for deploying fair and accurate QNNs on NISQ computers. It includes a complete NISQ error model, reinforcement learning-based deployment, and a flexible optimization objective incorporating both fairness and accuracy. Experimental results show JustQ outperforms previous methods, achieving superior accuracy and fairness. This work pioneers fair QNN design on NISQ computers, paving the way for future investigations.
尽管量子神经网络(QNN)在决策系统中取得了成功,但其公平性仍未得到探讨,因为人们主要关注的是准确性。这项研究对设计空间进行了探索,揭示了量子神经网络的不公平之处,并强调了量子神经网络部署和量子噪声对准确性和公平性的重要影响。为了有效驾驭广阔的 QNN 部署设计空间,我们提出了在 NISQ 计算机上部署公平准确的 QNN 的框架 JustQ。它包括一个完整的 NISQ 误差模型、基于强化学习的部署和一个灵活的优化目标,其中包含公平性和准确性。实验结果表明,JustQ 优于之前的方法,实现了卓越的准确性和公平性。这项工作开创了在 NISQ 计算机上进行公平 QNN 设计的先河,为未来的研究铺平了道路。
{"title":"JustQ: Automated Deployment of Fair and Accurate Quantum Neural Networks","authors":"Ruhan Wang, Fahiz Baba-Yara, Fan Chen","doi":"10.1109/ASP-DAC58780.2024.10473829","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473829","url":null,"abstract":"Despite the success of Quantum Neural Networks (QNNs) in decision-making systems, their fairness remains unexplored, as the focus primarily lies on accuracy. This work conducts a design space exploration, unveiling QNN unfairness, and highlighting the significant influence of QNN deployment and quantum noise on accuracy and fairness. To effectively navigate the vast QNN deployment design space, we propose JustQ, a framework for deploying fair and accurate QNNs on NISQ computers. It includes a complete NISQ error model, reinforcement learning-based deployment, and a flexible optimization objective incorporating both fairness and accuracy. Experimental results show JustQ outperforms previous methods, achieving superior accuracy and fairness. This work pioneers fair QNN design on NISQ computers, paving the way for future investigations.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"168 1","pages":"121-126"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Toward End-to-End Analog Design Automation with ML and Data-Driven Approaches (Invited Paper) 利用 ML 和数据驱动方法实现端到端模拟设计自动化(特邀论文)
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473840
Supriyo Maji, A. Budak, Souradip Poddar, David Z. Pan
Designing analog circuits poses significant challenges due to their knowledge-intensive nature and the diverse range of requirements. There has been limited success in achieving a fully automated framework for designing analog circuits. However, the advent of advanced machine learning algorithms is invigorating design automation efforts by enabling tools to replicate the techniques employed by experienced designers. In this paper, we aim to provide an overview of the recent progress in ML-driven analog circuit sizing and layout automation tool developments. In advanced technology nodes, layout effects must be considered during circuit sizing to avoid costly rerun of the flow. We will discuss the latest research in layout-aware sizing. In the end-to-end analog design automation flow, topology selection plays an important role, as the final performance depends on the choice of topology. We will discuss recent developments in ML-driven topology selection before delving into our vision of an end-to-end data-driven framework that leverages ML techniques to facilitate the selection of optimal topology from a library of topologies.
由于模拟电路的知识密集性和要求的多样性,模拟电路的设计面临着巨大的挑战。在设计模拟电路的全自动框架方面,取得的成功有限。然而,先进的机器学习算法的出现正在为设计自动化工作注入活力,使工具能够复制经验丰富的设计人员所使用的技术。本文旨在概述 ML 驱动的模拟电路尺寸和布局自动化工具开发的最新进展。在先进的技术节点中,电路选型时必须考虑布局效应,以避免代价高昂的流程重新运行。我们将讨论布局感知设计方面的最新研究。在端到端模拟设计自动化流程中,拓扑选择起着重要作用,因为最终性能取决于拓扑选择。我们将讨论 ML 驱动拓扑选择的最新发展,然后深入探讨我们对端到端数据驱动框架的愿景,该框架利用 ML 技术促进从拓扑库中选择最佳拓扑。
{"title":"Toward End-to-End Analog Design Automation with ML and Data-Driven Approaches (Invited Paper)","authors":"Supriyo Maji, A. Budak, Souradip Poddar, David Z. Pan","doi":"10.1109/ASP-DAC58780.2024.10473840","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473840","url":null,"abstract":"Designing analog circuits poses significant challenges due to their knowledge-intensive nature and the diverse range of requirements. There has been limited success in achieving a fully automated framework for designing analog circuits. However, the advent of advanced machine learning algorithms is invigorating design automation efforts by enabling tools to replicate the techniques employed by experienced designers. In this paper, we aim to provide an overview of the recent progress in ML-driven analog circuit sizing and layout automation tool developments. In advanced technology nodes, layout effects must be considered during circuit sizing to avoid costly rerun of the flow. We will discuss the latest research in layout-aware sizing. In the end-to-end analog design automation flow, topology selection plays an important role, as the final performance depends on the choice of topology. We will discuss recent developments in ML-driven topology selection before delving into our vision of an end-to-end data-driven framework that leverages ML techniques to facilitate the selection of optimal topology from a library of topologies.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"13 9","pages":"657-664"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Signed Convolution in Photonics with Phase-Change Materials using Mixed-Polarity Bitstreams 使用混合极性比特流的相变材料光子学中的符号卷积
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473952
Raphael Cardoso, Clément Zrounba, M.F. Abdalla, Paul Jiménez, Mauricio Gomes de Queiroz, B. Charbonnier, Fabio Pavanello, Ian O'Connor, S. L. Beux
As AI continues to grow in importance, in order to reduce its carbon footprint and utilization of computer resources, numerous alternatives are under investigation to improve its hardware building blocks. In particular, in convolutional neural networks (CNNs), the convolution function represents the most important operation and one of the best targets for optimization. A new approach to convolution had recently emerged using optics, phase-change materials (PCMs) and stochastic computing, but is thus far limited to unsigned operands. In this paper, we propose an extension in which the convolutional kernels are signed, using mixed-polarity bitstreams. We present a proof of validity for our method, while also showing that, in simulation and under similar operating conditions, our approach is less affected by noise than the common approach in the literature.
随着人工智能的重要性与日俱增,为了减少其碳足迹和计算机资源的使用,人们正在研究许多替代方案,以改进其硬件构件。特别是在卷积神经网络(CNN)中,卷积函数是最重要的操作,也是最佳优化目标之一。最近出现了一种利用光学、相变材料(PCM)和随机计算进行卷积的新方法,但迄今为止仅限于无符号操作数。在本文中,我们提出了一种使用混合极性比特流对卷积核进行带符号扩展的方法。我们提出了我们方法的有效性证明,同时还表明,在模拟和类似的操作条件下,我们的方法比文献中的常见方法受噪声的影响更小。
{"title":"Signed Convolution in Photonics with Phase-Change Materials using Mixed-Polarity Bitstreams","authors":"Raphael Cardoso, Clément Zrounba, M.F. Abdalla, Paul Jiménez, Mauricio Gomes de Queiroz, B. Charbonnier, Fabio Pavanello, Ian O'Connor, S. L. Beux","doi":"10.1109/ASP-DAC58780.2024.10473952","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473952","url":null,"abstract":"As AI continues to grow in importance, in order to reduce its carbon footprint and utilization of computer resources, numerous alternatives are under investigation to improve its hardware building blocks. In particular, in convolutional neural networks (CNNs), the convolution function represents the most important operation and one of the best targets for optimization. A new approach to convolution had recently emerged using optics, phase-change materials (PCMs) and stochastic computing, but is thus far limited to unsigned operands. In this paper, we propose an extension in which the convolutional kernels are signed, using mixed-polarity bitstreams. We present a proof of validity for our method, while also showing that, in simulation and under similar operating conditions, our approach is less affected by noise than the common approach in the literature.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"253 11","pages":"854-859"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast Test Compaction Method for Commercial DFT Flow Using Dedicated Pure-MaxSAT Solver 使用专用 Pure-MaxSAT 求解器的商用 DFT 流量快速测试压实方法
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473833
Zhiteng Chao, Xindi Zhang, Junying Huang, Jing Ye, Shaowei Cai, Huawei Li, Xiaowei Li
Minimizing the testing cost is crucial in the context of the design for test (DFT) flow. In our observation, the test patterns generated by commercial ATPG tools in test compression mode still contain redundancy. To tackle this obstacle, we propose a post-flow static test compaction method that utilizes a partial fault dictionary instead of a full fault dictionary, and leverages a dedicated Pure-MaxSAT solver to re-compact the test patterns generated by commercial ATPG tools. We also observe that commercial ATPG tools offer a more comprehensive selection of candidate patterns for compaction in the “n-detect” mode, leading to superior compaction efficacy. In experiments on ISCAS89, ITC99, and open-source RISC-V CPU benchmarks, our method achieves an average reduction of 21.58% and a maximum of 29.93% in test cycles evaluated by commercial tools while maintaining fault coverage. Furthermore, our approach demonstrates improved performance compared with existing methods.
在测试设计(DFT)流程中,测试成本最小化至关重要。根据我们的观察,商业 ATPG 工具在测试压缩模式下生成的测试模式仍然包含冗余。为了解决这一障碍,我们提出了一种流程后静态测试压缩方法,该方法利用部分故障字典而非完整故障字典,并利用专用的 Pure-MaxSAT 求解器来重新压缩商业 ATPG 工具生成的测试模式。我们还观察到,在 "n-检测 "模式下,商业 ATPG 工具能为压缩提供更全面的候选模式选择,从而实现更优越的压缩效果。在 ISCAS89、ITC99 和开源 RISC-V CPU 基准的实验中,我们的方法在保持故障覆盖率的同时,将商业工具评估的测试周期平均减少了 21.58%,最大减少了 29.93%。此外,与现有方法相比,我们的方法还提高了性能。
{"title":"A Fast Test Compaction Method for Commercial DFT Flow Using Dedicated Pure-MaxSAT Solver","authors":"Zhiteng Chao, Xindi Zhang, Junying Huang, Jing Ye, Shaowei Cai, Huawei Li, Xiaowei Li","doi":"10.1109/ASP-DAC58780.2024.10473833","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473833","url":null,"abstract":"Minimizing the testing cost is crucial in the context of the design for test (DFT) flow. In our observation, the test patterns generated by commercial ATPG tools in test compression mode still contain redundancy. To tackle this obstacle, we propose a post-flow static test compaction method that utilizes a partial fault dictionary instead of a full fault dictionary, and leverages a dedicated Pure-MaxSAT solver to re-compact the test patterns generated by commercial ATPG tools. We also observe that commercial ATPG tools offer a more comprehensive selection of candidate patterns for compaction in the “n-detect” mode, leading to superior compaction efficacy. In experiments on ISCAS89, ITC99, and open-source RISC-V CPU benchmarks, our method achieves an average reduction of 21.58% and a maximum of 29.93% in test cycles evaluated by commercial tools while maintaining fault coverage. Furthermore, our approach demonstrates improved performance compared with existing methods.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"280 6","pages":"503-508"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1