首页 > 最新文献

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文 中文
X-Check: GPU-Accelerated Design Rule Checking via Parallel Sweepline Algorithms X-Check: gpu加速设计规则检查通过并行横扫线算法
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549383
Zhuolun He, Yuzhe Ma, Bei Yu
Design rule checking (DRC) is essential in physical verification to ensure high yield and reliability for VLSI circuit designs. To achieve reasonable design cycle time, acceleration for computationally intensive DRC tasks has been demanded to accommodate the ever-growing complexity of modern VLSI circuits. In this paper, we pro-pose X-Check, a GPU-accelerated design rule checker. X-Check integrates novel parallel sweepline algorithms, which are both efficient in practice and with nontrivial theoretical guarantees. Experimental results have demonstrated significant speedup achieved by X-Check compared with a multi-threaded CPU checker.
设计规则检查(DRC)是确保VLSI电路设计的高良率和可靠性的必要物理验证。为了实现合理的设计周期时间,需要对计算密集型DRC任务进行加速,以适应现代VLSI电路日益增长的复杂性。在本文中,我们提出了X-Check,一个gpu加速设计规则检查器。X-Check集成了新颖的并行扫描线算法,该算法在实践中既高效又具有重要的理论保证。实验结果表明,与多线程CPU检查器相比,X-Check实现了显著的加速。
{"title":"X-Check: GPU-Accelerated Design Rule Checking via Parallel Sweepline Algorithms","authors":"Zhuolun He, Yuzhe Ma, Bei Yu","doi":"10.1145/3508352.3549383","DOIUrl":"https://doi.org/10.1145/3508352.3549383","url":null,"abstract":"Design rule checking (DRC) is essential in physical verification to ensure high yield and reliability for VLSI circuit designs. To achieve reasonable design cycle time, acceleration for computationally intensive DRC tasks has been demanded to accommodate the ever-growing complexity of modern VLSI circuits. In this paper, we pro-pose X-Check, a GPU-accelerated design rule checker. X-Check integrates novel parallel sweepline algorithms, which are both efficient in practice and with nontrivial theoretical guarantees. Experimental results have demonstrated significant speedup achieved by X-Check compared with a multi-threaded CPU checker.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129782411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Design and Technology Co-optimization Utilizing Multi-bit Flip-flop Cells 利用多比特触发器单元的设计与技术协同优化
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549351
Soomin Kim, Taewhan Kim
The benefit of multi-bit flip-flop (MBFF) as opposed to single-bit flip-flop is sharing in-cell clock inverters among the master and slave latches in the internal flip-flops of MBFF. Theoretically, the more flip-flops an MBFF has, the more power saving it can achieve. However, in practice, physically increasing the size of MBFF to accommodate many flip-flops imposes two new challenging problems in physical design: (1) non-flexible MBFF cell flipping for multiple D-to-Q signals and (2) unbalanced or wasted use of MBFF footprint space. In this work, we solve the two problems in a way to enhance routability and timing at the placement and routing stages. Precisely, for problem 1, we make the non-flexible MBFF cell flip-ping to be fully flexible by generating MBFF layouts supporting diverse D-to-Q flow directions in the detailed placement to improve routability and for problem 2, we enhance the setup and clock-to-Q delay on timing critical flip-flops in MBFF through gate upsizing (i.e., transistor folding) by using the unused space in MBFF to im-prove timing slack at the post-routing stage. Through experiments with benchmark circuits, it is shown that our proposed design and technology co-optimization (DTCO) flow using MBFFs that solves problems 1 and 2 is very promising.
与单比特触发器相比,多比特触发器(MBFF)的优点是在MBFF内部触发器的主锁存器和从锁存器之间共享单元内时钟逆变器。从理论上讲,MBFF的触发器越多,就越能节省电力。然而,在实践中,物理上增加MBFF的尺寸以容纳许多触发器会给物理设计带来两个新的挑战问题:(1)多个D-to-Q信号的MBFF单元翻转不灵活;(2)MBFF占用空间的不平衡或浪费。在这项工作中,我们以一种提高放置和路由阶段的可达性和时序的方式解决了这两个问题。准确地说,对于问题1,我们通过生成支持多种D-to-Q流方向的MBFF布局来提高路由可达性,从而使非柔性MBFF单元翻转完全灵活;对于问题2,我们通过栅极放大(即晶体管折叠)来增强MBFF中定时关键触发器的设置和时钟- q延迟,利用MBFF中未使用的空间来改善后路由阶段的定时松弛。通过基准电路的实验表明,我们提出的利用MBFFs解决问题1和2的设计和技术协同优化(DTCO)流程是非常有前途的。
{"title":"Design and Technology Co-optimization Utilizing Multi-bit Flip-flop Cells","authors":"Soomin Kim, Taewhan Kim","doi":"10.1145/3508352.3549351","DOIUrl":"https://doi.org/10.1145/3508352.3549351","url":null,"abstract":"The benefit of multi-bit flip-flop (MBFF) as opposed to single-bit flip-flop is sharing in-cell clock inverters among the master and slave latches in the internal flip-flops of MBFF. Theoretically, the more flip-flops an MBFF has, the more power saving it can achieve. However, in practice, physically increasing the size of MBFF to accommodate many flip-flops imposes two new challenging problems in physical design: (1) non-flexible MBFF cell flipping for multiple D-to-Q signals and (2) unbalanced or wasted use of MBFF footprint space. In this work, we solve the two problems in a way to enhance routability and timing at the placement and routing stages. Precisely, for problem 1, we make the non-flexible MBFF cell flip-ping to be fully flexible by generating MBFF layouts supporting diverse D-to-Q flow directions in the detailed placement to improve routability and for problem 2, we enhance the setup and clock-to-Q delay on timing critical flip-flops in MBFF through gate upsizing (i.e., transistor folding) by using the unused space in MBFF to im-prove timing slack at the post-routing stage. Through experiments with benchmark circuits, it is shown that our proposed design and technology co-optimization (DTCO) flow using MBFFs that solves problems 1 and 2 is very promising.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114201697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Verification and Design Space Exploration Under Uncertainty with Parametric Stochastic Contracts 参数随机契约不确定性下的定量验证与设计空间探索
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549446
Chanwook Oh, M. Lora, P. Nuzzo
This paper proposes an automated framework for quantitative verification and design space exploration of cyber-physical systems in the presence of uncertainty, leveraging assume-guarantee contracts expressed in Stochastic Signal Temporal Logic (StSTL). We introduce quantitative semantics for StSTL and formulations of the quantitative verification and design space exploration problems as bi-level optimization problems. We show that these optimization problems can be effectively solved for a class of stochastic systems and a fragment of bounded-time StSTL formulas. Our algorithm searches for partitions of the upper-level design space such that the solutions of the lower-level problems satisfy the upper-level constraints. A set of optimal parameter values are then selected within these partitions. We illustrate the effectiveness of our framework on the design of a multi-sensor perception system and an automatic cruise control system.
本文提出了一个自动化框架,用于存在不确定性的网络物理系统的定量验证和设计空间探索,利用随机信号时序逻辑(StSTL)中表达的假设-保证契约。我们引入了StSTL的定量语义,并将定量验证和设计空间探索问题表述为双层优化问题。我们证明了这些优化问题可以有效地解决一类随机系统和一小部分有界时间StSTL公式。我们的算法搜索上层设计空间的分区,使低层问题的解满足上层约束。然后在这些分区中选择一组最佳参数值。我们在多传感器感知系统和自动巡航控制系统的设计中说明了我们的框架的有效性。
{"title":"Quantitative Verification and Design Space Exploration Under Uncertainty with Parametric Stochastic Contracts","authors":"Chanwook Oh, M. Lora, P. Nuzzo","doi":"10.1145/3508352.3549446","DOIUrl":"https://doi.org/10.1145/3508352.3549446","url":null,"abstract":"This paper proposes an automated framework for quantitative verification and design space exploration of cyber-physical systems in the presence of uncertainty, leveraging assume-guarantee contracts expressed in Stochastic Signal Temporal Logic (StSTL). We introduce quantitative semantics for StSTL and formulations of the quantitative verification and design space exploration problems as bi-level optimization problems. We show that these optimization problems can be effectively solved for a class of stochastic systems and a fragment of bounded-time StSTL formulas. Our algorithm searches for partitions of the upper-level design space such that the solutions of the lower-level problems satisfy the upper-level constraints. A set of optimal parameter values are then selected within these partitions. We illustrate the effectiveness of our framework on the design of a multi-sensor perception system and an automatic cruise control system.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131359605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Designing Energy-Efficient Decision Tree Memristor Crossbar Circuits using Binary Classification Graphs 利用二值分类图设计节能决策树忆阻交叉电路
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549448
Pranav Sinha, Sunny Raj
We propose a method to design in-memory, energy-efficient, and compact memristor crossbar circuits for implementing decision trees using flow-based computing. We develop a new tool called binary classification graph, which is equivalent to decision trees in accuracy but uses bit values of input features to make decisions instead of thresholds. Our proposed design is resilient to manufacturing errors and can scale to large crossbar sizes due to the utilization of sneak paths in computations. Our design uses zero transistor and one memristor (0T1R) crossbars with only two resistance states of high and low, which makes it resilient to resistance drift and radiation degradation. We test the performance of our designs on multiple standard machine learning datasets and show that our method utilizes circuits of size 5.23 × 10−3 mm2 and uses 20.5 pJ per decision, and outperforms state-of-the-art decision tree acceleration algorithms on these metrics.
我们提出了一种方法来设计内存,节能,紧凑的记忆电阻交叉电路实现决策树使用基于流的计算。我们开发了一种新的工具,称为二元分类图,它在精度上相当于决策树,但使用输入特征的位值而不是阈值来进行决策。我们提出的设计对制造错误具有弹性,并且由于在计算中使用了偷偷路径,可以扩展到大的交叉杆尺寸。我们的设计采用零晶体管和一个忆阻器(0T1R)横条,只有高和低两种电阻状态,这使得它具有抗电阻漂移和辐射退化的能力。我们在多个标准机器学习数据集上测试了我们设计的性能,并表明我们的方法使用5.23 × 10−3 mm2的电路,每个决策使用20.5 pJ,并且在这些指标上优于最先进的决策树加速算法。
{"title":"Designing Energy-Efficient Decision Tree Memristor Crossbar Circuits using Binary Classification Graphs","authors":"Pranav Sinha, Sunny Raj","doi":"10.1145/3508352.3549448","DOIUrl":"https://doi.org/10.1145/3508352.3549448","url":null,"abstract":"We propose a method to design in-memory, energy-efficient, and compact memristor crossbar circuits for implementing decision trees using flow-based computing. We develop a new tool called binary classification graph, which is equivalent to decision trees in accuracy but uses bit values of input features to make decisions instead of thresholds. Our proposed design is resilient to manufacturing errors and can scale to large crossbar sizes due to the utilization of sneak paths in computations. Our design uses zero transistor and one memristor (0T1R) crossbars with only two resistance states of high and low, which makes it resilient to resistance drift and radiation degradation. We test the performance of our designs on multiple standard machine learning datasets and show that our method utilizes circuits of size 5.23 × 10−3 mm2 and uses 20.5 pJ per decision, and outperforms state-of-the-art decision tree acceleration algorithms on these metrics.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"02 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129984065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Scissor: Coupling Spatial Redundancy Reduction and CNN Compression for Embedded Hardware 智能剪刀:嵌入式硬件的耦合空间冗余减少和CNN压缩
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549397
Hao Kong, Di Liu, Shuo Huai, Xiangzhong Luo, Weichen Liu, Ravi Subramaniam, C. Makaya, Qian Lin
Scaling down the resolution of input images can greatly reduce the computational overhead of convolutional neural networks (CNNs), which is promising for edge AI. However, as an image usually contains much spatial redundancy, e.g., background pixels, directly shrinking the whole image will lose important features of the foreground object and lead to severe accuracy degradation. In this paper, we propose a dynamic image cropping framework to reduce the spatial redundancy by accurately cropping the foreground object from images. To achieve the instance-aware fine cropping, we introduce a lightweight foreground predictor to efficiently localize and crop the foreground of an image. The finely cropped images can be correctly recognized even at a small resolution. Meanwhile, computational redundancy also exists in CNN architectures. To pursue higher execution efficiency on resource-constrained embedded devices, we also propose a compound shrinking strategy to coordinately compress the three dimensions (depth, width, resolution) of CNNs. Eventually, we seamlessly combine the proposed dynamic image cropping and compound shrinking into a unified compression framework, Smart Scissor, which is expected to significantly reduce the computational overhead of CNNs while still maintaining high accuracy. Experiments on ImageNet-1K demonstrate that our method reduces the computational cost of ResNet50 by 41.5% while improving the top-1 accuracy by 0.3%. Moreover, compared to HRank, the state-of-theart CNN compression framework, our method achieves 4.1% higher top-1 accuracy at the same computational cost. The codes and data are available at https://github.com/ntuliuteam/smart-scissor
降低输入图像的分辨率可以大大减少卷积神经网络(cnn)的计算开销,这对边缘人工智能很有前途。然而,由于图像通常包含大量的空间冗余,例如背景像素,直接缩小整个图像将失去前景物体的重要特征,导致严重的精度下降。本文提出了一种动态图像裁剪框架,通过精确裁剪图像中的前景目标来减少空间冗余。为了实现基于实例的精细裁剪,我们引入了一个轻量级的前景预测器来有效地定位和裁剪图像的前景。即使在很小的分辨率下,精细裁剪的图像也能被正确识别。同时,计算冗余也存在于CNN架构中。为了在资源受限的嵌入式设备上追求更高的执行效率,我们还提出了一种复合收缩策略来协调压缩cnn的三个维度(深度、宽度、分辨率)。最终,我们将提出的动态图像裁剪和复合收缩无缝地结合到一个统一的压缩框架中,即智能剪刀,该框架有望在保持高精度的同时显着降低cnn的计算开销。在ImageNet-1K上的实验表明,我们的方法将ResNet50的计算成本降低了41.5%,同时将top-1的准确率提高了0.3%。此外,与最先进的CNN压缩框架HRank相比,我们的方法在相同的计算成本下实现了4.1%的top-1精度。代码和数据可在https://github.com/ntuliuteam/smart-scissor上获得
{"title":"Smart Scissor: Coupling Spatial Redundancy Reduction and CNN Compression for Embedded Hardware","authors":"Hao Kong, Di Liu, Shuo Huai, Xiangzhong Luo, Weichen Liu, Ravi Subramaniam, C. Makaya, Qian Lin","doi":"10.1145/3508352.3549397","DOIUrl":"https://doi.org/10.1145/3508352.3549397","url":null,"abstract":"Scaling down the resolution of input images can greatly reduce the computational overhead of convolutional neural networks (CNNs), which is promising for edge AI. However, as an image usually contains much spatial redundancy, e.g., background pixels, directly shrinking the whole image will lose important features of the foreground object and lead to severe accuracy degradation. In this paper, we propose a dynamic image cropping framework to reduce the spatial redundancy by accurately cropping the foreground object from images. To achieve the instance-aware fine cropping, we introduce a lightweight foreground predictor to efficiently localize and crop the foreground of an image. The finely cropped images can be correctly recognized even at a small resolution. Meanwhile, computational redundancy also exists in CNN architectures. To pursue higher execution efficiency on resource-constrained embedded devices, we also propose a compound shrinking strategy to coordinately compress the three dimensions (depth, width, resolution) of CNNs. Eventually, we seamlessly combine the proposed dynamic image cropping and compound shrinking into a unified compression framework, Smart Scissor, which is expected to significantly reduce the computational overhead of CNNs while still maintaining high accuracy. Experiments on ImageNet-1K demonstrate that our method reduces the computational cost of ResNet50 by 41.5% while improving the top-1 accuracy by 0.3%. Moreover, compared to HRank, the state-of-theart CNN compression framework, our method achieves 4.1% higher top-1 accuracy at the same computational cost. The codes and data are available at https://github.com/ntuliuteam/smart-scissor","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127454412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Attacks on Image Sensors 对图像传感器的攻击
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561097
M. Wolf, Kruttidipta Samal
This paper provides a taxonomy of security vulnerabilities of smart image sensor systems. Image sensors form an important class of sensors. Many image sensors include computation units that can provide traditional algorithms such as image or video compression along with machine learning tasks such as classification. Some attacks rely on the physics and optics of imaging. Other attacks take advantage of the complex logic and software required to perform imaging systems.
本文对智能图像传感器系统的安全漏洞进行了分类。图像传感器是一类重要的传感器。许多图像传感器包括可以提供传统算法(如图像或视频压缩)以及机器学习任务(如分类)的计算单元。有些攻击依赖于物理和光学成像。其他攻击利用了执行成像系统所需的复杂逻辑和软件。
{"title":"Attacks on Image Sensors","authors":"M. Wolf, Kruttidipta Samal","doi":"10.1145/3508352.3561097","DOIUrl":"https://doi.org/10.1145/3508352.3561097","url":null,"abstract":"This paper provides a taxonomy of security vulnerabilities of smart image sensor systems. Image sensors form an important class of sensors. Many image sensors include computation units that can provide traditional algorithms such as image or video compression along with machine learning tasks such as classification. Some attacks rely on the physics and optics of imaging. Other attacks take advantage of the complex logic and software required to perform imaging systems.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126048511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of Mixed-Driving Multi-Bit Flip-Flops for Power Optimization 用于功率优化的混合驱动多比特触发器的生成
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549473
Meng-Yun Liu, Yu-Cheng Lai, Wai-Kei Mak, Ting-Chi Wang
Multi-bit flip-flops (MBFFs) are often used to reduce the number of clock sinks, resulting in a low-power design. A traditional MBFF is composed of individual FFs of uniform driving strength. However, if some but not all of the bits of an MBFF violate timing constraints, the MBFF has to be sized up or decomposed into smaller bit-width combinations to satisfy timing, which reduces the power saving. In this paper, we present a new MBFF generation approach considering mixed-driving MBFFs whose certain bits have a higher driving strength than the other bits. To maximize the FF merging rate (and hence to minimize the final amount of clock sinks), our approach will first perform aggressive FF merging subject to timing constraints. Our merging is aggressive in the sense that we are willing to possibly oversize some FFs and allow the presence of empty bits in an MBFF to merge FFs into MBFFs of uniform driving strengths as much as possible. The oversized individual FFs of an MBFF will be later downsized subject to timing constraints by our approach, which results in a mixed-driving MBFF. Our MBFF generation approach has been combined with a commercial place and route tool, and our experimental results show the superiority of our approach over a prior work that considers uniform-driving MBFFs only in terms of the clock sink count, the FF power, the clock buffer count, and the routed clock wirelength.
多比特触发器(mbff)通常用于减少时钟接收器的数量,从而实现低功耗设计。传统的MBFF由驱动强度均匀的单个ff组成。但是,如果MBFF的某些位违反了时序限制,则必须对MBFF进行大小调整或分解为更小的位宽度组合以满足时序限制,从而降低了功耗。在本文中,我们提出了一种新的MBFF生成方法,考虑混合驱动MBFF中某些比特的驱动强度高于其他比特。为了最大化FF合并率(从而最小化时钟接收器的最终数量),我们的方法将首先在时间约束下执行积极的FF合并。从某种意义上说,我们的合并是积极的,我们愿意对一些ff进行超大化,并允许MBFF中存在空位,以尽可能地将ff合并为具有统一驱动强度的MBFF。根据我们的方法,MBFF的超大单个ff稍后将根据时间限制缩小,从而形成混合驱动的MBFF。我们的MBFF生成方法已经与商业位置和路由工具相结合,我们的实验结果表明,我们的方法比之前的工作更优越,这些工作只考虑时钟接收计数、FF功率、时钟缓冲区计数和路由时钟长度。
{"title":"Generation of Mixed-Driving Multi-Bit Flip-Flops for Power Optimization","authors":"Meng-Yun Liu, Yu-Cheng Lai, Wai-Kei Mak, Ting-Chi Wang","doi":"10.1145/3508352.3549473","DOIUrl":"https://doi.org/10.1145/3508352.3549473","url":null,"abstract":"Multi-bit flip-flops (MBFFs) are often used to reduce the number of clock sinks, resulting in a low-power design. A traditional MBFF is composed of individual FFs of uniform driving strength. However, if some but not all of the bits of an MBFF violate timing constraints, the MBFF has to be sized up or decomposed into smaller bit-width combinations to satisfy timing, which reduces the power saving. In this paper, we present a new MBFF generation approach considering mixed-driving MBFFs whose certain bits have a higher driving strength than the other bits. To maximize the FF merging rate (and hence to minimize the final amount of clock sinks), our approach will first perform aggressive FF merging subject to timing constraints. Our merging is aggressive in the sense that we are willing to possibly oversize some FFs and allow the presence of empty bits in an MBFF to merge FFs into MBFFs of uniform driving strengths as much as possible. The oversized individual FFs of an MBFF will be later downsized subject to timing constraints by our approach, which results in a mixed-driving MBFF. Our MBFF generation approach has been combined with a commercial place and route tool, and our experimental results show the superiority of our approach over a prior work that considers uniform-driving MBFFs only in terms of the clock sink count, the FF power, the clock buffer count, and the routed clock wirelength.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126688575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISSA: Input-Skippable, Set-Associative Computing-in-Memory (SA-CIM) Architecture for Neural Network Accelerators ISSA:神经网络加速器的可输入跳过、集关联内存计算(SA-CIM)架构
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549333
Yun-Chen Lo, Chih-Chen Yeh, Jun-Shen Wu, Chia-Chun Wang, Yu-Chih Tsai, Wen-Chien Ting, Ren-Shuo Liu
Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors.We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fourfold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs for skipping the zeros, small values, and sparse vectors. 3) We propose a transposed systolic dataflow to efficiently conduct conv3×3 while being capable of exploiting input-skipping schemes. 4) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint.The proposed ISSA architecture improves the throughput by 1.91× to 2.97× speedup and the energy efficiency by 2.5× to 4.2×.
在一些新兴的架构中,内存计算(CIM)具有原位模拟计算的特点,是解决人工智能(AI)的冯·诺伊曼架构的数据移动瓶颈的潜在解决方案。有趣的是,CIM与原位模拟计算显著不同的更多优势尚未被广泛了解。在这项工作中,我们指出互平稳向量(MSVs)是CIM的另一种独特的固有能力,它可以通过向CIM引入结合性而最大化。通过MSVs, CIM展示了使用动态形成的向量对存储的数据(例如,权重)进行动态矢量化以执行敏捷计算的显著自由。我们设计并实现了一个SA-CIM硅原型以及相应的TSMC 28nm制程架构和加速方案。更具体地说,本文的贡献有四个方面:1)我们将msv确定为可用于改进当前基于cim的硬件的性能和能源挑战的新特性。2)我们提出了SA-CIM来增强msv,以跳过零,小值和稀疏向量。3)我们提出了一个转置的收缩数据流,以有效地进行conv3×3,同时能够利用输入跳过方案。4)提出了一种设计流程,在满足精度损失约束的情况下,搜索最优的主动跳过方案设置。所提出的ISSA架构将吞吐量提高了1.91 ~ 2.97倍,加速速度提高了2.5 ~ 4.2倍。
{"title":"ISSA: Input-Skippable, Set-Associative Computing-in-Memory (SA-CIM) Architecture for Neural Network Accelerators","authors":"Yun-Chen Lo, Chih-Chen Yeh, Jun-Shen Wu, Chia-Chun Wang, Yu-Chih Tsai, Wen-Chien Ting, Ren-Shuo Liu","doi":"10.1145/3508352.3549333","DOIUrl":"https://doi.org/10.1145/3508352.3549333","url":null,"abstract":"Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors.We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fourfold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs for skipping the zeros, small values, and sparse vectors. 3) We propose a transposed systolic dataflow to efficiently conduct conv3×3 while being capable of exploiting input-skipping schemes. 4) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint.The proposed ISSA architecture improves the throughput by 1.91× to 2.97× speedup and the energy efficiency by 2.5× to 4.2×.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Uniform Spatial Distribution to Design Efficient Random Number Source for Stochastic Computing 利用均匀空间分布设计高效随机计算随机数源
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549396
Kuncai Zhong, Zexi Li, Haoran Jin, Weikang Qian
Stochastic computing (SC) generally suffers from long latency. One solution is to apply proper random number sources (RNSs). Nevertheless, current RNS designs either have high hardware cost or low accuracy. To address the issue, motivated by that the uniform spatial distribution generally leads to a high accuracy for an SC circuit, we propose a basic architecture to generate the uniform spatial distribution and a further detailed implementation of it. For the implementation, we further propose a method to optimize its hardware cost and a method to optimize its accuracy. The method for hardware cost optimization can optimize the hardware cost without affecting the accuracy. The experimental results show that our proposed implementation can achieve both low hardware cost and high accuracy. Compared to the state-of-the-art stochastic number generator design, the proposed design can reduce 88% area with close accuracy.
随机计算(SC)通常存在较长的延迟。一个解决方案是应用适当的随机数源(RNSs)。然而,目前的RNS设计要么硬件成本高,要么精度低。为了解决这个问题,由于均匀的空间分布通常会导致SC电路的高精度,我们提出了一个基本架构来产生均匀的空间分布并进一步详细实现它。在实现上,我们进一步提出了一种优化其硬件成本的方法和一种优化其精度的方法。硬件成本优化方法可以在不影响精度的情况下优化硬件成本。实验结果表明,该方法既能实现低硬件成本,又能实现高精度。与目前最先进的随机数字发生器设计相比,该设计可以在接近精度的情况下减少88%的面积。
{"title":"Exploiting Uniform Spatial Distribution to Design Efficient Random Number Source for Stochastic Computing","authors":"Kuncai Zhong, Zexi Li, Haoran Jin, Weikang Qian","doi":"10.1145/3508352.3549396","DOIUrl":"https://doi.org/10.1145/3508352.3549396","url":null,"abstract":"Stochastic computing (SC) generally suffers from long latency. One solution is to apply proper random number sources (RNSs). Nevertheless, current RNS designs either have high hardware cost or low accuracy. To address the issue, motivated by that the uniform spatial distribution generally leads to a high accuracy for an SC circuit, we propose a basic architecture to generate the uniform spatial distribution and a further detailed implementation of it. For the implementation, we further propose a method to optimize its hardware cost and a method to optimize its accuracy. The method for hardware cost optimization can optimize the hardware cost without affecting the accuracy. The experimental results show that our proposed implementation can achieve both low hardware cost and high accuracy. Compared to the state-of-the-art stochastic number generator design, the proposed design can reduce 88% area with close accuracy.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121417568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compositional Verification Using a Formal Component and Interface Specification 使用正式组件和接口规范进行组合验证
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549341
Yue Xing, Huaixi Lu, Aarti Gupta, S. Malik
Property-based specification such a s SystemVerilog Assertions (SVA) uses mathematical logic to specify the temporal behavior of RTL designs which can then be formally verified using model checking algorithms. These properties are specified for a single component (which may contain other components in the design hierarchy). Composing design components that have already been verified requires additional verification since incorrect communication at their interface may invalidate the properties that have been checked for the individual components. This paper focuses on a specification for their interface which can be checked individually for each component, and which guarantees that refinement-based properties checked f or each component continue to hold after their composition. We do this in the setting of the Instruction-level Abstraction (ILA) specification and verification methodology. The ILA methodology provides a uniform specification f or processors, a ccelerators and general modules at the instruction-level, and the automatic generation of a complete set of correctness properties for checking that the RTL model is a refinement o f t he ILA specification. We add an interface specification to model the inter-ILA communication. Further, we use our interface specification to generate a set of interface checking properties that check that the communication between the RTL components is correct. This provides the following guarantee: if each RTL component is a refinement of its ILA specification and the interface checks pass, then the RTL composition is a refinement of the ILA composition. We have applied the proposed methodology to six case studies including parts of large-scale designs such as parts of the FlexASR and NVDLA machine learning accelerators, demonstrating the practical applicability of our method.
基于属性的规范,如SystemVerilog断言(SVA)使用数学逻辑来指定RTL设计的时间行为,然后可以使用模型检查算法对其进行正式验证。这些属性是为单个组件指定的(该组件可能包含设计层次结构中的其他组件)。组合已经验证过的设计组件需要额外的验证,因为在它们的接口上不正确的通信可能会使已经为单个组件检查过的属性失效。本文关注的是它们的接口规范,该规范可以为每个组件单独检查,并保证为每个组件检查的基于细化的属性在组成后继续保持不变。我们在指令级抽象(ILA)规范和验证方法的设置中这样做。ILA方法为指令级的处理器、加速器和通用模块提供了统一的规范,并自动生成一套完整的正确性属性,用于检查RTL模型是否为ILA规范的改进。我们添加了一个接口规范来对ila内部通信进行建模。此外,我们使用接口规范生成一组接口检查属性,用于检查RTL组件之间的通信是否正确。这提供了以下保证:如果每个RTL组件都是其ILA规范的细化,并且接口检查通过了,那么RTL组合就是ILA组合的细化。
{"title":"Compositional Verification Using a Formal Component and Interface Specification","authors":"Yue Xing, Huaixi Lu, Aarti Gupta, S. Malik","doi":"10.1145/3508352.3549341","DOIUrl":"https://doi.org/10.1145/3508352.3549341","url":null,"abstract":"Property-based specification such a s SystemVerilog Assertions (SVA) uses mathematical logic to specify the temporal behavior of RTL designs which can then be formally verified using model checking algorithms. These properties are specified for a single component (which may contain other components in the design hierarchy). Composing design components that have already been verified requires additional verification since incorrect communication at their interface may invalidate the properties that have been checked for the individual components. This paper focuses on a specification for their interface which can be checked individually for each component, and which guarantees that refinement-based properties checked f or each component continue to hold after their composition. We do this in the setting of the Instruction-level Abstraction (ILA) specification and verification methodology. The ILA methodology provides a uniform specification f or processors, a ccelerators and general modules at the instruction-level, and the automatic generation of a complete set of correctness properties for checking that the RTL model is a refinement o f t he ILA specification. We add an interface specification to model the inter-ILA communication. Further, we use our interface specification to generate a set of interface checking properties that check that the communication between the RTL components is correct. This provides the following guarantee: if each RTL component is a refinement of its ILA specification and the interface checks pass, then the RTL composition is a refinement of the ILA composition. We have applied the proposed methodology to six case studies including parts of large-scale designs such as parts of the FlexASR and NVDLA machine learning accelerators, demonstrating the practical applicability of our method.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132096029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1