2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献_第6页

Enhanced Fast Boolean Matching based on Sensitivity Signatures Pruning 基于灵敏度特征剪枝的增强快速布尔匹配

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643587

Jiaxi Zhang, Liwei Ni, Shenggen Zheng, Hao Liu, Xiangfu Zou, Feng Wang, Guojie Luo

Boolean matching is significant to digital integrated circuits design. An exhaustive method for Boolean matching is computationally expensive even for functions with only a few variables, because the time complexity of such an algorithm for an n-variable Boolean function is O(2n+1n!). Sensitivity is an important characteristic and a measure of the complexity of Boolean functions. It has been used in analysis of the complexity of algorithms in different fields. This measure could be regarded as a signature of Boolean functions and has great potential to help reduce the search space of Boolean matching. In this paper, we introduce Boolean sensitivity into Boolean matching and design several sensitivity-related signatures to enhance fast Boolean matching. First, we propose some new signatures that relate sensitivity to Boolean equivalence. Then, we prove that these signatures are prerequisites for Boolean matching, which we can use to reduce the search space of the matching problem. Besides, we develop a fast sensitivity calculation method to compute and compare these signatures of two Boolean functions. Compared with the traditional cofactor and symmetric detection methods, sensitivity is a series of signatures of another dimension. We also show that sensitivity can be easily integrated into traditional methods and distinguish the mismatched Boolean functions faster. To the best of our knowledge, this is the first work that introduces sensitivity to Boolean matching. The experimental results show that sensitivity-related signatures we proposed in this paper can reduce the search space to a very large extent, and perform up to 3x speedup over the state-of-the-art Boolean matching methods.

布尔匹配在数字集成电路设计中具有重要意义。即使对于只有几个变量的函数，布尔匹配的穷举方法在计算上也是昂贵的，因为对于有n个变量的布尔函数，这种算法的时间复杂度是O(2n+1n!)灵敏度是布尔函数的一个重要特征，是衡量布尔函数复杂度的一个重要指标。它已被应用于不同领域的算法复杂度分析。这种度量可以看作是布尔函数的一种签名，在减小布尔匹配的搜索空间方面具有很大的潜力。本文将布尔灵敏度引入到布尔匹配中，并设计了几个与灵敏度相关的签名来提高布尔匹配的快速性。首先，我们提出了一些将灵敏度与布尔等价联系起来的新签名。然后，我们证明了这些签名是布尔匹配的先决条件，我们可以使用它们来减少匹配问题的搜索空间。此外，我们还开发了一种快速灵敏度计算方法来计算和比较两个布尔函数的这些特征。与传统的协因子和对称检测方法相比，灵敏度是一系列另一个维度的特征。我们还表明，灵敏度可以很容易地集成到传统方法中，并且可以更快地识别不匹配的布尔函数。据我们所知，这是第一个引入布尔匹配敏感性的工作。实验结果表明，本文提出的灵敏度相关签名可以在很大程度上减少搜索空间，并且比目前最先进的布尔匹配方法的速度提高3倍。

{"title":"Enhanced Fast Boolean Matching based on Sensitivity Signatures Pruning","authors":"Jiaxi Zhang, Liwei Ni, Shenggen Zheng, Hao Liu, Xiangfu Zou, Feng Wang, Guojie Luo","doi":"10.1109/ICCAD51958.2021.9643587","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643587","url":null,"abstract":"Boolean matching is significant to digital integrated circuits design. An exhaustive method for Boolean matching is computationally expensive even for functions with only a few variables, because the time complexity of such an algorithm for an n-variable Boolean function is O(2n+1n!). Sensitivity is an important characteristic and a measure of the complexity of Boolean functions. It has been used in analysis of the complexity of algorithms in different fields. This measure could be regarded as a signature of Boolean functions and has great potential to help reduce the search space of Boolean matching. In this paper, we introduce Boolean sensitivity into Boolean matching and design several sensitivity-related signatures to enhance fast Boolean matching. First, we propose some new signatures that relate sensitivity to Boolean equivalence. Then, we prove that these signatures are prerequisites for Boolean matching, which we can use to reduce the search space of the matching problem. Besides, we develop a fast sensitivity calculation method to compute and compare these signatures of two Boolean functions. Compared with the traditional cofactor and symmetric detection methods, sensitivity is a series of signatures of another dimension. We also show that sensitivity can be easily integrated into traditional methods and distinguish the mismatched Boolean functions faster. To the best of our knowledge, this is the first work that introduces sensitivity to Boolean matching. The experimental results show that sensitivity-related signatures we proposed in this paper can reduce the search space to a very large extent, and perform up to 3x speedup over the state-of-the-art Boolean matching methods.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129581650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Manufacturing Cycle-Time Optimization Using Gaussian Drying Model for Inkjet-Printed Electronics 基于高斯干燥模型的喷墨印刷电子产品制造周期优化

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643438

Tsun-Ming Tseng, M. Lian, Mengchu Li, P. Rinklin, Leroy Grob, B. Wolfrum, Ulf Schlichtmann, P. Rinklin

Inkjet-printed electronics have attracted considerable attention for low-cost mass production. To avoid undesired device behavior due to accidental ink merging and redistribution, high-density designs can benefit from layering and drying in batches. The overall manufacturing cycle-time, however, now becomes dominated by the cumulative drying time of these individual layers. The state-of-the-art approach decomposes the whole design, arranges the modified objects in different layers, and minimizes the number of layers. Fewer layers imply a reduction in the number of printing iterations and thus a higher manufacturing efficiency. Nevertheless, printing objects with significantly different drying dynamics in the same layer leads to a reduction of manufacturing efficiency, since the longest drying object in a given layer dominates the time required for this layer to dry. Consequently, an accurate estimation of the individual layers' drying time is indispensable to minimize the manufacturing cycle-time. To this end, we propose the first Gaussian drying model to evaluate the local evaporation rate in the drying process. Specifically, we estimate the drying time depending on the number, area, and distribution of the objects in a given layer. Finally, we minimize the total drying time by assigning to-be-printed objects to different layers with mixed-integer-linear programming (MILP) methods. Experimental results demonstrate that our Gaussian drying model closely approximates the actual drying process. In particular, comparing the non-optimized fabrication to the optimized results demonstrates that our method is able to reduce the drying time by 39%.

喷墨印刷电子产品因其低成本的批量生产而备受关注。为了避免因意外的油墨合并和再分配而导致的不期望的设备行为，高密度设计可以从分层和批量干燥中受益。然而，整个制造周期时间现在由这些单个层的累积干燥时间主导。最先进的方法分解整个设计，将修改后的对象安排在不同的层中，并最大限度地减少层数。更少的层意味着印刷迭代次数的减少，从而提高了制造效率。然而，在同一层中具有显著不同干燥动力学的打印对象会导致制造效率的降低，因为给定层中干燥时间最长的对象支配了该层干燥所需的时间。因此，准确估计各个层的干燥时间是必不可少的，以尽量减少制造周期时间。为此，我们提出了第一个高斯干燥模型来评估干燥过程中的局部蒸发速率。具体来说，我们根据给定层中物体的数量、面积和分布来估计干燥时间。最后，我们通过混合整数线性规划(MILP)方法将待打印对象分配到不同的层，从而最大限度地减少总干燥时间。实验结果表明，所建立的高斯干燥模型与实际干燥过程非常接近。特别地，将非优化制造与优化结果进行比较表明，我们的方法能够将干燥时间缩短39%。

{"title":"Manufacturing Cycle-Time Optimization Using Gaussian Drying Model for Inkjet-Printed Electronics","authors":"Tsun-Ming Tseng, M. Lian, Mengchu Li, P. Rinklin, Leroy Grob, B. Wolfrum, Ulf Schlichtmann, P. Rinklin","doi":"10.1109/ICCAD51958.2021.9643438","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643438","url":null,"abstract":"Inkjet-printed electronics have attracted considerable attention for low-cost mass production. To avoid undesired device behavior due to accidental ink merging and redistribution, high-density designs can benefit from layering and drying in batches. The overall manufacturing cycle-time, however, now becomes dominated by the cumulative drying time of these individual layers. The state-of-the-art approach decomposes the whole design, arranges the modified objects in different layers, and minimizes the number of layers. Fewer layers imply a reduction in the number of printing iterations and thus a higher manufacturing efficiency. Nevertheless, printing objects with significantly different drying dynamics in the same layer leads to a reduction of manufacturing efficiency, since the longest drying object in a given layer dominates the time required for this layer to dry. Consequently, an accurate estimation of the individual layers' drying time is indispensable to minimize the manufacturing cycle-time. To this end, we propose the first Gaussian drying model to evaluate the local evaporation rate in the drying process. Specifically, we estimate the drying time depending on the number, area, and distribution of the objects in a given layer. Finally, we minimize the total drying time by assigning to-be-printed objects to different layers with mixed-integer-linear programming (MILP) methods. Experimental results demonstrate that our Gaussian drying model closely approximates the actual drying process. In particular, comparing the non-optimized fabrication to the optimized results demonstrates that our method is able to reduce the drying time by 39%.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"211 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133388137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs 基于fpga的稀疏矩阵-向量乘法重排序优化数据重用

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643453

Shiqing Li, Di Liu, Weichen Liu

Sparse matrix-vector multiplication (SpMV) is of paramount importance in both scientific and engineering applications. The main workload of SpMV is multiplications between randomly distributed nonzero elements in sparse matrices and their corresponding vector elements. Due to irregular data access patterns of vector elements and the limited memory bandwidth, the computational throughput of CPUs and GPUs is lower than the peak performance offered by FPGAs. FPGA's large on-chip memory allows the input vector to be buffered on-chip and hence the off-chip memory bandwidth is only utilized to transfer the nonzero elements' values, column indices, and row indices. Multiple nonzero elements are transmitted to FPGA and then their corresponding vector elements are accessed per cycle. However, typical on-chip block RAMs (BRAM) in FPGAs only have two access ports. The mismatch between off-chip memory bandwidth and on-chip memory ports stalls the whole engine, resulting in inefficient utilization of off-chip memory bandwidth. In this work, we reorder the nonzero elements to optimize data reuse for SpMV on FPGAs. The key observation is that since the vector elements can be reused for nonzero elements with the same column index, memory requests of these elements can be omitted by reusing the fetched data. Based on this observation, a novel compressed format is proposed to optimize data reuse by reordering the matrix's nonzero elements. Further, to support the compressed format, we design a scalable hardware accelerator and implement it on the Xilinx UltraScale ZCU106 platform. We evaluate the proposed design with a set of matrices from the University of Florida sparse matrix collection. The experimental results show that the proposed design achieves an average 1.22x performance speedup w.r.t. the state-of-the-art work.

稀疏矩阵向量乘法(SpMV)在科学和工程应用中都具有极其重要的意义。SpMV的主要工作是稀疏矩阵中随机分布的非零元素与其对应向量元素的乘法。由于矢量元素的不规则数据访问模式和有限的内存带宽，cpu和gpu的计算吞吐量低于fpga提供的峰值性能。FPGA的大片上存储器允许在片上缓冲输入矢量，因此片外存储器带宽仅用于传输非零元素的值、列索引和行索引。将多个非零元素传输到FPGA，然后每个周期访问它们对应的向量元素。然而，fpga中典型的片上块ram (BRAM)只有两个访问端口。片外内存带宽与片内内存端口不匹配会导致整个引擎停转，导致片外内存带宽利用率低下。在这项工作中，我们重新排序非零元素以优化fpga上SpMV的数据重用。关键的观察结果是，由于vector元素可以为具有相同列索引的非零元素重用，因此可以通过重用获取的数据来省略对这些元素的内存请求。在此基础上，提出了一种新的压缩格式，通过对矩阵的非零元素重新排序来优化数据重用。此外，为了支持压缩格式，我们设计了一个可扩展的硬件加速器，并在赛灵思UltraScale ZCU106平台上实现。我们用一组来自佛罗里达大学稀疏矩阵集合的矩阵来评估所提出的设计。实验结果表明，所提出的设计实现了平均1.22倍的性能加速。

{"title":"Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs","authors":"Shiqing Li, Di Liu, Weichen Liu","doi":"10.1109/ICCAD51958.2021.9643453","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643453","url":null,"abstract":"Sparse matrix-vector multiplication (SpMV) is of paramount importance in both scientific and engineering applications. The main workload of SpMV is multiplications between randomly distributed nonzero elements in sparse matrices and their corresponding vector elements. Due to irregular data access patterns of vector elements and the limited memory bandwidth, the computational throughput of CPUs and GPUs is lower than the peak performance offered by FPGAs. FPGA's large on-chip memory allows the input vector to be buffered on-chip and hence the off-chip memory bandwidth is only utilized to transfer the nonzero elements' values, column indices, and row indices. Multiple nonzero elements are transmitted to FPGA and then their corresponding vector elements are accessed per cycle. However, typical on-chip block RAMs (BRAM) in FPGAs only have two access ports. The mismatch between off-chip memory bandwidth and on-chip memory ports stalls the whole engine, resulting in inefficient utilization of off-chip memory bandwidth. In this work, we reorder the nonzero elements to optimize data reuse for SpMV on FPGAs. The key observation is that since the vector elements can be reused for nonzero elements with the same column index, memory requests of these elements can be omitted by reusing the fetched data. Based on this observation, a novel compressed format is proposed to optimize data reuse by reordering the matrix's nonzero elements. Further, to support the compressed format, we design a scalable hardware accelerator and implement it on the Xilinx UltraScale ZCU106 platform. We evaluate the proposed design with a set of matrices from the University of Florida sparse matrix collection. The experimental results show that the proposed design achieves an average 1.22x performance speedup w.r.t. the state-of-the-art work.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133246404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

2021 ICCAD CAD Contest Problem B: Routing with Cell Movement Advanced: Invited Paper 2021 ICCAD CAD竞赛问题B:带细胞运动的路由

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643568

Kai-Shun Hu, Tao-Chun Yu, Ming Yang, Cindy Chin-Fang Shen

2021 ICCAD CAD Contest Problem B is an extended problem from 2020 ICCAD CAD Contest Problem B [1]–[2] for addressing more complex constraints. In the physical implementation, the common approach is to divide the problem into the placement and routing stage. By doing this divide-and-conquer approach, it may cause conservative margin reservation and miscorrelation. In order to achieve multiple advanced objectives in terms of Power, timing Performance and Area, so called PPA, a certain amount of cell movement at the routing stage become a desired functionality in an EDA tool. 2021 ICCAD CAD Contest Problem B encourages the research in the techniques of routing with cell movement to achieve multiple objectives in the advanced process nodes (less than 7 nm). We provided (i) a set of benchmarks and (ii) an evaluation metric of multiple objectives including power factor, the criticality of timing critical nets, the maximum number of moving cells, and the total routing length optimization that facilitate contestants to develop and test their new algorithms.

2021年ICCAD CAD竞赛问题B是2020年ICCAD CAD竞赛问题B[1] -[2]的扩展问题，用于解决更复杂的约束。在物理实现中，常见的方法是将问题划分为放置和路由阶段。通过这种分而治之的方法，可能会导致保守的保证金保留和不相关。为了在功率、时序性能和面积(即PPA)方面实现多个高级目标，路由阶段的一定数量的单元移动成为EDA工具中所需的功能。2021 ICCAD CAD竞赛问题B鼓励研究具有细胞运动的路由技术，以实现高级工艺节点(小于7纳米)的多个目标。我们提供了(i)一组基准和(ii)多个目标的评估指标，包括功率因数、定时关键网的临界性、移动单元的最大数量和总路由长度优化，以方便参赛者开发和测试他们的新算法。

引用次数: 3

Analytical Modeling of Transient Electromigration Stress based on Boundary Reflections 基于边界反射的瞬态电迁移应力解析建模

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643570

Mohammad Abdullah Al Shohel, Vidya A. Chhabria, N. Evmorfopoulos, S. Sapatnekar

Traditional methods that test for electromigration (EM) failure in multisegment interconnects, over the lifespan of an IC, are based on the use of the Blech criterion, followed by Black's equation. Such methods analyze each segment independently, but are well known to be inaccurate due to stress buildup over multiple segments. This paper introduces the new concept of boundary reflections of stress flow that ascribes a physical (wave-like) interpretation to the transient stress behavior in a finite multisegment line. This can provide a framework for deriving analytical expressions of transient EM stress for lines with any number of segments, which can also be tailored to include the appropriate number of terms for any desired level of accuracy. The proposed method is shown to have excellent accuracy, through evaluations against the FEM solver COMSOL, as well as scalability, through its application on large power grid benchmarks.

在IC的使用寿命期间，测试多段互连中的电迁移(EM)故障的传统方法是基于Blech标准的使用，然后是Black方程。这种方法独立分析每个管段，但众所周知，由于多个管段的应力积聚，这种方法是不准确的。本文引入了应力流动边界反射的新概念，将有限多段线的瞬态应力行为归因于物理(波状)解释。这可以提供一个框架，用于推导具有任意数量分段的直线瞬态电磁应力的解析表达式，也可以定制为包括任何所需精度水平的适当数量的项。通过对有限元求解器COMSOL的评估，表明该方法具有良好的精度，并通过在大型电网基准上的应用证明了该方法的可扩展性。

引用次数: 14

Generalizable Cross-Graph Embedding for GNN-based Congestion Prediction 基于gnn的拥塞预测的可推广交叉图嵌入

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643446

Amur Ghose, Vincent Zhang, Yingxue Zhang, Dong Li, Wulong Liu, M. Coates

Presently with technology node scaling, an accurate prediction model at early design stages can significantly reduce the design cycle. Especially during logic synthesis, predicting cell congestion due to improper logic combination can reduce the burden of subsequent physical implementations. There have been attempts using Graph Neural Network (GNN) techniques to tackle congestion prediction during the logic synthesis stage. However, they require informative cell features to achieve reasonable performance since the core idea of GNNs is built on the message passing framework, which would be impractical at the early logic synthesis stage. To address this limitation, we propose a framework that can directly learn embeddings for the given netlist to enhance the quality of our node features. Popular random-walk based embedding methods such as Node2vec, LINE, and DeepWalk suffer from the issue of cross-graph alignment and poor generalization to unseen netlist graphs, yielding inferior performance and costing significant runtime. In our framework, we introduce a superior alternative to obtain node embeddings that can generalize across netlist graphs using matrix factorization methods. We propose an efficient mini-batch training method at the sub-graph level that can guarantee parallel training and satisfy the memory restriction for large-scale netlists. We present results utilizing open-source EDA tools such as DREAMPLACE and OPENROAD frameworks on a variety of openly available circuits. By combining the learned embedding on top of the netlist with the GNNs, our method improves prediction performance, generalizes to new circuit lines, and is efficient in training, potentially saving over 90% of runtime.

在技术节点规模化的今天，在设计早期建立一个准确的预测模型可以大大缩短设计周期。特别是在逻辑合成过程中，预测由于逻辑组合不当导致的蜂窝拥塞可以减少后续物理实现的负担。已经有人尝试使用图神经网络(GNN)技术来解决逻辑合成阶段的拥塞预测问题。然而，由于gnn的核心思想是建立在消息传递框架上的，因此它们需要信息性的单元特征来实现合理的性能，这在早期的逻辑合成阶段是不切实际的。为了解决这一限制，我们提出了一个框架，可以直接学习给定网络列表的嵌入，以提高我们的节点特征的质量。流行的基于随机漫步的嵌入方法，如Node2vec、LINE和DeepWalk，都存在交叉图对齐和对未见过的网表图泛化不良的问题，产生较差的性能并花费大量的运行时间。在我们的框架中，我们引入了一种更好的替代方法来获得节点嵌入，它可以使用矩阵分解方法在网络列表图中进行推广。提出了一种高效的子图级小批量训练方法，既能保证并行训练，又能满足大规模网络列表的内存限制。我们展示了利用开源EDA工具(如DREAMPLACE和OPENROAD框架)在各种公开可用电路上的结果。通过将网络表顶部的学习嵌入与gnn相结合，我们的方法提高了预测性能，推广到新的电路线路，并且在训练中效率高，可能节省90%以上的运行时间。

{"title":"Generalizable Cross-Graph Embedding for GNN-based Congestion Prediction","authors":"Amur Ghose, Vincent Zhang, Yingxue Zhang, Dong Li, Wulong Liu, M. Coates","doi":"10.1109/ICCAD51958.2021.9643446","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643446","url":null,"abstract":"Presently with technology node scaling, an accurate prediction model at early design stages can significantly reduce the design cycle. Especially during logic synthesis, predicting cell congestion due to improper logic combination can reduce the burden of subsequent physical implementations. There have been attempts using Graph Neural Network (GNN) techniques to tackle congestion prediction during the logic synthesis stage. However, they require informative cell features to achieve reasonable performance since the core idea of GNNs is built on the message passing framework, which would be impractical at the early logic synthesis stage. To address this limitation, we propose a framework that can directly learn embeddings for the given netlist to enhance the quality of our node features. Popular random-walk based embedding methods such as Node2vec, LINE, and DeepWalk suffer from the issue of cross-graph alignment and poor generalization to unseen netlist graphs, yielding inferior performance and costing significant runtime. In our framework, we introduce a superior alternative to obtain node embeddings that can generalize across netlist graphs using matrix factorization methods. We propose an efficient mini-batch training method at the sub-graph level that can guarantee parallel training and satisfy the memory restriction for large-scale netlists. We present results utilizing open-source EDA tools such as DREAMPLACE and OPENROAD frameworks on a variety of openly available circuits. By combining the learned embedding on top of the netlist with the GNNs, our method improves prediction performance, generalizes to new circuit lines, and is efficient in training, potentially saving over 90% of runtime.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134178628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

TopoPart: a Multi-level Topology-Driven Partitioning Framework for Multi-FPGA Systems TopoPart:多fpga系统的多级拓扑驱动分区框架

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643481

Dan Zheng, Xinshi Zang, Martin D. F. Wong

As the complexity of circuit designs continues growing, multi-FPGA systems are becoming more and more popular for logic emulation and rapid prototyping. In a multi-FPGA system, different FPGAs are connected by limited physical wires, in other words, one FPGA usually has direct connections with only a few FPGAs. During the circuit partitioning stage, assigning two directly connected nodes to two FPGAs without physical links would significantly increase the delay and degrade the overall performance. However, some well-known partitioners, like hMETIS and PaToH, mainly focus on cut size minimization without considering such topology constraints of FPGAs, which limits their practical usage. In this paper, we propose a multi-level topology-driven partitioning framework, named as TopoPart, to deal with topology constraints in a multi-FPGA system. In particular, we firstly devise a candidate FPGA propagation algorithm in the coarsening phase to guarantee the later stages free of topology violations. In the last refinement phase, cut size is iteratively optimized maintaining both topology and resource constraints. Compared with the proposed baseline, our partitioning algorithm achieves zero topology violation while giving less cut size.

随着电路设计复杂性的不断提高，多fpga系统在逻辑仿真和快速原型设计中越来越受欢迎。在多FPGA系统中，不同的FPGA通过有限的物理线连接，换句话说，一个FPGA通常只与几个FPGA直接连接。在电路划分阶段，将两个直连节点分配给两个没有物理链路的fpga，会显著增加延迟，降低整体性能。然而，一些知名的分区器，如hMETIS和PaToH，主要关注于切割尺寸最小化，而没有考虑fpga的拓扑约束，这限制了它们的实际使用。在本文中，我们提出了一个多层拓扑驱动的划分框架，称为TopoPart，以处理多fpga系统中的拓扑约束。特别是，我们首先在粗化阶段设计了一种候选FPGA传播算法，以保证后期阶段不存在拓扑违规。在最后的细化阶段，切割尺寸被迭代地优化，同时保持拓扑和资源约束。与提出的基线相比，我们的划分算法在给出更小切割尺寸的同时实现了零拓扑冲突。

{"title":"TopoPart: a Multi-level Topology-Driven Partitioning Framework for Multi-FPGA Systems","authors":"Dan Zheng, Xinshi Zang, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643481","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643481","url":null,"abstract":"As the complexity of circuit designs continues growing, multi-FPGA systems are becoming more and more popular for logic emulation and rapid prototyping. In a multi-FPGA system, different FPGAs are connected by limited physical wires, in other words, one FPGA usually has direct connections with only a few FPGAs. During the circuit partitioning stage, assigning two directly connected nodes to two FPGAs without physical links would significantly increase the delay and degrade the overall performance. However, some well-known partitioners, like hMETIS and PaToH, mainly focus on cut size minimization without considering such topology constraints of FPGAs, which limits their practical usage. In this paper, we propose a multi-level topology-driven partitioning framework, named as TopoPart, to deal with topology constraints in a multi-FPGA system. In particular, we firstly devise a candidate FPGA propagation algorithm in the coarsening phase to guarantee the later stages free of topology violations. In the last refinement phase, cut size is iteratively optimized maintaining both topology and resource constraints. Compared with the proposed baseline, our partitioning algorithm achieves zero topology violation while giving less cut size.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133466240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Evolving Complementary Sparsity Patterns for Hardware-Friendly Inference of Sparse DNNs 基于互补稀疏模式的稀疏dnn硬件友好推理

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643452

Elbruz Ozen, A. Orailoglu

Sparse deep learning models are known to be more accurate than their dense counterparts for equal parameter and computational budgets. Unstructured model pruning can deliver dramatic compression rates, yet the consequent irregular sparsity patterns lead to severe computational challenges for modern computational hardware. Our work introduces a set of complementary sparsity patterns to construct both highly expressive and inherently regular sparse neural network layers. We propose a novel training approach to evolve inherently regular sparsity configurations and transform the expressive power of the proposed layers into a competitive classification accuracy even under extreme sparsity constraints. The structure of the introduced sparsity patterns engenders optimal compression of the layer parameters into a dense representation. Moreover, the constructed layers can be processed in the compressed format with full-hardware utilization in minimally modified non-sparse computational hardware. The experimental results demonstrate superior compression rates and remarkable performance improvements in sparse neural network inference in systolic arrays.

众所周知，对于相同的参数和计算预算，稀疏深度学习模型比密集模型更准确。非结构化模型修剪可以提供惊人的压缩率，但随之而来的不规则稀疏模式给现代计算硬件带来了严峻的计算挑战。我们的工作引入了一组互补的稀疏模式来构建高表现力和固有规则的稀疏神经网络层。我们提出了一种新的训练方法来进化固有的规则稀疏性配置，并将所提出的层的表达能力转化为在极端稀疏性约束下具有竞争力的分类精度。引入的稀疏模式的结构可以将层参数优化压缩为密集表示。此外，构造的层可以在最小修改的非稀疏计算硬件中以全硬件利用率的压缩格式进行处理。实验结果表明，稀疏神经网络在收缩阵列中的压缩率和性能都有显著提高。

{"title":"Evolving Complementary Sparsity Patterns for Hardware-Friendly Inference of Sparse DNNs","authors":"Elbruz Ozen, A. Orailoglu","doi":"10.1109/ICCAD51958.2021.9643452","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643452","url":null,"abstract":"Sparse deep learning models are known to be more accurate than their dense counterparts for equal parameter and computational budgets. Unstructured model pruning can deliver dramatic compression rates, yet the consequent irregular sparsity patterns lead to severe computational challenges for modern computational hardware. Our work introduces a set of complementary sparsity patterns to construct both highly expressive and inherently regular sparse neural network layers. We propose a novel training approach to evolve inherently regular sparsity configurations and transform the expressive power of the proposed layers into a competitive classification accuracy even under extreme sparsity constraints. The structure of the introduced sparsity patterns engenders optimal compression of the layer parameters into a dense representation. Moreover, the constructed layers can be processed in the compressed format with full-hardware utilization in minimally modified non-sparse computational hardware. The experimental results demonstrate superior compression rates and remarkable performance improvements in sparse neural network inference in systolic arrays.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116118475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Machine Learning-Based Test Pattern Generation for Neuromorphic Chips 基于机器学习的神经形态芯片测试模式生成

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643459

Hsiao-Yin Tseng, I. Chiu, Mu-Ting Wu, C. Li

The demand for neuromorphic chips has skyrocketed in recent years. Thus, efficient manufacturing testing becomes an issue. Conventional testing cannot be applied because some neuromorphic chips do not have scan chains. However, traditional functional testing for neuromorphic chips suffers from long test length and low fault coverage. In this work, we propose a machine learning-based test pattern generation technique with behavior fault models. We use the concept of adversarial attack to generate test patterns to improve the fault coverage of existing functional test patterns. The effectiveness of the proposed technique is demonstrated on two Spiking Neural Network models trained on MNIST. Compared to traditional functional testing, our proposed technique reduces test length by 566x to 8,824x and improves fault coverage by 8.1% to 86.3% on five fault models. Finally, we propose a methodology to solve the scalability issue for the synapse fault models, resulting in 25.7x run time reduction on test pattern generation for synapse faults.

近年来，对神经形态芯片的需求激增。因此，有效的制造测试成为一个问题。由于一些神经形态芯片没有扫描链，传统测试无法应用。然而，传统的神经形态芯片功能测试存在测试时间长、故障覆盖率低的问题。在这项工作中，我们提出了一种基于机器学习的带有行为故障模型的测试模式生成技术。我们使用对抗性攻击的概念来生成测试模式，以提高现有功能测试模式的故障覆盖率。在MNIST训练的两个脉冲神经网络模型上验证了该方法的有效性。与传统的功能测试相比，我们提出的技术将测试长度减少了566倍至8,824倍，并将5种故障模型的故障覆盖率提高了8.1%至86.3%。最后，我们提出了一种方法来解决突触故障模型的可伸缩性问题，从而使突触故障测试模式生成的运行时间减少了25.7倍。

引用次数: 7

Federated Contrastive Learning for Dermatological Disease Diagnosis via On-device Learning (Invited Paper) 基于设备上学习的皮肤病诊断联合对比学习(特邀论文)

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643454

Yawen Wu, Dewen Zeng, Zhepeng Wang, Yi Sheng, Lei Yang, Alaina J. James, Yiyu Shi, Jingtong Hu

Deep learning models have been deployed in an increasing number of edge and mobile devices to provide healthcare. These models rely on training with a tremendous amount of labeled data to achieve high accuracy. However, for medical applications such as dermatological disease diagnosis, the private data collected by mobile dermatology assistants exist on distributed mobile devices of patients, and each device only has a limited amount of data. Directly learning from limited data greatly deteriorates the performance of learned models. Federated learning (FL) can train models by using data distributed on devices while keeping the data local for privacy. Existing works on FL assume all the data have ground-truth labels. However, medical data often comes without any accompanying labels since labeling requires expertise and results in prohibitively high labor costs. The recently developed self-supervised learning approach, contrastive learning (CL), can leverage the unlabeled data to pre-train a model for learning data representations, after which the learned model can be fine-tuned on limited labeled data to perform dermatological disease diagnosis. However, simply combining CL with FL as federated contrastive learning (FCL) will result in ineffective learning since CL requires diverse data for accurate learning but each device in FL only has limited data diversity. In this work, we propose an on-device FCL framework for dermatological disease diagnosis with limited labels. Features are shared among devices in the FCL pre-training process to provide diverse and accurate contrastive information without sharing raw data for privacy. After that, the pre-trained model is fine-tuned with local labeled data independently on each device or collaboratively with supervised federated learning on all devices. Experiments on dermatological disease datasets show that the proposed framework effectively improves the recall and precision of dermatological disease diagnosis compared with state-of-the-art methods.

深度学习模型已被部署在越来越多的边缘和移动设备中，以提供医疗保健服务。这些模型依赖于大量标记数据的训练来达到高精度。然而，对于皮肤病诊断等医疗应用，移动皮肤科助理收集的私人数据存在于患者的分布式移动设备上，每个设备的数据量有限。联邦学习(FL)可以通过使用分布在设备上的数据来训练模型，同时将数据保存在本地以保护隐私。现有的关于FL的工作假设所有的数据都有真值标签。然而，医疗数据往往没有任何附带的标签，因为标签需要专业知识，并导致过高的劳动力成本。最近开发的自监督学习方法，对比学习(CL)，可以利用未标记的数据来预训练一个模型来学习数据表示，之后学习的模型可以在有限的标记数据上进行微调，以进行皮肤病诊断。然而，简单地将CL与FL结合为联邦对比学习(federated contrast learning, FCL)会导致学习效果不佳，因为CL需要多样化的数据才能进行准确的学习，而FL中的每个设备只有有限的数据多样性。在这项工作中，我们提出了一个设备上的FCL框架，用于有限标签的皮肤病诊断。在FCL预训练过程中，设备之间共享特征，以提供多样化和准确的对比信息，而无需共享原始数据以保护隐私。之后，预训练模型在每个设备上独立地使用本地标记数据进行微调，或者在所有设备上与监督联邦学习协作。在皮肤病数据集上的实验表明，与现有方法相比，该框架有效地提高了皮肤病诊断的查全率和查准率。

{"title":"Federated Contrastive Learning for Dermatological Disease Diagnosis via On-device Learning (Invited Paper)","authors":"Yawen Wu, Dewen Zeng, Zhepeng Wang, Yi Sheng, Lei Yang, Alaina J. James, Yiyu Shi, Jingtong Hu","doi":"10.1109/ICCAD51958.2021.9643454","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643454","url":null,"abstract":"Deep learning models have been deployed in an increasing number of edge and mobile devices to provide healthcare. These models rely on training with a tremendous amount of labeled data to achieve high accuracy. However, for medical applications such as dermatological disease diagnosis, the private data collected by mobile dermatology assistants exist on distributed mobile devices of patients, and each device only has a limited amount of data. Directly learning from limited data greatly deteriorates the performance of learned models. Federated learning (FL) can train models by using data distributed on devices while keeping the data local for privacy. Existing works on FL assume all the data have ground-truth labels. However, medical data often comes without any accompanying labels since labeling requires expertise and results in prohibitively high labor costs. The recently developed self-supervised learning approach, contrastive learning (CL), can leverage the unlabeled data to pre-train a model for learning data representations, after which the learned model can be fine-tuned on limited labeled data to perform dermatological disease diagnosis. However, simply combining CL with FL as federated contrastive learning (FCL) will result in ineffective learning since CL requires diverse data for accurate learning but each device in FL only has limited data diversity. In this work, we propose an on-device FCL framework for dermatological disease diagnosis with limited labels. Features are shared among devices in the FCL pre-training process to provide diverse and accurate contrastive information without sharing raw data for privacy. After that, the pre-trained model is fine-tuned with local labeled data independently on each device or collaboratively with supervised federated learning on all devices. Experiments on dermatological disease datasets show that the proposed framework effectively improves the recall and precision of dermatological disease diagnosis compared with state-of-the-art methods.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1