首页 > 最新文献

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文 中文
Superfast Full-Scale GPU-Accelerated Global Routing 超高速全尺寸gpu加速全局路由
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549474
Shiju Lin, Martin D. F. Wong
Global routing is an essential step in physical design. Recently there are works on accelerating global routers using GPU. However, they only focus on certain stages of global routing, and have limited overall speedup. In this paper, we present a superfast full-scale GPU-accelerated global router and introduce useful parallelization techniques for routing. Experiments show that our 3D router achieves both good quality and short runtime compared to other state-of-the-art academic global routers.
全局路由是物理设计中必不可少的一步。最近有一些使用GPU加速全局路由器的工作。然而,它们只关注全局路由的某些阶段,整体加速有限。在本文中,我们提出了一个超高速全尺寸gpu加速的全局路由器,并介绍了有用的路由并行化技术。实验结果表明,与其他先进的学术全球路由器相比,我们的3D路由器具有良好的质量和较短的运行时间。
{"title":"Superfast Full-Scale GPU-Accelerated Global Routing","authors":"Shiju Lin, Martin D. F. Wong","doi":"10.1145/3508352.3549474","DOIUrl":"https://doi.org/10.1145/3508352.3549474","url":null,"abstract":"Global routing is an essential step in physical design. Recently there are works on accelerating global routers using GPU. However, they only focus on certain stages of global routing, and have limited overall speedup. In this paper, we present a superfast full-scale GPU-accelerated global router and introduce useful parallelization techniques for routing. Experiments show that our 3D router achieves both good quality and short runtime compared to other state-of-the-art academic global routers.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128652064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qilin: Enabling Performance Analysis and Optimization of Shared-Virtual Memory Systems with FPGA Accelerators 齐林:基于FPGA加速器的共享虚拟内存系统性能分析与优化
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549431
Edward Richter, Deming Chen
While the tight integration of components in heterogeneous systems has increased the popularity of the Shared-Virtual Memory (SVM) system programming model, the overhead of SVM can significantly impact end-to-end application performance. However, studying SVM implementations is difficult, as there is no open and flexible system to explore trade-offs between different SVM implementations and the SVM design space is not clearly defined. To this end, we present Qilin, the first open-source system which enables thorough study of SVM in heterogeneous computing environments for discrete accelerators. Qilin is a transparent and flexible system built on top of an open-source FPGA shell, which allows researchers to alter components of the underlying SVM implementation to understand how SVM design decisions impact performance. Using Qilin, we perform an extensive quantitative analysis on the over-heads of three SVM architectures, and generate several insights which highlight the cost and benefits of each architecture. From these insights, we propose a flowchart of how to choose the best SVM implementation given the application characteristics and the SVM capabilities of the system. Qilin also provides application developers a flexible SVM shell for high-performance virtualized applications. Optimizations enabled by Qilin can reduce the latency of translations by 6.86x compared to an open-source FPGA shell.
虽然异构系统中组件的紧密集成提高了共享虚拟内存(SVM)系统编程模型的流行程度,但SVM的开销会显著影响端到端应用程序的性能。然而,研究支持向量机的实现是困难的,因为没有一个开放和灵活的系统来探索不同支持向量机实现之间的权衡,支持向量机的设计空间也没有明确定义。为此,我们提出了Qilin,这是第一个能够在离散加速器的异构计算环境中深入研究SVM的开源系统。Qilin是一个建立在开源FPGA外壳之上的透明灵活的系统,它允许研究人员改变底层SVM实现的组件,以了解SVM设计决策如何影响性能。使用Qilin,我们对三种支持向量机架构的开销进行了广泛的定量分析,并生成了一些突出每个架构的成本和收益的见解。根据这些见解,我们提出了一个流程图,说明如何根据应用特性和系统的支持向量机功能选择最佳的支持向量机实现。麒麟还为应用程序开发人员提供了灵活的SVM外壳,用于高性能虚拟化应用程序。与开源FPGA外壳相比,麒麟启用的优化可以将转换延迟减少6.86倍。
{"title":"Qilin: Enabling Performance Analysis and Optimization of Shared-Virtual Memory Systems with FPGA Accelerators","authors":"Edward Richter, Deming Chen","doi":"10.1145/3508352.3549431","DOIUrl":"https://doi.org/10.1145/3508352.3549431","url":null,"abstract":"While the tight integration of components in heterogeneous systems has increased the popularity of the Shared-Virtual Memory (SVM) system programming model, the overhead of SVM can significantly impact end-to-end application performance. However, studying SVM implementations is difficult, as there is no open and flexible system to explore trade-offs between different SVM implementations and the SVM design space is not clearly defined. To this end, we present Qilin, the first open-source system which enables thorough study of SVM in heterogeneous computing environments for discrete accelerators. Qilin is a transparent and flexible system built on top of an open-source FPGA shell, which allows researchers to alter components of the underlying SVM implementation to understand how SVM design decisions impact performance. Using Qilin, we perform an extensive quantitative analysis on the over-heads of three SVM architectures, and generate several insights which highlight the cost and benefits of each architecture. From these insights, we propose a flowchart of how to choose the best SVM implementation given the application characteristics and the SVM capabilities of the system. Qilin also provides application developers a flexible SVM shell for high-performance virtualized applications. Optimizations enabled by Qilin can reduce the latency of translations by 6.86x compared to an open-source FPGA shell.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126668122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
False Data Injection Attacks on Sensor Systems 传感器系统的虚假数据注入攻击
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561098
D. Serpanos
False data injection attacks on sensor systems are an emerging threat to cyberphysical systems, creating significant risks to all application domains and, importantly, to critical infrastructures. Cyberphysical systems are process-dependent leading to differing false data injection attacks that target disruption of the specific processes (plants). We present a taxonomy of false data injection attacks, using a general model for cyberphysical systems, showing that global and continuous attacks are extremely powerful. In order to detect false data injection attacks, we describe three methods that can be employed to enable effective monitoring and detection of false data injection attacks during plant operation. Considering that sensor failures have equivalent effects to relative false data injection attacks, the methods are effective for sensor fault detection as well.
对传感器系统的虚假数据注入攻击是对网络物理系统的一种新兴威胁,对所有应用领域,尤其是关键基础设施造成重大风险。网络物理系统是过程相关的,导致针对特定过程(工厂)中断的不同虚假数据注入攻击。我们提出了假数据注入攻击的分类,使用网络物理系统的通用模型,表明全局和连续攻击是非常强大的。为了检测虚假数据注入攻击,我们描述了三种方法,可以用来在工厂运行期间有效监控和检测虚假数据注入攻击。考虑到传感器故障与相对的假数据注入攻击具有同等的影响,该方法对于传感器故障检测也是有效的。
{"title":"False Data Injection Attacks on Sensor Systems","authors":"D. Serpanos","doi":"10.1145/3508352.3561098","DOIUrl":"https://doi.org/10.1145/3508352.3561098","url":null,"abstract":"False data injection attacks on sensor systems are an emerging threat to cyberphysical systems, creating significant risks to all application domains and, importantly, to critical infrastructures. Cyberphysical systems are process-dependent leading to differing false data injection attacks that target disruption of the specific processes (plants). We present a taxonomy of false data injection attacks, using a general model for cyberphysical systems, showing that global and continuous attacks are extremely powerful. In order to detect false data injection attacks, we describe three methods that can be employed to enable effective monitoring and detection of false data injection attacks during plant operation. Considering that sensor failures have equivalent effects to relative false data injection attacks, the methods are effective for sensor fault detection as well.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115624769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2022 CAD Contest Problem A: Learning Arithmetic Operations from Gate-Level Circuit 2022年CAD竞赛题目A:从门级电路学习算术运算
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561107
Chung-Han Chou, Chih-Jen Hsu, Chi-An Wu, Kuan-Hua Tu
Extracting circuit functionality from a gate-level netlist is critical in CAD tools. For security, it helps designers to detect hardware Trojans or malicious design changes in the netlist with third-party resources such as fabrication services and soft/hard IP cores. For verification, it can reduce the complexity and effort of keeping design information in aggressive optimization strategies adopted by synthesis tools. For Engineering Change Order (ECO), it can keep the designer from locating the ECO gate in a sea of bit-level gates.In this contest, we formulated a datapath learning and extraction problem. With a set of benchmarks and an evaluation metric, we expect contestants to develop a tool to learn the arithmetic equations from a synthesized gate-level netlist.
从门级网表中提取电路功能在CAD工具中是至关重要的。为了安全起见,它可以帮助设计人员检测硬件木马或恶意设计更改的网络列表与第三方资源,如制造服务和软/硬IP核。对于验证,它可以减少合成工具采用的激进优化策略中保持设计信息的复杂性和工作量。对于工程变更单(ECO),它可以使设计人员避免将ECO门定位在一堆位级门中。在这次比赛中,我们制定了一个数据路径学习和提取问题。通过一组基准和评估指标,我们期望参赛者开发一种工具来学习合成门级网表中的算术方程。
{"title":"2022 CAD Contest Problem A: Learning Arithmetic Operations from Gate-Level Circuit","authors":"Chung-Han Chou, Chih-Jen Hsu, Chi-An Wu, Kuan-Hua Tu","doi":"10.1145/3508352.3561107","DOIUrl":"https://doi.org/10.1145/3508352.3561107","url":null,"abstract":"Extracting circuit functionality from a gate-level netlist is critical in CAD tools. For security, it helps designers to detect hardware Trojans or malicious design changes in the netlist with third-party resources such as fabrication services and soft/hard IP cores. For verification, it can reduce the complexity and effort of keeping design information in aggressive optimization strategies adopted by synthesis tools. For Engineering Change Order (ECO), it can keep the designer from locating the ECO gate in a sea of bit-level gates.In this contest, we formulated a datapath learning and extraction problem. With a set of benchmarks and an evaluation metric, we expect contestants to develop a tool to learn the arithmetic equations from a synthesized gate-level netlist.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122464941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Speculative Load Forwarding Attack on Modern Processors 现代处理器的推测负载转发攻击
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549417
Hasini Witharana, P. Mishra
Modern processors deliver high performance by utilizing advanced features such as out-of-order execution, branch prediction, speculative execution, and sophisticated buffer management. Unfortunately, these techniques have introduced diverse vulnerabilities including Spectre, Meltdown, and microarchitectural data sampling (MDS). Although Spectre and Meltdown can leak data via memory side channels, MDS has shown to leak data from the CPU internal buffers in Intel architectures. AMD has reported that its processors are not vulnerable to MDS/Meltdown type attacks. In this paper, we present a Meltdown/MDS type of attack to leak data from the load queue in AMD Zen family architectures. To the best of our knowledge, our approach is the first attempt in developing an attack on AMD architectures using speculative load forwarding to leak data through the load queue. Experimental evaluation demonstrates that our proposed attack is successful on multiple machines with AMD processors. We also explore a lightweight mitigation to defend against speculative load forwarding attack on modern processors.
现代处理器通过利用诸如乱序执行、分支预测、推测执行和复杂的缓冲区管理等高级特性来提供高性能。不幸的是,这些技术带来了各种各样的漏洞,包括Spectre、Meltdown和微架构数据采样(MDS)。尽管Spectre和Meltdown可以通过内存侧通道泄露数据,但在英特尔架构中,MDS可以从CPU内部缓冲区泄露数据。AMD报告称其处理器不容易受到MDS/Meltdown类型的攻击。在本文中,我们提出了一种Meltdown/MDS类型的攻击,用于从AMD Zen系列架构的负载队列中泄漏数据。据我们所知,我们的方法是第一次尝试使用推测负载转发来通过负载队列泄露数据来开发对AMD架构的攻击。实验评估表明,我们提出的攻击在多台AMD处理器上是成功的。我们还探讨了一种轻量级缓解方法,以防御现代处理器上的推测性负载转发攻击。
{"title":"Speculative Load Forwarding Attack on Modern Processors","authors":"Hasini Witharana, P. Mishra","doi":"10.1145/3508352.3549417","DOIUrl":"https://doi.org/10.1145/3508352.3549417","url":null,"abstract":"Modern processors deliver high performance by utilizing advanced features such as out-of-order execution, branch prediction, speculative execution, and sophisticated buffer management. Unfortunately, these techniques have introduced diverse vulnerabilities including Spectre, Meltdown, and microarchitectural data sampling (MDS). Although Spectre and Meltdown can leak data via memory side channels, MDS has shown to leak data from the CPU internal buffers in Intel architectures. AMD has reported that its processors are not vulnerable to MDS/Meltdown type attacks. In this paper, we present a Meltdown/MDS type of attack to leak data from the load queue in AMD Zen family architectures. To the best of our knowledge, our approach is the first attempt in developing an attack on AMD architectures using speculative load forwarding to leak data through the load queue. Experimental evaluation demonstrates that our proposed attack is successful on multiple machines with AMD processors. We also explore a lightweight mitigation to defend against speculative load forwarding attack on modern processors.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116022054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How Good Is Your Verilog RTL Code? A Quick Answer from Machine Learning 你的Verilog RTL代码有多好?来自机器学习的快速回答
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549375
Prianka Sengupta, Aakash Tyagi, Yiran Chen, Jiangkun Hu
Hardware Description Language (HDL) is a common entry point for designing digital circuits. Differences in HDL coding styles and design choices may lead to considerably different design quality and performance-power tradeoff. In general, the impact of HDL coding is not clear until logic synthesis or even layout is completed. However, running synthesis merely as a feedback for HDL code is computationally not economical especially in early design phases when the code needs to be frequently modified. Furthermore, in late stages of design convergence burdened with high-impact engineering change orders (ECO’s), design iterations become prohibitively expensive. To this end, we propose a machine learning approach to Verilog-based Register-Transfer Level (RTL) design assessment without going through the synthesis process. It would allow designers to quickly evaluate the performance-power tradeoff among different options of RTL designs. Experimental results show that our proposed technique achieves an average of 95% prediction accuracy in terms of post-placement analysis, and is 6 orders of magnitude faster than evaluation by running logic synthesis and placement.
硬件描述语言(HDL)是设计数字电路的通用入口。HDL编码风格和设计选择的差异可能导致相当不同的设计质量和性能-功率权衡。一般来说,直到逻辑合成甚至布局完成后,HDL编码的影响才会清楚。然而,运行综合仅仅作为对HDL代码的反馈在计算上是不经济的,特别是在需要频繁修改代码的早期设计阶段。此外,在设计融合的后期阶段,由于高影响的工程变更订单(ECO),设计迭代变得非常昂贵。为此,我们提出了一种机器学习方法来进行基于verilog的Register-Transfer Level (RTL)设计评估,而无需经过合成过程。它将允许设计师快速评估不同RTL设计选项之间的性能-功率权衡。实验结果表明,我们提出的方法在放置后分析方面的预测准确率平均达到95%,比运行逻辑综合和放置的评估快6个数量级。
{"title":"How Good Is Your Verilog RTL Code? A Quick Answer from Machine Learning","authors":"Prianka Sengupta, Aakash Tyagi, Yiran Chen, Jiangkun Hu","doi":"10.1145/3508352.3549375","DOIUrl":"https://doi.org/10.1145/3508352.3549375","url":null,"abstract":"Hardware Description Language (HDL) is a common entry point for designing digital circuits. Differences in HDL coding styles and design choices may lead to considerably different design quality and performance-power tradeoff. In general, the impact of HDL coding is not clear until logic synthesis or even layout is completed. However, running synthesis merely as a feedback for HDL code is computationally not economical especially in early design phases when the code needs to be frequently modified. Furthermore, in late stages of design convergence burdened with high-impact engineering change orders (ECO’s), design iterations become prohibitively expensive. To this end, we propose a machine learning approach to Verilog-based Register-Transfer Level (RTL) design assessment without going through the synthesis process. It would allow designers to quickly evaluate the performance-power tradeoff among different options of RTL designs. Experimental results show that our proposed technique achieves an average of 95% prediction accuracy in terms of post-placement analysis, and is 6 orders of magnitude faster than evaluation by running logic synthesis and placement.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"549 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116559244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Hidden-ROM: A Compute-in-ROM Architecture to Deploy Large-Scale Neural Networks on Chip with Flexible and Scalable Post-Fabrication Task Transfer Capability 隐藏rom:一种具有灵活和可扩展的制造后任务传输能力的芯片上大规模神经网络的计算-in- rom架构
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549335
Yiming Chen, Guodong Yin, Ming-En Lee, Wenjun Tang, Zekun Yang, Yongpan Liu, Huazhong Yang, Xueqing Li
Motivated by reducing the data transfer activities in dataintensive neural network computing, SRAM-based compute-inmemory (CiM) has made significant progress. Unfortunately, SRAM has low density and limited on-chip capacity. This makes the deployment of large models inefficient due to the frequent DRAM access to update the weight in SRAM. Recently, a ROM-based CiM design, YOLoC, reveals the unique opportunity of deploying a large-scale neural network in CMOS by exploring the intriguing high density of ROM. However, even though assisting SRAM has been adopted in YOLoC for task transfer within the same domain, it is still a big challenge to overcome the read-only limitation in ROM and enable more flexibility. Therefore, it is of paramount significance to develop new ROM-based CiM architectures and provide broader task space and model expansion capability for more complex tasks.This paper presents Hidden-ROM for high flexibility of ROM-based CiM. Hidden-ROM provides several novel ideas beyond YOLoC. First, it adopts a one-SRAM-many-ROM method that "hides" ROM cells to support various datasets of different domains, including CIFAR10/100, FER2013, and ImageNet. Second, HiddenROM provides the model expansion capability after chip fabrication to update the model for more complex tasks when needed. Experiments show that Hidden-ROM designed for ResNet-18 pretrained on CIFAR100 (item classification) can achieve <0.5% accuracy loss in FER2013 (facial expression recognition), while YOLoC degrades by >40%. After expanding to ResNet-50/101, Hidden-ROM even achieves 68.6%/72.3% accuracy in ImageNet, close to 74.9%/76.4% by software. Such expansion costs only 7.6%/12.7% energy efficiency overhead while providing 12%/16% accuracy improvement after expansion.
由于减少了数据密集型神经网络计算中的数据传输活动,基于sram的内存计算(CiM)取得了重大进展。不幸的是,SRAM密度低,片上容量有限。由于频繁访问DRAM以更新SRAM中的权重,这使得大型模型的部署效率低下。最近,一种基于ROM的CiM设计,YOLoC,通过探索有趣的高密度ROM,揭示了在CMOS中部署大规模神经网络的独特机会。然而,即使在YOLoC中已采用辅助SRAM在同一域内进行任务传输,克服ROM的只读限制并实现更大的灵活性仍然是一个很大的挑战。因此,开发新的基于rom的CiM体系结构,为更复杂的任务提供更广阔的任务空间和模型扩展能力具有至关重要的意义。为了实现基于rom的CiM的高灵活性,本文提出了一种隐藏rom。除了YOLoC之外,Hidden-ROM还提供了一些新颖的想法。首先,采用“隐藏”ROM单元的1 - sram -多ROM方法,支持CIFAR10/100、FER2013、ImageNet等不同领域的各种数据集。其次,HiddenROM提供了芯片制造后的模型扩展能力,以便在需要时更新模型以应对更复杂的任务。实验表明,针对ResNet-18设计的Hidden-ROM在CIFAR100 (item classification)上进行预训练,识别率达到40%。扩展到ResNet-50/101后,Hidden-ROM在ImageNet上甚至达到了68.6%/72.3%的准确率,接近软件上的74.9%/76.4%。这样的扩展成本仅为7.6%/12.7%的能源效率开销,而扩展后的精度提高了12%/16%。
{"title":"Hidden-ROM: A Compute-in-ROM Architecture to Deploy Large-Scale Neural Networks on Chip with Flexible and Scalable Post-Fabrication Task Transfer Capability","authors":"Yiming Chen, Guodong Yin, Ming-En Lee, Wenjun Tang, Zekun Yang, Yongpan Liu, Huazhong Yang, Xueqing Li","doi":"10.1145/3508352.3549335","DOIUrl":"https://doi.org/10.1145/3508352.3549335","url":null,"abstract":"Motivated by reducing the data transfer activities in dataintensive neural network computing, SRAM-based compute-inmemory (CiM) has made significant progress. Unfortunately, SRAM has low density and limited on-chip capacity. This makes the deployment of large models inefficient due to the frequent DRAM access to update the weight in SRAM. Recently, a ROM-based CiM design, YOLoC, reveals the unique opportunity of deploying a large-scale neural network in CMOS by exploring the intriguing high density of ROM. However, even though assisting SRAM has been adopted in YOLoC for task transfer within the same domain, it is still a big challenge to overcome the read-only limitation in ROM and enable more flexibility. Therefore, it is of paramount significance to develop new ROM-based CiM architectures and provide broader task space and model expansion capability for more complex tasks.This paper presents Hidden-ROM for high flexibility of ROM-based CiM. Hidden-ROM provides several novel ideas beyond YOLoC. First, it adopts a one-SRAM-many-ROM method that \"hides\" ROM cells to support various datasets of different domains, including CIFAR10/100, FER2013, and ImageNet. Second, HiddenROM provides the model expansion capability after chip fabrication to update the model for more complex tasks when needed. Experiments show that Hidden-ROM designed for ResNet-18 pretrained on CIFAR100 (item classification) can achieve <0.5% accuracy loss in FER2013 (facial expression recognition), while YOLoC degrades by >40%. After expanding to ResNet-50/101, Hidden-ROM even achieves 68.6%/72.3% accuracy in ImageNet, close to 74.9%/76.4% by software. Such expansion costs only 7.6%/12.7% energy efficiency overhead while providing 12%/16% accuracy improvement after expansion.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127707885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robust Global Routing Engine with High-accuracy Cell Movement under Advanced Constraints 先进约束下具有高精度单元移动的鲁棒全局路由引擎
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549421
Ziran Zhu, Fuheng Shen, Yangjie Mei, Zhipeng Huang, Jianli Chen, Jun-Zhi Yang
Placement and routing are typically defined as two separate problems to reduce the design complexity. However, such a divide-and-conquer approach inevitably incurs the degradation of solution quality due to the correlation/objectives of placement and routing are not entirely consistent. Besides, with various constraints (e.g., timing, R/C characteristic, voltage area, etc.) imposed by advanced circuit designs, bridging the gap between placement and routing while satisfying the advanced constraints has become more challenging. In this paper, we develop a robust global routing engine with high-accuracy cell movement under advanced constraints to narrow the gap and improve the routing solution. We first present a routing refinement technique to obtain the convergent routing result based on fixed placement, which provides more accurate information for subsequent cell movement. To achieve fast and high-accuracy position prediction for cell movement, we construct a lookup table (LUT) considering complex constraints/objectives (e.g., routing direction and layer-based power consumption), and generate a timing-driven gain map for each cell based on the LUT. Finally, based on the prediction, we propose an alternating cell movement and cluster movement scheme followed by partial rip-up and reroute to optimize the routing solution. Experimental results on the ICCAD 2020 contest benchmarks show that our algorithm achieves the best total scores among all published works. Compared with the champion of the ICCAD 2021 contest, experimental results on the ICCAD 2021 contest benchmarks show that our algorithm achieves better solution quality in shorter runtime.
放置和布线通常被定义为两个独立的问题,以减少设计的复杂性。然而,这种分而治之的方法不可避免地会导致解决方案质量的下降,因为放置和路由的相关性/目标并不完全一致。此外,由于先进的电路设计施加了各种约束(例如,时序,R/C特性,电压面积等),在满足先进约束的同时弥合放置和布线之间的差距变得更加具有挑战性。在本文中,我们开发了一个鲁棒的全局路由引擎,在先进的约束条件下具有高精度的单元移动,以缩小差距并改进路由解决方案。我们首先提出了一种基于固定位置的路由优化技术,以获得收敛的路由结果,为后续的细胞运动提供更准确的信息。为了实现对细胞运动的快速和高精度的位置预测,我们构建了一个考虑复杂约束/目标(例如路由方向和基于层的功耗)的查找表(LUT),并基于LUT为每个细胞生成时序驱动的增益图。最后,在预测的基础上,我们提出了一种细胞移动和簇移动交替的方案,然后是部分撕裂和重路由,以优化路由解决方案。在ICCAD 2020竞赛基准上的实验结果表明,我们的算法在所有已发表的作品中获得了最好的总分。与ICCAD 2021竞赛冠军相比,在ICCAD 2021竞赛基准上的实验结果表明,我们的算法在更短的运行时间内获得了更好的解质量。
{"title":"A Robust Global Routing Engine with High-accuracy Cell Movement under Advanced Constraints","authors":"Ziran Zhu, Fuheng Shen, Yangjie Mei, Zhipeng Huang, Jianli Chen, Jun-Zhi Yang","doi":"10.1145/3508352.3549421","DOIUrl":"https://doi.org/10.1145/3508352.3549421","url":null,"abstract":"Placement and routing are typically defined as two separate problems to reduce the design complexity. However, such a divide-and-conquer approach inevitably incurs the degradation of solution quality due to the correlation/objectives of placement and routing are not entirely consistent. Besides, with various constraints (e.g., timing, R/C characteristic, voltage area, etc.) imposed by advanced circuit designs, bridging the gap between placement and routing while satisfying the advanced constraints has become more challenging. In this paper, we develop a robust global routing engine with high-accuracy cell movement under advanced constraints to narrow the gap and improve the routing solution. We first present a routing refinement technique to obtain the convergent routing result based on fixed placement, which provides more accurate information for subsequent cell movement. To achieve fast and high-accuracy position prediction for cell movement, we construct a lookup table (LUT) considering complex constraints/objectives (e.g., routing direction and layer-based power consumption), and generate a timing-driven gain map for each cell based on the LUT. Finally, based on the prediction, we propose an alternating cell movement and cluster movement scheme followed by partial rip-up and reroute to optimize the routing solution. Experimental results on the ICCAD 2020 contest benchmarks show that our algorithm achieves the best total scores among all published works. Compared with the champion of the ICCAD 2021 contest, experimental results on the ICCAD 2021 contest benchmarks show that our algorithm achieves better solution quality in shorter runtime.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132437142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pin Accessibility and Routing Congestion Aware DRC Hotspot Prediction using Graph Neural Network and U-Net 基于图神经网络和U-Net的引脚可达性和路由拥塞感知DRC热点预测
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549346
Kyeonghyeon Baek, Hyunbum Park, Suwan Kim, Kyumyung Choi, Taewhan Kim
An accurate DRC (design rule check) hotspot prediction at the placement stage is essential in order to reduce a substantial amount of design time required for the iterations of placement and routing. It is known that for implementing chips with advanced technology nodes, (1) pin accessibility and (2) routing congestion are two major causes of DRVs (design rule violations). Though many ML (machine learning) techniques have been proposed to address this prediction problem, it was not easy to assemble the aggregate data on items 1 and 2 in a unified fashion for training ML models, resulting in a considerable accuracy loss in DRC hotspot prediction. This work overcomes this limitation by proposing a novel ML based DRC hotspot prediction technique, which is able to accurately capture the combined impact of items 1 and 2 on DRC hotspots. Precisely, we devise a graph, called pin proximity graph, that effectively models the spatial information on cell I/O pins and the information on pin-to-pin disturbance relation. Then, we propose a new ML model, called PGNN, which tightly combines GNN (graph neural network) and U-net in a way that GNN is used to embed pin accessibility information abstracted from our pin proximity graph while U-net is used to extract routing congestion information from grid-based features. Through experiments with a set of benchmark designs using Nangate 15nm library, our PGNN outperforms the existing ML models on all benchmark designs, achieving on average 7.8~12.5% improvements on F1-score while taking 5.5× fast inference time in comparison with that of the state-of-the-art techniques.
为了减少放置和路由迭代所需的大量设计时间,在放置阶段准确的DRC(设计规则检查)热点预测是必不可少的。众所周知,对于实现具有先进技术节点的芯片,(1)引脚可访问性和(2)路由拥塞是drv(违反设计规则)的两个主要原因。尽管已经提出了许多ML(机器学习)技术来解决这一预测问题,但以统一的方式组装项目1和2的汇总数据用于训练ML模型并不容易,导致DRC热点预测的准确性损失相当大。这项工作通过提出一种新的基于ML的DRC热点预测技术来克服这一限制,该技术能够准确地捕获项目1和2对DRC热点的综合影响。准确地说,我们设计了一个图,称为引脚接近图,有效地模拟了单元I/O引脚的空间信息和引脚对引脚干扰关系的信息。然后,我们提出了一种新的机器学习模型,称为PGNN,它将GNN(图神经网络)和U-net紧密结合,GNN用于嵌入从引脚接近图中提取的引脚可达性信息,而U-net用于从基于网格的特征中提取路由拥塞信息。通过一组使用Nangate 15nm库的基准设计的实验,我们的PGNN在所有基准设计上都优于现有的ML模型,在f1得分上平均提高7.8~12.5%,而与最先进的技术相比,推理时间缩短了5.5倍。
{"title":"Pin Accessibility and Routing Congestion Aware DRC Hotspot Prediction using Graph Neural Network and U-Net","authors":"Kyeonghyeon Baek, Hyunbum Park, Suwan Kim, Kyumyung Choi, Taewhan Kim","doi":"10.1145/3508352.3549346","DOIUrl":"https://doi.org/10.1145/3508352.3549346","url":null,"abstract":"An accurate DRC (design rule check) hotspot prediction at the placement stage is essential in order to reduce a substantial amount of design time required for the iterations of placement and routing. It is known that for implementing chips with advanced technology nodes, (1) pin accessibility and (2) routing congestion are two major causes of DRVs (design rule violations). Though many ML (machine learning) techniques have been proposed to address this prediction problem, it was not easy to assemble the aggregate data on items 1 and 2 in a unified fashion for training ML models, resulting in a considerable accuracy loss in DRC hotspot prediction. This work overcomes this limitation by proposing a novel ML based DRC hotspot prediction technique, which is able to accurately capture the combined impact of items 1 and 2 on DRC hotspots. Precisely, we devise a graph, called pin proximity graph, that effectively models the spatial information on cell I/O pins and the information on pin-to-pin disturbance relation. Then, we propose a new ML model, called PGNN, which tightly combines GNN (graph neural network) and U-net in a way that GNN is used to embed pin accessibility information abstracted from our pin proximity graph while U-net is used to extract routing congestion information from grid-based features. Through experiments with a set of benchmark designs using Nangate 15nm library, our PGNN outperforms the existing ML models on all benchmark designs, achieving on average 7.8~12.5% improvements on F1-score while taking 5.5× fast inference time in comparison with that of the state-of-the-art techniques.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130532575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On Minimizing the Read Latency of Flash Memory to Preserve Inter-tree Locality in Random Forest 最小化闪存读延迟以保持随机森林树间局部性的研究
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549365
Yu-Cheng Lin, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, W. Shih
Many prior research works have been widely discussed how to bring machine learning algorithms to embedded systems. Because of resource constraints, embedded platforms for machine learning applications play the role of a predictor. That is, an inference model will be constructed on a personal computer or a server platform, and then integrated into embedded systems for just-in-time inference. With the consideration of the limited main memory space in embedded systems, an important problem for embedded machine learning systems is how to efficiently move inference model between the main memory and a secondary storage (e.g., flash memory). For tackling this problem, we need to consider how to preserve the locality inside the inference model during model construction. Therefore, we have proposed a solution, namely locality-aware random forest (LaRF), to preserve the inter-locality of all decision trees within a random forest model during the model construction process. Owing to the locality preservation, LaRF can improve the read latency by 81.5% at least, compared to the original random forest library.
许多先前的研究工作已经广泛讨论了如何将机器学习算法引入嵌入式系统。由于资源限制,机器学习应用的嵌入式平台扮演着预测器的角色。即在个人计算机或服务器平台上构建推理模型,然后集成到嵌入式系统中进行实时推理。考虑到嵌入式系统中主存空间有限,嵌入式机器学习系统的一个重要问题是如何有效地在主存和辅助存储器(如闪存)之间移动推理模型。为了解决这个问题,我们需要考虑如何在模型构建过程中保持推理模型内部的局部性。因此,我们提出了一种解决方案,即位置感知随机森林(LaRF),以在模型构建过程中保持随机森林模型中所有决策树的局域性。与原始随机森林库相比,LaRF库的读延迟至少提高了81.5%。
{"title":"On Minimizing the Read Latency of Flash Memory to Preserve Inter-tree Locality in Random Forest","authors":"Yu-Cheng Lin, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, W. Shih","doi":"10.1145/3508352.3549365","DOIUrl":"https://doi.org/10.1145/3508352.3549365","url":null,"abstract":"Many prior research works have been widely discussed how to bring machine learning algorithms to embedded systems. Because of resource constraints, embedded platforms for machine learning applications play the role of a predictor. That is, an inference model will be constructed on a personal computer or a server platform, and then integrated into embedded systems for just-in-time inference. With the consideration of the limited main memory space in embedded systems, an important problem for embedded machine learning systems is how to efficiently move inference model between the main memory and a secondary storage (e.g., flash memory). For tackling this problem, we need to consider how to preserve the locality inside the inference model during model construction. Therefore, we have proposed a solution, namely locality-aware random forest (LaRF), to preserve the inter-locality of all decision trees within a random forest model during the model construction process. Owing to the locality preservation, LaRF can improve the read latency by 81.5% at least, compared to the original random forest library.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125364534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1