首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
VST: A virtual stress testing framework for discovering bugs in SSD flash-translation layers VST:用于发现SSD闪存转换层中的错误的虚拟压力测试框架
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203790
Ren-Shuo Liu, Yun-Sheng Chang, Chih-Wen Hung
Flash translation layers (FTLs) are the core embedded software (also known as firmware) of NAND flash-based solid-state drives (SSDs). The relentless pursuit of high-performance SSDs renders FTLs increasingly complex and intricate. Therefore, testing and validating FTLs are crucial and challenging tasks. Directly testing and validating FTLs on SSD hardware are common practices though, they are time-consuming and cumbersome because 1) the testing speed is limited by the hardware speed of SSDs and 2) just reproducing bugs can be challenging, let alone locating and root causing the bugs. This work presents virtual stress testing (VST), a simulation framework to enable executing SSD FTLs on PCs or servers against virtual SRAM, DRAM, and flash emulated by host-side main memory. FTL function calls, such as moving data from flash to DRAM, are served by the VST framework. Therefore, VST can test FTLs without SSD hardware requirements nor SSD speed limitations, and root causing bugs becomes manageable tasks. We apply VST to representative SSD design, OpenSSD, which is actively utilized and maintained by SSD and FTL communities. Experimental results show that VST can test FTLs at a speed up to 375 GB/s, which is several hundred times faster than directly testing FTLs on SSD hardware. Moreover, we successfully discover seven new FTL bugs in the OpenSSD design using VST, which is a solid evidence of VST's bug-discovering effectiveness.
闪存转换层(ftl)是基于NAND闪存的固态硬盘(ssd)的核心嵌入式软件(也称为固件)。对高性能ssd的不懈追求使得超光速越来越复杂。因此,测试和验证超光速是至关重要且具有挑战性的任务。直接在SSD硬件上测试和验证ftl是一种常见的做法,但它们既耗时又麻烦,因为1)测试速度受到SSD硬件速度的限制;2)仅仅重现bug就很有挑战性,更不用说定位和根源bug了。这项工作提出了虚拟压力测试(VST),这是一个模拟框架,可以在pc或服务器上执行SSD ftl,以对抗主机侧主存储器模拟的虚拟SRAM、DRAM和闪存。FTL函数调用,例如将数据从闪存移动到DRAM,由VST框架提供服务。因此,VST可以在没有SSD硬件要求和SSD速度限制的情况下测试ftl,并且导致错误的根源成为可管理的任务。我们将VST应用于代表性的SSD设计OpenSSD, SSD和FTL社区积极使用和维护。实验结果表明,VST测试ftl的速度高达375 GB/s,比直接在SSD硬件上测试ftl快了几百倍。此外,我们使用VST在OpenSSD设计中成功发现了7个新的FTL错误,这是VST发现错误有效性的有力证据。
{"title":"VST: A virtual stress testing framework for discovering bugs in SSD flash-translation layers","authors":"Ren-Shuo Liu, Yun-Sheng Chang, Chih-Wen Hung","doi":"10.1109/ICCAD.2017.8203790","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203790","url":null,"abstract":"Flash translation layers (FTLs) are the core embedded software (also known as firmware) of NAND flash-based solid-state drives (SSDs). The relentless pursuit of high-performance SSDs renders FTLs increasingly complex and intricate. Therefore, testing and validating FTLs are crucial and challenging tasks. Directly testing and validating FTLs on SSD hardware are common practices though, they are time-consuming and cumbersome because 1) the testing speed is limited by the hardware speed of SSDs and 2) just reproducing bugs can be challenging, let alone locating and root causing the bugs. This work presents virtual stress testing (VST), a simulation framework to enable executing SSD FTLs on PCs or servers against virtual SRAM, DRAM, and flash emulated by host-side main memory. FTL function calls, such as moving data from flash to DRAM, are served by the VST framework. Therefore, VST can test FTLs without SSD hardware requirements nor SSD speed limitations, and root causing bugs becomes manageable tasks. We apply VST to representative SSD design, OpenSSD, which is actively utilized and maintained by SSD and FTL communities. Experimental results show that VST can test FTLs at a speed up to 375 GB/s, which is several hundred times faster than directly testing FTLs on SSD hardware. Moreover, we successfully discover seven new FTL bugs in the OpenSSD design using VST, which is a solid evidence of VST's bug-discovering effectiveness.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124820909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Data path optimisation and delay matching for asynchronous bundled-data balsa circuits 异步捆绑数据接口电路的数据路径优化和延迟匹配
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203806
Norman Kluge, Ralf Wollowski
Balsa provides an open-source design flow where asynchronous circuits are created from high-level specifications, but the syntax-driven translation often results in performance overhead. To improve this, we exploit the fact that bundled-data circuits can be divided into data and control path. Hence, tailored optimisation techniques can be applied to both paths separately. For control path optimisation, STG-based resynthesis has been used (applying logic minimisation). To continue the investigation, we additionally apply synchronous standard tools to optimise the data path. However, this removes the matched delays needed for a properly working bundled-data circuit. Therefore, we also present two algorithms to automatically insert proper matched delays. Our experiments show a performance improvement of up to 44 % and energy consumption improvement of up to 60 % compared to the original Balsa implementation.
Balsa提供了一个开源设计流,其中异步电路是根据高级规范创建的,但是语法驱动的转换通常会导致性能开销。为了改进这一点,我们利用了数据电路可以分为数据路径和控制路径的事实。因此,量身定制的优化技术可以分别应用于两条路径。对于控制路径优化,使用了基于stg的再合成(应用逻辑最小化)。为了继续调查,我们还应用同步标准工具来优化数据路径。然而,这消除了正确工作的捆绑数据电路所需的匹配延迟。因此,我们也提出了两种自动插入适当匹配延迟的算法。我们的实验表明,与最初的Balsa实现相比,性能提高了44%,能耗提高了60%。
{"title":"Data path optimisation and delay matching for asynchronous bundled-data balsa circuits","authors":"Norman Kluge, Ralf Wollowski","doi":"10.1109/ICCAD.2017.8203806","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203806","url":null,"abstract":"Balsa provides an open-source design flow where asynchronous circuits are created from high-level specifications, but the syntax-driven translation often results in performance overhead. To improve this, we exploit the fact that bundled-data circuits can be divided into data and control path. Hence, tailored optimisation techniques can be applied to both paths separately. For control path optimisation, STG-based resynthesis has been used (applying logic minimisation). To continue the investigation, we additionally apply synchronous standard tools to optimise the data path. However, this removes the matched delays needed for a properly working bundled-data circuit. Therefore, we also present two algorithms to automatically insert proper matched delays. Our experiments show a performance improvement of up to 44 % and energy consumption improvement of up to 60 % compared to the original Balsa implementation.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128687797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning on FPGAs to face the IoT revolution fpga上的机器学习面对物联网革命
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203862
Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Di He, Wei Zuo, Zuofu Cheng, K. Rupnow, Deming Chen
FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.
与基于CPU和gpu的实现相比,fpga已迅速应用于深度神经网络(dnn)的加速,具有更好的延迟和能效。高阶合成(High-level synthesis, HLS)是一种有效的深度神经网络设计流程,可提高设计效率、调试能力和设计空间探索能力。然而,在fpga资源限制下优化大型神经网络仍然是一个关键的挑战。在本文中,我们提出了一系列有效的设计技术,用于在fpga上实现具有高性能和高能效的深度神经网络。这些包括使用可配置的DNN ip,性能和资源建模,跨DNN层的资源分配,以及DNN缩减和重新训练。我们展示了几种设计解决方案,包括用于视频字幕的长期循环卷积网络(LRCN),用于FaceNet人脸识别的Inception模块,以及用于声音识别的长短期记忆(LSTM)。这些和其他类似的DNN解决方案是在基于视觉或声音的物联网应用中部署的理想实现。
{"title":"Machine learning on FPGAs to face the IoT revolution","authors":"Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Di He, Wei Zuo, Zuofu Cheng, K. Rupnow, Deming Chen","doi":"10.1109/ICCAD.2017.8203862","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203862","url":null,"abstract":"FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
An integrated-spreading-based macro-refining algorithm for large-scale mixed-size circuit designs 一种基于集成展开式的大规模混合电路设计宏细化算法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203818
Szu-To Chen, Yao-Wen Chang, Tung-Chieh Chen
With the increasing use of pre-designed macros in a modern chip and its induced high design complexity, macro placement has become a challenging problem in today's design houses. Most popular macro placement algorithms adopt a three-stage approach: placement prototyping, macro placement, and standard-cell placement, where cell positions after macro placement are assumed the same as those at the prototyping stage, possibly misguiding succeeding standard-cell placement. To close the gap between macro and standard-cell placement, we propose a macro-refining algorithm that adopts an integrated spreading technique considering the spreading of both macros and cells and the dynamic information of cell positions to improve macro placement. We further propose a new force-modulation technique to refine macro placement and a congestion-aware macro shifter to preserve more space for better routability. Extensive experiments based on various macro placements show that our proposed techniques are effective and our macro-refining algorithm can find significantly better placement solutions for large-scale mixed-size circuit designs.
随着在现代芯片中越来越多地使用预先设计的宏及其引起的高设计复杂性,宏放置已成为当今设计公司面临的一个具有挑战性的问题。大多数流行的宏放置算法采用三阶段方法:放置原型、宏放置和标准单元放置,其中假定宏放置后的单元位置与原型阶段的相同,这可能会误导后续的标准单元放置。为了缩小宏与标准单元放置之间的差距,我们提出了一种宏优化算法,该算法采用综合扩展技术,考虑宏和单元的扩展以及单元位置的动态信息,以改善宏的放置。我们进一步提出了一种新的力调制技术来改进宏放置和一个拥塞感知宏移位器,以保留更多的空间以获得更好的可达性。基于各种宏布局的大量实验表明,我们提出的技术是有效的,我们的宏优化算法可以为大规模混合尺寸电路设计找到更好的布局解决方案。
{"title":"An integrated-spreading-based macro-refining algorithm for large-scale mixed-size circuit designs","authors":"Szu-To Chen, Yao-Wen Chang, Tung-Chieh Chen","doi":"10.1109/ICCAD.2017.8203818","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203818","url":null,"abstract":"With the increasing use of pre-designed macros in a modern chip and its induced high design complexity, macro placement has become a challenging problem in today's design houses. Most popular macro placement algorithms adopt a three-stage approach: placement prototyping, macro placement, and standard-cell placement, where cell positions after macro placement are assumed the same as those at the prototyping stage, possibly misguiding succeeding standard-cell placement. To close the gap between macro and standard-cell placement, we propose a macro-refining algorithm that adopts an integrated spreading technique considering the spreading of both macros and cells and the dynamic information of cell positions to improve macro placement. We further propose a new force-modulation technique to refine macro placement and a congestion-aware macro shifter to preserve more space for better routability. Extensive experiments based on various macro placements show that our proposed techniques are effective and our macro-refining algorithm can find significantly better placement solutions for large-scale mixed-size circuit designs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Impact of circuit-level non-idealities on vision-based autonomous driving systems 电路级非理想性对基于视觉的自动驾驶系统的影响
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203887
Handi Yu, Changhao Yan, Xuan Zeng, Xin Li
We describe a novel methodology to validate vision-based autonomous driving systems over different circuit corners with consideration of temperature variation and circuit aging. The proposed work is motivated by the fact that low-level circuit implementation may have a significant impact on system performance, even though such effects have not been appropriately taken into account today. Our approach seamlessly integrates the image data recorded under nominal conditions with comprehensive statistical circuit models to synthetically generate the critical corner cases for which an autonomous driving system is likely to fail. As such, a given automotive system can be robustly validated for these worst-case scenarios that cannot be easily captured by physical experiments.
我们描述了一种新的方法来验证基于视觉的自动驾驶系统在不同的电路弯道,考虑温度变化和电路老化。所提议的工作的动机是底层电路实现可能对系统性能产生重大影响,即使这种影响今天还没有得到适当的考虑。我们的方法将标称条件下记录的图像数据与综合统计电路模型无缝集成,以综合生成自动驾驶系统可能失效的关键边缘情况。因此,给定的汽车系统可以针对这些不容易通过物理实验捕获的最坏情况进行可靠的验证。
{"title":"Impact of circuit-level non-idealities on vision-based autonomous driving systems","authors":"Handi Yu, Changhao Yan, Xuan Zeng, Xin Li","doi":"10.1109/ICCAD.2017.8203887","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203887","url":null,"abstract":"We describe a novel methodology to validate vision-based autonomous driving systems over different circuit corners with consideration of temperature variation and circuit aging. The proposed work is motivated by the fact that low-level circuit implementation may have a significant impact on system performance, even though such effects have not been appropriately taken into account today. Our approach seamlessly integrates the image data recorded under nominal conditions with comprehensive statistical circuit models to synthetically generate the critical corner cases for which an autonomous driving system is likely to fail. As such, a given automotive system can be robustly validated for these worst-case scenarios that cannot be easily captured by physical experiments.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A streaming clustering approach using a heterogeneous system for big data analysis 使用异构系统进行大数据分析的流聚类方法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203845
Dajung Lee, Alric Althoff, D. Richmond, R. Kastner
Data clustering is a fundamental challenge in data analytics. It is the main task in exploratory data mining and a core technique in machine learning. As the volume, variety, velocity, and variability of data grows, we need more efficient data analysis methods that can scale towards increasingly large and high dimensional data sets. We develop a streaming clustering algorithm that is highly amenable to hardware acceleration. Our algorithm eliminates the need to store the data objects, which removes limits on the size of the data that we can analyze. Our algorithm is highly parameterizable, which allows it to fit to the characteristics of the data set, and scale towards the available hardware resources. Our streaming hardware core can handle more than 40 Msamples/s when processing 3-dimensional streaming data and up to 1.78 Msamples/s for 70-dimensional data. To validate the accuracy and performance of our algorithms we compare it with several common clustering techniques on several different applications. The experimental result shows that it outperforms other prior hardware accelerated clustering systems.
数据聚类是数据分析中的一个基本挑战。它是探索性数据挖掘的主要任务,也是机器学习的核心技术。随着数据量、种类、速度和可变性的增长,我们需要更有效的数据分析方法,可以扩展到越来越大和高维的数据集。我们开发了一种高度适应硬件加速的流聚类算法。我们的算法消除了存储数据对象的需要,这消除了对我们可以分析的数据大小的限制。我们的算法是高度可参数化的,这使得它能够适应数据集的特征,并向可用的硬件资源扩展。我们的流媒体硬件核心在处理三维流数据时可以处理超过40 Msamples/s,在处理70维数据时可以处理高达1.78 Msamples/s。为了验证我们算法的准确性和性能,我们将其与几种常见的聚类技术在几个不同的应用程序中进行了比较。实验结果表明,该算法优于现有的硬件加速聚类系统。
{"title":"A streaming clustering approach using a heterogeneous system for big data analysis","authors":"Dajung Lee, Alric Althoff, D. Richmond, R. Kastner","doi":"10.1109/ICCAD.2017.8203845","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203845","url":null,"abstract":"Data clustering is a fundamental challenge in data analytics. It is the main task in exploratory data mining and a core technique in machine learning. As the volume, variety, velocity, and variability of data grows, we need more efficient data analysis methods that can scale towards increasingly large and high dimensional data sets. We develop a streaming clustering algorithm that is highly amenable to hardware acceleration. Our algorithm eliminates the need to store the data objects, which removes limits on the size of the data that we can analyze. Our algorithm is highly parameterizable, which allows it to fit to the characteristics of the data set, and scale towards the available hardware resources. Our streaming hardware core can handle more than 40 Msamples/s when processing 3-dimensional streaming data and up to 1.78 Msamples/s for 70-dimensional data. To validate the accuracy and performance of our algorithms we compare it with several common clustering techniques on several different applications. The experimental result shows that it outperforms other prior hardware accelerated clustering systems.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123120711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SALT: Provably good routing topology by a novel steiner shallow-light tree algorithm SALT:用一种新的steiner浅光树算法证明了良好的路由拓扑结构
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203828
Gengjie Chen, Peishan Tu, Evangeline F. Y. Young
In a weighted undirected graph, a spanning/Steiner shallow-light tree (SLT) simultaneously approximates (i) shortest distances from a root to the other vertices, and (ii) the minimum tree weight. The Steiner SLT has been proved to be exponentially lighter than the spanning one [1], [2]. In this paper, we propose a novel Steiner SLT construction method called SALT (Steiner shAllow-Light Tree), which is efficient and has the tightest bound over all the state-of-the-art SLT algorithms. Applying SALT to Manhattan space offers a smooth trade-off between rectilinear Steiner minimum tree (RSMT) and rectilinear Steiner minimum arborescence (RSMA) for VLSI routing. In addition, the adaption further reduces the time complexity from O(n2) to O(n log n). The experimental results show that SALT can achieve not only short path lengths and wirelength but also small delay, compared to both classical and recent routing tree construction methods.
在加权无向图中,生成/Steiner浅光树(SLT)同时逼近(i)从根到其他顶点的最短距离,以及(ii)最小树权值。Steiner SLT已被证明比生成的SLT要轻得多[1],[2]。在本文中,我们提出了一种新的斯坦纳SLT构造方法,称为SALT(斯坦纳浅光树),它是所有最先进的SLT算法中最有效和最紧密的。将SALT应用于曼哈顿空间为VLSI路由提供了直线斯坦纳最小树(RSMT)和直线斯坦纳最小树形(RSMA)之间的平滑权衡。实验结果表明,与经典和最新的路由树构建方法相比,SALT不仅可以实现较短的路径长度和无线长度,而且可以实现较小的延迟。
{"title":"SALT: Provably good routing topology by a novel steiner shallow-light tree algorithm","authors":"Gengjie Chen, Peishan Tu, Evangeline F. Y. Young","doi":"10.1109/ICCAD.2017.8203828","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203828","url":null,"abstract":"In a weighted undirected graph, a spanning/Steiner shallow-light tree (SLT) simultaneously approximates (i) shortest distances from a root to the other vertices, and (ii) the minimum tree weight. The Steiner SLT has been proved to be exponentially lighter than the spanning one [1], [2]. In this paper, we propose a novel Steiner SLT construction method called SALT (Steiner shAllow-Light Tree), which is efficient and has the tightest bound over all the state-of-the-art SLT algorithms. Applying SALT to Manhattan space offers a smooth trade-off between rectilinear Steiner minimum tree (RSMT) and rectilinear Steiner minimum arborescence (RSMA) for VLSI routing. In addition, the adaption further reduces the time complexity from O(n2) to O(n log n). The experimental results show that SALT can achieve not only short path lengths and wirelength but also small delay, compared to both classical and recent routing tree construction methods.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Making split fabrication synergistically secure and manufacturable 使分体制造协同安全性和可制造性
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203794
Lang Feng, Yujie Wang, Jiang Hu, Wai-Kei Mak, J. Rajendran
Split fabrication is a promising approach to security against attacks by untrusted foundries. While existing split fabrication methods consider the overhead of conventional objectives such as wirelength and timing, they mostly neglect manufacturability — an unavoidable challenge in nanometer technologies. Observing that security and manufacturability can be addressed in a synergistic manner, this work introduces routing techniques that can simultaneously improve both security and manufacturability in terms of either Chemical Mechanical Planarization (CMP) uniformity or Self-Aligned Double Patterning (SADP) compliance. The effectiveness of these techniques is confirmed by experiments on benchmark circuits.
分裂制造是一种很有前途的安全方法,可以防止不受信任的代工厂的攻击。虽然现有的分裂制造方法考虑了传统目标的开销,如波长和时间,但它们大多忽略了可制造性-这是纳米技术中不可避免的挑战。观察到安全性和可制造性可以以协同方式解决,本工作介绍了可以同时提高安全性和可制造性的路由技术,无论是化学机械平面化(CMP)均匀性还是自对齐双模式(SADP)合规性。在基准电路上的实验验证了这些技术的有效性。
{"title":"Making split fabrication synergistically secure and manufacturable","authors":"Lang Feng, Yujie Wang, Jiang Hu, Wai-Kei Mak, J. Rajendran","doi":"10.1109/ICCAD.2017.8203794","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203794","url":null,"abstract":"Split fabrication is a promising approach to security against attacks by untrusted foundries. While existing split fabrication methods consider the overhead of conventional objectives such as wirelength and timing, they mostly neglect manufacturability — an unavoidable challenge in nanometer technologies. Observing that security and manufacturability can be addressed in a synergistic manner, this work introduces routing techniques that can simultaneously improve both security and manufacturability in terms of either Chemical Mechanical Planarization (CMP) uniformity or Self-Aligned Double Patterning (SADP) compliance. The effectiveness of these techniques is confirmed by experiments on benchmark circuits.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121016033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Switch cell optimization of power-gated modern system-on-chips 电源门控现代片上系统的开关单元优化
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203826
Dongyoun Yi, Taewhan Kim
This work addresses a practical problem of allocating and placing a minimal number of active switch cells in power gated modern System-on-Chips (SoCs) to save the unnecessary standby leakage under noise (i.e., IR-drop) constraint. Since power gating switch cells are physically directly connected to power rails, their overall allocation structure is synthesized in a stage before the logic cell placement. Consequently, the allocation of switch cells in the pre-placement could lead to unnecessarily high standby leakage for modern designs. This work proposes a practical remedy for this problem at the post-placement stage. Specifically, for an initial design with a grid-based switch cell allocation, which is commonly used design methodology in industry, we propose a comprehensive solution to determining, for each switch cell, (1) whether the cell can be permanently turned off or (2) the type of switch cell for replacement so that the resulting total standby leakage of switch cells should be minimized under the noise constraint. We formulate the problem into a variant of weighted set cover problem and solve it efficiently by employing an approximate set cover algorithm. Through experiments with benchmark circuits in ISCAS89, OPENMSP430, and FPU, it is shown that our method is able to reduce the standby leakage by 35.0% and 13.9% over the initial designs and the designs produced by the previous switch cell optimization method in [5], respectively.
这项工作解决了一个实际问题,即在功率门控的现代片上系统(soc)中分配和放置最少数量的有源开关单元,以节省噪声(即ir下降)约束下不必要的待机泄漏。由于电源门控开关单元在物理上直接连接到电源轨道,因此在逻辑单元放置之前的一个阶段合成了它们的总体分配结构。因此,在预放置开关单元的分配可能导致不必要的高待机泄漏为现代设计。这项工作提出了在安置后阶段对这个问题的实际补救措施。具体来说,对于基于网格的开关单元分配的初始设计,这是工业中常用的设计方法,我们提出了一个全面的解决方案来确定,对于每个开关单元,(1)电池是否可以永久关闭或(2)更换开关单元的类型,以便在噪声约束下最小化开关单元的总备用泄漏。将该问题转化为加权集覆盖问题的一个变体,并采用近似集覆盖算法有效地求解了该问题。通过ISCAS89、OPENMSP430和FPU的基准电路实验表明,我们的方法能够比初始设计和先前[5]中的开关单元优化方法产生的设计分别减少35.0%和13.9%的待机泄漏。
{"title":"Switch cell optimization of power-gated modern system-on-chips","authors":"Dongyoun Yi, Taewhan Kim","doi":"10.1109/ICCAD.2017.8203826","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203826","url":null,"abstract":"This work addresses a practical problem of allocating and placing a minimal number of active switch cells in power gated modern System-on-Chips (SoCs) to save the unnecessary standby leakage under noise (i.e., IR-drop) constraint. Since power gating switch cells are physically directly connected to power rails, their overall allocation structure is synthesized in a stage before the logic cell placement. Consequently, the allocation of switch cells in the pre-placement could lead to unnecessarily high standby leakage for modern designs. This work proposes a practical remedy for this problem at the post-placement stage. Specifically, for an initial design with a grid-based switch cell allocation, which is commonly used design methodology in industry, we propose a comprehensive solution to determining, for each switch cell, (1) whether the cell can be permanently turned off or (2) the type of switch cell for replacement so that the resulting total standby leakage of switch cells should be minimized under the noise constraint. We formulate the problem into a variant of weighted set cover problem and solve it efficiently by employing an approximate set cover algorithm. Through experiments with benchmark circuits in ISCAS89, OPENMSP430, and FPU, it is shown that our method is able to reduce the standby leakage by 35.0% and 13.9% over the initial designs and the designs produced by the previous switch cell optimization method in [5], respectively.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131239244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications COMBA:一个全面的基于模型的分析框架,用于实际应用程序的高级综合
Pub Date : 2017-11-13 DOI: 10.5555/3199700.3199757
Jieru Zhao, Liang Feng, Sharad Sinha, Wei Zhang, Yun Liang, Bingsheng He
High Level Synthesis (HLS) relies on the use of synthesis pragmas to generate digital designs meeting a set of specifications. However, the selection of a set of pragmas depends largely on designer experience and knowledge of the target architecture and digital design. Existing automated methods of pragma selection are very limited in scope and capability to analyze complex design descriptions in high-level languages to be synthesized using HLS. In this paper, we propose COMBA, a comprehensive model-based analysis framework capable of analyzing the effects of a multitude of pragmas related to functions, loops and arrays in the design description using pluggable analytical models, a recursive data collector (RDC) and a metric-guided design space exploration algorithm (MGDSE). When compared with HLS tools like Vivado HLS, COMBA reports an average error of around 1% in estimating performance, while taking only a few seconds for analysis of Polybench benchmark applications and a few minutes for real-life applications like JPEG, Seidel and Rician. The synthesis pragmas recommended by COMBA result in an average 100x speed-up in performance for the analyzed applications, which establishes COMBA as a superior alternative to current state-of-the-art approaches.
高级综合(HLS)依赖于使用综合实用程序来生成符合一组规范的数字设计。然而,一组实用主义的选择在很大程度上取决于设计师的经验和目标体系结构和数字设计的知识。现有的自动化的语用选择方法在分析要使用HLS合成的高级语言中的复杂设计描述的范围和能力上都非常有限。在本文中,我们提出了COMBA,这是一个基于模型的综合分析框架,能够使用可插拔的分析模型,递归数据收集器(RDC)和度量指导的设计空间探索算法(MGDSE)分析与设计描述中函数,循环和数组相关的大量语用的影响。与Vivado HLS等HLS工具相比,COMBA在估计性能时的平均误差约为1%,而对Polybench基准应用程序的分析只需要几秒钟,对JPEG、Seidel和专家等实际应用程序的分析只需要几分钟。由COMBA推荐的综合编程使所分析的应用程序的性能平均提高了100倍,这使COMBA成为当前最先进方法的卓越替代方案。
{"title":"COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications","authors":"Jieru Zhao, Liang Feng, Sharad Sinha, Wei Zhang, Yun Liang, Bingsheng He","doi":"10.5555/3199700.3199757","DOIUrl":"https://doi.org/10.5555/3199700.3199757","url":null,"abstract":"High Level Synthesis (HLS) relies on the use of synthesis pragmas to generate digital designs meeting a set of specifications. However, the selection of a set of pragmas depends largely on designer experience and knowledge of the target architecture and digital design. Existing automated methods of pragma selection are very limited in scope and capability to analyze complex design descriptions in high-level languages to be synthesized using HLS. In this paper, we propose COMBA, a comprehensive model-based analysis framework capable of analyzing the effects of a multitude of pragmas related to functions, loops and arrays in the design description using pluggable analytical models, a recursive data collector (RDC) and a metric-guided design space exploration algorithm (MGDSE). When compared with HLS tools like Vivado HLS, COMBA reports an average error of around 1% in estimating performance, while taking only a few seconds for analysis of Polybench benchmark applications and a few minutes for real-life applications like JPEG, Seidel and Rician. The synthesis pragmas recommended by COMBA result in an average 100x speed-up in performance for the analyzed applications, which establishes COMBA as a superior alternative to current state-of-the-art approaches.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116531548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1