首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems 汽车系统中多cpu多gpu行人检测的开放基准实现
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203793
Matina Maria Trompouki, Leonidas Kosmidis, N. Navarro
Modern and future automotive systems incorporate several Advanced Driving Assistance Systems (ADAS). Those systems require significant performance that cannot be provided with traditional automotive processors and programming models. Multicore CPUs and Nvidia GPUs using CUDA are currently considered by both automotive industry and research community to provide the necessary computational power. However, despite several recent published works in this domain, there is an absolute lack of open implementations of GPU-based ADAS software, that can be used for benchmarking candidate platforms. In this work, we present a multi-CPU and GPU implementation of an open implementation of a pedestrian detection benchmark based on the Viola-Jones image recognition algorithm. We present our optimization strategies and evaluate our implementation on a multiprocessor system featuring multiple GPUs, showing an overall 88.5 x speedup over the sequential version.
现代和未来的汽车系统包括几个先进的驾驶辅助系统(ADAS)。这些系统需要传统汽车处理器和编程模型无法提供的显著性能。使用CUDA的多核cpu和Nvidia gpu目前被汽车行业和研究社区认为可以提供必要的计算能力。然而,尽管最近在该领域发表了一些作品,但仍然绝对缺乏基于gpu的ADAS软件的开放实现,可用于对候选平台进行基准测试。在这项工作中,我们提出了一个基于Viola-Jones图像识别算法的行人检测基准的开放实现的多cpu和GPU实现。我们展示了我们的优化策略,并在具有多个gpu的多处理器系统上评估了我们的实现,结果显示,与顺序版本相比,总体速度提高了88.5倍。
{"title":"An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems","authors":"Matina Maria Trompouki, Leonidas Kosmidis, N. Navarro","doi":"10.1109/ICCAD.2017.8203793","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203793","url":null,"abstract":"Modern and future automotive systems incorporate several Advanced Driving Assistance Systems (ADAS). Those systems require significant performance that cannot be provided with traditional automotive processors and programming models. Multicore CPUs and Nvidia GPUs using CUDA are currently considered by both automotive industry and research community to provide the necessary computational power. However, despite several recent published works in this domain, there is an absolute lack of open implementations of GPU-based ADAS software, that can be used for benchmarking candidate platforms. In this work, we present a multi-CPU and GPU implementation of an open implementation of a pedestrian detection benchmark based on the Viola-Jones image recognition algorithm. We present our optimization strategies and evaluate our implementation on a multiprocessor system featuring multiple GPUs, showing an overall 88.5 x speedup over the sequential version.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127290159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Novel heterogeneous computing platforms and 5G communications for IoT applications 面向物联网应用的新型异构计算平台和5G通信
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203872
Yuichi Nakamura, H. Shimonishi, Yuki Kobayashi, K. Satoda, Yashuhiro Matsunaga, Dai Kanetomo
IoT(Internet of Things), which collects various data in real world and analyzes values from collected data is one of good methods to help solve such serious problems and to construct efficient social systems. Meanwhile, since collected data is very complicated and has huge size, it takes a long time to collect and analyze “Complicated Big data”. Then, efficient computer systems and efficient network systems are necessary. Integration of heterogeneous computing and 5G network is one of the best platforms to provide complex IoT systems and services. In this paper, first, a reason why complex IoT systems require high performance hetero computing and high-speed communication systems like as 5G is presented. In the next, some use cases of IoT systems infrastructures empowered by hetero computing are also introduced.
物联网(Internet of Things, Internet of Things)收集现实世界中的各种数据,并从收集到的数据中分析价值,是帮助解决这些严重问题、构建高效社会系统的好方法之一。同时,由于收集的数据非常复杂,规模巨大,收集和分析“复杂大数据”需要很长时间。然后,高效的计算机系统和高效的网络系统是必要的。异构计算与5G网络的融合是提供复杂物联网系统和服务的最佳平台之一。本文首先阐述了复杂物联网系统需要5G等高性能异构计算和高速通信系统的原因。接下来,还介绍了由异构计算支持的物联网系统基础设施的一些用例。
{"title":"Novel heterogeneous computing platforms and 5G communications for IoT applications","authors":"Yuichi Nakamura, H. Shimonishi, Yuki Kobayashi, K. Satoda, Yashuhiro Matsunaga, Dai Kanetomo","doi":"10.1109/ICCAD.2017.8203872","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203872","url":null,"abstract":"IoT(Internet of Things), which collects various data in real world and analyzes values from collected data is one of good methods to help solve such serious problems and to construct efficient social systems. Meanwhile, since collected data is very complicated and has huge size, it takes a long time to collect and analyze “Complicated Big data”. Then, efficient computer systems and efficient network systems are necessary. Integration of heterogeneous computing and 5G network is one of the best platforms to provide complex IoT systems and services. In this paper, first, a reason why complex IoT systems require high performance hetero computing and high-speed communication systems like as 5G is presented. In the next, some use cases of IoT systems infrastructures empowered by hetero computing are also introduced.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122588887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Blockage-aware terminal propagation for placement wirelength minimization 阻塞感知终端传播的放置最小的无线长度
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203762
Sheng-Wei Yang, Yao-Wen Chang, Tung-Chieh Chen
Wirelength is the most fundamental objective in placement because it also affects various placement metrics (routability, timing, etc.). Half-perimeter wirelength (HPWL) is a pervasive metric for circuit placement. However, preplaced blocks (i.e., blockages) might misguide an HPWL-based placer to generate a placement solution that incurs significant routing detours. Consequently, it is desirable to develop an effective method to resolve the HPWL-rooted routing detour problem for placement optimization. This paper presents an efficient, generic, yet effective terminal propagation algorithm as a pre-placement process which can readily be integrated into a traditional placement flow to improve wirelength (and routability). Our algorithm identifies a region for each preplaced terminal according to its connectivity, and applies a minimum-cost maximum flow algorithm to propagate all preplaced terminals to their feasible propagation locations with the minimum total propagation length. Experimental results show that our flow with terminal propagation can reduce both global routed wirelength and routing congestion by 4% on average, compared with one without terminal propagation. In particular, our work also provides a long unnoticed insight into placement optimization with blockages, which can be addressed with an efficient, generic, yet effective scheme.
无线长度是布局中最基本的目标,因为它也会影响各种布局指标(可达性、定时等)。半周长(HPWL)是电路布置的普遍度量。然而,预先放置的块体(即阻塞物)可能会误导基于hpwl的砂矿机,从而产生导致大量路线绕道的放置解决方案。因此,需要开发一种有效的方法来解决基于hpwl的路径绕行问题。本文提出了一种高效,通用,有效的终端传播算法作为预放置过程,可以很容易地集成到传统的放置流程中,以提高无线性(和可达性)。我们的算法根据每个预置终端的连通性确定一个区域,并应用最小成本最大流量算法将所有预置终端以最小的总传播长度传播到其可行的传播位置。实验结果表明,与不带终端传播的路由流相比,带终端传播的路由流可使全局路由长度和路由拥塞平均减少4%。特别是,我们的工作还提供了一个长期未被注意到的关于阻塞的放置优化的见解,这可以通过一个高效,通用但有效的方案来解决。
{"title":"Blockage-aware terminal propagation for placement wirelength minimization","authors":"Sheng-Wei Yang, Yao-Wen Chang, Tung-Chieh Chen","doi":"10.1109/ICCAD.2017.8203762","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203762","url":null,"abstract":"Wirelength is the most fundamental objective in placement because it also affects various placement metrics (routability, timing, etc.). Half-perimeter wirelength (HPWL) is a pervasive metric for circuit placement. However, preplaced blocks (i.e., blockages) might misguide an HPWL-based placer to generate a placement solution that incurs significant routing detours. Consequently, it is desirable to develop an effective method to resolve the HPWL-rooted routing detour problem for placement optimization. This paper presents an efficient, generic, yet effective terminal propagation algorithm as a pre-placement process which can readily be integrated into a traditional placement flow to improve wirelength (and routability). Our algorithm identifies a region for each preplaced terminal according to its connectivity, and applies a minimum-cost maximum flow algorithm to propagate all preplaced terminals to their feasible propagation locations with the minimum total propagation length. Experimental results show that our flow with terminal propagation can reduce both global routed wirelength and routing congestion by 4% on average, compared with one without terminal propagation. In particular, our work also provides a long unnoticed insight into placement optimization with blockages, which can be addressed with an efficient, generic, yet effective scheme.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131010549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transportation security in the era of autonomous vehicles: Challenges and practice 自动驾驶汽车时代的交通安全:挑战与实践
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203895
S. Ray
The Transportation Sector is one of the Critical Infrastructure Sectors identified by the United States Department of Homeland Security. Developing robust, secure, and resilient designs for Transportation Sector components is particularly challenging since it requires significant, real-time coordination with automotive, marine, and aviation systems that are themselves undergoing transformative changes in electronic complexity. In this paper we provide a general overview of security challenges in the Transportation Sector, focusing in particular the Highways and Roadways sub-sector. We discuss current and emergent challenges in this area arising as a result of increased autonomy (and hence complexity) of automotive systems, and point out key research needs.
交通运输部门是美国国土安全部确定的关键基础设施部门之一。为交通运输部门的组件开发稳健、安全和弹性的设计尤其具有挑战性,因为它需要与汽车、船舶和航空系统进行重要的实时协调,这些系统本身也在经历电子复杂性的变革。在本文中,我们提供了运输部门安全挑战的总体概述,特别是公路和公路子部门。我们讨论了由于汽车系统的自主性(以及复杂性)的提高而在这一领域出现的当前和新出现的挑战,并指出了关键的研究需求。
{"title":"Transportation security in the era of autonomous vehicles: Challenges and practice","authors":"S. Ray","doi":"10.1109/ICCAD.2017.8203895","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203895","url":null,"abstract":"The Transportation Sector is one of the Critical Infrastructure Sectors identified by the United States Department of Homeland Security. Developing robust, secure, and resilient designs for Transportation Sector components is particularly challenging since it requires significant, real-time coordination with automotive, marine, and aviation systems that are themselves undergoing transformative changes in electronic complexity. In this paper we provide a general overview of security challenges in the Transportation Sector, focusing in particular the Highways and Roadways sub-sector. We discuss current and emergent challenges in this area arising as a result of increased autonomy (and hence complexity) of automotive systems, and point out key research needs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114860663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A novel cache bank timing attack 一种新的缓存库定时攻击方法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203771
Z. Jiang, Yunsi Fei
To avoid information leakage through execution, modern software implementations of cryptographic algorithms target constant timing complexity, i.e., the number of instructions does not vary with different inputs. However, often times, the underlying microarchitecture behaves differently under different data inputs, which covertly leaks confidential information through the timing channel. Cache timing channel due to cache miss penalties has been explored in recent years to break system security. In this paper, we exploit a finer-grained L1 cache bank timing channel, the stalling delay due to cache bank conflicts, and develop a new timing attack against table lookup-based cryptographic algorithms. We implement the timing attack with three different methods on Sandy Bridge micro-architecture, and successfully recover the complete 128-bit AES encryption key. The most effective attack can achieve 50% success rate using 75,000 samples and 100% success rate using 200,000 samples. The whole attack process from collecting samples to recoverying all key bytes takes less than 3 minutes. We anticipate the new timing attack to be a threat to various platforms, including ARM-based smart phones and performance-critical accelerators like GPUs.
为了避免执行过程中的信息泄露,现代加密算法的软件实现以恒定的时序复杂度为目标,即指令数不随输入的不同而变化。然而,通常情况下,底层微架构在不同的数据输入下表现不同,这将通过定时通道秘密地泄露机密信息。由于缓存缺失惩罚导致的缓存时间通道破坏了系统的安全性,这是近年来研究的热点问题。在本文中,我们利用细粒度的L1缓存库时间通道,由于缓存库冲突导致的延迟,并开发了一种新的针对基于表查找的加密算法的时间攻击。我们在Sandy Bridge微架构上用三种不同的方法实现了定时攻击,并成功地恢复了完整的128位AES加密密钥。最有效的攻击可以使用75,000个样本达到50%的成功率,使用200,000个样本达到100%的成功率。从采集样本到恢复所有关键字节,整个攻击过程不到3分钟。我们预计新的定时攻击将对各种平台构成威胁,包括基于arm的智能手机和性能关键的加速器,如gpu。
{"title":"A novel cache bank timing attack","authors":"Z. Jiang, Yunsi Fei","doi":"10.1109/ICCAD.2017.8203771","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203771","url":null,"abstract":"To avoid information leakage through execution, modern software implementations of cryptographic algorithms target constant timing complexity, i.e., the number of instructions does not vary with different inputs. However, often times, the underlying microarchitecture behaves differently under different data inputs, which covertly leaks confidential information through the timing channel. Cache timing channel due to cache miss penalties has been explored in recent years to break system security. In this paper, we exploit a finer-grained L1 cache bank timing channel, the stalling delay due to cache bank conflicts, and develop a new timing attack against table lookup-based cryptographic algorithms. We implement the timing attack with three different methods on Sandy Bridge micro-architecture, and successfully recover the complete 128-bit AES encryption key. The most effective attack can achieve 50% success rate using 75,000 samples and 100% success rate using 200,000 samples. The whole attack process from collecting samples to recoverying all key bytes takes less than 3 minutes. We anticipate the new timing attack to be a threat to various platforms, including ARM-based smart phones and performance-critical accelerators like GPUs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122142995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
ICCAD-2017 CAD contest in multi-deck standard cell legalization and benchmarks ICCAD-2017 CAD大赛中多层标准单元的合法化和基准
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203870
Nima Karimpour Darav, Ismail Bustany, A. Kennings, Ravi Mamidi
An increasing number of multi-deck cells occupying multiple rows (e.g. multi-bit registers) are used in advanced node technologies to achieve low power and high performance. The multi-deck standard cell legalization not only should remove all overlaps between cells but also should satisfy delicate and complicated design rules with preserving the quality of the given placement by applying the minimal perturbation. In addition, the process must be fast and robust to handle the sheer number of cells in the state-of-the-art designs. For this purpose, we have defined an evaluation metric based on maximum, average cell movements, and Half Perimeter Wire Length (HPWL) as well as runtime of the legalization algorithm. In addition, we have introduced a set of benchmarks that include multi-deck cells with a range of heights (1–4 row heights).
在先进的节点技术中,越来越多的多层单元占用多行(例如多比特寄存器),以实现低功耗和高性能。多层标准单元合法化不仅要消除单元之间的所有重叠,而且要通过施加最小的扰动来保持给定位置的质量,从而满足微妙而复杂的设计规则。此外,在最先进的设计中,该过程必须快速而稳健地处理大量的细胞。为此,我们定义了一个基于最大、平均单元移动、半周线长度(HPWL)以及合法化算法运行时间的评估指标。此外,我们还引入了一组基准,其中包括具有一定高度范围(1-4行高度)的多层单元。
{"title":"ICCAD-2017 CAD contest in multi-deck standard cell legalization and benchmarks","authors":"Nima Karimpour Darav, Ismail Bustany, A. Kennings, Ravi Mamidi","doi":"10.1109/ICCAD.2017.8203870","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203870","url":null,"abstract":"An increasing number of multi-deck cells occupying multiple rows (e.g. multi-bit registers) are used in advanced node technologies to achieve low power and high performance. The multi-deck standard cell legalization not only should remove all overlaps between cells but also should satisfy delicate and complicated design rules with preserving the quality of the given placement by applying the minimal perturbation. In addition, the process must be fast and robust to handle the sheer number of cells in the state-of-the-art designs. For this purpose, we have defined an evaluation metric based on maximum, average cell movements, and Half Perimeter Wire Length (HPWL) as well as runtime of the legalization algorithm. In addition, we have introduced a set of benchmarks that include multi-deck cells with a range of heights (1–4 row heights).","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117172277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations 可重用性是FIRRTL的基础:硬件构造语言、编译器框架和转换
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203780
Adam M. Izraelevitz, Jack Koenig, Patrick Li, Richard Lin, Angie Wang, Albert Magyar, Donggyu Kim, Colin Schmidt, Chick Markley, Jim Lawson, J. Bachrach
Enabled by modern languages and retargetable compilers, software development is in a virtual “Cambrian explosion” driven by a critical mass of powerfully parameterized libraries; but hardware development practices lag far behind. We hypothesize that existing hardware construction languages (HCLs) and novel hardware compiler frameworks (HCFs) can put hardware development on a similar evolutionary path by enabling new hardware libraries to be independent of underlying process technologies including FPGA mappings. We support this claim by (1) evaluating the degree with which Chisel, an existing HCL, can support powerfully parameterized libraries, and (2) introducing the concept and implementation of an HCF that uses an open-source hardware intermediate representation, FIRRTL (Flexible Intermediate Representation for RTL), to transform target-independent RTL into technology-specific RTL. Finally, we evaluate many hardware compiler transformations, including simplifying transformations, analyses, optimizations, instrumentations, and specializations, which demonstrate the power of a combined HCL and HCF approach.
在现代语言和可重定向编译器的支持下,软件开发处于一场虚拟的“寒武纪大爆发”,由大量功能强大的参数化库驱动;但硬件开发实践远远落后。我们假设,现有的硬件构建语言(hcl)和新型硬件编译器框架(hcf)可以使新的硬件库独立于包括FPGA映射在内的底层进程技术,从而将硬件开发置于类似的进化路径上。我们通过(1)评估现有HCL Chisel支持强大参数化库的程度,以及(2)引入HCF的概念和实现,该HCF使用开源硬件中间表示,FIRRTL (RTL的灵活中间表示),将目标无关的RTL转换为特定于技术的RTL。最后,我们评估了许多硬件编译器转换,包括简化转换、分析、优化、检测和专门化,这些都展示了HCL和HCF方法相结合的强大功能。
{"title":"Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations","authors":"Adam M. Izraelevitz, Jack Koenig, Patrick Li, Richard Lin, Angie Wang, Albert Magyar, Donggyu Kim, Colin Schmidt, Chick Markley, Jim Lawson, J. Bachrach","doi":"10.1109/ICCAD.2017.8203780","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203780","url":null,"abstract":"Enabled by modern languages and retargetable compilers, software development is in a virtual “Cambrian explosion” driven by a critical mass of powerfully parameterized libraries; but hardware development practices lag far behind. We hypothesize that existing hardware construction languages (HCLs) and novel hardware compiler frameworks (HCFs) can put hardware development on a similar evolutionary path by enabling new hardware libraries to be independent of underlying process technologies including FPGA mappings. We support this claim by (1) evaluating the degree with which Chisel, an existing HCL, can support powerfully parameterized libraries, and (2) introducing the concept and implementation of an HCF that uses an open-source hardware intermediate representation, FIRRTL (Flexible Intermediate Representation for RTL), to transform target-independent RTL into technology-specific RTL. Finally, we evaluate many hardware compiler transformations, including simplifying transformations, analyses, optimizations, instrumentations, and specializations, which demonstrate the power of a combined HCL and HCF approach.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
Towards reliability-aware circuit design in nanoscale FinFET technology: — New-generation aging model and circuit reliability simulator 面向纳米级FinFET技术的可靠性感知电路设计——新一代老化模型和电路可靠性模拟器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203856
Shaofeng Guo, Runsheng Wang, Zhuoqing Yu, P. Hao, P. Ren, Yangyuan Wang, S. Liao, Chunyi Huang, Tianlei Guo, A. Chen, Jushan Xie, Ru Huang
In this paper, an industry-level new-generation EDA solution for reliability-aware design in nanoscale FinFET technology is presented for the first time, with new compact transistor aging models and upgraded circuit reliability simulator. Our work solves various issues found in FinFET silicon data of NBTI aging. Especially, instead of ignoring or less accurate NBTI recovery effect model in traditional simulators, accurate NBTI degradation and recovery models are proposed and validated by silicon data for full stress/recovery range in the FinFET technology. The history effect, one of the important features of NBTI which is missing in the existing industrial tools, is included based on new simulation methodology. Since FinFET reliability data suggests the conventional linear extrapolation method is no longer valid, an accurate fast-speed long-term prediction method is proposed based on smart iteration flows of equivalence. The frequency dependence of NBTI, which draws much attention, is included in the new simulator automatically. This work has been integrated into Cadence reliability simulator, providing designers an opportunity for accurate reliability-aware circuit design.
本文首次提出了一种工业级的新一代EDA解决方案,用于纳米级FinFET技术的可靠性感知设计,该解决方案采用了新的紧凑晶体管老化模型和升级的电路可靠性模拟器。我们的工作解决了FinFET硅数据NBTI老化的各种问题。特别是,在传统的仿真器中忽略或不太精确的NBTI恢复效应模型,提出了精确的NBTI退化和恢复模型,并在FinFET技术的全应力/恢复范围内通过硅数据进行了验证。在新的仿真方法的基础上,引入了历史效应,这是现有工业工具所缺少的NBTI的重要特征之一。针对FinFET可靠性数据表明传统的线性外推方法已不再有效的问题,提出了一种基于等效智能迭代流的快速准确的长期预测方法。目前备受关注的NBTI的频率依赖性被自动包含在新的仿真器中。这项工作已经集成到Cadence可靠性模拟器中,为设计人员提供了精确的可靠性感知电路设计的机会。
{"title":"Towards reliability-aware circuit design in nanoscale FinFET technology: — New-generation aging model and circuit reliability simulator","authors":"Shaofeng Guo, Runsheng Wang, Zhuoqing Yu, P. Hao, P. Ren, Yangyuan Wang, S. Liao, Chunyi Huang, Tianlei Guo, A. Chen, Jushan Xie, Ru Huang","doi":"10.1109/ICCAD.2017.8203856","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203856","url":null,"abstract":"In this paper, an industry-level new-generation EDA solution for reliability-aware design in nanoscale FinFET technology is presented for the first time, with new compact transistor aging models and upgraded circuit reliability simulator. Our work solves various issues found in FinFET silicon data of NBTI aging. Especially, instead of ignoring or less accurate NBTI recovery effect model in traditional simulators, accurate NBTI degradation and recovery models are proposed and validated by silicon data for full stress/recovery range in the FinFET technology. The history effect, one of the important features of NBTI which is missing in the existing industrial tools, is included based on new simulation methodology. Since FinFET reliability data suggests the conventional linear extrapolation method is no longer valid, an accurate fast-speed long-term prediction method is proposed based on smart iteration flows of equivalence. The frequency dependence of NBTI, which draws much attention, is included in the new simulator automatically. This work has been integrated into Cadence reliability simulator, providing designers an opportunity for accurate reliability-aware circuit design.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123256810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations 硬件神经网络对动态电压和温度变化的脆弱性评估
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203882
Xun Jiao, Mulong Luo, Jeng-Hau Lin, Rajesh K. Gupta
As a problem solving method, neural networks have shown broad applicability from medical applications, speech recognition, and natural language processing. This success has even led to implementation of neural network algorithms into hardware. In this paper, we explore two questions: (a) to what extent microelectronic variations affects the quality of results by neural networks; and (b) if the answer to first question represents an opportunity to optimize the implementation of neural network algorithms. Regarding first question, variations are now increasingly common in aggressive process nodes and typically manifest as an increased frequency of timing errors. Combating variations — due to process and/or operating conditions — usually results in increased guardbands in circuit and architectural design, thus reducing the gains from process technology advances. Given the inherent resilience of neural networks due to adaptation of their learning parameters, one would expect the quality of results produced by neural networks to be relatively insensitive to the rising timing error rates caused by increased variations. On the contrary, using two frequently used neural networks (MLP and CNN), our results show that variations can significantly affect the inference accuracy. This paper outlines our assessment methodology and use of a cross-layer evaluation approach that extracts hardware-level errors from twenty different operating conditions and then inject such errors back to the software layer in an attempt to answer the second question posed above.
神经网络作为一种解决问题的方法,在医学应用、语音识别和自然语言处理等方面显示出广泛的适用性。这一成功甚至导致了神经网络算法在硬件上的实现。在本文中,我们探讨了两个问题:(a)微电子变化在多大程度上影响神经网络结果的质量;(b)如果第一个问题的答案代表了优化神经网络算法实现的机会。关于第一个问题,变化现在在积极的过程节点中越来越普遍,并且通常表现为时间错误频率的增加。由于工艺和/或操作条件的变化,通常会导致电路和架构设计中的保护带增加,从而减少工艺技术进步带来的收益。鉴于神经网络由于其学习参数的适应性而具有固有的弹性,人们会期望神经网络产生的结果质量对由变化增加引起的定时错误率上升相对不敏感。相反,使用两种常用的神经网络(MLP和CNN),我们的结果表明,变化会显著影响推理精度。本文概述了我们的评估方法和跨层评估方法的使用,该方法从20种不同的操作条件中提取硬件级错误,然后将这些错误注入软件层,试图回答上面提出的第二个问题。
{"title":"An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations","authors":"Xun Jiao, Mulong Luo, Jeng-Hau Lin, Rajesh K. Gupta","doi":"10.1109/ICCAD.2017.8203882","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203882","url":null,"abstract":"As a problem solving method, neural networks have shown broad applicability from medical applications, speech recognition, and natural language processing. This success has even led to implementation of neural network algorithms into hardware. In this paper, we explore two questions: (a) to what extent microelectronic variations affects the quality of results by neural networks; and (b) if the answer to first question represents an opportunity to optimize the implementation of neural network algorithms. Regarding first question, variations are now increasingly common in aggressive process nodes and typically manifest as an increased frequency of timing errors. Combating variations — due to process and/or operating conditions — usually results in increased guardbands in circuit and architectural design, thus reducing the gains from process technology advances. Given the inherent resilience of neural networks due to adaptation of their learning parameters, one would expect the quality of results produced by neural networks to be relatively insensitive to the rising timing error rates caused by increased variations. On the contrary, using two frequently used neural networks (MLP and CNN), our results show that variations can significantly affect the inference accuracy. This paper outlines our assessment methodology and use of a cross-layer evaluation approach that extracts hardware-level errors from twenty different operating conditions and then inject such errors back to the software layer in an attempt to answer the second question posed above.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Cost-effective write disturbance mitigation techniques for advancing PCM density 提高PCM密度的低成本写入干扰缓解技术
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203786
Mohammad Khavari Tavana, D. Kaeli
Rapid technology scaling has enabled the integration of many cores into a single chip. Given this level of core integration, the requirements for a large and scalable main memory system will only grow. Current DRAM-based main memory systems face power and scalability issues when working at sub-micron scales. Phase Change Memory (PCM) has been proposed as one of the most promising technology candidates to replace or complement DRAM. However, scaling down cell sizes introduces significant thermal-based write disturbance challenges in PCM. Due to the heat generated for programming cells, neighboring cells may be disturbed, experiencing changes in their values. A naive solution is to increase inter-cell space, attempting to isolate cell programming and eliminating write disturbance, but this approach significantly reduces PCM density. In this paper, we propose two cost-effective solutions to reduce the probability of write disturbance. Our solutions come with few side-effects on other memory system metrics. The first technique is based on data encoding, and tries to reduce the number of vulnerable data patterns when writing data to main memory. The second technique detects vulnerable cells, and overwrites them if their occurrence is below a set threshold. The proposed techniques are general and can avoid much of the performance overhead introduced by write disturbance. Our proposed solutions can reduce the average number of writes by 49% over traditional schemes, while incurring minimal impact on PCM lifetime and energy consumption.
快速的技术扩展使许多核心集成到单个芯片中成为可能。考虑到这种级别的核心集成,对大型可扩展主存储系统的需求只会增长。当前基于dram的主存储系统在亚微米尺度下工作时面临功率和可扩展性问题。相变存储器(PCM)已被认为是替代或补充DRAM的最有前途的技术候选之一。然而,缩小电池尺寸在PCM中引入了显著的基于热的写入干扰挑战。由于编程单元产生的热量,相邻的单元可能受到干扰,从而经历其值的变化。一种幼稚的解决方案是增加单元间空间,试图隔离单元编程并消除写入干扰,但这种方法显著降低了PCM密度。在本文中,我们提出了两种具有成本效益的解决方案来降低写入干扰的概率。我们的解决方案对其他内存系统指标几乎没有副作用。第一种技术基于数据编码,并试图在将数据写入主存时减少易受攻击的数据模式的数量。第二种技术检测脆弱的细胞,如果它们的出现低于设定的阈值,就覆盖它们。所提出的技术是通用的,可以避免写干扰带来的大部分性能开销。我们提出的解决方案可以比传统方案减少49%的平均写入次数,同时对PCM寿命和能耗的影响最小。
{"title":"Cost-effective write disturbance mitigation techniques for advancing PCM density","authors":"Mohammad Khavari Tavana, D. Kaeli","doi":"10.1109/ICCAD.2017.8203786","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203786","url":null,"abstract":"Rapid technology scaling has enabled the integration of many cores into a single chip. Given this level of core integration, the requirements for a large and scalable main memory system will only grow. Current DRAM-based main memory systems face power and scalability issues when working at sub-micron scales. Phase Change Memory (PCM) has been proposed as one of the most promising technology candidates to replace or complement DRAM. However, scaling down cell sizes introduces significant thermal-based write disturbance challenges in PCM. Due to the heat generated for programming cells, neighboring cells may be disturbed, experiencing changes in their values. A naive solution is to increase inter-cell space, attempting to isolate cell programming and eliminating write disturbance, but this approach significantly reduces PCM density. In this paper, we propose two cost-effective solutions to reduce the probability of write disturbance. Our solutions come with few side-effects on other memory system metrics. The first technique is based on data encoding, and tries to reduce the number of vulnerable data patterns when writing data to main memory. The second technique detects vulnerable cells, and overwrites them if their occurrence is below a set threshold. The proposed techniques are general and can avoid much of the performance overhead introduced by write disturbance. Our proposed solutions can reduce the average number of writes by 49% over traditional schemes, while incurring minimal impact on PCM lifetime and energy consumption.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116209722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1