首页 > 最新文献

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
Embedded reconfigurable logic for ASIC design obfuscation against supply chain attacks 针对供应链攻击的嵌入式可重构ASIC设计混淆逻辑
Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.256
Bao Liu, Brandon Wang
Hardware is the foundation and the root of trust of any security system. However, in today's global IC industry, an IP provider, an IC design house, a CAD company, or a foundry may subvert a VLSI system with back doors or logic bombs. Such a supply chain adversary's capability is rooted in his knowledge on the hardware design. Successful hardware design obfuscation would severely limit a supply chain adversary's capability if not preventing all supply chain attacks. However, not all designs are obfuscatable in traditional technologies. We propose to achieve ASIC design obfuscation based on embedded reconfigurable logic which is determined by the end user and unknown to any party in the supply chain. Combined with other security techniques, embedded reconfigurable logic can provide the root of ASIC design obfuscation, data confidentiality and tamper-proofness. As a case study, we evaluate hardware-based code injection attacks and reconfiguration-based instruction set obfuscation based on an open source SPARC processor LEON2. We prevent program monitor Trojan attacks and increase the area of a minimum code injection Trojan with a 1KB ROM by 2.38% for every 1% area increase of the LEON2 processor.
硬件是任何安全系统信任的基础和根源。然而,在当今全球集成电路产业中,IP提供商、集成电路设计公司、CAD公司或代工厂都可能通过后门或逻辑炸弹颠覆VLSI系统。这样一个供应链对手的能力根植于他对硬件设计的了解。如果不能阻止所有供应链攻击,成功的硬件设计混淆将严重限制供应链对手的能力。然而,并非所有的设计在传统技术中都是可混淆的。我们建议实现基于嵌入式可重构逻辑的ASIC设计混淆,该逻辑由最终用户确定,供应链中的任何一方都不知道。与其他安全技术相结合,嵌入式可重构逻辑可以提供ASIC设计混淆,数据保密性和防篡改的根源。作为案例研究,我们评估了基于硬件的代码注入攻击和基于开源SPARC处理器LEON2的基于重新配置的指令集混淆。我们防止了程序监控木马的攻击,并且在一个1KB ROM的最小代码注入木马的面积每增加1%,就会增加2.38%。
{"title":"Embedded reconfigurable logic for ASIC design obfuscation against supply chain attacks","authors":"Bao Liu, Brandon Wang","doi":"10.7873/DATE2014.256","DOIUrl":"https://doi.org/10.7873/DATE2014.256","url":null,"abstract":"Hardware is the foundation and the root of trust of any security system. However, in today's global IC industry, an IP provider, an IC design house, a CAD company, or a foundry may subvert a VLSI system with back doors or logic bombs. Such a supply chain adversary's capability is rooted in his knowledge on the hardware design. Successful hardware design obfuscation would severely limit a supply chain adversary's capability if not preventing all supply chain attacks. However, not all designs are obfuscatable in traditional technologies. We propose to achieve ASIC design obfuscation based on embedded reconfigurable logic which is determined by the end user and unknown to any party in the supply chain. Combined with other security techniques, embedded reconfigurable logic can provide the root of ASIC design obfuscation, data confidentiality and tamper-proofness. As a case study, we evaluate hardware-based code injection attacks and reconfiguration-based instruction set obfuscation based on an open source SPARC processor LEON2. We prevent program monitor Trojan attacks and increase the area of a minimum code injection Trojan with a 1KB ROM by 2.38% for every 1% area increase of the LEON2 processor.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80077795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Hybrid wire-surface wave architecture for one-to-many communication in networks-on-chip 片上网络中一对多通信的线面波混合架构
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.287
Ammar Karkar, Nizar Dahir, Ra'ed Al-Dujaily, K. Tong, T. Mak, A. Yakovlev
Network-on-chip (NoC) is a communication paradigm that has emerged to tackle different on-chip challenges and has satisfied different demands in terms of high performance and economical interconnect implementation. However, merely metal based NoC pursuit offers limited scalability with the relentless technology scaling, especially in one-to-many (1-to-M) communication. To meet the scalability demand, this paper proposes a new hybrid architecture empowered by both metal interconnects and Zenneck surface wave interconnects (SWI). This architecture, in conjunction with newly proposed routing and global arbitration schemes, avoids overloading the NoC and alleviates traffic hotspots compared to the trend of handling 1-to-M traffic as unicast. This work addresses the system level challenges for intra chip multicasting. Evaluation results, based on a cycle-accurate simulation and hardware description, demonstrate the effectiveness of the proposed architecture in terms of power reduction ratio of 4 to 12X and average delay reduction of 25X or more, compared to a regular NoC. These results are achieved with negligible hardware overheads.
片上网络(NoC)是为了解决不同的片上挑战而出现的一种通信范式,在高性能和经济的互连实现方面满足了不同的需求。然而,仅仅基于金属的NoC追求在无情的技术扩展中提供了有限的可扩展性,特别是在一对多(1对m)通信中。为了满足可扩展性需求,本文提出了一种由金属互连和Zenneck表面波互连(SWI)支持的新型混合架构。这种架构与新提出的路由和全局仲裁方案相结合,避免了NoC过载,缓解了流量热点,而不是像单播那样处理1对m流量。这项工作解决了芯片内多播的系统级挑战。基于周期精确仿真和硬件描述的评估结果表明,与常规NoC相比,所提出架构的功耗降低比为4至12倍,平均延迟降低25倍或更多。实现这些结果的硬件开销可以忽略不计。
{"title":"Hybrid wire-surface wave architecture for one-to-many communication in networks-on-chip","authors":"Ammar Karkar, Nizar Dahir, Ra'ed Al-Dujaily, K. Tong, T. Mak, A. Yakovlev","doi":"10.7873/DATE.2014.287","DOIUrl":"https://doi.org/10.7873/DATE.2014.287","url":null,"abstract":"Network-on-chip (NoC) is a communication paradigm that has emerged to tackle different on-chip challenges and has satisfied different demands in terms of high performance and economical interconnect implementation. However, merely metal based NoC pursuit offers limited scalability with the relentless technology scaling, especially in one-to-many (1-to-M) communication. To meet the scalability demand, this paper proposes a new hybrid architecture empowered by both metal interconnects and Zenneck surface wave interconnects (SWI). This architecture, in conjunction with newly proposed routing and global arbitration schemes, avoids overloading the NoC and alleviates traffic hotspots compared to the trend of handling 1-to-M traffic as unicast. This work addresses the system level challenges for intra chip multicasting. Evaluation results, based on a cycle-accurate simulation and hardware description, demonstrate the effectiveness of the proposed architecture in terms of power reduction ratio of 4 to 12X and average delay reduction of 25X or more, compared to a regular NoC. These results are achieved with negligible hardware overheads.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"320 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80207155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Optimization of design complexity in time-multiplexed constant multiplications 时间复用常数乘法中设计复杂度的优化
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.313
L. Aksoy, P. Flores, J. Monteiro
The multiplication of constants by a data input is an essential operation in digital signal processing (DSP) systems. For applications requiring a large number of constant multiplications under stringent hardware constraints, it is generally realized under a folded architecture, where a single constant selected from a set of multiple constants is multiplied by the data input at each time, called time-multiplexed constant multiplication (TMCM). This paper addresses the problem of optimizing the complexity of a TMCM design and introduces an algorithm that finds the least complex TMCM design by sharing the logic operators, i.e., adders, subtractors, adders/subtractors, and multiplexors (MUXes). It includes efficient search methods, yielding better results than existing TMCM algorithms.
在数字信号处理(DSP)系统中,常量的乘法运算是一个重要的操作。对于需要在严格硬件约束下进行大量常数乘法的应用,一般采用折叠架构实现,即从一组多个常数中选择一个常数,每次与输入的数据相乘,称为时间复用常数乘法(TMCM)。本文解决了优化TMCM设计复杂性的问题,并介绍了一种算法,该算法通过共享逻辑运算符,即加、减、加/减和多路复用器(mux),找到最不复杂的TMCM设计。它包括有效的搜索方法,产生比现有的TMCM算法更好的结果。
{"title":"Optimization of design complexity in time-multiplexed constant multiplications","authors":"L. Aksoy, P. Flores, J. Monteiro","doi":"10.7873/DATE.2014.313","DOIUrl":"https://doi.org/10.7873/DATE.2014.313","url":null,"abstract":"The multiplication of constants by a data input is an essential operation in digital signal processing (DSP) systems. For applications requiring a large number of constant multiplications under stringent hardware constraints, it is generally realized under a folded architecture, where a single constant selected from a set of multiple constants is multiplied by the data input at each time, called time-multiplexed constant multiplication (TMCM). This paper addresses the problem of optimizing the complexity of a TMCM design and introduces an algorithm that finds the least complex TMCM design by sharing the logic operators, i.e., adders, subtractors, adders/subtractors, and multiplexors (MUXes). It includes efficient search methods, yielding better results than existing TMCM algorithms.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"46 4 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81443375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
EDT: A specification notation for reactive systems EDT:反应系统的规范符号
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.228
R. Venkatesh, U. Shrotri, G. M. Krishna, Supriya Agrawal
Requirements of reactive systems express the relationship between sensors and actuators and are usually described in a natural language and a mix of state-based and stream-based paradigms. Translating these into a formal language is an important pre-requisite to automate the verification of requirements. The analysis effort required for the translation is a prime hurdle to formalization gaining acceptance among software engineers and testers. We present Expressive Decision Tables (EDT), a novel formal notation designed to reduce the translation efforts from both state-based and stream-based informal requirements. We have also built a tool, EDTTool, to generate test data and expected output from EDT specifications. In a case study consisting of more than 200 informal requirements of a real-life automotive application, translation of the informal requirements into EDT needed 43% lesser time than their translation into Statecharts. Further, we tested the Statecharts using test data generated by EDTTool from the corresponding EDT specifications. This testing detected one bug in a mature feature and exposed several missing requirements in another. The paper presents the EDT notation, comparison to other similar notations and the details of the case study.
反应系统的需求表达了传感器和执行器之间的关系,通常用自然语言和基于状态和基于流的范例的混合来描述。将这些转换成正式语言是自动化需求验证的重要先决条件。翻译所需的分析工作是获得软件工程师和测试人员认可的形式化的主要障碍。我们提出了表达决策表(Expressive Decision Tables, EDT),这是一种新颖的形式化符号,旨在减少基于状态和基于流的非正式需求的翻译工作。我们还构建了一个工具EDTTool,用于从EDT规范生成测试数据和预期输出。在一个包含200多个实际汽车应用程序的非正式需求的案例研究中,将非正式需求转换为EDT所需的时间比转换为Statecharts所需的时间少43%。此外,我们使用EDTTool根据相应的EDT规范生成的测试数据测试Statecharts。该测试检测到一个成熟特性中的一个错误,并暴露了另一个特性中缺少的几个需求。本文介绍了EDT符号,与其他类似符号的比较和案例研究的细节。
{"title":"EDT: A specification notation for reactive systems","authors":"R. Venkatesh, U. Shrotri, G. M. Krishna, Supriya Agrawal","doi":"10.7873/DATE.2014.228","DOIUrl":"https://doi.org/10.7873/DATE.2014.228","url":null,"abstract":"Requirements of reactive systems express the relationship between sensors and actuators and are usually described in a natural language and a mix of state-based and stream-based paradigms. Translating these into a formal language is an important pre-requisite to automate the verification of requirements. The analysis effort required for the translation is a prime hurdle to formalization gaining acceptance among software engineers and testers. We present Expressive Decision Tables (EDT), a novel formal notation designed to reduce the translation efforts from both state-based and stream-based informal requirements. We have also built a tool, EDTTool, to generate test data and expected output from EDT specifications. In a case study consisting of more than 200 informal requirements of a real-life automotive application, translation of the informal requirements into EDT needed 43% lesser time than their translation into Statecharts. Further, we tested the Statecharts using test data generated by EDTTool from the corresponding EDT specifications. This testing detected one bug in a mature feature and exposed several missing requirements in another. The paper presents the EDT notation, comparison to other similar notations and the details of the case study.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"153 9 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83135160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Video analytics using beyond CMOS devices 视频分析使用超越CMOS器件
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.357
N. Vijaykrishnan, S. Datta, G. Cauwenberghs, D. Chiarulli, S. Levitan, H. P. Wong
The human vision system understands and interprets complex scenes for a variety of visual tasks in real-time while consuming less than 20 Watts of power. The holistic design of artificial vision systems that will approach and eventually exceed the capabilities of human vision systems is a grand challenge. The design of such a system needs advances in multiple disciplines. This paper focuses on advances needed in the computational fabric and provides an overview of a new-genre of architectures inspired by advances in both the understanding of the visual cortex and the emergence of devices with new mechanisms for state computations.
人类视觉系统可以实时理解和解释各种视觉任务的复杂场景,同时消耗不到20瓦的功率。人工视觉系统的整体设计将接近并最终超过人类视觉系统的能力是一个巨大的挑战。这样一个系统的设计需要多学科的进步。本文着重于计算结构所需的进展,并概述了受视觉皮层理解的进展和具有状态计算新机制的设备的出现所启发的一种新类型的架构。
{"title":"Video analytics using beyond CMOS devices","authors":"N. Vijaykrishnan, S. Datta, G. Cauwenberghs, D. Chiarulli, S. Levitan, H. P. Wong","doi":"10.7873/DATE.2014.357","DOIUrl":"https://doi.org/10.7873/DATE.2014.357","url":null,"abstract":"The human vision system understands and interprets complex scenes for a variety of visual tasks in real-time while consuming less than 20 Watts of power. The holistic design of artificial vision systems that will approach and eventually exceed the capabilities of human vision systems is a grand challenge. The design of such a system needs advances in multiple disciplines. This paper focuses on advances needed in the computational fabric and provides an overview of a new-genre of architectures inspired by advances in both the understanding of the visual cortex and the emergence of devices with new mechanisms for state computations.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79876675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters 紧耦合硬件对嵌入式共享内存集群中动态并行加速的支持
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.169
P. Burgio, Giuseppe Tagliavini, Francesco Conti, A. Marongiu, L. Benini
Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism in systems which i) are resource-constrained and ii) run applications with small units of work calls for a runtime environment which has minimal overhead for the scheduling of parallel tasks. In this work, we study the major sources of overhead in the implementation of OpenMP dynamic loops, sections and tasks, and propose a hardware implementation of a generic Scheduling Engine (HWSE) which fits the semantics of the three constructs. The HWSE is designed as a tightly-coupled block to the PEs within a multi-core cluster, communicating through a shared-memory interface. This allows very fast programming and synchronization with the controlling PEs, fundamental to achieving fast dynamic scheduling, and ultimately to enable fine-grained parallelism. We prove the effectiveness of our solutions with real applications and synthetic benchmarks, using a cycle-accurate virtual platform.
嵌入式系统的现代设计越来越多地采用基于集群的架构,其中小型核心集通过紧密耦合的共享内存库和高性能互连进行通信。同时,现代应用程序的复杂性需要新的编程抽象来利用这些平台上的动态和/或不规则并行性。在资源受限的系统中支持动态并行,以及在运行具有小工作单元的应用程序的系统中支持动态并行,这样的运行时环境对并行任务的调度开销最小。在这项工作中,我们研究了OpenMP动态循环、分段和任务实现中的主要开销来源,并提出了一种符合这三种结构语义的通用调度引擎(HWSE)的硬件实现。HWSE被设计成与多核集群中的pe紧密耦合的块,通过共享内存接口进行通信。这允许与控制pe进行非常快速的编程和同步,这是实现快速动态调度的基础,并最终实现细粒度并行性。我们使用周期精确的虚拟平台,通过实际应用和合成基准证明了我们解决方案的有效性。
{"title":"Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters","authors":"P. Burgio, Giuseppe Tagliavini, Francesco Conti, A. Marongiu, L. Benini","doi":"10.7873/DATE.2014.169","DOIUrl":"https://doi.org/10.7873/DATE.2014.169","url":null,"abstract":"Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism in systems which i) are resource-constrained and ii) run applications with small units of work calls for a runtime environment which has minimal overhead for the scheduling of parallel tasks. In this work, we study the major sources of overhead in the implementation of OpenMP dynamic loops, sections and tasks, and propose a hardware implementation of a generic Scheduling Engine (HWSE) which fits the semantics of the three constructs. The HWSE is designed as a tightly-coupled block to the PEs within a multi-core cluster, communicating through a shared-memory interface. This allows very fast programming and synchronization with the controlling PEs, fundamental to achieving fast dynamic scheduling, and ultimately to enable fine-grained parallelism. We prove the effectiveness of our solutions with real applications and synthetic benchmarks, using a cycle-accurate virtual platform.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"55 11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82345148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Signature indexing of design layouts for hotspot detection 用于热点检测的设计布局签名索引
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.371
Cristian Andrades, Michael A. Rodriguez, C. Chiang
This work presents a new signature for 2D spatial configurations that is useful for the optimization of a hotspot detection process. The signature is a string of numbers representing changes along the horizontal and vertical slices of a configuration, which serves as the key of an inverted index that groups layout' windows with the same signature. The method extracts signatures from a compact specification of similar exact patterns with a fixed size. Then, these signatures are used as search keys of the inverted index to retrieve candidate windows that can match the patterns. Experimental results show that this simple type of signature has 100% recall and, in average, over 85% of precision in terms of the area effectively covered by the pattern and the retrieved area of the layout. In addition, the signature shows a good discriminate quality, since around 99% of the extracted signatures match each of them with a single pattern.
这项工作提出了一个新的二维空间配置签名,这对热点检测过程的优化是有用的。签名是一串数字,表示配置的水平和垂直切片的变化,它作为倒排索引的键,该索引将具有相同签名的布局窗口分组。该方法从具有固定大小的相似精确模式的紧凑规范中提取签名。然后,将这些签名用作倒排索引的搜索键,以检索与模式匹配的候选窗口。实验结果表明,这种简单类型的签名具有100%的召回率,在图案有效覆盖的面积和布局的检索面积方面,平均准确率超过85%。此外,该签名显示出良好的区分质量,因为大约99%的提取签名与每个签名都匹配一个单一的模式。
{"title":"Signature indexing of design layouts for hotspot detection","authors":"Cristian Andrades, Michael A. Rodriguez, C. Chiang","doi":"10.7873/DATE.2014.371","DOIUrl":"https://doi.org/10.7873/DATE.2014.371","url":null,"abstract":"This work presents a new signature for 2D spatial configurations that is useful for the optimization of a hotspot detection process. The signature is a string of numbers representing changes along the horizontal and vertical slices of a configuration, which serves as the key of an inverted index that groups layout' windows with the same signature. The method extracts signatures from a compact specification of similar exact patterns with a fixed size. Then, these signatures are used as search keys of the inverted index to retrieve candidate windows that can match the patterns. Experimental results show that this simple type of signature has 100% recall and, in average, over 85% of precision in terms of the area effectively covered by the pattern and the retrieved area of the layout. In addition, the signature shows a good discriminate quality, since around 99% of the extracted signatures match each of them with a single pattern.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"20 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88944037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimization of standard cell based detailed placement for 16 nm FinFET process 基于标准电池的16nm FinFET工艺细节布局优化
Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.370
Yuelin Du, Martin D. F. Wong
FinFET transistors have great advantages over traditional planar MOSFET transistors in high performance and low power applications. Major foundries are adopting the Fin-FET technology for CMOS semiconductor device fabrication in the 16 nm technology node and beyond. Edge device degradation is among the major challenges for the FinFET process. To avoid such degradation, dummy gates are needed on device edges, and the dummy gates have to be tied to power rails in order not to introduce unconnected parasitic transistors. This requires that each dummy gate must abut at least one source node after standard cell placement. If the drain nodes at two adjacent cell boundaries abut each other, additional source nodes must be inserted in between for dummy gate power tying, which costs more placement area. Usually there is some flexibility during detailed placement to horizontally flip the cells or switch the positions of adjacent cells, which has little impact on the global placement objectives, such as timing conditions and net congestion. This paper proposes a detailed placement optimization strategy for the standard cell based designs. By flipping a subset of cells in a standard cell row and switching pairs of adjacent cells, the number of drain to drain abutments between adjacent cell boundaries can be optimally minimized, which saves additional source node insertion and reduces the length of the standard cell row. In addition, the proposed graph model can be easily modified to consider more complicated design rules. The experimental results show that the optimization of 100k cells is completed within 0.1 second, verifying the efficiency of the proposed algorithm.
FinFET晶体管在高性能和低功耗应用方面比传统的平面MOSFET晶体管有很大的优势。主要的晶圆代工厂正在采用Fin-FET技术制造16纳米及以上的CMOS半导体器件。边缘器件退化是FinFET工艺面临的主要挑战之一。为了避免这种退化,在器件边缘需要假门,并且假门必须绑在电源轨上,以避免引入未连接的寄生晶体管。这要求在标准单元放置后,每个虚拟门必须至少有一个源节点。如果漏极节点位于相邻的两个单元边界上,则必须在两者之间插入额外的源极节点以进行虚拟栅极功率连接,这将占用更多的放置面积。通常在详细放置过程中有一定的灵活性,可以水平翻转单元或切换相邻单元的位置,这对全局放置目标(如定时条件和网络拥塞)的影响很小。本文提出了一种基于标准单元设计的布局优化策略。通过翻转标准单元行中的单元子集并切换相邻单元对,可以最大限度地减少相邻单元边界之间的排水基台数量,从而节省了额外的源节点插入并减少了标准单元行的长度。此外,所提出的图模型可以很容易地修改,以考虑更复杂的设计规则。实验结果表明,在0.1秒内完成了100k cell的优化,验证了所提算法的有效性。
{"title":"Optimization of standard cell based detailed placement for 16 nm FinFET process","authors":"Yuelin Du, Martin D. F. Wong","doi":"10.7873/DATE2014.370","DOIUrl":"https://doi.org/10.7873/DATE2014.370","url":null,"abstract":"FinFET transistors have great advantages over traditional planar MOSFET transistors in high performance and low power applications. Major foundries are adopting the Fin-FET technology for CMOS semiconductor device fabrication in the 16 nm technology node and beyond. Edge device degradation is among the major challenges for the FinFET process. To avoid such degradation, dummy gates are needed on device edges, and the dummy gates have to be tied to power rails in order not to introduce unconnected parasitic transistors. This requires that each dummy gate must abut at least one source node after standard cell placement. If the drain nodes at two adjacent cell boundaries abut each other, additional source nodes must be inserted in between for dummy gate power tying, which costs more placement area. Usually there is some flexibility during detailed placement to horizontally flip the cells or switch the positions of adjacent cells, which has little impact on the global placement objectives, such as timing conditions and net congestion. This paper proposes a detailed placement optimization strategy for the standard cell based designs. By flipping a subset of cells in a standard cell row and switching pairs of adjacent cells, the number of drain to drain abutments between adjacent cell boundaries can be optimally minimized, which saves additional source node insertion and reduces the length of the standard cell row. In addition, the proposed graph model can be easily modified to consider more complicated design rules. The experimental results show that the optimization of 100k cells is completed within 0.1 second, verifying the efficiency of the proposed algorithm.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"136 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88949095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
EVX: Vector execution on low power EDGE cores EVX:低功耗EDGE内核上的矢量执行
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.035
M. Duric, Oscar Palomar, Aaron Smith, O. Unsal, A. Cristal, M. Valero, D. Burger
In this paper, we present a vector execution model that provides the advantages of vector processors on low power, general purpose cores, with limited additional hardware. While accelerating data-level parallel (DLP) workloads, the vector model increases the efficiency and hardware resources utilization. We use a modest dual issue core based on an Explicit Data Graph Execution (EDGE) architecture to implement our approach, called EVX. Unlike most DLP accelerators which utilize additional hardware and increase the complexity of low power processors, EVX leverages the available resources of EDGE cores, and with minimal costs allows for specialization of the resources. EVX adds a control logic that increases the core area by 2.1%. We show that EVX yields an average speedup of 3x compared to a scalar baseline and outperforms multimedia SIMD extensions.
在本文中,我们提出了一个矢量执行模型,该模型提供了矢量处理器在低功耗,通用内核上的优势,并且具有有限的额外硬件。在加速数据级并行(DLP)工作负载的同时,矢量模型提高了效率和硬件资源利用率。我们使用基于显式数据图执行(EDGE)架构的适度双问题核心来实现我们的方法,称为EVX。与大多数使用额外硬件并增加低功耗处理器复杂性的DLP加速器不同,EVX利用EDGE内核的可用资源,并且以最小的成本允许资源专业化。EVX增加了一个控制逻辑,使核心面积增加了2.1%。我们表明,与标量基线相比,EVX的平均加速速度提高了3倍,并且优于多媒体SIMD扩展。
{"title":"EVX: Vector execution on low power EDGE cores","authors":"M. Duric, Oscar Palomar, Aaron Smith, O. Unsal, A. Cristal, M. Valero, D. Burger","doi":"10.7873/DATE.2014.035","DOIUrl":"https://doi.org/10.7873/DATE.2014.035","url":null,"abstract":"In this paper, we present a vector execution model that provides the advantages of vector processors on low power, general purpose cores, with limited additional hardware. While accelerating data-level parallel (DLP) workloads, the vector model increases the efficiency and hardware resources utilization. We use a modest dual issue core based on an Explicit Data Graph Execution (EDGE) architecture to implement our approach, called EVX. Unlike most DLP accelerators which utilize additional hardware and increase the complexity of low power processors, EVX leverages the available resources of EDGE cores, and with minimal costs allows for specialization of the resources. EVX adds a control logic that increases the core area by 2.1%. We show that EVX yields an average speedup of 3x compared to a scalar baseline and outperforms multimedia SIMD extensions.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"64 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88926791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
DeSpErate: Speeding-up design space exploration by using predictive simulation scheduling 绝望:利用预测仿真调度加速设计空间探索
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.231
Giovanni Mariani, G. Palermo, V. Zaccaria, C. Silvano
The design space exploration (DSE) phase is used to tune configurable system parameters and it generally consists of a multiobjective optimization (MOO) problem. It is usually done at pre-design phase and consists of the evaluation of large design spaces where each configuration requires long simulation. Several heuristic techniques have been proposed in the past and the recent trend is reducing the exploration time by using analytic prediction models to approximate the system metrics, effectively pruning sub-optimal configurations from the exploration scope. However, there is still a missing path towards the effective usage of the underlying computing resources used by the DSE process. In this work, we will show that an alternative and almost orthogonal approach - focused on exploiting the available parallelism in terms of computing resources - can be used to better schedule the simulations and to obtain a high speedup with respect to state of the art approaches, without compromising the accuracy of exploration results. Experimental results will be presented by dealing with the DSE problem of a shared memory multi-core system considering a variable number of available parallel resources to support the DSE phase1.
设计空间探索(DSE)阶段用于调整可配置的系统参数,通常由多目标优化(MOO)问题组成。它通常在预设计阶段完成,包括对大型设计空间的评估,其中每个配置都需要长时间的模拟。过去已经提出了几种启发式技术,最近的趋势是通过使用分析预测模型来近似系统指标来减少勘探时间,有效地从勘探范围中剔除次优配置。然而,对于DSE进程所使用的底层计算资源的有效利用,仍然缺少一条路径。在这项工作中,我们将展示一种替代的几乎正交的方法——专注于利用计算资源方面的可用并行性——可以用来更好地调度模拟,并在不影响勘探结果准确性的情况下获得相对于最先进方法的高加速。实验结果将通过考虑可变数量的可用并行资源来支持DSE阶段来处理共享内存多核系统的DSE问题。
{"title":"DeSpErate: Speeding-up design space exploration by using predictive simulation scheduling","authors":"Giovanni Mariani, G. Palermo, V. Zaccaria, C. Silvano","doi":"10.7873/DATE.2014.231","DOIUrl":"https://doi.org/10.7873/DATE.2014.231","url":null,"abstract":"The design space exploration (DSE) phase is used to tune configurable system parameters and it generally consists of a multiobjective optimization (MOO) problem. It is usually done at pre-design phase and consists of the evaluation of large design spaces where each configuration requires long simulation. Several heuristic techniques have been proposed in the past and the recent trend is reducing the exploration time by using analytic prediction models to approximate the system metrics, effectively pruning sub-optimal configurations from the exploration scope. However, there is still a missing path towards the effective usage of the underlying computing resources used by the DSE process. In this work, we will show that an alternative and almost orthogonal approach - focused on exploiting the available parallelism in terms of computing resources - can be used to better schedule the simulations and to obtain a high speedup with respect to state of the art approaches, without compromising the accuracy of exploration results. Experimental results will be presented by dealing with the DSE problem of a shared memory multi-core system considering a variable number of available parallel resources to support the DSE phase1.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"16 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89517126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1