Our work addresses protection of hardware IP at the mask level with the goal of preventing unauthorized manufacturing. The proposed protocol based on chip locking and activation is applicable to a broad category of electronic systems with a primary bus. Such designs include (1) numerous IP offerings for USB, PCI, PCI-E, AMBA and other bus standards typically used in system-on-a-chip designs and computer peripherals, (2) SRAM-based FPGAs that are programmed through an input bus, (3) general-purpose and embedded microprocessors, including soft cores, (4) DSPs, (5) network processors, and (6) game consoles. Our key insight is that such designs can be locked by scrambling the central bus by controlled reversible bit-permutations and substitutions. To securely establish a unique code per chip to control bus scrambling, we employ true random number generators and Dime-Hellman cryptography during activation.
{"title":"Protecting bus-based hardware IP by secret sharing","authors":"Jarrod A. Roy, F. Koushanfar, I. Markov","doi":"10.1145/1391469.1391684","DOIUrl":"https://doi.org/10.1145/1391469.1391684","url":null,"abstract":"Our work addresses protection of hardware IP at the mask level with the goal of preventing unauthorized manufacturing. The proposed protocol based on chip locking and activation is applicable to a broad category of electronic systems with a primary bus. Such designs include (1) numerous IP offerings for USB, PCI, PCI-E, AMBA and other bus standards typically used in system-on-a-chip designs and computer peripherals, (2) SRAM-based FPGAs that are programmed through an input bus, (3) general-purpose and embedded microprocessors, including soft cores, (4) DSPs, (5) network processors, and (6) game consoles. Our key insight is that such designs can be locked by scrambling the central bus by controlled reversible bit-permutations and substitutions. To securely establish a unique code per chip to control bus scrambling, we employ true random number generators and Dime-Hellman cryptography during activation.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117171429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Technological advances enable modern processors to utilize increasingly larger DRAMs with rising access frequencies. This is leading to high power consumption and operating temperature in DRAM chips. As a result, temperature management has become a real and pressing issue in high performance DRAM systems. Traditional low power techniques are not suitable for high performance DRAM systems with high bandwidth. In this paper, we propose and evaluate a customized DRAM low power technique based on page hit aware write buffer (PHA-WB). Our proposed approach reduces DRAM system power consumption and temperature without any performance penalty. Our experiments show that a system with a 64-entry PHA-WB could reduce the total DRAM power consumption by up to 22.0% (9.6% on average). The peak and average temperature reductions are 6.1degC and 2.1degC, respectively.
{"title":"A power and temperature aware DRAM architecture","authors":"Song Liu, S. Memik, Yu Zhang, G. Memik","doi":"10.1145/1391469.1391691","DOIUrl":"https://doi.org/10.1145/1391469.1391691","url":null,"abstract":"Technological advances enable modern processors to utilize increasingly larger DRAMs with rising access frequencies. This is leading to high power consumption and operating temperature in DRAM chips. As a result, temperature management has become a real and pressing issue in high performance DRAM systems. Traditional low power techniques are not suitable for high performance DRAM systems with high bandwidth. In this paper, we propose and evaluate a customized DRAM low power technique based on page hit aware write buffer (PHA-WB). Our proposed approach reduces DRAM system power consumption and temperature without any performance penalty. Our experiments show that a system with a 64-entry PHA-WB could reduce the total DRAM power consumption by up to 22.0% (9.6% on average). The peak and average temperature reductions are 6.1degC and 2.1degC, respectively.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127278711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intrinsic and parasitic capacitances play an important role in determining the high-frequency RF performance of devices. Recently, a new type of carbon nanotube field effect transistor (CNFET) based on tunneling principle has been proposed, which shows impressive device properties and overcomes some of the limitations of previously proposed CNFET devices. Although carbon nanotube based devices have been optimized for DC performance so far, little has been done to optimize them for high-frequency operation. In this paper, we present, detailed modeling and analysis of device geometry based intrinsic and parasitic capacitances of tunneling carbon nanotube field effect transistors (T-CNFETs) with both single nanotube as well as nanotube-array based channel. Based on the model, we analyze scaling of parasitic capacitances with device geometry for two different scaling scenarios of T-CNFETs. We show that in order to reduce the impact of parasitic capacitance, nanotube density has to be optimized. Furthermore, for the first time, we analyze various factors affecting the high-frequency/RF performance of back gated T-CNFETs and study the impact of parasitic and screening effects on the high-frequency/RF performance of these devices.
{"title":"Analysis and implications of parasitic and screening effects on the high-frequency/RF performance of tunneling-carbon nanotube FETs","authors":"C. Kshirsagar, Mohamed N. El-Zeftawi, K. Banerjee","doi":"10.1145/1391469.1391533","DOIUrl":"https://doi.org/10.1145/1391469.1391533","url":null,"abstract":"Intrinsic and parasitic capacitances play an important role in determining the high-frequency RF performance of devices. Recently, a new type of carbon nanotube field effect transistor (CNFET) based on tunneling principle has been proposed, which shows impressive device properties and overcomes some of the limitations of previously proposed CNFET devices. Although carbon nanotube based devices have been optimized for DC performance so far, little has been done to optimize them for high-frequency operation. In this paper, we present, detailed modeling and analysis of device geometry based intrinsic and parasitic capacitances of tunneling carbon nanotube field effect transistors (T-CNFETs) with both single nanotube as well as nanotube-array based channel. Based on the model, we analyze scaling of parasitic capacitances with device geometry for two different scaling scenarios of T-CNFETs. We show that in order to reduce the impact of parasitic capacitance, nanotube density has to be optimized. Furthermore, for the first time, we analyze various factors affecting the high-frequency/RF performance of back gated T-CNFETs and study the impact of parasitic and screening effects on the high-frequency/RF performance of these devices.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115041523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Haubelt, T. Schlichter, J. Keinert, M. Meredith
SystemCoDesigner is an ESL tool developed at the University of Erlangen-Nuremberg, Germany. SystemCoDesigner offers a fast design space exploration and rapid prototyping of behavioral SystemC models. Together with Forte Design Systems, a fully automated approach was developed by integrating behavioral synthesis into the design flow. Starting from a behavioral SystemC model, hardware accelerators can be generated automatically using Forte Cynthesizer and can be added to the design space. The resulting design space is explored automatically by optimizing several objectives simultaneously using state of the art multi-objective optimization algorithms. As a result, SystemCoDesigner presents optimized hardware/software solutions to the designer who can select any of them for rapid prototyping on an FPGA basis. Thus, SystemCoDesigner bridges the gap from ESL to RTL and increases the confidence in early design decisions.
{"title":"SystemCoDesigner: Automatic design space exploration and rapid prototyping from behavioral models","authors":"C. Haubelt, T. Schlichter, J. Keinert, M. Meredith","doi":"10.1145/1391469.1391616","DOIUrl":"https://doi.org/10.1145/1391469.1391616","url":null,"abstract":"SystemCoDesigner is an ESL tool developed at the University of Erlangen-Nuremberg, Germany. SystemCoDesigner offers a fast design space exploration and rapid prototyping of behavioral SystemC models. Together with Forte Design Systems, a fully automated approach was developed by integrating behavioral synthesis into the design flow. Starting from a behavioral SystemC model, hardware accelerators can be generated automatically using Forte Cynthesizer and can be added to the design space. The resulting design space is explored automatically by optimizing several objectives simultaneously using state of the art multi-objective optimization algorithms. As a result, SystemCoDesigner presents optimized hardware/software solutions to the designer who can select any of them for rapid prototyping on an FPGA basis. Thus, SystemCoDesigner bridges the gap from ESL to RTL and increases the confidence in early design decisions.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115511760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimedia-dominated consumer electronics devices (such as cellular phone, digital camera, etc.) operate under soft real-time constraints. Overly pessimistic worst-case execution time analysis techniques borrowed from hard real-time systems domain are not particularly suitable in this context. Instead, the execution time distribution of a task provides a more valuable input to the system-level performance analysis frameworks. Both program inputs and underlying architecture contribute to the execution time variation of a task. But existing probabilistic execution time analysis approaches mostly ignore architectural modeling. In this paper, we take the first step towards remedying this situation through instruction cache modeling. We introduce the notion of probabilistic cache states to model the evolution of cache content during program execution over multiple inputs. In particular, we estimate the mean and variance of execution time of a program across inputs in the presence of instruction cache. The experimental evaluation confirms the scalability and accuracy of our probabilistic cache modeling approach.
{"title":"Cache modeling in probabilistic execution time analysis","authors":"Yun Liang, T. Mitra","doi":"10.1145/1391469.1391551","DOIUrl":"https://doi.org/10.1145/1391469.1391551","url":null,"abstract":"Multimedia-dominated consumer electronics devices (such as cellular phone, digital camera, etc.) operate under soft real-time constraints. Overly pessimistic worst-case execution time analysis techniques borrowed from hard real-time systems domain are not particularly suitable in this context. Instead, the execution time distribution of a task provides a more valuable input to the system-level performance analysis frameworks. Both program inputs and underlying architecture contribute to the execution time variation of a task. But existing probabilistic execution time analysis approaches mostly ignore architectural modeling. In this paper, we take the first step towards remedying this situation through instruction cache modeling. We introduce the notion of probabilistic cache states to model the evolution of cache content during program execution over multiple inputs. In particular, we estimate the mean and variance of execution time of a program across inputs in the presence of instruction cache. The experimental evaluation confirms the scalability and accuracy of our probabilistic cache modeling approach.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115563523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lukasiewycz, M. Glaß, C. Haubelt, J. Teich, Richard Regler, Bardo Lang
In this paper, a novel automatic approach for the concurrent topology and routing optimization that achieves a high quality network layout is proposed. This optimization is based on a specialized binary Integer Linear Program (ILP) in combination with a Multi-Objective Evolutionary Algorithm (MOEA). The ILP is formulated such that each solution represents a topology and routing that fulfills all requirements and demands of the network. Thus, in an iterative process, this ILP is solved to obtain feasible networks whereas the MOEA is used for the optimization of multiple even non-linear objectives and ensures a fast convergence towards the optimal solutions. Additionally, a domain specific preprocessing algorithm for the ILP is presented that decreases the problem complexity and, thus, allows to optimize large and complex networks efficiently. The experimental results validate the performance of this methodology on two state-of-the-art prototype automotive networks.
{"title":"Concurrent topology and routing optimization in automotive network integration","authors":"M. Lukasiewycz, M. Glaß, C. Haubelt, J. Teich, Richard Regler, Bardo Lang","doi":"10.1145/1391469.1391629","DOIUrl":"https://doi.org/10.1145/1391469.1391629","url":null,"abstract":"In this paper, a novel automatic approach for the concurrent topology and routing optimization that achieves a high quality network layout is proposed. This optimization is based on a specialized binary Integer Linear Program (ILP) in combination with a Multi-Objective Evolutionary Algorithm (MOEA). The ILP is formulated such that each solution represents a topology and routing that fulfills all requirements and demands of the network. Thus, in an iterative process, this ILP is solved to obtain feasible networks whereas the MOEA is used for the optimization of multiple even non-linear objectives and ensures a fast convergence towards the optimal solutions. Additionally, a domain specific preprocessing algorithm for the ILP is presented that decreases the problem complexity and, thus, allows to optimize large and complex networks efficiently. The experimental results validate the performance of this methodology on two state-of-the-art prototype automotive networks.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114262760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Abu-Rahma, K. Chowdhury, Joseph Wang, Zhiqin Chen, S. Yoon, M. Anis
The increase of process variations in advanced CMOS technologies is considered one of the biggest challenges for SRAM designers. This is aggravated by the strong demand for lower cost and power consumption, higher performance and density which complicates SRAM design process. In this paper, we present a methodology for statistical simulation of SRAM read access yield, which is tightly related to SRAM performance and power consumption. The proposed flow enables early SRAM yield predication and performance/power optimization in the design time, which is important for SRAM in nanometer technologies. The methodology is verified using measured silicon yield data from a 1 Mb memory fabricated in an industrial 45 nm technology.
{"title":"A methodology for statistical estimation of read access yield in SRAMs","authors":"M. Abu-Rahma, K. Chowdhury, Joseph Wang, Zhiqin Chen, S. Yoon, M. Anis","doi":"10.1145/1391469.1391522","DOIUrl":"https://doi.org/10.1145/1391469.1391522","url":null,"abstract":"The increase of process variations in advanced CMOS technologies is considered one of the biggest challenges for SRAM designers. This is aggravated by the strong demand for lower cost and power consumption, higher performance and density which complicates SRAM design process. In this paper, we present a methodology for statistical simulation of SRAM read access yield, which is tightly related to SRAM performance and power consumption. The proposed flow enables early SRAM yield predication and performance/power optimization in the design time, which is important for SRAM in nanometer technologies. The methodology is verified using measured silicon yield data from a 1 Mb memory fabricated in an industrial 45 nm technology.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114587853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FPGA application developers often use pipelining, C-slowing and retiming to improve the performance of their designs. Unfortunately, registered netlists present a fundamentally different problem to CAD tools, potentially limiting the benefit of these techniques. In this paper we discuss some of the inherent issues pipelined netlists pose to existing timing-driven placement approaches. We then present two algorithmic modifications that reduce post-routing critical path delay by an average of 40%.
{"title":"Enhancing timing-driven FPGA placement for pipelined netlists","authors":"Ken Eguro, S. Hauck","doi":"10.1145/1391469.1391480","DOIUrl":"https://doi.org/10.1145/1391469.1391480","url":null,"abstract":"FPGA application developers often use pipelining, C-slowing and retiming to improve the performance of their designs. Unfortunately, registered netlists present a fundamentally different problem to CAD tools, potentially limiting the benefit of these techniques. In this paper we discuss some of the inherent issues pipelined netlists pose to existing timing-driven placement approaches. We then present two algorithmic modifications that reduce post-routing critical path delay by an average of 40%.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122118288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional software-based diagnosis of failing chips typically identifies several lines where the failure is believed to reside. However, these lines can span across multiple layers and can be very long in length. This makes physical failure analysis difficult. hi contrast, there are emerging diagnosis techniques that identify both the faulty lines as well as the neighboring conditions for which an affected line becomes faulty, hi this paper, an approach is presented to improve failure localization by automatically analyzing the information associated with the outcome of diagnosis. Experimental results show a significant improvement in failure localization when this method is applied to 106 real IC failures.
{"title":"Precise failure localization using automated layout analysis of diagnosis candidates","authors":"W. Tam, O. Poku, R. D. Blanton","doi":"10.1145/1391469.1391568","DOIUrl":"https://doi.org/10.1145/1391469.1391568","url":null,"abstract":"Traditional software-based diagnosis of failing chips typically identifies several lines where the failure is believed to reside. However, these lines can span across multiple layers and can be very long in length. This makes physical failure analysis difficult. hi contrast, there are emerging diagnosis techniques that identify both the faulty lines as well as the neighboring conditions for which an affected line becomes faulty, hi this paper, an approach is presented to improve failure localization by automatically analyzing the information associated with the outcome of diagnosis. Experimental results show a significant improvement in failure localization when this method is applied to 106 real IC failures.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117281127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider synthesis of arithmetic DSP circuits with finite precision fixed-point operations. The aim is to choose the lowest cost implementation that matches a real-valued specification within the allowed imprecision. Starting from Taylor series or real-valued polynomials, we demonstrate first a method to obtain satisfying implementations that uses intermediate arithmetic transform polynomials as an analytical apparatus suitable to precision analysis for both the quantization (bit-width) and approximation sources of imprecision. We then derive the precision optimization algorithm that explores multiple precision parameters in a branch-and-bound search.
{"title":"Optimizing imprecise fixed-point arithmetic circuits specified by Taylor Series through Arithmetic Transform","authors":"Yu Pang, K. Radecka","doi":"10.1145/1391469.1391574","DOIUrl":"https://doi.org/10.1145/1391469.1391574","url":null,"abstract":"We consider synthesis of arithmetic DSP circuits with finite precision fixed-point operations. The aim is to choose the lowest cost implementation that matches a real-valued specification within the allowed imprecision. Starting from Taylor series or real-valued polynomials, we demonstrate first a method to obtain satisfying implementations that uses intermediate arithmetic transform polynomials as an analytical apparatus suitable to precision analysis for both the quantization (bit-width) and approximation sources of imprecision. We then derive the precision optimization algorithm that explores multiple precision parameters in a branch-and-bound search.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"28 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129950403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}