Y. Akgul, D. Puschini, S. Lesecq, E. Beigné, I. Panades, P. Benoit, L. Torres
The emerging SOI technologies provide an increased body bias range compared to traditional bulk technologies, opening new opportunities. From the power management perspective, a new degree of freedom is added to the supply voltage and clock frequency variation, increasing the complexity of the power optimization problem. In this paper, a method is proposed to manage the power consumed in an FD-SOI circuit through supply and body bias voltages, and clock frequency variation. Results for a Digital Signal Processor in STMicroelectronics 28nm FD-SOI technology show that the power reduction ratio can reach 17%.
{"title":"Power management through DVFS and dynamic body biasing in FD-SOI circuits","authors":"Y. Akgul, D. Puschini, S. Lesecq, E. Beigné, I. Panades, P. Benoit, L. Torres","doi":"10.1145/2593069.2593185","DOIUrl":"https://doi.org/10.1145/2593069.2593185","url":null,"abstract":"The emerging SOI technologies provide an increased body bias range compared to traditional bulk technologies, opening new opportunities. From the power management perspective, a new degree of freedom is added to the supply voltage and clock frequency variation, increasing the complexity of the power optimization problem. In this paper, a method is proposed to manage the power consumed in an FD-SOI circuit through supply and body bias voltages, and clock frequency variation. Results for a Digital Signal Processor in STMicroelectronics 28nm FD-SOI technology show that the power reduction ratio can reach 17%.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123811457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FPGA devices provide a range of security features which can provide powerful security capabilities. This paper describes many security features included in present-day FPGAs including bitstream authenticated encryption, configuration scrubbing, voltage and temperature sensors and JTAG-intercept. The paper explains the role of these features in providing security capabilities such as privacy, anti-tamper and protection of data handled by the FPGA. The paper concludes with an example of a single-chip cryptographic system, a trusted system built with these components.
{"title":"FPGA security: From features to capabilities to trusted systems","authors":"S. Trimberger, J. Moore","doi":"10.1145/2593069.2602555","DOIUrl":"https://doi.org/10.1145/2593069.2602555","url":null,"abstract":"FPGA devices provide a range of security features which can provide powerful security capabilities. This paper describes many security features included in present-day FPGAs including bitstream authenticated encryption, configuration scrubbing, voltage and temperature sensors and JTAG-intercept. The paper explains the role of these features in providing security capabilities such as privacy, anti-tamper and protection of data handled by the FPGA. The paper concludes with an example of a single-chip cryptographic system, a trusted system built with these components.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124083965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Floating Point Units (FPUs) pose a singular challenge for traditional verification methods, such as coverage driven simulation, given the large and complex data paths and intricate control structures which renders those methods incomplete and error prone. Formal verification (FV) has been successfully leveraged to achieve the high level of quality desired of these critical logics. Typically, FV-based approaches to verify FPUs rely on introducing higher level abstractions to allow reasoning. This however has to be done manually, and quickly becomes tedious for optimized bit level implementations on board high performance microprocessors. Automated formal methods working directly on the bit level and providing a full end-to-end check exist but are limited to single instructions (issued in an empty pipeline), hence lack in checking control aspects related to inter-instruction interactions, or pipeline control. In this paper we present an approach based on equivalence checking to overcome the single instruction limitation for automated bit level proofs in the formal verification of FPUs. The sequential execution of instructions is modeled by two instances of the design-under-test. One of the instances acts as a reference model for the other. This allows for large numbers of internal equivalences to be leveraged by equivalence checking techniques. We show that this method is capable of proving instruction sequences for industrial FPU designs. Together with a proof of correctness of individual instructions it guarantees correctness of the FPU design as a whole. In our experience this is a one of a kind approach to perform automated end-to-end verification of FPUs.
{"title":"Automatic verification of Floating Point Units","authors":"Udo Krautz, Viresh Paruthi, Anand Arunagiri, Sujeet Kumar, Shweta Pujar, Tina Babinsky","doi":"10.1145/2593069.2593096","DOIUrl":"https://doi.org/10.1145/2593069.2593096","url":null,"abstract":"Floating Point Units (FPUs) pose a singular challenge for traditional verification methods, such as coverage driven simulation, given the large and complex data paths and intricate control structures which renders those methods incomplete and error prone. Formal verification (FV) has been successfully leveraged to achieve the high level of quality desired of these critical logics. Typically, FV-based approaches to verify FPUs rely on introducing higher level abstractions to allow reasoning. This however has to be done manually, and quickly becomes tedious for optimized bit level implementations on board high performance microprocessors. Automated formal methods working directly on the bit level and providing a full end-to-end check exist but are limited to single instructions (issued in an empty pipeline), hence lack in checking control aspects related to inter-instruction interactions, or pipeline control. In this paper we present an approach based on equivalence checking to overcome the single instruction limitation for automated bit level proofs in the formal verification of FPUs. The sequential execution of instructions is modeled by two instances of the design-under-test. One of the instances acts as a reference model for the other. This allows for large numbers of internal equivalences to be leveraged by equivalence checking techniques. We show that this method is capable of proving instruction sequences for industrial FPU designs. Together with a proof of correctness of individual instructions it guarantees correctness of the FPU design as a whole. In our experience this is a one of a kind approach to perform automated end-to-end verification of FPUs.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128563182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arunprasath Shankar, B. Singh, F. Wolff, C. Papachristou
The integration of reusable IP blocks/cores is a common process in system-on-chip design and involves manually comparing/mapping IP specifications against system requirements. The informal nature of specification limits its automatic analysis. Existing techniques fail to utilize the underlying conceptual information embedded in specifications. In this paper, we present a methodology for specification analysis, which involves concept mining of specifications to generate domain ontologies. We employ a semi-supervised expert system with semantic analysis capability to create a collaborative framework for cumulative knowledge acquisition. Our system then uses the generated ontologies to perform component retrieval, drop-in-replacement analysis and design vs. test-plan comparisons. We demonstrate our approach by evaluating several IP specifications.
{"title":"Ontology-guided conceptual analysis of design specifications","authors":"Arunprasath Shankar, B. Singh, F. Wolff, C. Papachristou","doi":"10.1145/2593069.2593175","DOIUrl":"https://doi.org/10.1145/2593069.2593175","url":null,"abstract":"The integration of reusable IP blocks/cores is a common process in system-on-chip design and involves manually comparing/mapping IP specifications against system requirements. The informal nature of specification limits its automatic analysis. Existing techniques fail to utilize the underlying conceptual information embedded in specifications. In this paper, we present a methodology for specification analysis, which involves concept mining of specifications to generate domain ontologies. We employ a semi-supervised expert system with semantic analysis capability to create a collaborative framework for cumulative knowledge acquisition. Our system then uses the generated ontologies to perform component retrieval, drop-in-replacement analysis and design vs. test-plan comparisons. We demonstrate our approach by evaluating several IP specifications.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129839704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massive Open Online Courses (MOOCs) can deliver advanced course material at planetary scale, combining internet-based video content delivery, and cloud-based assignments. From March to May 2013, I taught the world's first EDA MOOC, entitled VLSI CAD: Logic to Layout, based on roughly 20 years of experience teaching electronic design automation in a conventional face-to-face classroom setting. Over 17,000 participants registered for this MOOC. This paper summarizes my experience with teaching EDA at planetary scale: how we covered ASIC synthesis, verification, layout, and timing; how we built cloud resources to enable students to experiment with open-source tools; how we designed software projects and deployed cloud-based auto-graders to support realistic EDA tool projects. The paper also discusses what MOOCs could mean to the dynamism of the EDA community.
{"title":"The first EDA MOOC: Teaching design automation to planet earth","authors":"Rob A. Rutenbar","doi":"10.1145/2593069.2593230","DOIUrl":"https://doi.org/10.1145/2593069.2593230","url":null,"abstract":"Massive Open Online Courses (MOOCs) can deliver advanced course material at planetary scale, combining internet-based video content delivery, and cloud-based assignments. From March to May 2013, I taught the world's first EDA MOOC, entitled VLSI CAD: Logic to Layout, based on roughly 20 years of experience teaching electronic design automation in a conventional face-to-face classroom setting. Over 17,000 participants registered for this MOOC. This paper summarizes my experience with teaching EDA at planetary scale: how we covered ASIC synthesis, verification, layout, and timing; how we built cloud resources to enable students to experiment with open-source tools; how we designed software projects and deployed cloud-based auto-graders to support realistic EDA tool projects. The paper also discusses what MOOCs could mean to the dynamism of the EDA community.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130699243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have implemented an FPGA routing algorithm on a shared memory multi-processor using the Galois API, which offers speculative parallelism in software. The router is a parallel implementation of PathFinder, which is the basis for most commercial FPGA routers. We parallelize the maze expansion step for each net, while routing nets sequentially to limit the amount of rollback that would likely occur due to misspeculation. Our implementation relies on non-blocking priority queues, which use software transactional memory (SMT), to identify the best route for each net. Our experimental results demonstrate scalability for large benchmarks and that the amount of available parallelism depends primarily on the circuit size, not the inter-dependence of signals. We achieve an average speedup of approximately 3x compared to the most recently published work on parallel multi-threaded FPGA routing, and up to 6x in comparison to the single-threaded router implemented in the publicly available Versatile Place and Route (VPR) framework.
{"title":"Parallel FPGA routing based on the operator formulation","authors":"Yehdhih Ould Mohammed Moctar, P. Brisk","doi":"10.1145/2593069.2593177","DOIUrl":"https://doi.org/10.1145/2593069.2593177","url":null,"abstract":"We have implemented an FPGA routing algorithm on a shared memory multi-processor using the Galois API, which offers speculative parallelism in software. The router is a parallel implementation of PathFinder, which is the basis for most commercial FPGA routers. We parallelize the maze expansion step for each net, while routing nets sequentially to limit the amount of rollback that would likely occur due to misspeculation. Our implementation relies on non-blocking priority queues, which use software transactional memory (SMT), to identify the best route for each net. Our experimental results demonstrate scalability for large benchmarks and that the amount of available parallelism depends primarily on the circuit size, not the inter-dependence of signals. We achieve an average speedup of approximately 3x compared to the most recently published work on parallel multi-threaded FPGA routing, and up to 6x in comparison to the single-threaded router implemented in the publicly available Versatile Place and Route (VPR) framework.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Loop pipelining is a widely-accepted technique in high-level synthesis to enable pipelined execution of successive loop iterations to achieve high performance. Existing loop pipelining methods provide inadequate support for pipeline flushing. In this paper, we study the problem of enabling flushing in pipeline synthesis and examine its implications in scheduling and binding. We propose novel techniques for synthesizing a conflict-aware flushing-enabled pipeline that is robust against potential resource collisions. Experiments with real-life benchmarks show that our methods significantly reduce the possibility of resource collisions compared to conventional approaches while conserving hardware resources and achieving near-optimal performance.
{"title":"Flushing-enabled loop pipelining for high-level synthesis","authors":"Steve Dai, Mingxing Tan, K. Hao, Zhiru Zhang","doi":"10.1145/2593069.2593143","DOIUrl":"https://doi.org/10.1145/2593069.2593143","url":null,"abstract":"Loop pipelining is a widely-accepted technique in high-level synthesis to enable pipelined execution of successive loop iterations to achieve high performance. Existing loop pipelining methods provide inadequate support for pipeline flushing. In this paper, we study the problem of enabling flushing in pipeline synthesis and examine its implications in scheduling and binding. We propose novel techniques for synthesizing a conflict-aware flushing-enabled pipeline that is robust against potential resource collisions. Experiments with real-life benchmarks show that our methods significantly reduce the possibility of resource collisions compared to conventional approaches while conserving hardware resources and achieving near-optimal performance.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129971467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Keszöcze, R. Wille, Tsung-Yi Ho, R. Drechsler
With the advances of the microfluidic technology, the design of digital microfluidic biochips recently received significant attention. But thus far, the corresponding design tasks such as binding, scheduling, placement, and routing have usually been considered separately. Furthermore, often just heuristic results have been obtained. In this work, we present a one-pass synthesis scheme which directly realizes the desired functionality onto the chip and, at the same time, guarantees minimality with respect to area and/or timing. For this purpose, the deductive power of solvers for Boolean satisfiability is exploited. Experiments show how the approach leverages the design of the respective devices.
{"title":"Exact one-pass synthesis of digital microfluidic biochips","authors":"Oliver Keszöcze, R. Wille, Tsung-Yi Ho, R. Drechsler","doi":"10.1145/2593069.2593135","DOIUrl":"https://doi.org/10.1145/2593069.2593135","url":null,"abstract":"With the advances of the microfluidic technology, the design of digital microfluidic biochips recently received significant attention. But thus far, the corresponding design tasks such as binding, scheduling, placement, and routing have usually been considered separately. Furthermore, often just heuristic results have been obtained. In this work, we present a one-pass synthesis scheme which directly realizes the desired functionality onto the chip and, at the same time, guarantees minimality with respect to area and/or timing. For this purpose, the deductive power of solvers for Boolean satisfiability is exploited. Experiments show how the approach leverages the design of the respective devices.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129860394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ujjwal Guin, Xuehui Zhang, Domenic Forte, M. Tehranipoor
The recycling of electronic components has become a major concern for the industry and government as it potentially impacts the security and reliability of a wide variety of electronic systems. The sheer number of component types (analog, digital, mixed-signal) and sizes (large or small) makes it extremely challenging to find a one-size-fits-all solution to detect and prevent recycled ICs. In this paper, we propose a suite of solutions for combating die and IC recycling (CDIR). These solutions include light-weight, on-chip structures based on ring oscillators (RO-CDIR), anti-fuses (AF-CDIR) and fuses (F-CDIR). Each structure meets the unique needs and limitations of different part types and sizes providing excellent coverage of recycled parts. HSPICE simulation results using 90nm technology demonstrate the effectiveness of our proposed negative-bias temperature instability (NBTI)-aware RO-CDIR for detecting ICs used for very short period of time. Recycling of large digital ICs can effectively be detected by using AF-CDIR. Small analog and digital recycled components can be identified by testing our F-CDIR with very low cost measurement devices, e.g., a multimeter.
{"title":"Low-cost on-chip structures for combating die and IC recycling","authors":"Ujjwal Guin, Xuehui Zhang, Domenic Forte, M. Tehranipoor","doi":"10.1145/2593069.2593157","DOIUrl":"https://doi.org/10.1145/2593069.2593157","url":null,"abstract":"The recycling of electronic components has become a major concern for the industry and government as it potentially impacts the security and reliability of a wide variety of electronic systems. The sheer number of component types (analog, digital, mixed-signal) and sizes (large or small) makes it extremely challenging to find a one-size-fits-all solution to detect and prevent recycled ICs. In this paper, we propose a suite of solutions for combating die and IC recycling (CDIR). These solutions include light-weight, on-chip structures based on ring oscillators (RO-CDIR), anti-fuses (AF-CDIR) and fuses (F-CDIR). Each structure meets the unique needs and limitations of different part types and sizes providing excellent coverage of recycled parts. HSPICE simulation results using 90nm technology demonstrate the effectiveness of our proposed negative-bias temperature instability (NBTI)-aware RO-CDIR for detecting ICs used for very short period of time. Recycling of large digital ICs can effectively be detected by using AF-CDIR. Small analog and digital recycled components can be identified by testing our F-CDIR with very low cost measurement devices, e.g., a multimeter.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130922125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scaling to sub-20nm technology nodes changes the nature of reliability effects from abrupt functional problems to progressive degradation of the performance characteristics of devices and system components. Further, application workloads can significantly affect the overall system reliability. In this work, we have analyzed aging effects on various design hierarchies of an embedded commercial processor in 28nm running real-world applications. We have also quantified the dependencies of aging effects on switching-activity and power-state of workloads. Implementation results show that the processor timing degradation can vary from 2% to 11%, depending on the workload. Due to the dependence of aging on the application workloads, margin based design will be highly pessimistic. We propose an efficient and flexible in situ monitoring methodology, SlackProbe, which inserts timing monitors at both path endpoints and path intermediate nets. We show that SlackProbe reduces the numbers of monitors required by over 15X with ~5% additional delay margin in several commercial processor benchmarks. The real-time data from these monitors can be used for hardware and software adaptation to mitigate failures due to aging.
{"title":"Monitoring reliability in embedded processors - A multi-layer view","authors":"V. Chandra","doi":"10.1145/2593069.2596682","DOIUrl":"https://doi.org/10.1145/2593069.2596682","url":null,"abstract":"Scaling to sub-20nm technology nodes changes the nature of reliability effects from abrupt functional problems to progressive degradation of the performance characteristics of devices and system components. Further, application workloads can significantly affect the overall system reliability. In this work, we have analyzed aging effects on various design hierarchies of an embedded commercial processor in 28nm running real-world applications. We have also quantified the dependencies of aging effects on switching-activity and power-state of workloads. Implementation results show that the processor timing degradation can vary from 2% to 11%, depending on the workload. Due to the dependence of aging on the application workloads, margin based design will be highly pessimistic. We propose an efficient and flexible in situ monitoring methodology, SlackProbe, which inserts timing monitors at both path endpoints and path intermediate nets. We show that SlackProbe reduces the numbers of monitors required by over 15X with ~5% additional delay margin in several commercial processor benchmarks. The real-time data from these monitors can be used for hardware and software adaptation to mitigate failures due to aging.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}