Y. Akgul, D. Puschini, S. Lesecq, E. Beigné, I. Panades, P. Benoit, L. Torres
The emerging SOI technologies provide an increased body bias range compared to traditional bulk technologies, opening new opportunities. From the power management perspective, a new degree of freedom is added to the supply voltage and clock frequency variation, increasing the complexity of the power optimization problem. In this paper, a method is proposed to manage the power consumed in an FD-SOI circuit through supply and body bias voltages, and clock frequency variation. Results for a Digital Signal Processor in STMicroelectronics 28nm FD-SOI technology show that the power reduction ratio can reach 17%.
{"title":"Power management through DVFS and dynamic body biasing in FD-SOI circuits","authors":"Y. Akgul, D. Puschini, S. Lesecq, E. Beigné, I. Panades, P. Benoit, L. Torres","doi":"10.1145/2593069.2593185","DOIUrl":"https://doi.org/10.1145/2593069.2593185","url":null,"abstract":"The emerging SOI technologies provide an increased body bias range compared to traditional bulk technologies, opening new opportunities. From the power management perspective, a new degree of freedom is added to the supply voltage and clock frequency variation, increasing the complexity of the power optimization problem. In this paper, a method is proposed to manage the power consumed in an FD-SOI circuit through supply and body bias voltages, and clock frequency variation. Results for a Digital Signal Processor in STMicroelectronics 28nm FD-SOI technology show that the power reduction ratio can reach 17%.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123811457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Yu, S. Saxena, C. Hess, I. Elfadel, D. Antoniadis, D. Boning
In this paper, we propose a novel MOSFET parameter extraction method to enable early technology evaluation. The distinguishing feature of the proposed method is that it enables the extraction of an entire set of MOSFET model parameters using limited and incomplete IV measurements from on-chip monitor circuits. An important step in this method is the use of maximum-a-posteriori estimation where past measurements of transistors from various technologies are used to learn a prior distribution and its uncertainty matrix for the parameters of the target technology. The framework then utilizes Bayesian inference to facilitate extraction using a very small set of additional measurements. The proposed method is validated using various past technologies and post-silicon measurements for a commercial 28-nm process. The proposed extraction could also be used to characterize the statistical variations of MOSFETs with the significant benefit that some constraints required by the backward propagation of variance (BPV) method are relaxed.
{"title":"Remembrance of transistors past: Compact model parameter extraction using bayesian inference and incomplete new measurements","authors":"Li Yu, S. Saxena, C. Hess, I. Elfadel, D. Antoniadis, D. Boning","doi":"10.1145/2593069.2593201","DOIUrl":"https://doi.org/10.1145/2593069.2593201","url":null,"abstract":"In this paper, we propose a novel MOSFET parameter extraction method to enable early technology evaluation. The distinguishing feature of the proposed method is that it enables the extraction of an entire set of MOSFET model parameters using limited and incomplete IV measurements from on-chip monitor circuits. An important step in this method is the use of maximum-a-posteriori estimation where past measurements of transistors from various technologies are used to learn a prior distribution and its uncertainty matrix for the parameters of the target technology. The framework then utilizes Bayesian inference to facilitate extraction using a very small set of additional measurements. The proposed method is validated using various past technologies and post-silicon measurements for a commercial 28-nm process. The proposed extraction could also be used to characterize the statistical variations of MOSFETs with the significant benefit that some constraints required by the backward propagation of variance (BPV) method are relaxed.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125404689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arunprasath Shankar, B. Singh, F. Wolff, C. Papachristou
The integration of reusable IP blocks/cores is a common process in system-on-chip design and involves manually comparing/mapping IP specifications against system requirements. The informal nature of specification limits its automatic analysis. Existing techniques fail to utilize the underlying conceptual information embedded in specifications. In this paper, we present a methodology for specification analysis, which involves concept mining of specifications to generate domain ontologies. We employ a semi-supervised expert system with semantic analysis capability to create a collaborative framework for cumulative knowledge acquisition. Our system then uses the generated ontologies to perform component retrieval, drop-in-replacement analysis and design vs. test-plan comparisons. We demonstrate our approach by evaluating several IP specifications.
{"title":"Ontology-guided conceptual analysis of design specifications","authors":"Arunprasath Shankar, B. Singh, F. Wolff, C. Papachristou","doi":"10.1145/2593069.2593175","DOIUrl":"https://doi.org/10.1145/2593069.2593175","url":null,"abstract":"The integration of reusable IP blocks/cores is a common process in system-on-chip design and involves manually comparing/mapping IP specifications against system requirements. The informal nature of specification limits its automatic analysis. Existing techniques fail to utilize the underlying conceptual information embedded in specifications. In this paper, we present a methodology for specification analysis, which involves concept mining of specifications to generate domain ontologies. We employ a semi-supervised expert system with semantic analysis capability to create a collaborative framework for cumulative knowledge acquisition. Our system then uses the generated ontologies to perform component retrieval, drop-in-replacement analysis and design vs. test-plan comparisons. We demonstrate our approach by evaluating several IP specifications.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129839704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Keszöcze, R. Wille, Tsung-Yi Ho, R. Drechsler
With the advances of the microfluidic technology, the design of digital microfluidic biochips recently received significant attention. But thus far, the corresponding design tasks such as binding, scheduling, placement, and routing have usually been considered separately. Furthermore, often just heuristic results have been obtained. In this work, we present a one-pass synthesis scheme which directly realizes the desired functionality onto the chip and, at the same time, guarantees minimality with respect to area and/or timing. For this purpose, the deductive power of solvers for Boolean satisfiability is exploited. Experiments show how the approach leverages the design of the respective devices.
{"title":"Exact one-pass synthesis of digital microfluidic biochips","authors":"Oliver Keszöcze, R. Wille, Tsung-Yi Ho, R. Drechsler","doi":"10.1145/2593069.2593135","DOIUrl":"https://doi.org/10.1145/2593069.2593135","url":null,"abstract":"With the advances of the microfluidic technology, the design of digital microfluidic biochips recently received significant attention. But thus far, the corresponding design tasks such as binding, scheduling, placement, and routing have usually been considered separately. Furthermore, often just heuristic results have been obtained. In this work, we present a one-pass synthesis scheme which directly realizes the desired functionality onto the chip and, at the same time, guarantees minimality with respect to area and/or timing. For this purpose, the deductive power of solvers for Boolean satisfiability is exploited. Experiments show how the approach leverages the design of the respective devices.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129860394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massively parallel computation in GPUs significantly boosts performance of compute-intensive applications but creates power and thermal issues that limit further performance scaling. This paper demonstrates significant GPGPU power savings by relaxing application accuracy requirements and enabling the use of low power imprecise hardware (IHW). A synthesized set of novel imprecise floating point arithmetic units is presented. GPGPU-Sim and GPUWattch are used to estimate impacts of IHW units on output quality and system-level power consumption, providing a quality-power tradeoff model for application-specific optimization. Experimental results for a 45 nm process show up to 32% power savings with negligible impacts on output quality.
{"title":"Low power GPGPU computation with imprecise hardware","authors":"Hang Zhang, M. Putic, J. Lach","doi":"10.1145/2593069.2593156","DOIUrl":"https://doi.org/10.1145/2593069.2593156","url":null,"abstract":"Massively parallel computation in GPUs significantly boosts performance of compute-intensive applications but creates power and thermal issues that limit further performance scaling. This paper demonstrates significant GPGPU power savings by relaxing application accuracy requirements and enabling the use of low power imprecise hardware (IHW). A synthesized set of novel imprecise floating point arithmetic units is presented. GPGPU-Sim and GPUWattch are used to estimate impacts of IHW units on output quality and system-level power consumption, providing a quality-power tradeoff model for application-specific optimization. Experimental results for a 45 nm process show up to 32% power savings with negligible impacts on output quality.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129927520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Loop pipelining is a widely-accepted technique in high-level synthesis to enable pipelined execution of successive loop iterations to achieve high performance. Existing loop pipelining methods provide inadequate support for pipeline flushing. In this paper, we study the problem of enabling flushing in pipeline synthesis and examine its implications in scheduling and binding. We propose novel techniques for synthesizing a conflict-aware flushing-enabled pipeline that is robust against potential resource collisions. Experiments with real-life benchmarks show that our methods significantly reduce the possibility of resource collisions compared to conventional approaches while conserving hardware resources and achieving near-optimal performance.
{"title":"Flushing-enabled loop pipelining for high-level synthesis","authors":"Steve Dai, Mingxing Tan, K. Hao, Zhiru Zhang","doi":"10.1145/2593069.2593143","DOIUrl":"https://doi.org/10.1145/2593069.2593143","url":null,"abstract":"Loop pipelining is a widely-accepted technique in high-level synthesis to enable pipelined execution of successive loop iterations to achieve high performance. Existing loop pipelining methods provide inadequate support for pipeline flushing. In this paper, we study the problem of enabling flushing in pipeline synthesis and examine its implications in scheduling and binding. We propose novel techniques for synthesizing a conflict-aware flushing-enabled pipeline that is robust against potential resource collisions. Experiments with real-life benchmarks show that our methods significantly reduce the possibility of resource collisions compared to conventional approaches while conserving hardware resources and achieving near-optimal performance.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129971467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have implemented an FPGA routing algorithm on a shared memory multi-processor using the Galois API, which offers speculative parallelism in software. The router is a parallel implementation of PathFinder, which is the basis for most commercial FPGA routers. We parallelize the maze expansion step for each net, while routing nets sequentially to limit the amount of rollback that would likely occur due to misspeculation. Our implementation relies on non-blocking priority queues, which use software transactional memory (SMT), to identify the best route for each net. Our experimental results demonstrate scalability for large benchmarks and that the amount of available parallelism depends primarily on the circuit size, not the inter-dependence of signals. We achieve an average speedup of approximately 3x compared to the most recently published work on parallel multi-threaded FPGA routing, and up to 6x in comparison to the single-threaded router implemented in the publicly available Versatile Place and Route (VPR) framework.
{"title":"Parallel FPGA routing based on the operator formulation","authors":"Yehdhih Ould Mohammed Moctar, P. Brisk","doi":"10.1145/2593069.2593177","DOIUrl":"https://doi.org/10.1145/2593069.2593177","url":null,"abstract":"We have implemented an FPGA routing algorithm on a shared memory multi-processor using the Galois API, which offers speculative parallelism in software. The router is a parallel implementation of PathFinder, which is the basis for most commercial FPGA routers. We parallelize the maze expansion step for each net, while routing nets sequentially to limit the amount of rollback that would likely occur due to misspeculation. Our implementation relies on non-blocking priority queues, which use software transactional memory (SMT), to identify the best route for each net. Our experimental results demonstrate scalability for large benchmarks and that the amount of available parallelism depends primarily on the circuit size, not the inter-dependence of signals. We achieve an average speedup of approximately 3x compared to the most recently published work on parallel multi-threaded FPGA routing, and up to 6x in comparison to the single-threaded router implemented in the publicly available Versatile Place and Route (VPR) framework.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ujjwal Guin, Xuehui Zhang, Domenic Forte, M. Tehranipoor
The recycling of electronic components has become a major concern for the industry and government as it potentially impacts the security and reliability of a wide variety of electronic systems. The sheer number of component types (analog, digital, mixed-signal) and sizes (large or small) makes it extremely challenging to find a one-size-fits-all solution to detect and prevent recycled ICs. In this paper, we propose a suite of solutions for combating die and IC recycling (CDIR). These solutions include light-weight, on-chip structures based on ring oscillators (RO-CDIR), anti-fuses (AF-CDIR) and fuses (F-CDIR). Each structure meets the unique needs and limitations of different part types and sizes providing excellent coverage of recycled parts. HSPICE simulation results using 90nm technology demonstrate the effectiveness of our proposed negative-bias temperature instability (NBTI)-aware RO-CDIR for detecting ICs used for very short period of time. Recycling of large digital ICs can effectively be detected by using AF-CDIR. Small analog and digital recycled components can be identified by testing our F-CDIR with very low cost measurement devices, e.g., a multimeter.
{"title":"Low-cost on-chip structures for combating die and IC recycling","authors":"Ujjwal Guin, Xuehui Zhang, Domenic Forte, M. Tehranipoor","doi":"10.1145/2593069.2593157","DOIUrl":"https://doi.org/10.1145/2593069.2593157","url":null,"abstract":"The recycling of electronic components has become a major concern for the industry and government as it potentially impacts the security and reliability of a wide variety of electronic systems. The sheer number of component types (analog, digital, mixed-signal) and sizes (large or small) makes it extremely challenging to find a one-size-fits-all solution to detect and prevent recycled ICs. In this paper, we propose a suite of solutions for combating die and IC recycling (CDIR). These solutions include light-weight, on-chip structures based on ring oscillators (RO-CDIR), anti-fuses (AF-CDIR) and fuses (F-CDIR). Each structure meets the unique needs and limitations of different part types and sizes providing excellent coverage of recycled parts. HSPICE simulation results using 90nm technology demonstrate the effectiveness of our proposed negative-bias temperature instability (NBTI)-aware RO-CDIR for detecting ICs used for very short period of time. Recycling of large digital ICs can effectively be detected by using AF-CDIR. Small analog and digital recycled components can be identified by testing our F-CDIR with very low cost measurement devices, e.g., a multimeter.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130922125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the challenges that all accelerators face, is to execute loops that have if-then-else constructs. There are three ways to accelerate loops with an if-then-else construct on a Coarse-grained reconfigurable architecture (CGRA): full predication, partial predication, and dual-issue scheme. In comparison with the other schemes, dual-issue scheme may achieve the best performance, but it requires compiler support - which does not exist. In this paper, we develop compiler techniques to map loops with conditionals on CGRA for the dual-issue scheme. Our experiments show: i) 40% of loops that can be accelerated on CGRA have conditionals, ii) The proposed dual-issue scheme enables our compiler to accelerate loops 40% faster than full predication scheme proposed in [12], and iii) Our compiler assisted dual issue scheme can exploit richer interconnects, if present.
{"title":"Branch-aware loop mapping on CGRAs","authors":"M. Hamzeh, Aviral Shrivastava, S. Vrudhula","doi":"10.1145/2593069.2593100","DOIUrl":"https://doi.org/10.1145/2593069.2593100","url":null,"abstract":"One of the challenges that all accelerators face, is to execute loops that have if-then-else constructs. There are three ways to accelerate loops with an if-then-else construct on a Coarse-grained reconfigurable architecture (CGRA): full predication, partial predication, and dual-issue scheme. In comparison with the other schemes, dual-issue scheme may achieve the best performance, but it requires compiler support - which does not exist. In this paper, we develop compiler techniques to map loops with conditionals on CGRA for the dual-issue scheme. Our experiments show: i) 40% of loops that can be accelerated on CGRA have conditionals, ii) The proposed dual-issue scheme enables our compiler to accelerate loops 40% faster than full predication scheme proposed in [12], and iii) Our compiler assisted dual issue scheme can exploit richer interconnects, if present.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125628807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sensitivities of the dynamic system responses with respect to the system parameters are highly valuable, with broad applications such as system tuning and uncertainty quantification. Compared to the direct methods, adjoint methods are much more efficient when the number of parameters is large. In this paper, we present a time-unrolling method to compute adjoint sensitivities. Instead of explicitly constructing the adjoint system, which quite often is nontrivial, our time-unrolling method implicitly retrace the response trajectory by utilizing the fitting polynomial of the integration methods. This paper provides theoretical foundation of the method as well as experimental demonstrations of its effectiveness.
{"title":"A time-unrolling method to compute sensitivity of dynamic systems","authors":"Frank Liu, P. Feldmann","doi":"10.1145/2593069.2593080","DOIUrl":"https://doi.org/10.1145/2593069.2593080","url":null,"abstract":"Sensitivities of the dynamic system responses with respect to the system parameters are highly valuable, with broad applications such as system tuning and uncertainty quantification. Compared to the direct methods, adjoint methods are much more efficient when the number of parameters is large. In this paper, we present a time-unrolling method to compute adjoint sensitivities. Instead of explicitly constructing the adjoint system, which quite often is nontrivial, our time-unrolling method implicitly retrace the response trajectory by utilizing the fitting polynomial of the integration methods. This paper provides theoretical foundation of the method as well as experimental demonstrations of its effectiveness.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116007671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}