We describe a practical method of generating production ready timing violation resilient asynchronous circuits with conditional communication from a high level hardware description language. Designs written in SystemVerilogCSP are taped out on a 3.3 million transistor chip. We present two slackless scan-enabled asynchronous controllers based on the Click template that saved an average area of 14% in our application.
{"title":"Adding Conditionality to Resilient Bundled-Data Designs","authors":"D. Hand, A. Katrin, W. Koven","doi":"10.1109/ASYNC.2016.22","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.22","url":null,"abstract":"We describe a practical method of generating production ready timing violation resilient asynchronous circuits with conditional communication from a high level hardware description language. Designs written in SystemVerilogCSP are taped out on a 3.3 million transistor chip. We present two slackless scan-enabled asynchronous controllers based on the Click template that saved an average area of 14% in our application.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121918402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balsa provides a design flow where asynchronous circuits are created from high-level specifications, but the syntax-driven translation often results in performance overhead. To improve this, we exploit the fact that bundled-data circuits can be divided into data and control path. Hence, tailored optimisation techniques can be applied to both paths separately. For control path optimisation, STG-based resynthesis has been introduced (applying logic minimisation). However, solid results are missing so far due to problems with state explosion and the reliable insertion of reset logic. To tackle this, we use an adjusted STG decomposition algorithm and started to develop a new logic synthesizer (based on ideas of petrify) with proper reset insertion. Adding the adapted data path, we are now able to get first promising post synthesis simulation results using an industrial technology library (with a performance improvement of up to 23%). First experiments show additional potential for performance improvements (of up to 56%) when standard tools for synchronous design are applied to the data path.
{"title":"Optimising Bundled-Data Balsa Circuits","authors":"Norman Kluge, Ralf Wollowski","doi":"10.1109/ASYNC.2016.11","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.11","url":null,"abstract":"Balsa provides a design flow where asynchronous circuits are created from high-level specifications, but the syntax-driven translation often results in performance overhead. To improve this, we exploit the fact that bundled-data circuits can be divided into data and control path. Hence, tailored optimisation techniques can be applied to both paths separately. For control path optimisation, STG-based resynthesis has been introduced (applying logic minimisation). However, solid results are missing so far due to problems with state explosion and the reliable insertion of reset logic. To tackle this, we use an adjusted STG decomposition algorithm and started to develop a new logic synthesizer (based on ideas of petrify) with proper reset insertion. Adding the adapted data path, we are now able to get first promising post synthesis simulation results using an industrial technology library (with a performance improvement of up to 23%). First experiments show additional potential for performance improvements (of up to 56%) when standard tools for synchronous design are applied to the data path.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125789015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is well-established that unsynchronized communication across clock domains can result in metastable upsets and that this cannot be avoided deterministically. This, however, does not preclude the possibility that metastability can be contained deterministically, in the sense that meaningful and precise computations can be performed despite metastability of some bits. In this work, we provide evidence that this is not only possible, but can also be done efficiently. We propose a circuit of size O(B2) and depth O(B) that computes the minimum and maximum of two B-bit Gray code inputs, where each input may contain one metastable bit (introducing uncertainty regarding whether it encodes some value x or rather x + 1). This is achieved by combining the results of a recursive call on the (B - 1)-bit suffixes in a metastability-containing way. This overcomes the problem posed by possible metastability of the logic controlling the recursion, which must occur in some executions.
{"title":"Efficient Metastability-Containing Gray Code 2-Sort","authors":"C. Lenzen, Moti Medina","doi":"10.1109/ASYNC.2016.18","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.18","url":null,"abstract":"It is well-established that unsynchronized communication across clock domains can result in metastable upsets and that this cannot be avoided deterministically. This, however, does not preclude the possibility that metastability can be contained deterministically, in the sense that meaningful and precise computations can be performed despite metastability of some bits. In this work, we provide evidence that this is not only possible, but can also be done efficiently. We propose a circuit of size O(B2) and depth O(B) that computes the minimum and maximum of two B-bit Gray code inputs, where each input may contain one metastable bit (introducing uncertainty regarding whether it encodes some value x or rather x + 1). This is achieved by combining the results of a recursive call on the (B - 1)-bit suffixes in a metastability-containing way. This overcomes the problem posed by possible metastability of the logic controlling the recursion, which must occur in some executions.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115647460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an asynchronous (QDI) FFT design for low-power M2M communication. The design achieves low power by having efficient memory controls, twiddle multiplication, and allowing all subsystems in this nested butterfly architecture to run only as fast as they need to run. For a 10MHz input data rate, our 128-point, 16-bit, radix-23 FFT design consumes only 5.9nJ of energy at Vdd=1V in a 65nm technology.
{"title":"Low Power QDI Asynchronous FFT","authors":"Benjamin Z. Tang, F. Lane","doi":"10.1109/ASYNC.2016.17","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.17","url":null,"abstract":"We present an asynchronous (QDI) FFT design for low-power M2M communication. The design achieves low power by having efficient memory controls, twiddle multiplication, and allowing all subsystems in this nested butterfly architecture to run only as fast as they need to run. For a 10MHz input data rate, our 128-point, 16-bit, radix-23 FFT design consumes only 5.9nJ of energy at Vdd=1V in a 65nm technology.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114770543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier de San Pedro, Thomas Bourgeat, J. Cortadella
The paper presents a first effort at exploring a novel area in the domain of asynchronous controllers: specification mining. Rather than synthesizing circuits from specifications, we aim at doing reverse engineering, i.e., discovering safe specifications from the circuits that preserve a set of pre-defined behavioral properties (e.g., hazard freeness). The specifications are discovered without any previous knowledge of the behavior of the circuit environment. This area may open new opportunities for re-synthesis and verification of asynchronous controllers. The effectiveness of the proposed approach is demonstrated by mining concurrent specifications (Signal Transition Graphs) from multiple implementations of 4-phase handshake controllers and some controllers with choice.
{"title":"Specification Mining for Asynchronous Controllers","authors":"Javier de San Pedro, Thomas Bourgeat, J. Cortadella","doi":"10.1109/ASYNC.2016.10","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.10","url":null,"abstract":"The paper presents a first effort at exploring a novel area in the domain of asynchronous controllers: specification mining. Rather than synthesizing circuits from specifications, we aim at doing reverse engineering, i.e., discovering safe specifications from the circuits that preserve a set of pre-defined behavioral properties (e.g., hazard freeness). The specifications are discovered without any previous knowledge of the behavior of the circuit environment. This area may open new opportunities for re-synthesis and verification of asynchronous controllers. The effectiveness of the proposed approach is demonstrated by mining concurrent specifications (Signal Transition Graphs) from multiple implementations of 4-phase handshake controllers and some controllers with choice.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"475 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123386798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Relative Timing uses path based timing constraints to guarantee that a circuit conforms to its behavioral specification. Timing constraints are used to order signal transitions or events in a circuit through corresponding minimum and maximum delay timing constraints. A circuit may have multiple sets of constraints, each of which, when satisfied, can individually ensure functional correctness. This paper presents a framework to evaluate and rank relative timing constraint sets for a given circuit. The constraint sets are evaluated on the basis of robustness of the constraints and conflicts between constraints in the same set. The analysis is automated by building a tool. The paper applies the methodology and tool to optimize the extraction of relative timing constraints for delay insensitive timing models of asynchronous circuits. This is demonstrated using a burst-mode controller. The optimization leads to an average tool runtime reduction of 94%.
{"title":"Qualifying Relative Timing Constraints for Asynchronous Circuits","authors":"Jotham Vaddaboina Manoranjan, K. Stevens","doi":"10.1109/ASYNC.2016.23","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.23","url":null,"abstract":"Relative Timing uses path based timing constraints to guarantee that a circuit conforms to its behavioral specification. Timing constraints are used to order signal transitions or events in a circuit through corresponding minimum and maximum delay timing constraints. A circuit may have multiple sets of constraints, each of which, when satisfied, can individually ensure functional correctness. This paper presents a framework to evaluate and rank relative timing constraint sets for a given circuit. The constraint sets are evaluated on the basis of robustness of the constraints and conflicts between constraints in the same set. The analysis is automated by building a tool. The paper applies the methodology and tool to optimize the extraction of relative timing constraints for delay insensitive timing models of asynchronous circuits. This is demonstrated using a burst-mode controller. The optimization leads to an average tool runtime reduction of 94%.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123693786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Cortadella, Marc Lupon, A. Moreno-Conde, Antoni Roca, S. Sapatnekar
How much margin do we have to add to the delay lines of a bundled-data circuit? This paper is an attempt to give a methodical answer to this question, taking into account all sources of variability and the existing EDA machinery for timing analysis and sign-off. The paper is based on the study of the margins of a ring oscillator that substitutes a PLL as clock generator. A timing model is proposed that shows that a 12% margin for delay lines can be sufficient to cover variability in a 65nm technology. In a typical scenario, performance and energy improvements between 15% and 35% can be obtained by using a ring oscillator instead of a PLL. The paper concludes that a synchronous circuit with a ring oscillator clock shows similar benefits in performance and energy as those of bundled-data asynchronous circuits.
{"title":"Ring Oscillator Clocks and Margins","authors":"J. Cortadella, Marc Lupon, A. Moreno-Conde, Antoni Roca, S. Sapatnekar","doi":"10.1109/ASYNC.2016.14","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.14","url":null,"abstract":"How much margin do we have to add to the delay lines of a bundled-data circuit? This paper is an attempt to give a methodical answer to this question, taking into account all sources of variability and the existing EDA machinery for timing analysis and sign-off. The paper is based on the study of the margins of a ring oscillator that substitutes a PLL as clock generator. A timing model is proposed that shows that a 12% margin for delay lines can be sufficient to cover variability in a 65nm technology. In a typical scenario, performance and energy improvements between 15% and 35% can be obtained by using a ring oscillator instead of a PLL. The paper concludes that a synchronous circuit with a ring oscillator clock shows similar benefits in performance and energy as those of bundled-data asynchronous circuits.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126534866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Resilient architectures emerged as a promising solution to remove worst-case timing margins added due to process, voltage and temperature variation, improving system performance while reducing energy consumption. Asynchronous circuits can also improve energy efficiency and performance due to the absence of a global clock. A recently proposed circuit template, called Blade, leverages the advantages of both asynchronous and resilient techniques. However, Blade still presents challenges in terms of testing, which hinder its practical application. This paper evaluates the fault behavior of the Error Detection Logic (EDL) block of Blade with single stuck-at or propagation delay fault models. We propose a fault classification based on the effects observed in the overall circuit operation while in the presence of a fault. This classification shows the obtained fault coverage assuming three different testability scenarios and it also shows that a single fault can entirely disable an EDL, disabling its resilience. The proposed classification can be used in the future to improve the design for testability of resilient architectures.
{"title":"Fault Classification of the Error Detection Logic in the Blade Resilient Template","authors":"F. Kuentzer, Alexandre M. Amory","doi":"10.1109/ASYNC.2016.9","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.9","url":null,"abstract":"Resilient architectures emerged as a promising solution to remove worst-case timing margins added due to process, voltage and temperature variation, improving system performance while reducing energy consumption. Asynchronous circuits can also improve energy efficiency and performance due to the absence of a global clock. A recently proposed circuit template, called Blade, leverages the advantages of both asynchronous and resilient techniques. However, Blade still presents challenges in terms of testing, which hinder its practical application. This paper evaluates the fault behavior of the Error Detection Logic (EDL) block of Blade with single stuck-at or propagation delay fault models. We propose a fault classification based on the effects observed in the overall circuit operation while in the presence of a fault. This classification shows the obtained fault coverage assuming three different testability scenarios and it also shows that a single fault can entirely disable an EDL, disabling its resilience. The proposed classification can be used in the future to improve the design for testability of resilient architectures.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"692 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122493853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schmitt-Trigger circuits are the method of choice for converting general signal shapes into clean, well-behaved digital ones. In this context these circuits are often used for metastability handling, as well. However, like any other positive feedback circuit, a Schmitt-Trigger can become metastable itself. Therefore, its own metastable behavior must be well understood, in particular the conditions that may cause its metastability. In this paper we will build on existing results from Marino to show that (a) a monotonic input signal can cause late transitions but never leads to a non-digital voltage at the Schmitt-Trigger output, and (b) a non-monotonic input can pin the Schmitt-Trigger output to a constant voltage at any desired (also non-digital) level for an arbitrary duration. In fact, the output can even be driven to any waveform within the dynamic limits of the system. We will base our analysis on a mathematical model of a Schmitt-Trigger's dynamic behavior and perform SPICE simulations to support our theory and confirm its validity for modern CMOS implementations. Furthermore, we will discuss several use cases of a Schmitt-Trigger in the light of our results.
{"title":"The Metastable Behavior of a Schmitt-Trigger","authors":"A. Steininger, Jürgen Maier, Robert Najvirt","doi":"10.1109/ASYNC.2016.19","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.19","url":null,"abstract":"Schmitt-Trigger circuits are the method of choice for converting general signal shapes into clean, well-behaved digital ones. In this context these circuits are often used for metastability handling, as well. However, like any other positive feedback circuit, a Schmitt-Trigger can become metastable itself. Therefore, its own metastable behavior must be well understood, in particular the conditions that may cause its metastability. In this paper we will build on existing results from Marino to show that (a) a monotonic input signal can cause late transitions but never leads to a non-digital voltage at the Schmitt-Trigger output, and (b) a non-monotonic input can pin the Schmitt-Trigger output to a constant voltage at any desired (also non-digital) level for an arbitrary duration. In fact, the output can even be driven to any waveform within the dynamic limits of the system. We will base our analysis on a mathematical model of a Schmitt-Trigger's dynamic behavior and perform SPICE simulations to support our theory and confirm its validity for modern CMOS implementations. Furthermore, we will discuss several use cases of a Schmitt-Trigger in the light of our results.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131592147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Divya Akella, Matthew R. Fojtik, Brucek Khailany, Sudhir S. Kudva, Yaping Zhou, B. Calhoun
Power supply noise can significantly degrade circuit performance in modern high-performance SoCs. Adaptive clocking schemes have been proposed recently that can tolerate power supply noise by adjusting the clock frequency in response to fast-changing voltage variations. In this paper, we model and quantify power supply noise tolerance with a fine-grained globally asynchronous locally synchronous (GALS) design style together with an adaptive clocking scheme. An experimental setup that includes SPICE and Verilog-A models is used to quantify the effect of clock-tree insertion delay and spatial workload variations on power supply noise tolerance in both traditional synchronous adaptive clocking and a fine-grained GALS adaptive clocking scheme. Compared to the traditional scheme, fine-grained GALS adaptive clocking significantly reduces these effects and the margins required to tolerate power supply noise. The gain is quantified using the uncompensated voltage noise metric, which is defined as the additional voltage margin that is required for failure-free operation of circuits at the frequency dictated by the adaptive clocking scheme. In our experimental setup for a typical high performance SoC, fine-grained GALS adaptive clocking achieves a 78 mV saving in uncompensated voltage noise, which is an equivalent of 15% savings in power.
{"title":"Modeling and Analysis of Power Supply Noise Tolerance with Fine-Grained GALS Adaptive Clocks","authors":"Divya Akella, Matthew R. Fojtik, Brucek Khailany, Sudhir S. Kudva, Yaping Zhou, B. Calhoun","doi":"10.1109/ASYNC.2016.13","DOIUrl":"https://doi.org/10.1109/ASYNC.2016.13","url":null,"abstract":"Power supply noise can significantly degrade circuit performance in modern high-performance SoCs. Adaptive clocking schemes have been proposed recently that can tolerate power supply noise by adjusting the clock frequency in response to fast-changing voltage variations. In this paper, we model and quantify power supply noise tolerance with a fine-grained globally asynchronous locally synchronous (GALS) design style together with an adaptive clocking scheme. An experimental setup that includes SPICE and Verilog-A models is used to quantify the effect of clock-tree insertion delay and spatial workload variations on power supply noise tolerance in both traditional synchronous adaptive clocking and a fine-grained GALS adaptive clocking scheme. Compared to the traditional scheme, fine-grained GALS adaptive clocking significantly reduces these effects and the margins required to tolerate power supply noise. The gain is quantified using the uncompensated voltage noise metric, which is defined as the additional voltage margin that is required for failure-free operation of circuits at the frequency dictated by the adaptive clocking scheme. In our experimental setup for a typical high performance SoC, fine-grained GALS adaptive clocking achieves a 78 mV saving in uncompensated voltage noise, which is an equivalent of 15% savings in power.","PeriodicalId":314538,"journal":{"name":"2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130746659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}