Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494451
L. S. Nielsen, J. Sparsø
This paper describes a number of design issues relating to the implementation of low-power asynchronous signal processing circuits. Specifically, the paper addresses the design of a dedicated processor structure that implements an audio FIR filter bank which is part of an industrial application. The algorithm requires a fixed number of steps and the moderate speed requirement allows a sequential implementation. The latter, in combination with a huge predominance of numerically small data values in the input data stream, is the key to a low-power asynchronous implementation. Power is minimized in two ways: by reducing the switching activity in the circuit, and by applying adaptive scaling of the supply voltage, in order to exploit the fact that the average case latency as 2-3 times better than the worst case. The paper reports on a study of properties of real life data, and discusses the implications it has on the choice of architecture, handshake-protocol, data-encoding, and circuit design. This includes a tagging scheme that divides the data-path into slices, and an asynchronous ripple carry adder that avoids a completion tree.
{"title":"A low-power asynchronous data-path for a FIR filter bank","authors":"L. S. Nielsen, J. Sparsø","doi":"10.1109/ASYNC.1996.494451","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494451","url":null,"abstract":"This paper describes a number of design issues relating to the implementation of low-power asynchronous signal processing circuits. Specifically, the paper addresses the design of a dedicated processor structure that implements an audio FIR filter bank which is part of an industrial application. The algorithm requires a fixed number of steps and the moderate speed requirement allows a sequential implementation. The latter, in combination with a huge predominance of numerically small data values in the input data stream, is the key to a low-power asynchronous implementation. Power is minimized in two ways: by reducing the switching activity in the circuit, and by applying adaptive scaling of the supply voltage, in order to exploit the fact that the average case latency as 2-3 times better than the worst case. The paper reports on a study of properties of real life data, and discusses the implications it has on the choice of architecture, handshake-protocol, data-encoding, and circuit design. This includes a tagging scheme that divides the data-path into slices, and an asynchronous ripple carry adder that avoids a completion tree.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134135543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494432
P. T. Røine
A system for high-speed asynchronous interconnections between VLSI chips is proposed. Communication is performed on three-wire links that have about the same properties as differential interconnections. A bit transmission consists of switching the constant driver current from one wire to one of the two others. There is no need for clocking or synchronisation, as bits are separated by a transition. The chosen data representation makes decoding to a two-phase protocol especially simple. Energy consumption may be reduced by dynamically adjusting bias currents, and thus circuit speed, to match the demand for communication bandwidth. In a 0.7 /spl mu/m CMOS process, communication bandwidth per link is expected to reach 1 Gb/s.
{"title":"A system for asynchronous high-speed chip to chip communication","authors":"P. T. Røine","doi":"10.1109/ASYNC.1996.494432","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494432","url":null,"abstract":"A system for high-speed asynchronous interconnections between VLSI chips is proposed. Communication is performed on three-wire links that have about the same properties as differential interconnections. A bit transmission consists of switching the constant driver current from one wire to one of the two others. There is no need for clocking or synchronisation, as bits are separated by a transition. The chosen data representation makes decoding to a two-phase protocol especially simple. Energy consumption may be reduced by dynamically adjusting bias currents, and thus circuit speed, to match the demand for communication bandwidth. In a 0.7 /spl mu/m CMOS process, communication bandwidth per link is expected to reach 1 Gb/s.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126501414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494452
J. Garside, S. Temple, R. Mehra
AMULET2e is an asynchronous microprocessor system based on the AMULET2 processor core. In addition to the processor it incorporates a number of distinct subsystems, the most notable of which is an asynchronous on-chip cache. This includes several novel features which exploit the asynchronous design style to increase throughput and reduce power consumption. These features are evident at a number of levels in the design. For example, the cache is micropipelined to increase its throughput, at the architectural level there is an arbitration free non-blocking line fetch mechanism while at the circuit design level there is a power-saving RAM sense amplifier control circuit. A significant property of the cache system is its ability to cycle in a data dependent way which allows the system to approach average case performance.
{"title":"The AMULET2e cache system","authors":"J. Garside, S. Temple, R. Mehra","doi":"10.1109/ASYNC.1996.494452","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494452","url":null,"abstract":"AMULET2e is an asynchronous microprocessor system based on the AMULET2 processor core. In addition to the processor it incorporates a number of distinct subsystems, the most notable of which is an asynchronous on-chip cache. This includes several novel features which exploit the asynchronous design style to increase throughput and reduce power consumption. These features are evident at a number of levels in the design. For example, the cache is micropipelined to increase its throughput, at the architectural level there is an arbitration free non-blocking line fetch mechanism while at the circuit design level there is a power-saving RAM sense amplifier control circuit. A significant property of the cache system is its ability to cycle in a data dependent way which allows the system to approach average case performance.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115228999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494446
E. Grass, R. Morling, I. Kale
A new method for designing single rail asynchronous circuits is studied. It utilises additional circuitry to monitor the activity of nodes within combinational logic blocks. When all transitions have halted a completion signal is generated. Details of the circuit and design methodology are given and the influence of glitches on the proposed circuit is discussed. Three different levels of granularity are investigated. Experimental physical layout of the circuit with extracted and back-annotated simulation results is provided. The proposed approach results in faster operation than synchronous circuits with minimum circuit overhead incurred.
{"title":"Activity-Monitoring Completion-Detection (AMCD): a new single rail approach to achieve self-timing","authors":"E. Grass, R. Morling, I. Kale","doi":"10.1109/ASYNC.1996.494446","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494446","url":null,"abstract":"A new method for designing single rail asynchronous circuits is studied. It utilises additional circuitry to monitor the activity of nodes within combinational logic blocks. When all transitions have halted a completion signal is generated. Details of the circuit and design methodology are given and the influence of glitches on the proposed circuit is discussed. Three different levels of granularity are investigated. Experimental physical layout of the circuit with extracted and back-annotated simulation results is provided. The proposed approach results in faster operation than synchronous circuits with minimum circuit overhead incurred.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129759997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494444
K. V. Berkel, A. Bink
Single-track handshake signaling is using the same wire for request and acknowledge signaling. After each 2-phase handshake the wire is back in its initial state. A sequence of three protocol definitions suggests both a design method for single-track circuits and a trade-off between their robustness and their cost/performance. Single-track handshake signaling is applied to micropipelines and to handshake circuits, including a successful redesign of an existing rate-converter IC. The practical merits are not yet clear, but may include improved performance.
{"title":"Single-track handshake signaling with application to micropipelines and handshake circuits","authors":"K. V. Berkel, A. Bink","doi":"10.1109/ASYNC.1996.494444","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494444","url":null,"abstract":"Single-track handshake signaling is using the same wire for request and acknowledge signaling. After each 2-phase handshake the wire is back in its initial state. A sequence of three protocol definitions suggests both a design method for single-track circuits and a trade-off between their robustness and their cost/performance. Single-track handshake signaling is applied to micropipelines and to handshake circuits, including a successful redesign of an existing rate-converter IC. The practical merits are not yet clear, but may include improved performance.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121975207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494445
M. Maezawa, I. Kurosawa, Y. Kameda, T. Nanya
We present a pulse-driven asynchronous logic gate family based on a high-speed, low-power Josephson-junction device of rapid single-flux-quantum (RSFQ) circuits. Dual-rail logic is used and clock-free operation is realized. The proper operation of the circuits is confirmed by the numerical simulation which shows logic delays are about 60 ps for an AND and about 80 ps for an XOR. Power consumption is estimated to be 10 /spl mu/W/gate.
{"title":"Pulse-driven dual-rail logic gate family based on rapid single-flux-quantum (RSFQ) devices for asynchronous circuits","authors":"M. Maezawa, I. Kurosawa, Y. Kameda, T. Nanya","doi":"10.1109/ASYNC.1996.494445","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494445","url":null,"abstract":"We present a pulse-driven asynchronous logic gate family based on a high-speed, low-power Josephson-junction device of rapid single-flux-quantum (RSFQ) circuits. Dual-rail logic is used and clock-free operation is realized. The proper operation of the circuits is confirmed by the numerical simulation which shows logic delays are about 60 ps for an AND and about 80 ps for an XOR. Power consumption is estimated to be 10 /spl mu/W/gate.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494450
J. Tierno, R. Manohar, Alain J. Martin
We introduce the concept of energy index, a measure which can be used to estimate the pourer dissipation of a standard implementation of the high-level specification for an asynchronous circuit. This energy index is related to information-theoretic entropy measures. It is shown how these measures can be used to design low-power circuits.
{"title":"The energy and entropy of VLSI computations","authors":"J. Tierno, R. Manohar, Alain J. Martin","doi":"10.1109/ASYNC.1996.494450","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494450","url":null,"abstract":"We introduce the concept of energy index, a measure which can be used to estimate the pourer dissipation of a standard implementation of the high-level specification for an asynchronous circuit. This energy index is related to information-theoretic entropy measures. It is shown how these measures can be used to design low-power circuits.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130398802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494441
N. Tabrizi, M. Liebelt, K. Eshraghian
Different types of hazards have been studied extensively under the bounded gate and wire delay model. It is well known that under this delay model not all multiple input dynamic logic hazards can be removed from all two stage combinational logic circuits. In this paper we restrict the delay model to the well-known inertial gate delay or speed independent model and show that under this model half of the dynamic logic hazards can no longer occur in two level logic circuits. We then weaken the zero wire delay restriction and find an upper bound for the delay along critical interconnection wires and hence propose a virtual isochronic fork model for interconnection networks.
{"title":"Dynamic hazards and speed independent delay model","authors":"N. Tabrizi, M. Liebelt, K. Eshraghian","doi":"10.1109/ASYNC.1996.494441","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494441","url":null,"abstract":"Different types of hazards have been studied extensively under the bounded gate and wire delay model. It is well known that under this delay model not all multiple input dynamic logic hazards can be removed from all two stage combinational logic circuits. In this paper we restrict the delay model to the well-known inertial gate delay or speed independent model and show that under this model half of the dynamic logic hazards can no longer occur in two level logic circuits. We then weaken the zero wire delay restriction and find an upper bound for the delay along critical interconnection wires and hence propose a virtual isochronic fork model for interconnection networks.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131898240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494439
Tony Werner, V. Akella
This paper proposes a new dynamic instruction scheduler called the Asynchronous Fast Dispatch Stack (AFDS). This approach utilizes asynchronous design techniques to implement a dispatch stack-based dynamic instruction issue mechanism. To maintain throughput and simplify dependency computations, the AFDS architecture includes a counterflow pipeline, which is modeled after the Counterflow Pipeline Processor (CFPP) proposed by Sproull and Sutherland (1994). The AFDS counterflow pipeline, however, propagates instruction dependency and completion information, rather than results and source operands. Preliminary results indicate that the AFDS is a promising application of the CFPP architecture.
{"title":"Counterflow pipeline based dynamic instruction scheduling","authors":"Tony Werner, V. Akella","doi":"10.1109/ASYNC.1996.494439","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494439","url":null,"abstract":"This paper proposes a new dynamic instruction scheduler called the Asynchronous Fast Dispatch Stack (AFDS). This approach utilizes asynchronous design techniques to implement a dispatch stack-based dynamic instruction issue mechanism. To maintain throughput and simplify dependency computations, the AFDS architecture includes a counterflow pipeline, which is modeled after the Counterflow Pipeline Processor (CFPP) proposed by Sproull and Sutherland (1994). The AFDS counterflow pipeline, however, propagates instruction dependency and completion information, rather than results and source operands. Preliminary results indicate that the AFDS is a promising application of the CFPP architecture.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115752725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-03-18DOI: 10.1109/ASYNC.1996.494438
W. Richardson, E. Brunvand
Decoupled computer architectures provide an effective means of exploiting instruction level parallelism. Self-timed micropipeline systems are inherently decoupled due to the elastic nature of the basic FIFO structure, and may be ideally suited for constructing decoupled computer architectures. Fred is a self-timed decoupled, pipelined computer architecture based on micropipelines. We present the architecture of Fred, with specific details on a micropipelined implementation that includes support for multiple functional units and out-of-order instruction completion due to the self-timed decoupling.
{"title":"Fred: an architecture for a self-timed decoupled computer","authors":"W. Richardson, E. Brunvand","doi":"10.1109/ASYNC.1996.494438","DOIUrl":"https://doi.org/10.1109/ASYNC.1996.494438","url":null,"abstract":"Decoupled computer architectures provide an effective means of exploiting instruction level parallelism. Self-timed micropipeline systems are inherently decoupled due to the elastic nature of the basic FIFO structure, and may be ideally suited for constructing decoupled computer architectures. Fred is a self-timed decoupled, pipelined computer architecture based on micropipelines. We present the architecture of Fred, with specific details on a micropipelined implementation that includes support for multiple functional units and out-of-order instruction completion due to the self-timed decoupling.","PeriodicalId":365358,"journal":{"name":"Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"1126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122925564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}