We propose a pipelined division architecture for low-power ECC applications, which is based on partial-division on group basis and lookahead technique exploiting the linearity in finite field arithmetic. The throughput is one division per clock regardless of the degree of the dividend polynomial. The salient feature of this architecture is that it leads to very low power-delay product. To verify the relative performance of the proposed division architecture over the conventional one using LFSR, three RS and BCH code applications were fabricated using 0.8 /spl mu/m double metal CMOS technology. Experimental results show about 32, 65, 67 times improvement in power consumption compared with conventional one using LFSR.
{"title":"A one division per clock pipelined division architecture based on LAPR (lookahead of partial-remainder) for low-power ECC applications","authors":"Hyungjoon Kwon, Kwyro Lee","doi":"10.1109/LPE.1997.621286","DOIUrl":"https://doi.org/10.1109/LPE.1997.621286","url":null,"abstract":"We propose a pipelined division architecture for low-power ECC applications, which is based on partial-division on group basis and lookahead technique exploiting the linearity in finite field arithmetic. The throughput is one division per clock regardless of the degree of the dividend polynomial. The salient feature of this architecture is that it leads to very low power-delay product. To verify the relative performance of the proposed division architecture over the conventional one using LFSR, three RS and BCH code applications were fabricated using 0.8 /spl mu/m double metal CMOS technology. Experimental results show about 32, 65, 67 times improvement in power consumption compared with conventional one using LFSR.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122276929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a robust Differential Current Switch Logic gate suitable for low V/sub DD/, low power operation. Differential Current Switch Logic gates achieve high performance and low power by restricting internal node voltage swings. Traditional DCSL is, however, highly sensitive to load imbalance because of the presence of a cross coupled inverter pair at the output. In this paper we describe LVDCSL, a low voltage DCSL family which preserves the essential features of DCSL namely, high speed, low power, restricted internal voltage swings and a latching input stage. However, it is much more robust to mismatched output loads, and is capable of working at far lower voltages. In addition spikes in output transitions are greatly reduced simplifying interface to conventional CMOS circuits. Our results show that LVDCSL is capable of working at under 2 volts in a 0.35 /spl mu/m CMOS process while being faster than comparable Domino gates. At the same time total power consumption is reduced. LVDCSL achieves 40% delay improvement and 22% power reduction in comparison with Domino gates.
{"title":"LVDCSL: low voltage differential current switch logic, a robust low power DCSL family","authors":"D. Somasekhar, K. Roy","doi":"10.1145/263272.263276","DOIUrl":"https://doi.org/10.1145/263272.263276","url":null,"abstract":"In this paper we present a robust Differential Current Switch Logic gate suitable for low V/sub DD/, low power operation. Differential Current Switch Logic gates achieve high performance and low power by restricting internal node voltage swings. Traditional DCSL is, however, highly sensitive to load imbalance because of the presence of a cross coupled inverter pair at the output. In this paper we describe LVDCSL, a low voltage DCSL family which preserves the essential features of DCSL namely, high speed, low power, restricted internal voltage swings and a latching input stage. However, it is much more robust to mismatched output loads, and is capable of working at far lower voltages. In addition spikes in output transitions are greatly reduced simplifying interface to conventional CMOS circuits. Our results show that LVDCSL is capable of working at under 2 volts in a 0.35 /spl mu/m CMOS process while being faster than comparable Domino gates. At the same time total power consumption is reduced. LVDCSL achieves 40% delay improvement and 22% power reduction in comparison with Domino gates.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129904637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Significand pre-alignment is a pre requisite for floating point additions. This paper addresses the architectural design and energy delay evaluation of a Low Power Barrel Switch for pre-alignment of floating point significands. Architectural energy delay analysis of Barrel Switch schemes suggests the suitability of transition activity scaled architectures for Low Power CMOS designs. Our energy delay estimates of operand pre-alignment Barrel Switchers for the addition of IEEE single precision floating point numbers, taking into account the architectural as well as circuit implementation issues, suggests an energy delay reduction of better than 50% for transition activity scaled architectures for coefficients of parasitic lending exceeding 10. The corresponding reduction in power consumption is more than 55%.
{"title":"Energy delay measures of barrel switch architectures for pre-alignment of floating point operands for addition","authors":"R. Pillai, D. Al-Khalili, A. Al-Khalili","doi":"10.1145/263272.263341","DOIUrl":"https://doi.org/10.1145/263272.263341","url":null,"abstract":"Significand pre-alignment is a pre requisite for floating point additions. This paper addresses the architectural design and energy delay evaluation of a Low Power Barrel Switch for pre-alignment of floating point significands. Architectural energy delay analysis of Barrel Switch schemes suggests the suitability of transition activity scaled architectures for Low Power CMOS designs. Our energy delay estimates of operand pre-alignment Barrel Switchers for the addition of IEEE single precision floating point numbers, taking into account the architectural as well as circuit implementation issues, suggests an energy delay reduction of better than 50% for transition activity scaled architectures for coefficients of parasitic lending exceeding 10. The corresponding reduction in power consumption is more than 55%.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115122004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several high-level low-power design techniques have been incorporated in the design of a decimation filter for software radio. These include; operation minimization, multiplier elimination and block deactivation. Analysis and simulation results indicate that these techniques can achieve a 4 times reduction in power dissipation. An interleaved multiplier-accumulator array is used in the lowpass filter. The decimation filter designed has a programmable resolution, that varies from 12 to 20 bits. The entire decimation filter has been designed in a 3.3 Volt 0.5 /spl mu/m CMOS technology.
几种高水平的低功耗设计技术已被纳入软件无线电抽取滤波器的设计。这些包括;操作最小化,乘数消除和块停用。分析和仿真结果表明,这些技术可以使功耗降低4倍。低通滤波器采用交错乘加器阵列。所设计的抽取滤波器具有可编程的分辨率,从12位到20位不等。整个抽取滤波器采用3.3伏特0.5 /spl μ m CMOS技术设计。
{"title":"A programmable power-efficient decimation filter for software radios","authors":"E. Farag, R. Yan, M. Elmasry","doi":"10.1145/263272.263285","DOIUrl":"https://doi.org/10.1145/263272.263285","url":null,"abstract":"Several high-level low-power design techniques have been incorporated in the design of a decimation filter for software radio. These include; operation minimization, multiplier elimination and block deactivation. Analysis and simulation results indicate that these techniques can achieve a 4 times reduction in power dissipation. An interleaved multiplier-accumulator array is used in the lowpass filter. The decimation filter designed has a programmable resolution, that varies from 12 to 20 bits. The entire decimation filter has been designed in a 3.3 Volt 0.5 /spl mu/m CMOS technology.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"165 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127535108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Presented in this paper are algorithm transformation techniques for adaptive signal processing, which allow dynamic alteration of algorithm properties in response to signal non-stationarities. These transformations, referred to as dynamic algorithm transformations (DAT), jointly optimize algorithm and circuit performance measures such as signal-to-noise ratios (SNR) and power dissipation (P/sub D/), respectively. A DAT-based signal processing system is composed of a signal monitoring algorithm (SMA) block and a signal processing algorithm (SPA) block. First, computation of the theoretical power-optimum SPA configuration incorporating signal transition activity is presented. Next, practical SMA schemes are developed, which achieved power reduction by a combination of powering down the filter taps and modifying the coefficients. The DAT-based adaptive filter is then employed as a near-end cross-talk (NEXT) canceller in 155.52 Mb/s ATM-LAN over category 3 wiring. Simulation results indicate that the power savings for the NEXT canceller range from 21%-62% as the cable length varies from 100 meters to 70 meters.
{"title":"Dynamic algorithm transformations (DAT) for low-power adaptive signal processing","authors":"M. Goel, Naresh R Shanbhag","doi":"10.1145/263272.263316","DOIUrl":"https://doi.org/10.1145/263272.263316","url":null,"abstract":"Presented in this paper are algorithm transformation techniques for adaptive signal processing, which allow dynamic alteration of algorithm properties in response to signal non-stationarities. These transformations, referred to as dynamic algorithm transformations (DAT), jointly optimize algorithm and circuit performance measures such as signal-to-noise ratios (SNR) and power dissipation (P/sub D/), respectively. A DAT-based signal processing system is composed of a signal monitoring algorithm (SMA) block and a signal processing algorithm (SPA) block. First, computation of the theoretical power-optimum SPA configuration incorporating signal transition activity is presented. Next, practical SMA schemes are developed, which achieved power reduction by a combination of powering down the filter taps and modifying the coefficients. The DAT-based adaptive filter is then employed as a near-end cross-talk (NEXT) canceller in 155.52 Mb/s ATM-LAN over category 3 wiring. Simulation results indicate that the power savings for the NEXT canceller range from 21%-62% as the cable length varies from 100 meters to 70 meters.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130275572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While power consumption has become an important design constraint very few reports of power analysis of processors are available in the literature. The processor considered is an experimental integration of a 16-bit DSP and a 32-bit RISC microcontroller, ERDI. Simulation based power analysis on a back annotated design is used to obtain data for a set of DSP application kernels and synthetic benchmarks.
{"title":"Power analysis of a 32-bit RISC microcontroller integrated with a 16-bit DSP","authors":"R. Bajwa, N. Schumann, H. Kojima","doi":"10.1145/263272.263309","DOIUrl":"https://doi.org/10.1145/263272.263309","url":null,"abstract":"While power consumption has become an important design constraint very few reports of power analysis of processors are available in the literature. The processor considered is an experimental integration of a 16-bit DSP and a 32-bit RISC microcontroller, ERDI. Simulation based power analysis on a back annotated design is used to obtain data for a set of DSP application kernels and synthetic benchmarks.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127183711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient use of an optimized memory hierarchy to exploit temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in data dominated applications. In the past, this task has been identified as crucial in a complete low-power memory management methodology. But effective formalized techniques to deal with this specific task have not been addressed yet. In this paper the design freedom available for the basic problem is explored in-depth and the outline of a systematic solution methodology is proposed. The efficiency of the methodology is illustrated on a real-life motion estimation application.
{"title":"Formalized methodology for data reuse exploration in hierarchical memory mappings","authors":"J. Diguet, S. Wuytack, F. Catthoor, H. Man","doi":"10.1145/263272.263278","DOIUrl":"https://doi.org/10.1145/263272.263278","url":null,"abstract":"Efficient use of an optimized memory hierarchy to exploit temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in data dominated applications. In the past, this task has been identified as crucial in a complete low-power memory management methodology. But effective formalized techniques to deal with this specific task have not been addressed yet. In this paper the design freedom available for the basic problem is explored in-depth and the outline of a systematic solution methodology is proposed. The efficiency of the methodology is illustrated on a real-life motion estimation application.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133498945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Benini, G. Micheli, E. Macii, M. Poncino, S. Quer
This paper describes a new approach to low-power bus encoding, called "The Beach Solution", which is thought for power optimization of digital systems containing an embedded processor or a microcontroller executing a special-purpose software routine. The main difference between the proposed method and existing bus encoding techniques is that it is strongly application-dependent, in the sense that it is Based on the analysis of the execution stream of a given program. This allows an accurate computation of the correlations that may exist between blocks of bits in consecutive patterns, and that can be successfully exploited to determine an encoding which minimizes the bus transition activity. Experimental results, obtained on a set of special-purpose applications, are very promising; reductions of the bus activity up to 64.8% (41.9% on average) have been achieved over the original address streams.
{"title":"System-level power optimization of special purpose applications: the Beach Solution","authors":"L. Benini, G. Micheli, E. Macii, M. Poncino, S. Quer","doi":"10.1145/263272.263277","DOIUrl":"https://doi.org/10.1145/263272.263277","url":null,"abstract":"This paper describes a new approach to low-power bus encoding, called \"The Beach Solution\", which is thought for power optimization of digital systems containing an embedded processor or a microcontroller executing a special-purpose software routine. The main difference between the proposed method and existing bus encoding techniques is that it is strongly application-dependent, in the sense that it is Based on the analysis of the execution stream of a given program. This allows an accurate computation of the correlations that may exist between blocks of bits in consecutive patterns, and that can be successfully exploited to determine an encoding which minimizes the bus transition activity. Experimental results, obtained on a set of special-purpose applications, are very promising; reductions of the bus activity up to 64.8% (41.9% on average) have been achieved over the original address streams.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134279535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Quasi-Static Energy Recovery Logic family (QSERL) using two complementary sinusoidal supply clocks is proposed in this paper. A high-efficiency clock generation circuitry which generates two complementary sinusoidal clocks required by QSERL is also presented. The clock circuitry locks both frequency and phase of clock signals, which makes it possible to integrate adiabatic module into a VLSI system. We have designed an 8/spl times/8 carry-save multiplier using QSERL logic and two phase sinusoidal clocks. SPICE simulation shows that the QSERL multiplier can save 37% of energy over static CMOS multiplier at 100 MHz.
{"title":"Quasi-static energy recovery logic and supply-clock generation circuits","authors":"Y. Ye, K. Roy, G. Stamoulis","doi":"10.1145/263272.263293","DOIUrl":"https://doi.org/10.1145/263272.263293","url":null,"abstract":"A Quasi-Static Energy Recovery Logic family (QSERL) using two complementary sinusoidal supply clocks is proposed in this paper. A high-efficiency clock generation circuitry which generates two complementary sinusoidal clocks required by QSERL is also presented. The clock circuitry locks both frequency and phase of clock signals, which makes it possible to integrate adiabatic module into a VLSI system. We have designed an 8/spl times/8 carry-save multiplier using QSERL logic and two phase sinusoidal clocks. SPICE simulation shows that the QSERL multiplier can save 37% of energy over static CMOS multiplier at 100 MHz.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133405479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration level of VLSI system increases as the technology improves. The power dissipation of the data processing unit in the digital signal processing systems must be kept as low as possible. Thus, we newly designed a 4-2 adder and a booth selector by using transmission gate circuits to accomplish low power consumption without performance sacrifice. The proposed 4-2 adder consumes lower power than the conventional 4-2 adder by 16% and the proposed booth selector consumes less power than the conventional booth selector by 60%. We designed a 32-bit MAC unit with the proposed 4-2 adder and the booth selector. The power dissipation of the 32-bit MAC unit is 124 mW at 100 MHz with 2 V power supply, with the area of 1.3 mm/spl times/2.4 mm.
{"title":"A new 4-2 adder and booth selector for low power MAC unit","authors":"Bum-Sik Kim, Daewoong Chung, L. Kim","doi":"10.1109/LPE.1997.621250","DOIUrl":"https://doi.org/10.1109/LPE.1997.621250","url":null,"abstract":"The integration level of VLSI system increases as the technology improves. The power dissipation of the data processing unit in the digital signal processing systems must be kept as low as possible. Thus, we newly designed a 4-2 adder and a booth selector by using transmission gate circuits to accomplish low power consumption without performance sacrifice. The proposed 4-2 adder consumes lower power than the conventional 4-2 adder by 16% and the proposed booth selector consumes less power than the conventional booth selector by 60%. We designed a 32-bit MAC unit with the proposed 4-2 adder and the booth selector. The power dissipation of the 32-bit MAC unit is 124 mW at 100 MHz with 2 V power supply, with the area of 1.3 mm/spl times/2.4 mm.","PeriodicalId":334688,"journal":{"name":"Proceedings of 1997 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130604624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}