Pub Date : 2012-11-01DOI: 10.1049/iet-cds.2012.0012
B. P. Das, H. Onodera
Today's multi-million digital integrated circuit design highly depends on the quality of the standard cell library. In this study, an all-digital reconfigurable-array-based test structure is presented to test the quality (i.e. functionality and performance) of all types of logic gates in the standard cell library using the reconfigurable array of gate delay measurement cell. The gate delay is estimated using the least squares method with measured reconfigurable ring oscillator's (RO) period/frequency. As the least squares method averages out the random noise in the measured RO period, measured gate delay is estimated accurately. The reconfigurable-array structure can easily isolate a faulty standard cell from a non-faulty standard cell. The test structure is area efficient with a saving of 1.6× and 2× area compared with the normal RO-based delay measurement in 180 nm and 65 nm technology node, respectively. A subset of standard cells is tested using this reconfigurable-array structure. A test chip has been fabricated in an industrial 180 nm technology node to study the feasibility of the approach. The measured results from 20 chips are reported to show the amount of within-die and die-to-die variation.
{"title":"Area-efficient reconfigurable-array-based oscillator for standard cell characterisation","authors":"B. P. Das, H. Onodera","doi":"10.1049/iet-cds.2012.0012","DOIUrl":"https://doi.org/10.1049/iet-cds.2012.0012","url":null,"abstract":"Today's multi-million digital integrated circuit design highly depends on the quality of the standard cell library. In this study, an all-digital reconfigurable-array-based test structure is presented to test the quality (i.e. functionality and performance) of all types of logic gates in the standard cell library using the reconfigurable array of gate delay measurement cell. The gate delay is estimated using the least squares method with measured reconfigurable ring oscillator's (RO) period/frequency. As the least squares method averages out the random noise in the measured RO period, measured gate delay is estimated accurately. The reconfigurable-array structure can easily isolate a faulty standard cell from a non-faulty standard cell. The test structure is area efficient with a saving of 1.6× and 2× area compared with the normal RO-based delay measurement in 180 nm and 65 nm technology node, respectively. A subset of standard cells is tested using this reconfigurable-array structure. A test chip has been fabricated in an industrial 180 nm technology node to study the feasibility of the approach. The measured results from 20 chips are reported to show the amount of within-die and die-to-die variation.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114642923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1049/iet-cds.2012.0029
Kang-Yeob Park, W. Oh, Y. Lee, W. Choi
We report a fully integrated serial-link receiver with optical interface fabricated with a 0.18 µm complementary metal oxide semiconductor technology for long-haul display interconnects. The receiver includes a trans-impedance amplifier, a limiting amplifier, a clock and data recovery circuit, 1:64 de-multiplexer and a built-in error checker. The receiver produces 64-bit wide electrical signals from photodetector output signals produced by 5.28, 5.6 or 6.25 Gb/s optical signals delivered through up to 700-m multi-mode fibre. It can support serialised data for Ultra eXtended Graphics Array (UXGA), 1080 p and Wide Ultra eXtended Graphics Array (WUXGA). The receiver core occupies 0.59 mm2 with 42.4 mW power dissipation at 6.25 Gb/s bit rate from a 1.8 V supply.
{"title":"Fully integrated serial-link receiver with optical interface for long-haul display interconnects","authors":"Kang-Yeob Park, W. Oh, Y. Lee, W. Choi","doi":"10.1049/iet-cds.2012.0029","DOIUrl":"https://doi.org/10.1049/iet-cds.2012.0029","url":null,"abstract":"We report a fully integrated serial-link receiver with optical interface fabricated with a 0.18 µm complementary metal oxide semiconductor technology for long-haul display interconnects. The receiver includes a trans-impedance amplifier, a limiting amplifier, a clock and data recovery circuit, 1:64 de-multiplexer and a built-in error checker. The receiver produces 64-bit wide electrical signals from photodetector output signals produced by 5.28, 5.6 or 6.25 Gb/s optical signals delivered through up to 700-m multi-mode fibre. It can support serialised data for Ultra eXtended Graphics Array (UXGA), 1080 p and Wide Ultra eXtended Graphics Array (WUXGA). The receiver core occupies 0.59 mm2 with 42.4 mW power dissipation at 6.25 Gb/s bit rate from a 1.8 V supply.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126266699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1049/iet-cds.2011.0367
Najoua Chalbi, Mohamed Boubaker, M. Hedi
In this study, the authors present field-programmable gate array dynamic power models for basic operators at the architectural level. Other models are developed for operator groups arranged in parallel or in series in the architecture. The operator's characterisation models depend on the frequency variation, the activity rate and precision in the presence of autocorrelation, taking into account the interconnections between operators. The authors have validated their approach by the Euclidean distance and finite-impulse response filter applications while using the operator models in a first step and the IPs models in a second step. The estimation results show that the estimate is even closer to the real value when IPs mathematical models are used, and the experimental ones show a higher average accuracy and the maximum average error reached is equal to 3.7%. The power models are verified by an on-board measurement based on a Virtex2Pro field-programmable gate array real environment and is ready for integration with high-level power optimisation techniques.
{"title":"Power estimation model based on grouping components in field-programmable gate array circuit","authors":"Najoua Chalbi, Mohamed Boubaker, M. Hedi","doi":"10.1049/iet-cds.2011.0367","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0367","url":null,"abstract":"In this study, the authors present field-programmable gate array dynamic power models for basic operators at the architectural level. Other models are developed for operator groups arranged in parallel or in series in the architecture. The operator's characterisation models depend on the frequency variation, the activity rate and precision in the presence of autocorrelation, taking into account the interconnections between operators. The authors have validated their approach by the Euclidean distance and finite-impulse response filter applications while using the operator models in a first step and the IPs models in a second step. The estimation results show that the estimate is even closer to the real value when IPs mathematical models are used, and the experimental ones show a higher average accuracy and the maximum average error reached is equal to 3.7%. The power models are verified by an on-board measurement based on a Virtex2Pro field-programmable gate array real environment and is ready for integration with high-level power optimisation techniques.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131772484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-02DOI: 10.1049/iet-cds.2011.0253
S. Rathod, A. Saxena, S. Dasgupta
In this study, the authors evaluate different schemes of address decoders based on bulk, single gate (SG) silicon-on-insulator (SOI) and double gate (DG) FinFET technology. Schemes differ in terms of back gate connections, and swing on the enable and address lines. The analysis for delay, power dissipation and critical charge has been carried out. Radiation induced single event transients and multiple bit upsets in address decoder have been studied. For radiation hardened applications, tied gate configuration has been found to be good choice over bulk, SG-SOI and independent gate configurations. The effect of process parameter variations on different schemes has been studied. HSPICE simulations have been performed with 45 nm bulk, SG-SOI and DG-FinFET predictive technology models.
{"title":"Analysis of double-gate FinFET-based address decoder for radiation-induced single-event-transients","authors":"S. Rathod, A. Saxena, S. Dasgupta","doi":"10.1049/iet-cds.2011.0253","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0253","url":null,"abstract":"In this study, the authors evaluate different schemes of address decoders based on bulk, single gate (SG) silicon-on-insulator (SOI) and double gate (DG) FinFET technology. Schemes differ in terms of back gate connections, and swing on the enable and address lines. The analysis for delay, power dissipation and critical charge has been carried out. Radiation induced single event transients and multiple bit upsets in address decoder have been studied. For radiation hardened applications, tied gate configuration has been found to be good choice over bulk, SG-SOI and independent gate configurations. The effect of process parameter variations on different schemes has been studied. HSPICE simulations have been performed with 45 nm bulk, SG-SOI and DG-FinFET predictive technology models.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114243214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-02DOI: 10.1049/iet-cds.2012.0090
H. Makino, S. Nakata, Hirotsugu Suzuki, S. Mutoh, M. Miyama, T. Yoshimura, S. Iwade, Y. Matsuda
This study describes a method to easily predict the write yield of a static random access memory (SRAM) memory cell. The differential coefficient of the combined word line margin (CWLM) for the threshold voltage ( V th ) is analysed using the simple Schockley's transistor model. The analysis shows the good linearity comes from keeping the access transistor operating in the saturation mode for a wide range of V th 's. The Monte Carlo simulation demonstrates that the CWLM obeys the normal distribution. The mean and the variance of the CWLM are almost constant for sample numbers ranging from 100 to 100'000. The estimated write failure probability are almost uniform within a factor of 1.7 for the number of samples more than 300, which allows us to evaluate SRAM with a small number of measurements. The predicted distribution using the differential coefficient calculated by the SPICE simulation also matches the Monte Carlo results. The estimated write failure probability agrees with the Monte Carlo results within a factor of 2.0, which is acceptable for SRAM redundancy circuit design. Finally, the write yield is related to the error rate. Thus, the write yield is easily predicted from a small number of measured samples or the differential coefficients of the CWLM on the V th 's calculated by the SPICE simulation.
{"title":"Utilising the normal distribution of the write noise margin to easily predict the SRAM write yield","authors":"H. Makino, S. Nakata, Hirotsugu Suzuki, S. Mutoh, M. Miyama, T. Yoshimura, S. Iwade, Y. Matsuda","doi":"10.1049/iet-cds.2012.0090","DOIUrl":"https://doi.org/10.1049/iet-cds.2012.0090","url":null,"abstract":"This study describes a method to easily predict the write yield of a static random access memory (SRAM) memory cell. The differential coefficient of the combined word line margin (CWLM) for the threshold voltage ( V th ) is analysed using the simple Schockley's transistor model. The analysis shows the good linearity comes from keeping the access transistor operating in the saturation mode for a wide range of V th 's. The Monte Carlo simulation demonstrates that the CWLM obeys the normal distribution. The mean and the variance of the CWLM are almost constant for sample numbers ranging from 100 to 100'000. The estimated write failure probability are almost uniform within a factor of 1.7 for the number of samples more than 300, which allows us to evaluate SRAM with a small number of measurements. The predicted distribution using the differential coefficient calculated by the SPICE simulation also matches the Monte Carlo results. The estimated write failure probability agrees with the Monte Carlo results within a factor of 2.0, which is acceptable for SRAM redundancy circuit design. Finally, the write yield is related to the error rate. Thus, the write yield is easily predicted from a small number of measured samples or the differential coefficients of the CWLM on the V th 's calculated by the SPICE simulation.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-02DOI: 10.1049/iet-cds.2011.0283
M. Gholipour, N. Masoumi
Multi-walled carbon nanotubes (MWCNTs) have attracted much attention as very large scale integration (VLSI) chip interconnects, because of their high-current densities and excellent thermal and mechanical properties. This study investigates different aspects of the use of MWCNTs as chip routing wires to seek modern technologies for high-performance interconnects. Mathematical analyses, and simulations were made for MWCNT and Cu at global, intermediate and local interconnect levels. The authors propose a semi-analytical delay estimation model along with an equivalent RC model for MWCNT global interconnects. The results obtained from these models show good conformance with the simulation results. The proposed compact semi-analytical model can be used to perform fast analysis of MWCNT global interconnects, including delay, buffer insertion and crosstalk. The authors exploited their model to investigate the impact of buffer insertion on MWCNT interconnect delay. The optimal number of required buffers is estimated, as it minimises the MWCNT propagation delay. Analytical and simulation results show that the MWCNT interconnects require lower number of buffers compared to Cu wires.
{"title":"Efficient inclusive analytical model for delay estimation of multi-walled carbon nanotube interconnects","authors":"M. Gholipour, N. Masoumi","doi":"10.1049/iet-cds.2011.0283","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0283","url":null,"abstract":"Multi-walled carbon nanotubes (MWCNTs) have attracted much attention as very large scale integration (VLSI) chip interconnects, because of their high-current densities and excellent thermal and mechanical properties. This study investigates different aspects of the use of MWCNTs as chip routing wires to seek modern technologies for high-performance interconnects. Mathematical analyses, and simulations were made for MWCNT and Cu at global, intermediate and local interconnect levels. The authors propose a semi-analytical delay estimation model along with an equivalent RC model for MWCNT global interconnects. The results obtained from these models show good conformance with the simulation results. The proposed compact semi-analytical model can be used to perform fast analysis of MWCNT global interconnects, including delay, buffer insertion and crosstalk. The authors exploited their model to investigate the impact of buffer insertion on MWCNT interconnect delay. The optimal number of required buffers is estimated, as it minimises the MWCNT propagation delay. Analytical and simulation results show that the MWCNT interconnects require lower number of buffers compared to Cu wires.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129391693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-02DOI: 10.1049/iet-cds.2011.0177
Y. Chou, Chun-Chen Lin, Hsin-Liang Chen, Jen-Shiun Chiang
This study addresses a new digital calibration filter design for cascaded ΣΔ modulators with finite amplifier gain. A recent approach based on the H-infinity loop shaping method to this problem has the merit of obviating the use of an estimation or adaptive digital correction scheme, which thus reduces the complexity of circuit implementation. For the approach to be successful, it is critical to find an appropriate weighting function so as to make the gain responses of the uncertain noise transfer function (NTF) in a proper shape for improving signal-to-noise ratio (SNR). However, the search of such a weighting function is difficult in general. Moreover, the introduced weighting function increases filter order and hence circuit complexity. To circumvent this difficulty and the inherited drawbacks, this study presents a new noise shaping method for the problem. Considering that it is hard to decide the optimal shape of the uncertain NTF a priori, the authors propose a dual-band design to achieve the shape adjustment task. In particular, the range of lower frequency band is determined by SNR performance evaluation rather than being arbitrarily given a priori. This step is crucial and increases the chance of finding a better filter.
{"title":"Heuristic finite-impulse-response filter design for cascaded ΣΔ modulators with finite amplifier gain","authors":"Y. Chou, Chun-Chen Lin, Hsin-Liang Chen, Jen-Shiun Chiang","doi":"10.1049/iet-cds.2011.0177","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0177","url":null,"abstract":"This study addresses a new digital calibration filter design for cascaded ΣΔ modulators with finite amplifier gain. A recent approach based on the H-infinity loop shaping method to this problem has the merit of obviating the use of an estimation or adaptive digital correction scheme, which thus reduces the complexity of circuit implementation. For the approach to be successful, it is critical to find an appropriate weighting function so as to make the gain responses of the uncertain noise transfer function (NTF) in a proper shape for improving signal-to-noise ratio (SNR). However, the search of such a weighting function is difficult in general. Moreover, the introduced weighting function increases filter order and hence circuit complexity. To circumvent this difficulty and the inherited drawbacks, this study presents a new noise shaping method for the problem. Considering that it is hard to decide the optimal shape of the uncertain NTF a priori, the authors propose a dual-band design to achieve the shape adjustment task. In particular, the range of lower frequency band is determined by SNR performance evaluation rather than being arbitrarily given a priori. This step is crucial and increases the chance of finding a better filter.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121768413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-02DOI: 10.1049/iet-cds.2011.0254
Ling-feng Shi, Y. J. Chang, Hui-sen He, H. Nie, Y. Zhao
A rectifier diode temperature compensation circuit is presented for primary-side controlled flyback converter. By compensating the variation of secondary-side rectifier diode forward voltage with temperature, the error rate of output voltage in flyback converter will be effectively improved at high temperature. The design of the circuit is based on the negative temperature characteristics of the base-emitter voltage VBE of bipolar transistors. Besides, the circuit can also provide overtemperature protection. Results of simulation based on 0.5 mm bipolar complementary metal oxide semi-conductor process show that the compensation voltage is 0.1 V at 125°C and 0 V at 25°C. The maximum output voltage error rate of flyback converter with compensation is from 3.8 to 0.6% under the temperature between 25 and 125°C. The thermal shutdown threshold is 140°C, and the over-temperature protection hysteresis threshold is 110°C.
{"title":"Design of rectifier diode temperature compensation circuit in flyback converter","authors":"Ling-feng Shi, Y. J. Chang, Hui-sen He, H. Nie, Y. Zhao","doi":"10.1049/iet-cds.2011.0254","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0254","url":null,"abstract":"A rectifier diode temperature compensation circuit is presented for primary-side controlled flyback converter. By compensating the variation of secondary-side rectifier diode forward voltage with temperature, the error rate of output voltage in flyback converter will be effectively improved at high temperature. The design of the circuit is based on the negative temperature characteristics of the base-emitter voltage VBE of bipolar transistors. Besides, the circuit can also provide overtemperature protection. Results of simulation based on 0.5 mm bipolar complementary metal oxide semi-conductor process show that the compensation voltage is 0.1 V at 125°C and 0 V at 25°C. The maximum output voltage error rate of flyback converter with compensation is from 3.8 to 0.6% under the temperature between 25 and 125°C. The thermal shutdown threshold is 140°C, and the over-temperature protection hysteresis threshold is 110°C.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128194059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-01DOI: 10.1049/iet-cds.2011.0354
O. Khan, S. Kundu
Power consumption has become a major cause of concern spanning from data centres to handheld devices. Traditionally, improvement in power-performance efficiency of a modern superscalar processor came from technology scaling. However, that is no longer the case. Many of the current systems deploy coarse grain voltage and/or frequency scaling for power management. These techniques are attractive, but limited because of their granularity of control and effectiveness in nano-complementary metal-oxide-semiconductor (CMOS) technologies. This study proposes a novel architecture-level mechanism to exploit intra-thread variations for power-performance efficiency in modern superscalar processors. This class of processors implement several buffer/queue structures to support speculative out-of-order execution for performance enhancement. Applications may not need full capabilities of such structures at all times. A mechanism that collaboratively adapts a finite set of key hardware structures to the changing programme behaviour can allow the processor to operate with heterogeneous power-performance capabilities. This study presents a novel offline regression-based empirical model to estimate structure resizing for a selected set of structures. It is shown that using a few processor runtime events, the system can dynamically estimate structure resizing to exploit power-performance efficiency. Results show that using the proposed empirical model, a selective set of key structures can be resized at runtime to deliver on average 40% power-performance efficiency over a baseline design, with only 5% loss of performance.
{"title":"Empirical model for cooperative resizing of processor structures to exploit power-performance efficiency at runtime","authors":"O. Khan, S. Kundu","doi":"10.1049/iet-cds.2011.0354","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0354","url":null,"abstract":"Power consumption has become a major cause of concern spanning from data centres to handheld devices. Traditionally, improvement in power-performance efficiency of a modern superscalar processor came from technology scaling. However, that is no longer the case. Many of the current systems deploy coarse grain voltage and/or frequency scaling for power management. These techniques are attractive, but limited because of their granularity of control and effectiveness in nano-complementary metal-oxide-semiconductor (CMOS) technologies. This study proposes a novel architecture-level mechanism to exploit intra-thread variations for power-performance efficiency in modern superscalar processors. This class of processors implement several buffer/queue structures to support speculative out-of-order execution for performance enhancement. Applications may not need full capabilities of such structures at all times. A mechanism that collaboratively adapts a finite set of key hardware structures to the changing programme behaviour can allow the processor to operate with heterogeneous power-performance capabilities. This study presents a novel offline regression-based empirical model to estimate structure resizing for a selected set of structures. It is shown that using a few processor runtime events, the system can dynamically estimate structure resizing to exploit power-performance efficiency. Results show that using the proposed empirical model, a selective set of key structures can be resized at runtime to deliver on average 40% power-performance efficiency over a baseline design, with only 5% loss of performance.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129641063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, a power-efficient very large-scale integration (VLSI) implementation for the convolutional code decoder is presented. Based on the state transparent convolutional code definition, the receiving codewords are classified into non-erroneous and erroneous segments separately. Different from the conventional Viterbi decoder (VD), the authors use a low-complexity decoder, denoted as bit reverse decoder, to recover the non-erroneous segments using reverse operation with a little power consumption and present the segment-based VD to decode the erroneous codeword segments. Then, the clock-gating technique is employed to switch between segment-based VD and bit reverse decoder for power saving. To further reduce the power consumption, the authors group registers into several segments in the survivor memory unit of the segment-based VD and also apply clock gating to each segment individually. According to the number of consecutive erroneous codeword segments, the corresponding numbers of register segments in the survivor memory unit are enabled and other register segments are clock-gated to reduce the switching activities. Besides, our design determines the start and terminal states of the survivor path to obtain correct results of erroneous segments without bit-error rate degradation. As compared with other decoders, our design requires less power without decreasing the decoding performance.
{"title":"Power-efficient decoder implementation based on state transparent convolutional codes","authors":"Yeu-Horng Shiau, Hung-Yu Yang, Pei-Yin Chen, Shi-Gi Huang","doi":"10.1049/iet-cds.2011.0055","DOIUrl":"https://doi.org/10.1049/iet-cds.2011.0055","url":null,"abstract":"In this study, a power-efficient very large-scale integration (VLSI) implementation for the convolutional code decoder is presented. Based on the state transparent convolutional code definition, the receiving codewords are classified into non-erroneous and erroneous segments separately. Different from the conventional Viterbi decoder (VD), the authors use a low-complexity decoder, denoted as bit reverse decoder, to recover the non-erroneous segments using reverse operation with a little power consumption and present the segment-based VD to decode the erroneous codeword segments. Then, the clock-gating technique is employed to switch between segment-based VD and bit reverse decoder for power saving. To further reduce the power consumption, the authors group registers into several segments in the survivor memory unit of the segment-based VD and also apply clock gating to each segment individually. According to the number of consecutive erroneous codeword segments, the corresponding numbers of register segments in the survivor memory unit are enabled and other register segments are clock-gated to reduce the switching activities. Besides, our design determines the start and terminal states of the survivor path to obtain correct results of erroneous segments without bit-error rate degradation. As compared with other decoders, our design requires less power without decreasing the decoding performance.","PeriodicalId":120076,"journal":{"name":"IET Circuits Devices Syst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123440484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}