Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362420
B. Sethuraman, J. Khan, R. Vemuri
We present a battery-efficient task execution methodology for portable reconfigurable computing (RC) platforms. We implement a given algorithm with varying power-performance levels: we call these implementations, Cores and each core is characterized in terms of its power and performance levels. Core change and/or the frequency change are the two mechanisms used to vary the battery consumption. We consider two cases for single task execution: one increasing the performance with a battery life constraint and the other prolonging the battery' life with a performance constraint. The execution time of each task is divided into equal time intervals, called slots. A simulated annealing based algorithm is used to find the best constraint-satisfying sequence of cores offline. Our results show a 50% increase in the total work done (case 1) and 61% increase in battery life (case 2), by using this methodology when compared to a system not using it. The combined effect of both cases is applied to a multiple task execution and the results are reported.
{"title":"Battery-efficient task execution on portable reconfigurable computing","authors":"B. Sethuraman, J. Khan, R. Vemuri","doi":"10.1109/SOCC.2004.1362420","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362420","url":null,"abstract":"We present a battery-efficient task execution methodology for portable reconfigurable computing (RC) platforms. We implement a given algorithm with varying power-performance levels: we call these implementations, Cores and each core is characterized in terms of its power and performance levels. Core change and/or the frequency change are the two mechanisms used to vary the battery consumption. We consider two cases for single task execution: one increasing the performance with a battery life constraint and the other prolonging the battery' life with a performance constraint. The execution time of each task is divided into equal time intervals, called slots. A simulated annealing based algorithm is used to find the best constraint-satisfying sequence of cores offline. Our results show a 50% increase in the total work done (case 1) and 61% increase in battery life (case 2), by using this methodology when compared to a system not using it. The combined effect of both cases is applied to a multiple task execution and the results are reported.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132734637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362476
Y. Tsai, Ananth Hegde, N. Vijaykrishnan, M. J. Irwin, T. Theocharides
Leakage power is projected to be one of the major challenges in future technology generations. The temperature profile, process variation, and transistor count all have strong impact on the leakage power distribution of a processor. We have built a simulator to estimate the dynamic/leakage power for a VLIW architecture considering dynamic temperature feedback and process variation. The framework is based on architecture similar to the Intel Itanium IA64 and is extended to simulate its power when implemented in 65nm technology. Our experimental results show that leakage power will become more than 50% of the power budget in 65nm technology. Moreover, without including the process variation, the total leakage power will be underestimated by as much as 30%.
{"title":"ChipPower: an architecture-level leakage simulator","authors":"Y. Tsai, Ananth Hegde, N. Vijaykrishnan, M. J. Irwin, T. Theocharides","doi":"10.1109/SOCC.2004.1362476","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362476","url":null,"abstract":"Leakage power is projected to be one of the major challenges in future technology generations. The temperature profile, process variation, and transistor count all have strong impact on the leakage power distribution of a processor. We have built a simulator to estimate the dynamic/leakage power for a VLIW architecture considering dynamic temperature feedback and process variation. The framework is based on architecture similar to the Intel Itanium IA64 and is extended to simulate its power when implemented in 65nm technology. Our experimental results show that leakage power will become more than 50% of the power budget in 65nm technology. Moreover, without including the process variation, the total leakage power will be underestimated by as much as 30%.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133544209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362394
M. Hansson, A. Alvandpour
We describe a low clock load conditional transmission-gate flip-flop aimed at reducing on-chip clock power consumption. It utilizes a scalable and simple leakage compensation technique, which injects additional leakage current in opposite direction, thus compensating for the worst-case leakage. During any low frequency operation, the flip-flop is configured as a static flip-flop with increased functional robustness. Post-layout simulations show a 30 % clock power reduction compared to a conventional static flip-flop.
{"title":"A low clock load conditional flip-flop","authors":"M. Hansson, A. Alvandpour","doi":"10.1109/SOCC.2004.1362394","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362394","url":null,"abstract":"We describe a low clock load conditional transmission-gate flip-flop aimed at reducing on-chip clock power consumption. It utilizes a scalable and simple leakage compensation technique, which injects additional leakage current in opposite direction, thus compensating for the worst-case leakage. During any low frequency operation, the flip-flop is configured as a static flip-flop with increased functional robustness. Post-layout simulations show a 30 % clock power reduction compared to a conventional static flip-flop.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362354
R. Secareanu, Qiang Li, S. Bharatan, C. Kyono, R. Thoma, Mel Miller, O. Hartin
Signal integrity is one of the major challenges in system-on-a-chip (SOC) integration. The complexity of the associated problems increases when RF circuits are integrated together with other circuits, such as digital or large signal circuits. In this material, an analysis of possible interactions between an on-chip inductor and various types of circuit blocks is described. Conclusions on the feasibility of an inductor-circuit system are outlined.
{"title":"Signal integrity implications of inductor-to-circuit proximity","authors":"R. Secareanu, Qiang Li, S. Bharatan, C. Kyono, R. Thoma, Mel Miller, O. Hartin","doi":"10.1109/SOCC.2004.1362354","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362354","url":null,"abstract":"Signal integrity is one of the major challenges in system-on-a-chip (SOC) integration. The complexity of the associated problems increases when RF circuits are integrated together with other circuits, such as digital or large signal circuits. In this material, an analysis of possible interactions between an on-chip inductor and various types of circuit blocks is described. Conclusions on the feasibility of an inductor-circuit system are outlined.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125112079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362466
Chao Wang, Y. Tong, Yu Shao
The critical-band analysis plays a very important role in the frond-end feature extraction for speech recognition. In this paper, a generic low-power and low-voltage VLSI design of a critical-band transform (CBT) processor is proposed. Design and analysis of a 21-band Munich Bark scale CBT processor showed that it can achieve significant power-efficiency by the reduction in computational complexity, implementation of pipeline and parallel processing, and application of supply voltage scaling technique. Simulation results showed that it can complete a CBT analysis (including I/O process) in 4.99ms on one 160-point segment of speech at a very low system clock frequency of 234 kHz. This would support a CBT analysis for 50% overlap 10 ms segments of speech at a sampling frequency of 16 kHz. The power dissipation is 1413.6 /spl mu/W/MHz and 66.7 /spl mu/W/MHz for supply voltage of 3.3 V and 1.1 V, respectively. It contains 206273 transistors and occupies 2.69 mm/sup 2/ for a 0.35 /spl mu/m CMOS technology.
{"title":"VLSI design and analysis of a critical-band processor for speech recognition","authors":"Chao Wang, Y. Tong, Yu Shao","doi":"10.1109/SOCC.2004.1362466","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362466","url":null,"abstract":"The critical-band analysis plays a very important role in the frond-end feature extraction for speech recognition. In this paper, a generic low-power and low-voltage VLSI design of a critical-band transform (CBT) processor is proposed. Design and analysis of a 21-band Munich Bark scale CBT processor showed that it can achieve significant power-efficiency by the reduction in computational complexity, implementation of pipeline and parallel processing, and application of supply voltage scaling technique. Simulation results showed that it can complete a CBT analysis (including I/O process) in 4.99ms on one 160-point segment of speech at a very low system clock frequency of 234 kHz. This would support a CBT analysis for 50% overlap 10 ms segments of speech at a sampling frequency of 16 kHz. The power dissipation is 1413.6 /spl mu/W/MHz and 66.7 /spl mu/W/MHz for supply voltage of 3.3 V and 1.1 V, respectively. It contains 206273 transistors and occupies 2.69 mm/sup 2/ for a 0.35 /spl mu/m CMOS technology.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127168023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362385
R. Oberhuber, C. Hechtl, K. Schimpf, B. Staufer
New design and layout methods were developed to overcome yield loss from dislocation defects, which are omnipresent in SiGe technologies as a penalty for the higher speed compared to pure Si. This paper presents the failure analysis on bandgap malfunctions in a RF-SiGe transceiver device which is currently ramped to production. The resulting yieldloss was significant. In-circuit fault analysis identified collector-emitter leakage of the SiGe-HBT's as the electrical root cause. All existing failure patterns were explained with SPICE circuit simulation. Isolation and characterization of the bipolar transistor and high resolution TEM showed dislocations originating from the high strain at the STI-moat edge. Design and layout improvements were applied to reduce the sensitivity of the bandgap reference circuit to the defects: a 5V HBT with superior yield performance was introduced, and variations in the sizing of the matched npn were investigated. With these improvements, the bandgap fails were drastically reduced.
{"title":"Bandgap yield loss due to dislocations on RFSiGe transceiver ICs: failure analysis, design","authors":"R. Oberhuber, C. Hechtl, K. Schimpf, B. Staufer","doi":"10.1109/SOCC.2004.1362385","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362385","url":null,"abstract":"New design and layout methods were developed to overcome yield loss from dislocation defects, which are omnipresent in SiGe technologies as a penalty for the higher speed compared to pure Si. This paper presents the failure analysis on bandgap malfunctions in a RF-SiGe transceiver device which is currently ramped to production. The resulting yieldloss was significant. In-circuit fault analysis identified collector-emitter leakage of the SiGe-HBT's as the electrical root cause. All existing failure patterns were explained with SPICE circuit simulation. Isolation and characterization of the bipolar transistor and high resolution TEM showed dislocations originating from the high strain at the STI-moat edge. Design and layout improvements were applied to reduce the sensitivity of the bandgap reference circuit to the defects: a 5V HBT with superior yield performance was introduced, and variations in the sizing of the matched npn were investigated. With these improvements, the bandgap fails were drastically reduced.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129452077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362378
Håkan Bengtson, C. Svensson
This paper describes a scalable and robust differential rail-to-rail delay cell. The delay cell is fabricated in a 3.3 V 0.35 /spl mu/m CMOS process. The delay cell shows a wide-range operation and low power supply sensitivity. The delay range is 0.31 ps to 21.8 ns. For 0.5 ns delay, when the clock period is 500 MHz, the power supply sensitivity is 0.033 ps/mV. The delay cell is used in a DLL for clock generation of a four times interleaved 2 Gb/s decision feedback equalizer.
{"title":"A scalable and robust rail-to-rail delay cell for DLLs","authors":"Håkan Bengtson, C. Svensson","doi":"10.1109/SOCC.2004.1362378","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362378","url":null,"abstract":"This paper describes a scalable and robust differential rail-to-rail delay cell. The delay cell is fabricated in a 3.3 V 0.35 /spl mu/m CMOS process. The delay cell shows a wide-range operation and low power supply sensitivity. The delay range is 0.31 ps to 21.8 ns. For 0.5 ns delay, when the clock period is 500 MHz, the power supply sensitivity is 0.033 ps/mV. The delay cell is used in a DLL for clock generation of a four times interleaved 2 Gb/s decision feedback equalizer.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"172 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120865655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362382
Nam-Hoon Kim, P. Beerel, Ralph Peng
In many data-intensive applications, an area and power efficient memory architecture is significant for the overall system in terms of area and power consumption. We present a new memory allocation and assignment method using memory partitioning to customize the memory architecture. The method consists of two main steps. The useful exploration region to trade off area with energy consumption is first extracted. Secondly, based on this exploration space, an iterative multi-way partitioning is performed to optimize area and power. Memory partitioning is an effective memory optimization strategy that seeks to customize the memory architecture for a given application. Our experimental results on several examples demonstrate that our method can find the promising memory architecture.
{"title":"A memory allocation and assignment method using multiway partitioning","authors":"Nam-Hoon Kim, P. Beerel, Ralph Peng","doi":"10.1109/SOCC.2004.1362382","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362382","url":null,"abstract":"In many data-intensive applications, an area and power efficient memory architecture is significant for the overall system in terms of area and power consumption. We present a new memory allocation and assignment method using memory partitioning to customize the memory architecture. The method consists of two main steps. The useful exploration region to trade off area with energy consumption is first extracted. Secondly, based on this exploration space, an iterative multi-way partitioning is performed to optimize area and power. Memory partitioning is an effective memory optimization strategy that seeks to customize the memory architecture for a given application. Our experimental results on several examples demonstrate that our method can find the promising memory architecture.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131652198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362391
A. Mukherjee
Narrowing time-to-market windows are driving the design community toward FPGAs. Whereas quick prototype implementations are possible using FPGAs, circuit delays have always been a major concern. Moreover, achieving high performance in FPGAs with densely packed routing resources is difficult because of crosstalk noise. In this paper we describe a very high performance FPGA, and show a simple and practical technique of almost reducing crosstalk noise by using a two-phase nonoverlapping complimentary clocking scheme. An efficient integer linear programming formulation has been proposed to find an optimum solution to a constrained problem, and we have studied the effects and costs of applying our idea to different architectures. Experiments with MCNC benchmark circuits in different architectures of our FPGA show that, on average, we could reduce crosstalk induced delay increases to less than 4% of the clock period. With a minimal increase of 3% in area due to this optimization, our results seem very promising.
{"title":"Reducing crosstalk noise in high speed FPGAs","authors":"A. Mukherjee","doi":"10.1109/SOCC.2004.1362391","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362391","url":null,"abstract":"Narrowing time-to-market windows are driving the design community toward FPGAs. Whereas quick prototype implementations are possible using FPGAs, circuit delays have always been a major concern. Moreover, achieving high performance in FPGAs with densely packed routing resources is difficult because of crosstalk noise. In this paper we describe a very high performance FPGA, and show a simple and practical technique of almost reducing crosstalk noise by using a two-phase nonoverlapping complimentary clocking scheme. An efficient integer linear programming formulation has been proposed to find an optimum solution to a constrained problem, and we have studied the effects and costs of applying our idea to different architectures. Experiments with MCNC benchmark circuits in different architectures of our FPGA show that, on average, we could reduce crosstalk induced delay increases to less than 4% of the clock period. With a minimal increase of 3% in area due to this optimization, our results seem very promising.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126809348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-30DOI: 10.1109/SOCC.2004.1362399
D. Velenis, M. Papaefthymiou, E. Friedman
The design of clock distribution networks in synchronous digital systems presents enormous challenges. Controlling the clock signal delay in the presence of various noise sources, process parameter variations, and environmental effects represents a fundamental problem in the design of high speed synchronous circuits. Two different approaches for enhancing the layout of the clock tree in order to reduce the uncertainty of the clock signal are presented in this paper. The application of these techniques on a set of benchmark circuits demonstrates interesting tradeoffs among the aggregate clock buffer size, the total wire length of the clock tree, and the power dissipation.
{"title":"Clock tree layout design for reduced delay uncertainty","authors":"D. Velenis, M. Papaefthymiou, E. Friedman","doi":"10.1109/SOCC.2004.1362399","DOIUrl":"https://doi.org/10.1109/SOCC.2004.1362399","url":null,"abstract":"The design of clock distribution networks in synchronous digital systems presents enormous challenges. Controlling the clock signal delay in the presence of various noise sources, process parameter variations, and environmental effects represents a fundamental problem in the design of high speed synchronous circuits. Two different approaches for enhancing the layout of the clock tree in order to reduce the uncertainty of the clock signal are presented in this paper. The application of these techniques on a set of benchmark circuits demonstrates interesting tradeoffs among the aggregate clock buffer size, the total wire length of the clock tree, and the power dissipation.","PeriodicalId":184894,"journal":{"name":"IEEE International SOC Conference, 2004. Proceedings.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126879999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}