This paper presents a procedure to generate energy-efficient code for the Motorola DSP56K processor based on increasing the packing efficiency and minimizing the number of address instructions. The key features are a novel scheduling algorithm that reduces the dependencies between instructions, a register allocation algorithm that spills variables based on their packability, and an address code generation algorithm that minimizes the number of additional instructions. The size of the code generated by this procedure is on the average 45% (25%) smaller than that generated by Motorola's g56 K (SPAM).
本文提出了一种基于提高封装效率和减少地址指令数量的摩托罗拉DSP56K处理器的节能代码生成程序。其主要特点是一种新颖的调度算法,它减少了指令之间的依赖关系,一种寄存器分配算法,它根据变量的可包装性溢出变量,以及一种地址码生成算法,它最大限度地减少了额外指令的数量。这个过程生成的代码的大小平均比摩托罗拉的g56 K (SPAM)生成的代码小45%(25%)。
{"title":"Energy-efficient code generation for DSP56000 family","authors":"S. Udayanarayanan, C. Chakrabarti","doi":"10.1109/LPE.2000.155292","DOIUrl":"https://doi.org/10.1109/LPE.2000.155292","url":null,"abstract":"This paper presents a procedure to generate energy-efficient code for the Motorola DSP56K processor based on increasing the packing efficiency and minimizing the number of address instructions. The key features are a novel scheduling algorithm that reduces the dependencies between instructions, a register allocation algorithm that spills variables based on their packability, and an address code generation algorithm that minimizes the number of additional instructions. The size of the code generated by this procedure is on the average 45% (25%) smaller than that generated by Motorola's g56 K (SPAM).","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123992191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A low-voltage analog multiplier operating at 1.2 V is presented. The multiplier core consists of four MOS transistors operating in the saturation region. The circuit exploits the quadratic relation between current and voltage of the MOS transistor in saturation. The circuit was designed using standard 0.6 /spl mu/m CMOS technology. Simulation results indicate an IP3 of 4.9 dBm and a spur free dynamic range of 45 dB.
{"title":"A low-voltage CMOS multiplier for RF applications","authors":"Carl J. Debono, Franco Maloberti, J. Micallef","doi":"10.1145/344166.344598","DOIUrl":"https://doi.org/10.1145/344166.344598","url":null,"abstract":"A low-voltage analog multiplier operating at 1.2 V is presented. The multiplier core consists of four MOS transistors operating in the saturation region. The circuit exploits the quadratic relation between current and voltage of the MOS transistor in saturation. The circuit was designed using standard 0.6 /spl mu/m CMOS technology. Simulation results indicate an IP3 of 4.9 dBm and a spur free dynamic range of 45 dB.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131720038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adaptive encoding has been shown to be an effective approach to bus power minimization in situations where characterization of the input statistics is not available. In this paper, we propose a novel technique for adaptive bus encoding that, conversely from existing solutions, exploits spatial correlations in the input data being transmitted to increase the accuracy in the dynamic selection of the encoding function. We discuss the encoding algorithm and we describe an architecture for its implementation as bus interface. We present experimental data collected in a realistic simulation framework on a number of meaningful benchmarks, and we compare them to those obtained through the application of existing encoding schemes.
{"title":"A spatially-adaptive bus interface for low-switching communication","authors":"A. Acquaviva, R. Scarsi","doi":"10.1109/LPE.2000.155289","DOIUrl":"https://doi.org/10.1109/LPE.2000.155289","url":null,"abstract":"Adaptive encoding has been shown to be an effective approach to bus power minimization in situations where characterization of the input statistics is not available. In this paper, we propose a novel technique for adaptive bus encoding that, conversely from existing solutions, exploits spatial correlations in the input data being transmitted to increase the accuracy in the dynamic selection of the encoding function. We discuss the encoding algorithm and we describe an architecture for its implementation as bus interface. We present experimental data collected in a realistic simulation framework on a number of meaningful benchmarks, and we compare them to those obtained through the application of existing encoding schemes.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134397948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents Pyramid code, an optimal code for transmitting sequential addresses over a DRAM bus. Constructed by finding an Eulerian cycle on a complete graph, this code is optimal for conventional DRAM in the sense that it minimizes the switching activity on the time-multiplexed address bus from CPU to DRAM. Experimental results on a large number of testbenches with different characteristics (i.e. sequential vs. random memory access behaviors) are reported and demonstrate a reduction of bus activity by as much as 50%.
{"title":"Power-optimal encoding for DRAM address bus","authors":"W. Cheng, Massoud Pedram","doi":"10.1109/LPE.2000.155293","DOIUrl":"https://doi.org/10.1109/LPE.2000.155293","url":null,"abstract":"This paper presents Pyramid code, an optimal code for transmitting sequential addresses over a DRAM bus. Constructed by finding an Eulerian cycle on a complete graph, this code is optimal for conventional DRAM in the sense that it minimizes the switching activity on the time-multiplexed address bus from CPU to DRAM. Experimental results on a large number of testbenches with different characteristics (i.e. sequential vs. random memory access behaviors) are reported and demonstrate a reduction of bus activity by as much as 50%.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126676989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Masselos, S. Theoharis, P. Merakos, T. Stouraitis, C. Goutis
Novel techniques for the power efficient synthesis of sum-of-product computations are presented. Simple and efficient heuristics for scheduling and assignment are described. Different partly static cost functions are proposed to drive the synthesis tasks. The proposed cost functions target the power consumption either in the buses connecting the functional units with the storage elements or inside the functional units. The partly static nature of the proposed cost functions reduces the time of the synthesis procedure. Experimental results from different relevant digital signal processing algorithmic kernels prove that the proposed synthesis techniques lead to significant power savings.
{"title":"Low power synthesis of sum-of-products computation","authors":"K. Masselos, S. Theoharis, P. Merakos, T. Stouraitis, C. Goutis","doi":"10.1109/LPE.2000.155288","DOIUrl":"https://doi.org/10.1109/LPE.2000.155288","url":null,"abstract":"Novel techniques for the power efficient synthesis of sum-of-product computations are presented. Simple and efficient heuristics for scheduling and assignment are described. Different partly static cost functions are proposed to drive the synthesis tasks. The proposed cost functions target the power consumption either in the buses connecting the functional units with the storage elements or inside the functional units. The partly static nature of the proposed cost functions reduces the time of the synthesis procedure. Experimental results from different relevant digital signal processing algorithmic kernels prove that the proposed synthesis techniques lead to significant power savings.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126427667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies [MGK 98, Tiw 98] have confirmed that a significant amount of energy is dissipated in the process of instruction dispatching and issue in modern superscalar microprocessors. We propose a model for the energy dissipated by instruction dispatching and issuing logic in modern superscalar microprocessors and validate them through register level simulations and SPICE-measured dissipation coefficients from 0.5 micron CMOS layouts of relevant circuits. Alternative organizations are studied for instruction window buffers that result in energy savings of about 47% over traditional designs.
{"title":"Reducing energy requirements for instruction issue and dispatch in superscalar microprocessors","authors":"K. Ghose","doi":"10.1109/LPE.2000.155287","DOIUrl":"https://doi.org/10.1109/LPE.2000.155287","url":null,"abstract":"Recent studies [MGK 98, Tiw 98] have confirmed that a significant amount of energy is dissipated in the process of instruction dispatching and issue in modern superscalar microprocessors. We propose a model for the energy dissipated by instruction dispatching and issuing logic in modern superscalar microprocessors and validate them through register level simulations and SPICE-measured dissipation coefficients from 0.5 micron CMOS layouts of relevant circuits. Alternative organizations are studied for instruction window buffers that result in energy savings of about 47% over traditional designs.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel technique for power-performance trade-off based on profile-driven code execution. Specifically, we show that there is an optimal level of parallelism for energy consumption and propose a compiler-assisted technique for code annotation that can be used at run-time to adaptively trade-off power and performance. As shown by experimental results, our approach is up to 23% better than clock throttling and is as efficient as voltage scaling (up to 10% better in some cases). The technique proposed in this paper can be used by an ACPI-compliant power manager for prolonging battery life or as a passive cooling feature for thermal management.
{"title":"Profile-driven code execution for low power dissipation","authors":"Diana Marculescu","doi":"10.1109/LPE.2000.155294","DOIUrl":"https://doi.org/10.1109/LPE.2000.155294","url":null,"abstract":"This paper proposes a novel technique for power-performance trade-off based on profile-driven code execution. Specifically, we show that there is an optimal level of parallelism for energy consumption and propose a compiler-assisted technique for code annotation that can be used at run-time to adaptively trade-off power and performance. As shown by experimental results, our approach is up to 23% better than clock throttling and is as efficient as voltage scaling (up to 10% better in some cases). The technique proposed in this paper can be used by an ACPI-compliant power manager for prolonging battery life or as a passive cooling feature for thermal management.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130285342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A self-timed radix-2 division scheme for low power consumption is proposed. By replacing dual-rail dynamic circuits in non-critical data paths with single-rail static circuits, power dissipation is decreased, yet performance is maintained by speculative remainder computation. SPICE simulation results show that the proposed design can achieve 33.8-ns latency for 56-bit mantissa division and 47% energy reduction compared to a fully dual-rail version.
{"title":"Low power self-timed radix-2 division","authors":"Jae-Hee Won, Kiyoung Choi","doi":"10.1109/LPE.2000.155280","DOIUrl":"https://doi.org/10.1109/LPE.2000.155280","url":null,"abstract":"A self-timed radix-2 division scheme for low power consumption is proposed. By replacing dual-rail dynamic circuits in non-critical data paths with single-rail static circuits, power dissipation is decreased, yet performance is maintained by speculative remainder computation. SPICE simulation results show that the proposed design can achieve 33.8-ns latency for 56-bit mantissa division and 47% energy reduction compared to a fully dual-rail version.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116296822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Esakkimuthu, N. Vijaykrishnan, M. Kandemir, M. J. Irwin
A memory system usually consumes a significant amount of energy in many battery-operated devices. In this paper, we provide a quantitative comparison and evaluation of the interaction of two hardware cache optimization mechanisms (block buffering and sub-banking) and three widely used compiler optimization techniques (linear loop transformation, loop tiling, and loop unrolling). Our results show that the pure hardware optimizations (eight block buffers and four sub-banks in a 4K, 2-way cache) provided up to 4% energy saving, with an average saving of 2% across all benchmarks. In contrast, the pure software optimization approach that uses all three compiler optimizations, provided at least 23% energy saving, with an average of 62%. However, a closer observation reveals that hardware optimization becomes more critical for on-chip cache energy reduction when executing optimized codes.
{"title":"Memory system energy: Influence of hardware-software optimizations","authors":"G. Esakkimuthu, N. Vijaykrishnan, M. Kandemir, M. J. Irwin","doi":"10.1109/LPE.2000.155291","DOIUrl":"https://doi.org/10.1109/LPE.2000.155291","url":null,"abstract":"A memory system usually consumes a significant amount of energy in many battery-operated devices. In this paper, we provide a quantitative comparison and evaluation of the interaction of two hardware cache optimization mechanisms (block buffering and sub-banking) and three widely used compiler optimization techniques (linear loop transformation, loop tiling, and loop unrolling). Our results show that the pure hardware optimizations (eight block buffers and four sub-banks in a 4K, 2-way cache) provided up to 4% energy saving, with an average saving of 2% across all benchmarks. In contrast, the pure software optimization approach that uses all three compiler optimizations, provided at least 23% energy saving, with an average of 62%. However, a closer observation reveals that hardware optimization becomes more critical for on-chip cache energy reduction when executing optimized codes.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114591714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the design of a 3rd-order lowpass /spl Sigma//spl Delta/ analog-to-digital (A/D) converter using a continuous-time (CT) loopfilter. The loopfilter has been implemented by using active RC-integrators. The influence of the low supply voltage on the building blocks such as the amplifier and the common mode feedback as well as on the overall /spl Sigma//spl Delta/ modulator is discussed. Simulation results of the 1.5 V CT /spl Sigma//spl Delta/ A/D converter show a 75 dB dynamic range in a bandwidth of 25 kHz. The expected power consumption is less than 300 /spl mu/W.
{"title":"A 1.5 V low-power third order continuous-time lowpass /spl Sigma//spl Delta/ A/D converter","authors":"F. Gerfers, Y. Manoli","doi":"10.1109/LPE.2000.155283","DOIUrl":"https://doi.org/10.1109/LPE.2000.155283","url":null,"abstract":"This paper presents the design of a 3rd-order lowpass /spl Sigma//spl Delta/ analog-to-digital (A/D) converter using a continuous-time (CT) loopfilter. The loopfilter has been implemented by using active RC-integrators. The influence of the low supply voltage on the building blocks such as the amplifier and the common mode feedback as well as on the overall /spl Sigma//spl Delta/ modulator is discussed. Simulation results of the 1.5 V CT /spl Sigma//spl Delta/ A/D converter show a 75 dB dynamic range in a bandwidth of 25 kHz. The expected power consumption is less than 300 /spl mu/W.","PeriodicalId":188020,"journal":{"name":"ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121427821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}