In this paper, we describe a high-level data path allocation algorithm to facilitate built-in self test. It generates self-testable data path design while maximizing the sharing of modules and test registers. The sharing of modules and test registers enables only a small number of registers is modified for BIST, thereby decreasing the hardware area which is one of the major overheads for BIST technique. In our approach, both module allocation and register allocation are performed incrementally. In each iteration, module allocation is guided by a testability balance technique while register allocation aims at increasing the sharing degrees of registers. With a variety of benchmarks, we demonstrate the advantage of our approach compared with other conventional approaches.
{"title":"Built-in self-testable data path synthesis","authors":"Laurence Tianruo Yangt, Jon Muxio","doi":"10.1109/IWV.2001.923143","DOIUrl":"https://doi.org/10.1109/IWV.2001.923143","url":null,"abstract":"In this paper, we describe a high-level data path allocation algorithm to facilitate built-in self test. It generates self-testable data path design while maximizing the sharing of modules and test registers. The sharing of modules and test registers enables only a small number of registers is modified for BIST, thereby decreasing the hardware area which is one of the major overheads for BIST technique. In our approach, both module allocation and register allocation are performed incrementally. In each iteration, module allocation is guided by a testability balance technique while register allocation aims at increasing the sharing degrees of registers. With a variety of benchmarks, we demonstrate the advantage of our approach compared with other conventional approaches.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129525951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given, as follows. As computers continue to shrink, research and commercial interest in mobile/wearable computing is rapidly growing. Unlike traditional desktop computing in which the user is required to come to the computer, mobile computing brings the computer to the user. Mobile/wearable computers represent the next evolutionary step in the trend toward more people-centric computing. One of the key problems with mobile/wearable computing is energy consumption. Battery weight for mobile/wearable computers often exceeds the weight of all other components combined. In order to make mobile/wearable computing widely applicable, major advances in reducing power consumption and battery weight are needed. While a "Moore's Law" exists for power consumption of microprocessors with mW/MIPS decreasing by a factor of ten every five year, there is no such similar trend in wireless communications. This suggests that future wearable computers will be communications bound. In fact, we estimate that nearly 80% of the power consumed by wearable computers can be due to communications. Trading off energy expensive communication for energy cheap computation through effective partitioning of control and data can result in significant energy savings. Examples and measurements will illustrate how the use of proxies can reduce power consumption due to communications by several orders of magnitude. In addition, the interface design must be carefully matched with user tasks and balanced against energy consumption. Many complex and interrelated issues determine the balance between ease-of-use and power consumption. Simply trading off ease-of-use for lower per operation power consumption may result in higher task energy consumption due to the increase in the number of operations needed to traverse a less intuitive interface. The effect of user interface on energy consumption can be evaluated by developing several different interfaces and measuring and comparing the ease-of-use and energy consumption. In conclusion, an architecture that supports these studies will be introduced. The Spot wearable computer includes a dozen power monitors that can be read under software control to determine which subsystems are active and their power consumption during an application.
{"title":"Energy locality: processing/communication/interface tradeoffs to optimize energy in mobile systems","authors":"D. Siewiorek","doi":"10.1109/IWV.2001.923131","DOIUrl":"https://doi.org/10.1109/IWV.2001.923131","url":null,"abstract":"Summary form only given, as follows. As computers continue to shrink, research and commercial interest in mobile/wearable computing is rapidly growing. Unlike traditional desktop computing in which the user is required to come to the computer, mobile computing brings the computer to the user. Mobile/wearable computers represent the next evolutionary step in the trend toward more people-centric computing. One of the key problems with mobile/wearable computing is energy consumption. Battery weight for mobile/wearable computers often exceeds the weight of all other components combined. In order to make mobile/wearable computing widely applicable, major advances in reducing power consumption and battery weight are needed. While a \"Moore's Law\" exists for power consumption of microprocessors with mW/MIPS decreasing by a factor of ten every five year, there is no such similar trend in wireless communications. This suggests that future wearable computers will be communications bound. In fact, we estimate that nearly 80% of the power consumed by wearable computers can be due to communications. Trading off energy expensive communication for energy cheap computation through effective partitioning of control and data can result in significant energy savings. Examples and measurements will illustrate how the use of proxies can reduce power consumption due to communications by several orders of magnitude. In addition, the interface design must be carefully matched with user tasks and balanced against energy consumption. Many complex and interrelated issues determine the balance between ease-of-use and power consumption. Simply trading off ease-of-use for lower per operation power consumption may result in higher task energy consumption due to the increase in the number of operations needed to traverse a less intuitive interface. The effect of user interface on energy consumption can be evaluated by developing several different interfaces and measuring and comparing the ease-of-use and energy consumption. In conclusion, an architecture that supports these studies will be introduced. The Spot wearable computer includes a dozen power monitors that can be read under software control to determine which subsystems are active and their power consumption during an application.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126380540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the modeling of layout structure in high level C++ models. Researchers agree that the level of abstraction for integrated circuit design needs to be raised. New languages and methodologies are being proposed, most of them from the software engineering domain. However one of the fundamental hardware design challenges is often overlooked as push button synthesis solutions are sought: physical design predictability. In this paper we describe how C++ constructs should be used to capture structural and physical implementation concerns. Our explanation relies on the importance of the floorplan and component placement estimations at high levels of abstraction. We highlight how using object oriented mechanisms eases the structural modeling of circuit components, and we present a C++ class library design to specify these structural concerns.
{"title":"Structural design composition for C++ hardware models","authors":"F. Doucet, V. Sinha, Raj Kumar Gupta","doi":"10.1109/IWV.2001.923137","DOIUrl":"https://doi.org/10.1109/IWV.2001.923137","url":null,"abstract":"This paper addresses the modeling of layout structure in high level C++ models. Researchers agree that the level of abstraction for integrated circuit design needs to be raised. New languages and methodologies are being proposed, most of them from the software engineering domain. However one of the fundamental hardware design challenges is often overlooked as push button synthesis solutions are sought: physical design predictability. In this paper we describe how C++ constructs should be used to capture structural and physical implementation concerns. Our explanation relies on the importance of the floorplan and component placement estimations at high levels of abstraction. We highlight how using object oriented mechanisms eases the structural modeling of circuit components, and we present a C++ class library design to specify these structural concerns.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115246827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new ALU design is proposed that is more economical than a conventional Logarithmic Number System (LNS) ALU for pipelined multiply-accumulate applications (such as FIR filters). A novel interpolator that accepts both positive and negative arguments allows rearrangement of the fixed-point adders that implement the LNS addition algorithm. The area for the resulting circuit is essentially the same as the traditional LNS approach, but the critical path for the proposed circuit is shorter, allowing a faster cycle time and/or a shorter latency. To make the advantages of the improved LNS ALU available to end users, new primitive operations (increment-multiply and multiply-increment-multiply) should be supported instead of the more traditional add and multiply-accumulate operations. The Verilog coding for such a novel increment-multiply module is given.
{"title":"A pipelined LNS ALU","authors":"M. Arnold","doi":"10.1109/IWV.2001.923155","DOIUrl":"https://doi.org/10.1109/IWV.2001.923155","url":null,"abstract":"A new ALU design is proposed that is more economical than a conventional Logarithmic Number System (LNS) ALU for pipelined multiply-accumulate applications (such as FIR filters). A novel interpolator that accepts both positive and negative arguments allows rearrangement of the fixed-point adders that implement the LNS addition algorithm. The area for the resulting circuit is essentially the same as the traditional LNS approach, but the critical path for the proposed circuit is shorter, allowing a faster cycle time and/or a shorter latency. To make the advantages of the improved LNS ALU available to end users, new primitive operations (increment-multiply and multiply-increment-multiply) should be supported instead of the more traditional add and multiply-accumulate operations. The Verilog coding for such a novel increment-multiply module is given.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124467917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sensing current instead of voltage provides an alternative to signaling on the long wires that are increasingly limiting the performance of CMOS as it scales into the VDSM regime (<0.25 /spl mu/). Current-mode techniques have been proposed for sensing bit-lines. We present a comparative study of Current-sensing with the optimal repeater insertion technique for wires from 0.35 cm to 1.75 cm in length. Simulation results using SPICE for 0.18 /spl mu/ showed that current-sensing was faster and lower-power when compared to optimal repeater insertion technique. While the power dissipated by the optimal repeater circuit increased linearly with line length, power dissipated by the current-sensing circuit was almost constant for longer lines. Inductance had little effect on the differential current sensing technique.
{"title":"Current sensing techniques for global interconnects in very deep submicron (VDSM) CMOS","authors":"A. Maheshwari, Wayne Burleson","doi":"10.1109/IWV.2001.923141","DOIUrl":"https://doi.org/10.1109/IWV.2001.923141","url":null,"abstract":"Sensing current instead of voltage provides an alternative to signaling on the long wires that are increasingly limiting the performance of CMOS as it scales into the VDSM regime (<0.25 /spl mu/). Current-mode techniques have been proposed for sensing bit-lines. We present a comparative study of Current-sensing with the optimal repeater insertion technique for wires from 0.35 cm to 1.75 cm in length. Simulation results using SPICE for 0.18 /spl mu/ showed that current-sensing was faster and lower-power when compared to optimal repeater insertion technique. While the power dissipated by the optimal repeater circuit increased linearly with line length, power dissipated by the current-sensing circuit was almost constant for longer lines. Inductance had little effect on the differential current sensing technique.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126846705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a new architecture for low-power design of parallel multipliers is proposed. Reduction of power consumption is achieved by reducing the circuit activity at the architecture level by dividing the multiplication circuit into clusters of smaller multipliers. By applying clock gating techniques and preprocessing operations on the input pattern using simple logic functions, some of these clusters that are producing a zero result can be disabled and hence saving the switching power component that could be consumed by these clusters. The amount of power savings is dependent on the nature of the input pattern, which varies according to the application. Analysis of the input pattern is performed. For testing purposes, A 8-bit multiplier prototype is constructed in 0.35 micron double metal CMOS technology using Cadence development tools. For the average case when all the input combinations have an equal probability of occurrence, HSPICE simulation results at 3.3 V and 500 MHz frequency show that the proposed architecture results in 13.4% power savings.
{"title":"A novel architecture for low-power design of parallel multipliers","authors":"A. Fayed, M. Bayoumi","doi":"10.1109/IWV.2001.923154","DOIUrl":"https://doi.org/10.1109/IWV.2001.923154","url":null,"abstract":"In this paper, a new architecture for low-power design of parallel multipliers is proposed. Reduction of power consumption is achieved by reducing the circuit activity at the architecture level by dividing the multiplication circuit into clusters of smaller multipliers. By applying clock gating techniques and preprocessing operations on the input pattern using simple logic functions, some of these clusters that are producing a zero result can be disabled and hence saving the switching power component that could be consumed by these clusters. The amount of power savings is dependent on the nature of the input pattern, which varies according to the application. Analysis of the input pattern is performed. For testing purposes, A 8-bit multiplier prototype is constructed in 0.35 micron double metal CMOS technology using Cadence development tools. For the average case when all the input combinations have an equal probability of occurrence, HSPICE simulation results at 3.3 V and 500 MHz frequency show that the proposed architecture results in 13.4% power savings.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130324521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We apply the output prediction logic (OPL) technique to the differential CMOS logic family. Including the effects of process, voltage and temperature (PVT) variations, we show that OPL differential CMOS is more than 40% faster than the single-rail OPL-dynamic logic family, and nearly 5 times faster than optimized static CMOS. We also demonstrate an OPL-differential 64:2 compressor that is 37% faster than the OPL-dynamic version. Finally, we show that OPL-differential is nearly twice as fast as differential domino.
{"title":"Application of output prediction logic to differential CMOS","authors":"Su Go, L. McMurchie, Carl Sechen","doi":"10.1109/IWV.2001.923140","DOIUrl":"https://doi.org/10.1109/IWV.2001.923140","url":null,"abstract":"We apply the output prediction logic (OPL) technique to the differential CMOS logic family. Including the effects of process, voltage and temperature (PVT) variations, we show that OPL differential CMOS is more than 40% faster than the single-rail OPL-dynamic logic family, and nearly 5 times faster than optimized static CMOS. We also demonstrate an OPL-differential 64:2 compressor that is 37% faster than the OPL-dynamic version. Finally, we show that OPL-differential is nearly twice as fast as differential domino.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134022133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel low power SIMD architecture for texture mapping using transformation. Low power has been achieved by exploring the properties of the affine transformation to reduce the computational cost. The architecture has been prototyped using 0.35 /spl mu/m CMOS technology with three layers of metal. The proposed architecture can be used in video object motion tracking and texture warping processors.
{"title":"A low power SIMD architecture for affine-based texture mapping","authors":"Wael Badawy","doi":"10.1109/IWV.2001.923151","DOIUrl":"https://doi.org/10.1109/IWV.2001.923151","url":null,"abstract":"This paper presents a novel low power SIMD architecture for texture mapping using transformation. Low power has been achieved by exploring the properties of the affine transformation to reduce the computational cost. The architecture has been prototyped using 0.35 /spl mu/m CMOS technology with three layers of metal. The proposed architecture can be used in video object motion tracking and texture warping processors.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Parikh, M. Kandemir, N. Vijaykrishnan, M. J. Irwin
We present and evaluate several instruction scheduling algorithms that reorder a given sequence of instructions taking into account the energy considerations. We first compare a performance oriented scheduling technique with three energy-oriented instruction scheduling algorithms from both performance (execution cycles of the resulting schedules) and energy consumption points of view. Then, we propose scheduling algorithms that consider energy and performance at the same time. The results obtained using randomly generated directed acyclic graphs show that these techniques are quite successful in reducing energy consumption and their performance (in terms of execution cycles) is comparable to that of a pure performance-oriented scheduling.
{"title":"VLIW scheduling for energy and performance","authors":"A. Parikh, M. Kandemir, N. Vijaykrishnan, M. J. Irwin","doi":"10.1109/IWV.2001.923148","DOIUrl":"https://doi.org/10.1109/IWV.2001.923148","url":null,"abstract":"We present and evaluate several instruction scheduling algorithms that reorder a given sequence of instructions taking into account the energy considerations. We first compare a performance oriented scheduling technique with three energy-oriented instruction scheduling algorithms from both performance (execution cycles of the resulting schedules) and energy consumption points of view. Then, we propose scheduling algorithms that consider energy and performance at the same time. The results obtained using randomly generated directed acyclic graphs show that these techniques are quite successful in reducing energy consumption and their performance (in terms of execution cycles) is comparable to that of a pure performance-oriented scheduling.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"1960 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130203938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Glossner, D. Routenberg, E. Hokenek, M. Moudgill, M. Schulte, P. Balzola, S. Vassiliadis
We discuss the hardware and software challenges in building a 2 Mbit per second wireless battery powered communications device. Of primary importance is power dissipation. To achieve aggressive power targets, a host of new techniques are required at all levels of the design hierarchy. Techniques for parallelizing saturating arithmetic will become important because of the software optimizations they enable. Highly configurable programmable structures will enable multiprotocol SOC solutions. To program complex SOCs, new compiler techniques will be required. Hardware implementations will need to be intimately aware of these software techniques. In particular both signal processing code written in C and control code written in Java will drive new compilation techniques to enable broadband 3G wireless systems.
{"title":"Towards a very high bandwidth wireless battery powered device","authors":"J. Glossner, D. Routenberg, E. Hokenek, M. Moudgill, M. Schulte, P. Balzola, S. Vassiliadis","doi":"10.1109/IWV.2001.923132","DOIUrl":"https://doi.org/10.1109/IWV.2001.923132","url":null,"abstract":"We discuss the hardware and software challenges in building a 2 Mbit per second wireless battery powered communications device. Of primary importance is power dissipation. To achieve aggressive power targets, a host of new techniques are required at all levels of the design hierarchy. Techniques for parallelizing saturating arithmetic will become important because of the software optimizations they enable. Highly configurable programmable structures will enable multiprotocol SOC solutions. To program complex SOCs, new compiler techniques will be required. Hardware implementations will need to be intimately aware of these software techniques. In particular both signal processing code written in C and control code written in Java will drive new compilation techniques to enable broadband 3G wireless systems.","PeriodicalId":114059,"journal":{"name":"Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130651179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}