This paper presents a hardware efficient architecture for generating sine and cosine waves based on the CORDIC (Coordinate Rotation Digital Computer) algorithm. In its original form the CORDIC suffers from major drawbacks like scale-factor calculation, latency and optimal selection of micro-rotations. The proposed algorithm overcomes all these drawbacks. We use leading-one bit detection technique to identify the micro-rotations. The scale-free design of the proposed algorithm is based on Taylor series expansion of the sine and cosine waves. The 16-bit iterative architecture achieves approximately 4.5% and 6.7% lower slice-delay product as compared to the other existing designs. The algorithm design and its VLSI implementation are detailed.
{"title":"Hardware Efficient Architecture for Generating Sine/Cosine Waves","authors":"Supriya Aggarwal, K. Khare","doi":"10.1109/VLSID.2012.46","DOIUrl":"https://doi.org/10.1109/VLSID.2012.46","url":null,"abstract":"This paper presents a hardware efficient architecture for generating sine and cosine waves based on the CORDIC (Coordinate Rotation Digital Computer) algorithm. In its original form the CORDIC suffers from major drawbacks like scale-factor calculation, latency and optimal selection of micro-rotations. The proposed algorithm overcomes all these drawbacks. We use leading-one bit detection technique to identify the micro-rotations. The scale-free design of the proposed algorithm is based on Taylor series expansion of the sine and cosine waves. The 16-bit iterative architecture achieves approximately 4.5% and 6.7% lower slice-delay product as compared to the other existing designs. The algorithm design and its VLSI implementation are detailed.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122282664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel memory architecture, introducing Random Access Analog Memory (RA2M), to store unquantized samples of video signal of maximum 5 MHz bandwidth for storing time duration in order of millisecond by implementing periodic memory refreshing mechanism in it. At 16.5 MHz sampling frequency with 25 frames/s frame rate, this implemented design can store voltage signal sample of up to 200 mV for 40 ms with 8 bit resolution. The proposed architecture contains unit RA2M cell of 250 fF capacitance occupying 21 μm × 21 μm area with 4.1 mW average power dissipation per cell in 0.18 μm standard CMOS fabrication process. The improvement in signal storage time duration into analog memory by introducing periodic memory refreshing mechanism in voltage mode is implemented for the first time. The circuit implementation is based on switched capacitor technique and is compatible with conventional fabrication process. This architecture facilitates random location data accessibility and includes common mode noise rejection by its differential signal implementation.
{"title":"Random Access Analog Memory (RA2M) for Video Signal Application","authors":"Nilanjan Chattaraj, A. Dhar","doi":"10.1109/VLSID.2012.43","DOIUrl":"https://doi.org/10.1109/VLSID.2012.43","url":null,"abstract":"This paper proposes a novel memory architecture, introducing Random Access Analog Memory (RA2M), to store unquantized samples of video signal of maximum 5 MHz bandwidth for storing time duration in order of millisecond by implementing periodic memory refreshing mechanism in it. At 16.5 MHz sampling frequency with 25 frames/s frame rate, this implemented design can store voltage signal sample of up to 200 mV for 40 ms with 8 bit resolution. The proposed architecture contains unit RA2M cell of 250 fF capacitance occupying 21 μm × 21 μm area with 4.1 mW average power dissipation per cell in 0.18 μm standard CMOS fabrication process. The improvement in signal storage time duration into analog memory by introducing periodic memory refreshing mechanism in voltage mode is implemented for the first time. The circuit implementation is based on switched capacitor technique and is compatible with conventional fabrication process. This architecture facilitates random location data accessibility and includes common mode noise rejection by its differential signal implementation.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127882925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Environmental energy harvesting is a promising approach to achieving extremely long operational lifetimes in a variety of micro-scale electronic systems. Maximum power point tracking (MPPT) is a technique used in energy harvesting systems to maximize the amount of harvested power. Existing MPPT methods, originally intended for large-scale systems, incur high power overheads when used in micro-scale energy harvesting, where the output voltage of the transducers is very low (less than 500mV) and the harvested power is miniscule (only hundreds of μW). This paper presents a low-overhead MPPT algorithm for micro-scale solar energy harvesting systems. The proposed algorithm is based on the use of a negative feedback control loop and is particularly amenable to hardware-efficient implementation. We have used the proposed algorithm to design a micro-scale solar energy harvesting system, which has been implemented using IBM 45nm technology. Post-layout simulation results demonstrate that the proposed MPPT scheme successfully tracks the optimal operating point with a tracking error of less than 1% and incurs minimal power overheads.
{"title":"Low-Overhead Maximum Power Point Tracking for Micro-Scale Solar Energy Harvesting Systems","authors":"Chao Lu, S. P. Park, V. Raghunathan, K. Roy","doi":"10.1109/VLSID.2012.73","DOIUrl":"https://doi.org/10.1109/VLSID.2012.73","url":null,"abstract":"Environmental energy harvesting is a promising approach to achieving extremely long operational lifetimes in a variety of micro-scale electronic systems. Maximum power point tracking (MPPT) is a technique used in energy harvesting systems to maximize the amount of harvested power. Existing MPPT methods, originally intended for large-scale systems, incur high power overheads when used in micro-scale energy harvesting, where the output voltage of the transducers is very low (less than 500mV) and the harvested power is miniscule (only hundreds of μW). This paper presents a low-overhead MPPT algorithm for micro-scale solar energy harvesting systems. The proposed algorithm is based on the use of a negative feedback control loop and is particularly amenable to hardware-efficient implementation. We have used the proposed algorithm to design a micro-scale solar energy harvesting system, which has been implemented using IBM 45nm technology. Post-layout simulation results demonstrate that the proposed MPPT scheme successfully tracks the optimal operating point with a tracking error of less than 1% and incurs minimal power overheads.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114226777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Damodaran, T. Anderson, S. Agarwala, R. Venkatasubramanian, M. Gill, Dhileep Gopalakrishnan, A. Hill, A. Chachad, D. Balasubramanian, Naveen Bhoria, Jonathan Tran, Duc Bui, Mujibur Rahman, S. Moharil, Matthew D. Pierson, Steven Mullinnix, Hung Ong, D. Thompson, Krishna Gurram, O. Olorode, Nuruddin Mahmood, Jose Flores, A. Rajagopal, Soujanya Narnur, Daniel Wu, Alan Hales, Kyle Peavy, Robert Sussman
The next-generation C66x DSP integrated fixed and floating-point DSP implemented in TSMC 40nm process is presented in this paper. The DSP core runs at 1.25GHz at 0.9V and has a standby power consumption of 800mW. The core transistor count is 21.5 million. The DSP core features 8-way VLIW floating point Data path and a two level memory system and delivers 40 GMACS or 10 GFLOPS floating point MAC performance at 1.25GHz.
{"title":"A 1.25GHz 0.8W C66x DSP Core in 40nm CMOS","authors":"R. Damodaran, T. Anderson, S. Agarwala, R. Venkatasubramanian, M. Gill, Dhileep Gopalakrishnan, A. Hill, A. Chachad, D. Balasubramanian, Naveen Bhoria, Jonathan Tran, Duc Bui, Mujibur Rahman, S. Moharil, Matthew D. Pierson, Steven Mullinnix, Hung Ong, D. Thompson, Krishna Gurram, O. Olorode, Nuruddin Mahmood, Jose Flores, A. Rajagopal, Soujanya Narnur, Daniel Wu, Alan Hales, Kyle Peavy, Robert Sussman","doi":"10.1109/VLSID.2012.85","DOIUrl":"https://doi.org/10.1109/VLSID.2012.85","url":null,"abstract":"The next-generation C66x DSP integrated fixed and floating-point DSP implemented in TSMC 40nm process is presented in this paper. The DSP core runs at 1.25GHz at 0.9V and has a standby power consumption of 800mW. The core transistor count is 21.5 million. The DSP core features 8-way VLIW floating point Data path and a two level memory system and delivers 40 GMACS or 10 GFLOPS floating point MAC performance at 1.25GHz.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126743767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This tutorial begins with a broad overview of challenges in emerging mixed signal systems. After describing the system-level requirements along with the architecture and circuit needs, specific circuit and system solutions will be discussed to highlight promising approaches. Design techniques for advanced analog- and mixed signal circuit blocks such as phase-locked loops and analog-to-digital converters will be covered in detail. Finally, the modeling and analysis of substrate noise coupling in mixed-signal integrated circuits is addressed. This day long tutorial addresses both the system- and circuit-level aspects of emerging mixed-signal systems. Analysis and design techniques to implement analog to digital converters, phase-locked loops, and the impact of substrate noise on these circuits in large system-on-chips will be discussed. The tutorial is categorized into the following four categories.
{"title":"Tutorial T5: Advanced Analog-Mixed Signal System and Circuit Techniques","authors":"P. Hanumolu, U. Moon, T. Fiez","doi":"10.1109/VLSID.2012.32","DOIUrl":"https://doi.org/10.1109/VLSID.2012.32","url":null,"abstract":"This tutorial begins with a broad overview of challenges in emerging mixed signal systems. After describing the system-level requirements along with the architecture and circuit needs, specific circuit and system solutions will be discussed to highlight promising approaches. Design techniques for advanced analog- and mixed signal circuit blocks such as phase-locked loops and analog-to-digital converters will be covered in detail. Finally, the modeling and analysis of substrate noise coupling in mixed-signal integrated circuits is addressed. This day long tutorial addresses both the system- and circuit-level aspects of emerging mixed-signal systems. Analysis and design techniques to implement analog to digital converters, phase-locked loops, and the impact of substrate noise on these circuits in large system-on-chips will be discussed. The tutorial is categorized into the following four categories.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133448957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bodhisatwa Mazumdar, Debdeep Mukhopadhyay, I. Sengupta
This paper proposes an S-box construction of AES-128 block cipher which is more robust to differential power analysis (DPA) attacks than that of AES-128 implemented with Rijndael S-box while having similar cryptographic properties. The proposed S-box avoids use of countermeasures for thwarting DPA attacks thus consuming lesser area and power in the embedded hardware and still being more DPA resistive compared to Rijndael S-box. The design has been prototyped on Xilinx FPGA Spartan device XC3S400-4PQ208 and the power traces of the two different running AES-128 algorithms with the proposed and Rijndael S-boxes have been analyzed separately. The experimental results of the FPGA implementations show a lesser gate count consumption and increased throughput for the AES-128 with proposed S-box as that when implemented with Rijndael S-box on the same FPGA device. The requirement of higher number of power traces to perform DPA analysis on AES-128 with RAIN S-box as compared to that implemented with Rijndael S-box is an experimental validation of the theoretical claim of lower transparency order computed for RAIN S-box as being more DPA resistant than that of Rijndael S-box.
{"title":"Design for Security of Block Cipher S-Boxes to Resist Differential Power Attacks","authors":"Bodhisatwa Mazumdar, Debdeep Mukhopadhyay, I. Sengupta","doi":"10.1109/VLSID.2012.56","DOIUrl":"https://doi.org/10.1109/VLSID.2012.56","url":null,"abstract":"This paper proposes an S-box construction of AES-128 block cipher which is more robust to differential power analysis (DPA) attacks than that of AES-128 implemented with Rijndael S-box while having similar cryptographic properties. The proposed S-box avoids use of countermeasures for thwarting DPA attacks thus consuming lesser area and power in the embedded hardware and still being more DPA resistive compared to Rijndael S-box. The design has been prototyped on Xilinx FPGA Spartan device XC3S400-4PQ208 and the power traces of the two different running AES-128 algorithms with the proposed and Rijndael S-boxes have been analyzed separately. The experimental results of the FPGA implementations show a lesser gate count consumption and increased throughput for the AES-128 with proposed S-box as that when implemented with Rijndael S-box on the same FPGA device. The requirement of higher number of power traces to perform DPA analysis on AES-128 with RAIN S-box as compared to that implemented with Rijndael S-box is an experimental validation of the theoretical claim of lower transparency order computed for RAIN S-box as being more DPA resistant than that of Rijndael S-box.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133872684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Sachid, Pallavi Paliwal, S. Joshi, M. Baghini, D. Sharma, V. Rao
With every new technology node, scaling down of Device-to-Interconnect Capacitance ratio causes Interconnect delay to become bottleneck for circuit performance. To miti-gate this effect, interconnect routing area on-chip should be minimized for improved power-delay product. In this aspect, Fin FET with multiple fins per lithographic pitch gains more advantage, in comparison to Planar Device, since, such Fin FET devices allow increase of electrical width without increasing device layout area and thus, interconnect capacitance is comparatively lower. Therefore, minimum delay could be achieved for lesser device width, and thus, with lower power. This paper proves the performance enhancement with such Fin FET Device for Mux Circuit, and aims to find out Optimum Design Space for Mux Circuit, at 22nm technology node, with practical value of Interconnect Capacitive load (extrapolated from circuit layout in current technology node).
{"title":"Circuit Optimization at 22nm Technology Node","authors":"A. Sachid, Pallavi Paliwal, S. Joshi, M. Baghini, D. Sharma, V. Rao","doi":"10.1109/VLSID.2012.91","DOIUrl":"https://doi.org/10.1109/VLSID.2012.91","url":null,"abstract":"With every new technology node, scaling down of Device-to-Interconnect Capacitance ratio causes Interconnect delay to become bottleneck for circuit performance. To miti-gate this effect, interconnect routing area on-chip should be minimized for improved power-delay product. In this aspect, Fin FET with multiple fins per lithographic pitch gains more advantage, in comparison to Planar Device, since, such Fin FET devices allow increase of electrical width without increasing device layout area and thus, interconnect capacitance is comparatively lower. Therefore, minimum delay could be achieved for lesser device width, and thus, with lower power. This paper proves the performance enhancement with such Fin FET Device for Mux Circuit, and aims to find out Optimum Design Space for Mux Circuit, at 22nm technology node, with practical value of Interconnect Capacitive load (extrapolated from circuit layout in current technology node).","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129875830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmesh D. Bordoloi, B. Suri, S. Nunna, S. Chakraborty, P. Eles, Zebo Peng
Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally intractable. As such, existing tools to customize instruction sets of extensible processors rely on approximation methods or heuristics. In contrast to such traditional approaches, we propose to use GPUs (Graphics Processing Units) to efficiently solve computationally expensive algorithms in the design automation tools for extensible processors. To demonstrate our idea, we choose a custom instruction selection problem and accelerate it using CUDA (CUDA is a GPU computing engine). Our CUDA implementation is devised to maximize the achievable speedups by various optimizations like exploiting on-chip shared memory and register usage. Experiments conducted on well known benchmarks show significant speedups over sequential CPU implementations as well as over multi-core implementations.
{"title":"Customizing Instruction Set Extensible Reconfigurable Processors Using GPUs","authors":"Unmesh D. Bordoloi, B. Suri, S. Nunna, S. Chakraborty, P. Eles, Zebo Peng","doi":"10.1109/VLSID.2012.107","DOIUrl":"https://doi.org/10.1109/VLSID.2012.107","url":null,"abstract":"Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally intractable. As such, existing tools to customize instruction sets of extensible processors rely on approximation methods or heuristics. In contrast to such traditional approaches, we propose to use GPUs (Graphics Processing Units) to efficiently solve computationally expensive algorithms in the design automation tools for extensible processors. To demonstrate our idea, we choose a custom instruction selection problem and accelerate it using CUDA (CUDA is a GPU computing engine). Our CUDA implementation is devised to maximize the achievable speedups by various optimizations like exploiting on-chip shared memory and register usage. Experiments conducted on well known benchmarks show significant speedups over sequential CPU implementations as well as over multi-core implementations.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In high speed link transmitters, one major contributor of jitter is the data-dependant switching of the transmitters. Such switching leads to oscillations in the supply R-L-C network. This paper presents an area-efficient way to reduce this supply noise by shifting the switching beyond the resonance frequency of the supply network, irrespective of the data-pattern. This scheme is implemented in HDMI transmitter in 65nm technology.
{"title":"Self-Induced Supply Noise Reduction Technique in GBPS Rate Transmitters","authors":"Nitin Gupta, Tapas Nandy, P. Bala","doi":"10.1109/VLSID.2012.52","DOIUrl":"https://doi.org/10.1109/VLSID.2012.52","url":null,"abstract":"In high speed link transmitters, one major contributor of jitter is the data-dependant switching of the transmitters. Such switching leads to oscillations in the supply R-L-C network. This paper presents an area-efficient way to reduce this supply noise by shifting the switching beyond the resonance frequency of the supply network, irrespective of the data-pattern. This scheme is implemented in HDMI transmitter in 65nm technology.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The need to integrate multiple wireless communication protocols into a single low-cost, low power hardware platform is prompted by the increasing number of emerging communication protocols and applications. This paper presents an efficient methodology for integrating multiple wireless protocols in an ASIC which minimizes resource occupation. A hierarchical data path merging algorithm is developed to find common shareable components in two different communication circuits. The data path merging approach will build a combined generic circuit with inserted multiplexers (MUXes) which can provide the same functionality of each individual circuit. The proposed method is orders of magnitude faster (well over 1000 times faster for realistic circuits) than the existing data path merging algorithm (with an overhead of 3% additional area) and can switch communication protocols on the fly (i.e. it can switch between protocols in a single clock cycle), which is a desirable feature for seemingly simultaneous multi-mode wireless communication. Wireless LAN (WLAN) 802.11a, WLAN802.11b and Ultra Wide Band (UWB) transmission circuits are merged to prove the efficacy of our proposal.
{"title":"A Rapid Methodology for Multi-mode Communication Circuit Generation","authors":"L. Tang, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/VLSID.2012.71","DOIUrl":"https://doi.org/10.1109/VLSID.2012.71","url":null,"abstract":"The need to integrate multiple wireless communication protocols into a single low-cost, low power hardware platform is prompted by the increasing number of emerging communication protocols and applications. This paper presents an efficient methodology for integrating multiple wireless protocols in an ASIC which minimizes resource occupation. A hierarchical data path merging algorithm is developed to find common shareable components in two different communication circuits. The data path merging approach will build a combined generic circuit with inserted multiplexers (MUXes) which can provide the same functionality of each individual circuit. The proposed method is orders of magnitude faster (well over 1000 times faster for realistic circuits) than the existing data path merging algorithm (with an overhead of 3% additional area) and can switch communication protocols on the fly (i.e. it can switch between protocols in a single clock cycle), which is a desirable feature for seemingly simultaneous multi-mode wireless communication. Wireless LAN (WLAN) 802.11a, WLAN802.11b and Ultra Wide Band (UWB) transmission circuits are merged to prove the efficacy of our proposal.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122058167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}