首页 > 最新文献

2012 25th International Conference on VLSI Design最新文献

英文 中文
Hardware Efficient Architecture for Generating Sine/Cosine Waves 生成正弦波/余弦波的硬件高效架构
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.46
Supriya Aggarwal, K. Khare
This paper presents a hardware efficient architecture for generating sine and cosine waves based on the CORDIC (Coordinate Rotation Digital Computer) algorithm. In its original form the CORDIC suffers from major drawbacks like scale-factor calculation, latency and optimal selection of micro-rotations. The proposed algorithm overcomes all these drawbacks. We use leading-one bit detection technique to identify the micro-rotations. The scale-free design of the proposed algorithm is based on Taylor series expansion of the sine and cosine waves. The 16-bit iterative architecture achieves approximately 4.5% and 6.7% lower slice-delay product as compared to the other existing designs. The algorithm design and its VLSI implementation are detailed.
本文提出了一种基于坐标旋转数字计算机(CORDIC)算法生成正余弦波的高效硬件结构。在其原始形式中,CORDIC存在诸如比例因子计算,延迟和微旋转的最佳选择等主要缺点。该算法克服了所有这些缺点。我们使用超前1位检测技术来识别微旋转。该算法的无标度设计是基于正弦和余弦波的泰勒级数展开。与其他现有设计相比,16位迭代架构实现了大约4.5%和6.7%的低片延迟产品。详细介绍了该算法的设计及其VLSI实现。
{"title":"Hardware Efficient Architecture for Generating Sine/Cosine Waves","authors":"Supriya Aggarwal, K. Khare","doi":"10.1109/VLSID.2012.46","DOIUrl":"https://doi.org/10.1109/VLSID.2012.46","url":null,"abstract":"This paper presents a hardware efficient architecture for generating sine and cosine waves based on the CORDIC (Coordinate Rotation Digital Computer) algorithm. In its original form the CORDIC suffers from major drawbacks like scale-factor calculation, latency and optimal selection of micro-rotations. The proposed algorithm overcomes all these drawbacks. We use leading-one bit detection technique to identify the micro-rotations. The scale-free design of the proposed algorithm is based on Taylor series expansion of the sine and cosine waves. The 16-bit iterative architecture achieves approximately 4.5% and 6.7% lower slice-delay product as compared to the other existing designs. The algorithm design and its VLSI implementation are detailed.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122282664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Random Access Analog Memory (RA2M) for Video Signal Application 随机存取模拟存储器(RA2M)的视频信号应用
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.43
Nilanjan Chattaraj, A. Dhar
This paper proposes a novel memory architecture, introducing Random Access Analog Memory (RA2M), to store unquantized samples of video signal of maximum 5 MHz bandwidth for storing time duration in order of millisecond by implementing periodic memory refreshing mechanism in it. At 16.5 MHz sampling frequency with 25 frames/s frame rate, this implemented design can store voltage signal sample of up to 200 mV for 40 ms with 8 bit resolution. The proposed architecture contains unit RA2M cell of 250 fF capacitance occupying 21 μm × 21 μm area with 4.1 mW average power dissipation per cell in 0.18 μm standard CMOS fabrication process. The improvement in signal storage time duration into analog memory by introducing periodic memory refreshing mechanism in voltage mode is implemented for the first time. The circuit implementation is based on switched capacitor technique and is compatible with conventional fabrication process. This architecture facilitates random location data accessibility and includes common mode noise rejection by its differential signal implementation.
本文提出了一种新的存储结构,即随机存取模拟存储器(RA2M),通过在RA2M中实现周期性的存储器刷新机制,存储最大带宽为5mhz的视频信号的非量化采样,以毫秒为单位存储时间。在16.5 MHz采样频率下,以25帧/秒的帧速率,该实现的设计可以以8位分辨率存储高达200 mV的电压信号采样,持续40毫秒。该架构采用0.18 μm标准CMOS工艺,容量为250 fF,面积为21 μm × 21 μm的单元RA2M电池,每个电池平均功耗为4.1 mW。通过引入电压模式下的周期性存储器刷新机制,首次实现了模拟存储器中信号存储时间的改善。该电路的实现基于开关电容技术,与传统的制造工艺兼容。该结构便于随机位置数据访问,并通过其差分信号实现抑制共模噪声。
{"title":"Random Access Analog Memory (RA2M) for Video Signal Application","authors":"Nilanjan Chattaraj, A. Dhar","doi":"10.1109/VLSID.2012.43","DOIUrl":"https://doi.org/10.1109/VLSID.2012.43","url":null,"abstract":"This paper proposes a novel memory architecture, introducing Random Access Analog Memory (RA2M), to store unquantized samples of video signal of maximum 5 MHz bandwidth for storing time duration in order of millisecond by implementing periodic memory refreshing mechanism in it. At 16.5 MHz sampling frequency with 25 frames/s frame rate, this implemented design can store voltage signal sample of up to 200 mV for 40 ms with 8 bit resolution. The proposed architecture contains unit RA2M cell of 250 fF capacitance occupying 21 μm × 21 μm area with 4.1 mW average power dissipation per cell in 0.18 μm standard CMOS fabrication process. The improvement in signal storage time duration into analog memory by introducing periodic memory refreshing mechanism in voltage mode is implemented for the first time. The circuit implementation is based on switched capacitor technique and is compatible with conventional fabrication process. This architecture facilitates random location data accessibility and includes common mode noise rejection by its differential signal implementation.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127882925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low-Overhead Maximum Power Point Tracking for Micro-Scale Solar Energy Harvesting Systems 微型太阳能收集系统的低开销最大功率点跟踪
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.73
Chao Lu, S. P. Park, V. Raghunathan, K. Roy
Environmental energy harvesting is a promising approach to achieving extremely long operational lifetimes in a variety of micro-scale electronic systems. Maximum power point tracking (MPPT) is a technique used in energy harvesting systems to maximize the amount of harvested power. Existing MPPT methods, originally intended for large-scale systems, incur high power overheads when used in micro-scale energy harvesting, where the output voltage of the transducers is very low (less than 500mV) and the harvested power is miniscule (only hundreds of μW). This paper presents a low-overhead MPPT algorithm for micro-scale solar energy harvesting systems. The proposed algorithm is based on the use of a negative feedback control loop and is particularly amenable to hardware-efficient implementation. We have used the proposed algorithm to design a micro-scale solar energy harvesting system, which has been implemented using IBM 45nm technology. Post-layout simulation results demonstrate that the proposed MPPT scheme successfully tracks the optimal operating point with a tracking error of less than 1% and incurs minimal power overheads.
环境能量收集是一种很有前途的方法,可以在各种微型电子系统中实现超长的使用寿命。最大功率点跟踪(MPPT)是一种用于能量收集系统的技术,用于最大限度地获取能量。现有的MPPT方法原本是为大型系统设计的,但当用于微尺度能量收集时,会产生很高的功率开销,其中换能器的输出电压非常低(小于500mV),而收集的功率很小(仅为数百μW)。提出了一种适用于微型太阳能收集系统的低开销MPPT算法。所提出的算法基于负反馈控制回路的使用,特别适合于硬件高效的实现。我们使用该算法设计了一个微型太阳能收集系统,该系统采用IBM 45nm技术实现。布置图后仿真结果表明,所提出的MPPT方案成功地跟踪到最优工作点,跟踪误差小于1%,功耗开销最小。
{"title":"Low-Overhead Maximum Power Point Tracking for Micro-Scale Solar Energy Harvesting Systems","authors":"Chao Lu, S. P. Park, V. Raghunathan, K. Roy","doi":"10.1109/VLSID.2012.73","DOIUrl":"https://doi.org/10.1109/VLSID.2012.73","url":null,"abstract":"Environmental energy harvesting is a promising approach to achieving extremely long operational lifetimes in a variety of micro-scale electronic systems. Maximum power point tracking (MPPT) is a technique used in energy harvesting systems to maximize the amount of harvested power. Existing MPPT methods, originally intended for large-scale systems, incur high power overheads when used in micro-scale energy harvesting, where the output voltage of the transducers is very low (less than 500mV) and the harvested power is miniscule (only hundreds of μW). This paper presents a low-overhead MPPT algorithm for micro-scale solar energy harvesting systems. The proposed algorithm is based on the use of a negative feedback control loop and is particularly amenable to hardware-efficient implementation. We have used the proposed algorithm to design a micro-scale solar energy harvesting system, which has been implemented using IBM 45nm technology. Post-layout simulation results demonstrate that the proposed MPPT scheme successfully tracks the optimal operating point with a tracking error of less than 1% and incurs minimal power overheads.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114226777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
A 1.25GHz 0.8W C66x DSP Core in 40nm CMOS 一个采用40nm CMOS的1.25GHz 0.8W C66x DSP内核
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.85
R. Damodaran, T. Anderson, S. Agarwala, R. Venkatasubramanian, M. Gill, Dhileep Gopalakrishnan, A. Hill, A. Chachad, D. Balasubramanian, Naveen Bhoria, Jonathan Tran, Duc Bui, Mujibur Rahman, S. Moharil, Matthew D. Pierson, Steven Mullinnix, Hung Ong, D. Thompson, Krishna Gurram, O. Olorode, Nuruddin Mahmood, Jose Flores, A. Rajagopal, Soujanya Narnur, Daniel Wu, Alan Hales, Kyle Peavy, Robert Sussman
The next-generation C66x DSP integrated fixed and floating-point DSP implemented in TSMC 40nm process is presented in this paper. The DSP core runs at 1.25GHz at 0.9V and has a standby power consumption of 800mW. The core transistor count is 21.5 million. The DSP core features 8-way VLIW floating point Data path and a two level memory system and delivers 40 GMACS or 10 GFLOPS floating point MAC performance at 1.25GHz.
本文介绍了采用台积电40nm工艺实现的下一代C66x固定浮点DSP集成芯片。DSP核心在0.9V下运行在1.25GHz,待机功耗为800mW。核心晶体管数量为2150万个。DSP核心具有8路VLIW浮点数据路径和两级存储系统,并在1.25GHz下提供40 GMACS或10 GFLOPS浮点MAC性能。
{"title":"A 1.25GHz 0.8W C66x DSP Core in 40nm CMOS","authors":"R. Damodaran, T. Anderson, S. Agarwala, R. Venkatasubramanian, M. Gill, Dhileep Gopalakrishnan, A. Hill, A. Chachad, D. Balasubramanian, Naveen Bhoria, Jonathan Tran, Duc Bui, Mujibur Rahman, S. Moharil, Matthew D. Pierson, Steven Mullinnix, Hung Ong, D. Thompson, Krishna Gurram, O. Olorode, Nuruddin Mahmood, Jose Flores, A. Rajagopal, Soujanya Narnur, Daniel Wu, Alan Hales, Kyle Peavy, Robert Sussman","doi":"10.1109/VLSID.2012.85","DOIUrl":"https://doi.org/10.1109/VLSID.2012.85","url":null,"abstract":"The next-generation C66x DSP integrated fixed and floating-point DSP implemented in TSMC 40nm process is presented in this paper. The DSP core runs at 1.25GHz at 0.9V and has a standby power consumption of 800mW. The core transistor count is 21.5 million. The DSP core features 8-way VLIW floating point Data path and a two level memory system and delivers 40 GMACS or 10 GFLOPS floating point MAC performance at 1.25GHz.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126743767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Tutorial T5: Advanced Analog-Mixed Signal System and Circuit Techniques 教程T5:高级模拟混合信号系统和电路技术
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.32
P. Hanumolu, U. Moon, T. Fiez
This tutorial begins with a broad overview of challenges in emerging mixed signal systems. After describing the system-level requirements along with the architecture and circuit needs, specific circuit and system solutions will be discussed to highlight promising approaches. Design techniques for advanced analog- and mixed signal circuit blocks such as phase-locked loops and analog-to-digital converters will be covered in detail. Finally, the modeling and analysis of substrate noise coupling in mixed-signal integrated circuits is addressed. This day long tutorial addresses both the system- and circuit-level aspects of emerging mixed-signal systems. Analysis and design techniques to implement analog to digital converters, phase-locked loops, and the impact of substrate noise on these circuits in large system-on-chips will be discussed. The tutorial is categorized into the following four categories.
本教程开始与新兴的混合信号系统的挑战的广泛概述。在描述系统级需求以及架构和电路需求之后,将讨论具体的电路和系统解决方案,以突出有前途的方法。先进的模拟和混合信号电路模块的设计技术,如锁相环和模数转换器将详细介绍。最后,讨论了混合信号集成电路中衬底噪声耦合的建模和分析。这个为期一天的教程解决了新兴混合信号系统的系统和电路级方面。分析和设计技术,以实现模拟到数字转换器,锁相环,以及衬底噪声对这些电路的影响,在大型系统芯片将被讨论。本教程分为以下四类。
{"title":"Tutorial T5: Advanced Analog-Mixed Signal System and Circuit Techniques","authors":"P. Hanumolu, U. Moon, T. Fiez","doi":"10.1109/VLSID.2012.32","DOIUrl":"https://doi.org/10.1109/VLSID.2012.32","url":null,"abstract":"This tutorial begins with a broad overview of challenges in emerging mixed signal systems. After describing the system-level requirements along with the architecture and circuit needs, specific circuit and system solutions will be discussed to highlight promising approaches. Design techniques for advanced analog- and mixed signal circuit blocks such as phase-locked loops and analog-to-digital converters will be covered in detail. Finally, the modeling and analysis of substrate noise coupling in mixed-signal integrated circuits is addressed. This day long tutorial addresses both the system- and circuit-level aspects of emerging mixed-signal systems. Analysis and design techniques to implement analog to digital converters, phase-locked loops, and the impact of substrate noise on these circuits in large system-on-chips will be discussed. The tutorial is categorized into the following four categories.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133448957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design for Security of Block Cipher S-Boxes to Resist Differential Power Attacks 抗差分功率攻击的分组密码s盒安全性设计
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.56
Bodhisatwa Mazumdar, Debdeep Mukhopadhyay, I. Sengupta
This paper proposes an S-box construction of AES-128 block cipher which is more robust to differential power analysis (DPA) attacks than that of AES-128 implemented with Rijndael S-box while having similar cryptographic properties. The proposed S-box avoids use of countermeasures for thwarting DPA attacks thus consuming lesser area and power in the embedded hardware and still being more DPA resistive compared to Rijndael S-box. The design has been prototyped on Xilinx FPGA Spartan device XC3S400-4PQ208 and the power traces of the two different running AES-128 algorithms with the proposed and Rijndael S-boxes have been analyzed separately. The experimental results of the FPGA implementations show a lesser gate count consumption and increased throughput for the AES-128 with proposed S-box as that when implemented with Rijndael S-box on the same FPGA device. The requirement of higher number of power traces to perform DPA analysis on AES-128 with RAIN S-box as compared to that implemented with Rijndael S-box is an experimental validation of the theoretical claim of lower transparency order computed for RAIN S-box as being more DPA resistant than that of Rijndael S-box.
本文提出了一种s盒结构的AES-128分组密码,在具有相似的密码学特性的情况下,比采用Rijndael s盒实现的AES-128对差分功率分析(DPA)攻击具有更强的鲁棒性。与Rijndael S-box相比,所提出的S-box避免使用对抗DPA攻击的措施,从而在嵌入式硬件中消耗更少的面积和功率,并且仍然具有更高的DPA抗性。该设计在Xilinx FPGA Spartan器件XC3S400-4PQ208上进行了原型设计,并分别分析了采用所提出的和Rijndael s盒运行的两种不同AES-128算法的功率走线。FPGA实现的实验结果表明,与在同一FPGA器件上使用Rijndael S-box实现时相比,使用所提出的S-box实现的AES-128具有更少的门数消耗和更高的吞吐量。与Rijndael S-box相比,使用RAIN S-box对AES-128进行DPA分析需要更高数量的功率走线,这是对RAIN S-box计算的较低透明度顺序的理论主张的实验验证,因为它比Rijndael S-box更耐DPA。
{"title":"Design for Security of Block Cipher S-Boxes to Resist Differential Power Attacks","authors":"Bodhisatwa Mazumdar, Debdeep Mukhopadhyay, I. Sengupta","doi":"10.1109/VLSID.2012.56","DOIUrl":"https://doi.org/10.1109/VLSID.2012.56","url":null,"abstract":"This paper proposes an S-box construction of AES-128 block cipher which is more robust to differential power analysis (DPA) attacks than that of AES-128 implemented with Rijndael S-box while having similar cryptographic properties. The proposed S-box avoids use of countermeasures for thwarting DPA attacks thus consuming lesser area and power in the embedded hardware and still being more DPA resistive compared to Rijndael S-box. The design has been prototyped on Xilinx FPGA Spartan device XC3S400-4PQ208 and the power traces of the two different running AES-128 algorithms with the proposed and Rijndael S-boxes have been analyzed separately. The experimental results of the FPGA implementations show a lesser gate count consumption and increased throughput for the AES-128 with proposed S-box as that when implemented with Rijndael S-box on the same FPGA device. The requirement of higher number of power traces to perform DPA analysis on AES-128 with RAIN S-box as compared to that implemented with Rijndael S-box is an experimental validation of the theoretical claim of lower transparency order computed for RAIN S-box as being more DPA resistant than that of Rijndael S-box.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133872684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Circuit Optimization at 22nm Technology Node 22nm工艺节点的电路优化
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.91
A. Sachid, Pallavi Paliwal, S. Joshi, M. Baghini, D. Sharma, V. Rao
With every new technology node, scaling down of Device-to-Interconnect Capacitance ratio causes Interconnect delay to become bottleneck for circuit performance. To miti-gate this effect, interconnect routing area on-chip should be minimized for improved power-delay product. In this aspect, Fin FET with multiple fins per lithographic pitch gains more advantage, in comparison to Planar Device, since, such Fin FET devices allow increase of electrical width without increasing device layout area and thus, interconnect capacitance is comparatively lower. Therefore, minimum delay could be achieved for lesser device width, and thus, with lower power. This paper proves the performance enhancement with such Fin FET Device for Mux Circuit, and aims to find out Optimum Design Space for Mux Circuit, at 22nm technology node, with practical value of Interconnect Capacitive load (extrapolated from circuit layout in current technology node).
随着每一个新技术节点的出现,器件与互连电容比的缩小导致互连延迟成为电路性能的瓶颈。为了减轻这种影响,应尽量减少片上互连路由面积,以改善功率延迟产品。在这方面,与Planar器件相比,每个光刻节距具有多个翅片的Fin FET具有更大的优势,因为这种Fin FET器件允许在不增加器件布局面积的情况下增加电宽度,因此互连电容相对较低。因此,最小的延迟可以实现较小的器件宽度,从而以较低的功耗。本文验证了这种用于Mux电路的Fin FET器件的性能提升,旨在找出在22nm技术节点下Mux电路的最佳设计空间,具有互连电容负载的实用价值(从当前技术节点的电路布局推断)。
{"title":"Circuit Optimization at 22nm Technology Node","authors":"A. Sachid, Pallavi Paliwal, S. Joshi, M. Baghini, D. Sharma, V. Rao","doi":"10.1109/VLSID.2012.91","DOIUrl":"https://doi.org/10.1109/VLSID.2012.91","url":null,"abstract":"With every new technology node, scaling down of Device-to-Interconnect Capacitance ratio causes Interconnect delay to become bottleneck for circuit performance. To miti-gate this effect, interconnect routing area on-chip should be minimized for improved power-delay product. In this aspect, Fin FET with multiple fins per lithographic pitch gains more advantage, in comparison to Planar Device, since, such Fin FET devices allow increase of electrical width without increasing device layout area and thus, interconnect capacitance is comparatively lower. Therefore, minimum delay could be achieved for lesser device width, and thus, with lower power. This paper proves the performance enhancement with such Fin FET Device for Mux Circuit, and aims to find out Optimum Design Space for Mux Circuit, at 22nm technology node, with practical value of Interconnect Capacitive load (extrapolated from circuit layout in current technology node).","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129875830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Customizing Instruction Set Extensible Reconfigurable Processors Using GPUs 使用gpu定制指令集可扩展可重构处理器
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.107
Unmesh D. Bordoloi, B. Suri, S. Nunna, S. Chakraborty, P. Eles, Zebo Peng
Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally intractable. As such, existing tools to customize instruction sets of extensible processors rely on approximation methods or heuristics. In contrast to such traditional approaches, we propose to use GPUs (Graphics Processing Units) to efficiently solve computationally expensive algorithms in the design automation tools for extensible processors. To demonstrate our idea, we choose a custom instruction selection problem and accelerate it using CUDA (CUDA is a GPU computing engine). Our CUDA implementation is devised to maximize the achievable speedups by various optimizations like exploiting on-chip shared memory and register usage. Experiments conducted on well known benchmarks show significant speedups over sequential CPU implementations as well as over multi-core implementations.
许多可重构处理器允许根据目标应用程序的性能要求定制指令集。近年来,由于这种添加自定义指令的灵活性,它们获得了极大的普及。然而,大多数用于指令集定制的设计自动化算法(如枚举和选择最优定制指令集)在计算上是难以处理的。因此,现有的自定义可扩展处理器指令集的工具依赖于近似方法或启发式方法。与这些传统方法相比,我们建议使用gpu(图形处理单元)来有效地解决可扩展处理器设计自动化工具中计算昂贵的算法。为了证明我们的想法,我们选择了一个自定义指令选择问题,并使用CUDA (CUDA是一种GPU计算引擎)对其进行加速。我们的CUDA实现旨在通过各种优化(如利用片上共享内存和寄存器使用)最大限度地提高可实现的速度。在众所周知的基准测试上进行的实验表明,在顺序CPU实现和多核实现上都有显著的加速。
{"title":"Customizing Instruction Set Extensible Reconfigurable Processors Using GPUs","authors":"Unmesh D. Bordoloi, B. Suri, S. Nunna, S. Chakraborty, P. Eles, Zebo Peng","doi":"10.1109/VLSID.2012.107","DOIUrl":"https://doi.org/10.1109/VLSID.2012.107","url":null,"abstract":"Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally intractable. As such, existing tools to customize instruction sets of extensible processors rely on approximation methods or heuristics. In contrast to such traditional approaches, we propose to use GPUs (Graphics Processing Units) to efficiently solve computationally expensive algorithms in the design automation tools for extensible processors. To demonstrate our idea, we choose a custom instruction selection problem and accelerate it using CUDA (CUDA is a GPU computing engine). Our CUDA implementation is devised to maximize the achievable speedups by various optimizations like exploiting on-chip shared memory and register usage. Experiments conducted on well known benchmarks show significant speedups over sequential CPU implementations as well as over multi-core implementations.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Self-Induced Supply Noise Reduction Technique in GBPS Rate Transmitters GBPS速率发射机的自致供电降噪技术
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.52
Nitin Gupta, Tapas Nandy, P. Bala
In high speed link transmitters, one major contributor of jitter is the data-dependant switching of the transmitters. Such switching leads to oscillations in the supply R-L-C network. This paper presents an area-efficient way to reduce this supply noise by shifting the switching beyond the resonance frequency of the supply network, irrespective of the data-pattern. This scheme is implemented in HDMI transmitter in 65nm technology.
在高速链路发射机中,产生抖动的一个主要原因是发射机的数据相关切换。这种开关导致电源R-L-C网络中的振荡。本文提出了一种面积有效的方法,通过将开关移到供电网络的谐振频率之外,而不考虑数据模式,从而降低供电噪声。该方案在65nm技术的HDMI发射机上实现。
{"title":"Self-Induced Supply Noise Reduction Technique in GBPS Rate Transmitters","authors":"Nitin Gupta, Tapas Nandy, P. Bala","doi":"10.1109/VLSID.2012.52","DOIUrl":"https://doi.org/10.1109/VLSID.2012.52","url":null,"abstract":"In high speed link transmitters, one major contributor of jitter is the data-dependant switching of the transmitters. Such switching leads to oscillations in the supply R-L-C network. This paper presents an area-efficient way to reduce this supply noise by shifting the switching beyond the resonance frequency of the supply network, irrespective of the data-pattern. This scheme is implemented in HDMI transmitter in 65nm technology.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Rapid Methodology for Multi-mode Communication Circuit Generation 多模通信电路的快速生成方法
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.71
L. Tang, Jorgen Peddersen, S. Parameswaran
The need to integrate multiple wireless communication protocols into a single low-cost, low power hardware platform is prompted by the increasing number of emerging communication protocols and applications. This paper presents an efficient methodology for integrating multiple wireless protocols in an ASIC which minimizes resource occupation. A hierarchical data path merging algorithm is developed to find common shareable components in two different communication circuits. The data path merging approach will build a combined generic circuit with inserted multiplexers (MUXes) which can provide the same functionality of each individual circuit. The proposed method is orders of magnitude faster (well over 1000 times faster for realistic circuits) than the existing data path merging algorithm (with an overhead of 3% additional area) and can switch communication protocols on the fly (i.e. it can switch between protocols in a single clock cycle), which is a desirable feature for seemingly simultaneous multi-mode wireless communication. Wireless LAN (WLAN) 802.11a, WLAN802.11b and Ultra Wide Band (UWB) transmission circuits are merged to prove the efficacy of our proposal.
越来越多的新兴通信协议和应用促使人们需要将多种无线通信协议集成到一个低成本、低功耗的硬件平台中。本文提出了一种将多个无线协议集成到ASIC中的有效方法,该方法最大限度地减少了资源占用。为了在两种不同的通信电路中找到共同的可共享组件,提出了一种分层数据路径合并算法。数据路径合并方法将构建一个具有插入多路复用器(mux)的组合通用电路,该电路可以提供每个单独电路的相同功能。所提出的方法比现有的数据路径合并算法(具有3%额外面积的开销)快几个数量级(对于实际电路快1000倍以上),并且可以在飞行中切换通信协议(即它可以在单个时钟周期内在协议之间切换),这是看似同时多模无线通信的理想特征。无线局域网(WLAN) 802.11a、无线局域网(WLAN) 802.11b和超宽带(UWB)传输电路的合并证明了我们的建议的有效性。
{"title":"A Rapid Methodology for Multi-mode Communication Circuit Generation","authors":"L. Tang, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/VLSID.2012.71","DOIUrl":"https://doi.org/10.1109/VLSID.2012.71","url":null,"abstract":"The need to integrate multiple wireless communication protocols into a single low-cost, low power hardware platform is prompted by the increasing number of emerging communication protocols and applications. This paper presents an efficient methodology for integrating multiple wireless protocols in an ASIC which minimizes resource occupation. A hierarchical data path merging algorithm is developed to find common shareable components in two different communication circuits. The data path merging approach will build a combined generic circuit with inserted multiplexers (MUXes) which can provide the same functionality of each individual circuit. The proposed method is orders of magnitude faster (well over 1000 times faster for realistic circuits) than the existing data path merging algorithm (with an overhead of 3% additional area) and can switch communication protocols on the fly (i.e. it can switch between protocols in a single clock cycle), which is a desirable feature for seemingly simultaneous multi-mode wireless communication. Wireless LAN (WLAN) 802.11a, WLAN802.11b and Ultra Wide Band (UWB) transmission circuits are merged to prove the efficacy of our proposal.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122058167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2012 25th International Conference on VLSI Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1