Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811892
Yuzhong Xiao, Chixiao Chen, Rui Wei, Fan Jiang, Jun Xu, Junyan Ren
A third-order multi-bit feedforward-feedback (FF-FB) mixed continuous time sigma-delta modulator (CTSDM) for WLAN receivers is presented. The comparators and switch drivers are simplified and therefore can be more power-efficient. The FF-FB mixed architecture obtains a signal transfer function (STF) with little out-of-band peaking. Based on the signal bandwidth and anti-aliasing requirements for the WLAN receivers, we realized the modulator with SMIC 0.18um CMOS Mixed Signal process and the core achieves 80dB SNR over 10MHz signal bandwidth while consuming 30mW at 1.8V supply.
{"title":"A 80-dB DR, 10-MHz BW continuous-time sigma-delta modulator with low power comparators and switch drivers","authors":"Yuzhong Xiao, Chixiao Chen, Rui Wei, Fan Jiang, Jun Xu, Junyan Ren","doi":"10.1109/ASICON.2013.6811892","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811892","url":null,"abstract":"A third-order multi-bit feedforward-feedback (FF-FB) mixed continuous time sigma-delta modulator (CTSDM) for WLAN receivers is presented. The comparators and switch drivers are simplified and therefore can be more power-efficient. The FF-FB mixed architecture obtains a signal transfer function (STF) with little out-of-band peaking. Based on the signal bandwidth and anti-aliasing requirements for the WLAN receivers, we realized the modulator with SMIC 0.18um CMOS Mixed Signal process and the core achieves 80dB SNR over 10MHz signal bandwidth while consuming 30mW at 1.8V supply.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129475138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811833
Guoyue Jiang, Fang Wang, Zhaolin Li, Shaojun Wei
Stream processors have emerged as a mainstream solution for computation intensive applications. This paper proposes a power-efficient network-on-chip (NoC) for multi-core stream processors, aiming at improving the communication performance and power consumption. In the proposed NoC, specific stream paths are proposed according to features of multi-core stream processing. Specific stream paths are constructed based on a packet-switched NoC, providing fast and power-efficient transmissions for stream communications. To support specific stream paths on the packet-switched NoC, the modified micro-architecture of the router is proposed with a negligible area overhead. A set of stream applications are exploited for evaluation. Experimental results show that, an average of 16.0% latency reduction and 35.9% power saving can be obtained.
{"title":"A power-efficient network-on-chip for multi-core stream processors","authors":"Guoyue Jiang, Fang Wang, Zhaolin Li, Shaojun Wei","doi":"10.1109/ASICON.2013.6811833","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811833","url":null,"abstract":"Stream processors have emerged as a mainstream solution for computation intensive applications. This paper proposes a power-efficient network-on-chip (NoC) for multi-core stream processors, aiming at improving the communication performance and power consumption. In the proposed NoC, specific stream paths are proposed according to features of multi-core stream processing. Specific stream paths are constructed based on a packet-switched NoC, providing fast and power-efficient transmissions for stream communications. To support specific stream paths on the packet-switched NoC, the modified micro-architecture of the router is proposed with a negligible area overhead. A set of stream applications are exploited for evaluation. Experimental results show that, an average of 16.0% latency reduction and 35.9% power saving can be obtained.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126297019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811997
G. Hang, Yang Yang, P. Zhao, Xiaohui Hu, X. You
A novel differential dynamic CMOS logic using multiple-input floating-gate MOS(FGMOS) transistors is proposed. In this circuit family, a pair of n-channel multiple-input FGMOS pull down logic networks is used to replace the nMOS logic tree in the conventional dynamic differential cascode voltage switch logic circuit. A simple synthesis technique of the n-channel multiple-input FGMOS logic tree by employing summation signal is also discussed. By using multiple-input FGMOS, the logic tree can be significantly simplified. HSPICE simulations using TSMC 0.35μm 2-ploy 4-metal CMOS technology have verified the effectiveness of the proposed design scheme.
{"title":"A clocked differential switch logic using floating-gate MOS transistors","authors":"G. Hang, Yang Yang, P. Zhao, Xiaohui Hu, X. You","doi":"10.1109/ASICON.2013.6811997","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811997","url":null,"abstract":"A novel differential dynamic CMOS logic using multiple-input floating-gate MOS(FGMOS) transistors is proposed. In this circuit family, a pair of n-channel multiple-input FGMOS pull down logic networks is used to replace the nMOS logic tree in the conventional dynamic differential cascode voltage switch logic circuit. A simple synthesis technique of the n-channel multiple-input FGMOS logic tree by employing summation signal is also discussed. By using multiple-input FGMOS, the logic tree can be significantly simplified. HSPICE simulations using TSMC 0.35μm 2-ploy 4-metal CMOS technology have verified the effectiveness of the proposed design scheme.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126346688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6812042
Chao Liang
Mixed-signal design becomes more and more popular nowadays because designers are required to quickly integrate IPs, control blocks, functional blocks, and analog modules together and run through the design flow to tape out in short time. Given that the latest designs are becoming more and more complex, the increasing physical effects in advanced process nodes, and request for shorter time to market, a fast and accurate design flow will be critical to ensure the success of the project. This paper will briefly describe various mixed signal verification methods used at Freescale Kinetis MCU which include behavior modeling, AMS validation, connectivity verification, mixed-signal Verification IP (VIP), multi-power verification, SoC transistor level simulation and mixed signal functional coverage. Engineering results are discussed to demonstrate the effectiveness of those methods.
{"title":"Mixed-signal verification methods for multi-power mixed-signal System-on-Chip (SoC) design","authors":"Chao Liang","doi":"10.1109/ASICON.2013.6812042","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6812042","url":null,"abstract":"Mixed-signal design becomes more and more popular nowadays because designers are required to quickly integrate IPs, control blocks, functional blocks, and analog modules together and run through the design flow to tape out in short time. Given that the latest designs are becoming more and more complex, the increasing physical effects in advanced process nodes, and request for shorter time to market, a fast and accurate design flow will be critical to ensure the success of the project. This paper will briefly describe various mixed signal verification methods used at Freescale Kinetis MCU which include behavior modeling, AMS validation, connectivity verification, mixed-signal Verification IP (VIP), multi-power verification, SoC transistor level simulation and mixed signal functional coverage. Engineering results are discussed to demonstrate the effectiveness of those methods.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129197314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811911
Weijing Shi, Yi Li, Jun Han, Xu Cheng, Xiaoyang Zeng
Compressive sensing theory, which allows sparse signal to be sampled at sub-Nyquist rate, is introduced to Wireless Body Area Networks to reduce the hardware requirement and energy consumption of signal acquisition. However, signal recovery by software causes time delay for real-time reconstruction. In this paper, we propose a high speed hardware implementation of orthogonal matching pursuit reconstruction algorithm in SMIC 130nm CMOS. By using original multi-functional systolic arrays, it is highly parallel and extensible. Experimental result shows that it completes a reconstruction of 16-sparseness 256-length ECG signal in 45μs with maximum operating frequency of 167MHz. When the sparseness is 8, it takes 18μs to recover the signal, which is 33% faster than the state-of-art design.
{"title":"An extensible and real-time compressive sensing reconstruction hardware for WBANs using OMP","authors":"Weijing Shi, Yi Li, Jun Han, Xu Cheng, Xiaoyang Zeng","doi":"10.1109/ASICON.2013.6811911","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811911","url":null,"abstract":"Compressive sensing theory, which allows sparse signal to be sampled at sub-Nyquist rate, is introduced to Wireless Body Area Networks to reduce the hardware requirement and energy consumption of signal acquisition. However, signal recovery by software causes time delay for real-time reconstruction. In this paper, we propose a high speed hardware implementation of orthogonal matching pursuit reconstruction algorithm in SMIC 130nm CMOS. By using original multi-functional systolic arrays, it is highly parallel and extensible. Experimental result shows that it completes a reconstruction of 16-sparseness 256-length ECG signal in 45μs with maximum operating frequency of 167MHz. When the sparseness is 8, it takes 18μs to recover the signal, which is 33% faster than the state-of-art design.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117270875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811821
Zao Liu, Xin Huang, S. Tan, Hai Wang, H. Tang
In this paper, we propose a new distributed task migration method to reduce the thermal hot spots and on-chip temperature variance, which leads to better thermal reliability and reduced package costs of emerging many-core processors. The novelty of the new algorithm is that the task migration is done in a fully distributed way while we can still maintain some degrees of global view to guide the process. This is enabled by recently proposed distributed state tracking technique to dynamically estimate the average temperature of all the cores, which provides the important global view of the temperature of the whole chip to efficiently guide local task migration among cores. In addition, the local task migration will be carried out based on the power, temperature, and load influence from neighboring cores. Our experimental results on a 36 core microprocessor demonstrate that the proposed method can reduce 30% more thermal hot spots compared with the existing distributed thermal management method, leading to more balanced temperature distribution of many-core microprocessor chips.
{"title":"Distributed task migration for thermal hot spot reduction in many-core microprocessors","authors":"Zao Liu, Xin Huang, S. Tan, Hai Wang, H. Tang","doi":"10.1109/ASICON.2013.6811821","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811821","url":null,"abstract":"In this paper, we propose a new distributed task migration method to reduce the thermal hot spots and on-chip temperature variance, which leads to better thermal reliability and reduced package costs of emerging many-core processors. The novelty of the new algorithm is that the task migration is done in a fully distributed way while we can still maintain some degrees of global view to guide the process. This is enabled by recently proposed distributed state tracking technique to dynamically estimate the average temperature of all the cores, which provides the important global view of the temperature of the whole chip to efficiently guide local task migration among cores. In addition, the local task migration will be carried out based on the power, temperature, and load influence from neighboring cores. Our experimental results on a 36 core microprocessor demonstrate that the proposed method can reduce 30% more thermal hot spots compared with the existing distributed thermal management method, leading to more balanced temperature distribution of many-core microprocessor chips.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"10 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122458657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811912
Lei Li, Jian Wang, Jinmei Lai
This paper presents a completely unified interconnect unit (UINT) including unified input and output multiplexers (UIM and UOM) which are usually non-repeatable [1-3]. UINT ensures different logic modules could have exactly the identical interconnect circuit, providing higher scalability for FPGAs. Furthermore, Multi-Vt switch circuit combining low threshold voltage and high threshold voltage transistors is put forward to minimize the adverse effects brought by threshold voltage loss and decrease of Supply Voltage in Nanometer technology, attaining high speed performance of FPGA. The proposed interconnect unit is applied to own-designed Fudan Programmable (FDP5) FPGA and realized through 65 nm technology. Post-layout simulation results indicate that the proposed interconnect circuit is well-designed with up to 40% improvement of speed performance compared to the prior work [3] equivalent to the same technology, yet maintaining lower power consumption and smaller area, reduced by 12% and 35% respectively.
{"title":"Improved unified interconnect unit for high speed and scalable FPGA","authors":"Lei Li, Jian Wang, Jinmei Lai","doi":"10.1109/ASICON.2013.6811912","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811912","url":null,"abstract":"This paper presents a completely unified interconnect unit (UINT) including unified input and output multiplexers (UIM and UOM) which are usually non-repeatable [1-3]. UINT ensures different logic modules could have exactly the identical interconnect circuit, providing higher scalability for FPGAs. Furthermore, Multi-Vt switch circuit combining low threshold voltage and high threshold voltage transistors is put forward to minimize the adverse effects brought by threshold voltage loss and decrease of Supply Voltage in Nanometer technology, attaining high speed performance of FPGA. The proposed interconnect unit is applied to own-designed Fudan Programmable (FDP5) FPGA and realized through 65 nm technology. Post-layout simulation results indicate that the proposed interconnect circuit is well-designed with up to 40% improvement of speed performance compared to the prior work [3] equivalent to the same technology, yet maintaining lower power consumption and smaller area, reduced by 12% and 35% respectively.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123039703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811998
Chenxi Deng
In this paper, an area-efficient implementation of a multistage decimation filter for audio ΣΔ ADC is presented. The decimator with a decimation ratio of 256 has less than 0.005dB passband ripple and 100dB stop band attenuation. It has an audio passband of 0-20kHz and outputs 16-bit resolution signal at 48kHz. With an area-efficient architecture involving RAM and ROM, and the dedicated instruction scheduling through 256 steps in a cycle, the decimator is synthesized with fewer than 300 LUTs and fewer than 160 Slices on a Xilinx Spartan3E FPGA. An ALU with only one 32-bit processing register and one 16-bit output register is designed. The computing rate or the clock rate is equal to the input sampling rate, which lowers power consumption and simplifies clock generation design. A Matlab compiler is developed to automate the generation of ROM word bits according to the instruction scheduling. At last, the simulation result of the RTL model in Modelsim is verified by the MatlabSimulink programs to ensure that the internal 32-bit register data is `bit true' while processing the 1-bit input stream.
{"title":"An area-efficient implementation of ΣΔ ADC multistage decimation filter","authors":"Chenxi Deng","doi":"10.1109/ASICON.2013.6811998","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811998","url":null,"abstract":"In this paper, an area-efficient implementation of a multistage decimation filter for audio ΣΔ ADC is presented. The decimator with a decimation ratio of 256 has less than 0.005dB passband ripple and 100dB stop band attenuation. It has an audio passband of 0-20kHz and outputs 16-bit resolution signal at 48kHz. With an area-efficient architecture involving RAM and ROM, and the dedicated instruction scheduling through 256 steps in a cycle, the decimator is synthesized with fewer than 300 LUTs and fewer than 160 Slices on a Xilinx Spartan3E FPGA. An ALU with only one 32-bit processing register and one 16-bit output register is designed. The computing rate or the clock rate is equal to the input sampling rate, which lowers power consumption and simplifies clock generation design. A Matlab compiler is developed to automate the generation of ROM word bits according to the instruction scheduling. At last, the simulation result of the RTL model in Modelsim is verified by the MatlabSimulink programs to ensure that the internal 32-bit register data is `bit true' while processing the 1-bit input stream.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6812052
Shengyou Zhong, L. Yao, Jiqing Zhang
Integrating image sensor and analog-to-digital converter (ADC) array on the same substrate and developing digital image sensor become more and more popular with the development of CMOS technologies. It can reduce the complexity of the whole system and decrease the cost of the system. More importantly, it can integrate more digital signal processing which the traditional image sensor cannot achieve. Integrating ADC array into the image sensor requires ADC to have characters of low power and small area. In this work, a 10-bit is proposed resistor-capacitor hybrid successive approximation register ADC (RC SAR ADC) which uses a resistor ladder to generate 3 reference voltages for last 6 least significant bits (LSB) and shares the shift register of SAR control logic for the ADC array to reduce the area. Theory analysis and simulation show that the proposed RC SAR ADC is suitable for array applications.
随着CMOS技术的发展,将图像传感器与模数转换器(ADC)阵列集成在同一衬底上,开发数字图像传感器越来越受到人们的欢迎。它可以降低整个系统的复杂性,降低系统的成本。更重要的是,它可以集成更多传统图像传感器无法实现的数字信号处理。将ADC阵列集成到图像传感器中,要求ADC具有低功耗、小面积的特点。在这项工作中,提出了一个10位的电阻-电容混合连续逼近寄存器ADC (RC SAR ADC),它使用电阻阶梯为最后6个最低有效位(LSB)产生3个参考电压,并为ADC阵列共享SAR控制逻辑的移位寄存器,以减少面积。理论分析和仿真结果表明,所提出的RC SAR ADC适用于阵列应用。
{"title":"A small-area low-power ADC array for image sensor applications","authors":"Shengyou Zhong, L. Yao, Jiqing Zhang","doi":"10.1109/ASICON.2013.6812052","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6812052","url":null,"abstract":"Integrating image sensor and analog-to-digital converter (ADC) array on the same substrate and developing digital image sensor become more and more popular with the development of CMOS technologies. It can reduce the complexity of the whole system and decrease the cost of the system. More importantly, it can integrate more digital signal processing which the traditional image sensor cannot achieve. Integrating ADC array into the image sensor requires ADC to have characters of low power and small area. In this work, a 10-bit is proposed resistor-capacitor hybrid successive approximation register ADC (RC SAR ADC) which uses a resistor ladder to generate 3 reference voltages for last 6 least significant bits (LSB) and shares the shift register of SAR control logic for the ADC array to reduce the area. Theory analysis and simulation show that the proposed RC SAR ADC is suitable for array applications.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115931136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ASICON.2013.6811852
Renfeng Dou, Yifan Bo, Jun Han, Xiaoyang Zeng
Fast Fourier transform (FFT) is one of the key operations in digital communication systems and digital signal processing platforms. This paper presents a design of high throughput variable-length FFT processor based on switch network (SN) architecture. Meanwhile, strong runtime configurability and scalability is exploited. Considering the support for variable-length FFT as well as the balance between speed and cost, the mixed-radix (MR) technique and in-place strategy are used. In addition, auto synchronization method is proposed to make stage pipelined mode work efficiently. Batch processing mode is also proposed to boost performance for small size FFTs. The results show that the throughput for 16-point to 256-point FFT can be improved from 5.8X to 1.2X, respectively. The processor supports 16- to 8192-point FFT and provides about 2GSamples/s for FFT size less than or equal to 256 by batch processing, and 1GSamples/s throughput for larger size FFT at 500MHz. The core area is 2.04 mm2 and the power consumption is 68 mW at 100MHz for 1k-point.
{"title":"Design of a high throughput configurable variable-length FFT processor based on switch network architecture","authors":"Renfeng Dou, Yifan Bo, Jun Han, Xiaoyang Zeng","doi":"10.1109/ASICON.2013.6811852","DOIUrl":"https://doi.org/10.1109/ASICON.2013.6811852","url":null,"abstract":"Fast Fourier transform (FFT) is one of the key operations in digital communication systems and digital signal processing platforms. This paper presents a design of high throughput variable-length FFT processor based on switch network (SN) architecture. Meanwhile, strong runtime configurability and scalability is exploited. Considering the support for variable-length FFT as well as the balance between speed and cost, the mixed-radix (MR) technique and in-place strategy are used. In addition, auto synchronization method is proposed to make stage pipelined mode work efficiently. Batch processing mode is also proposed to boost performance for small size FFTs. The results show that the throughput for 16-point to 256-point FFT can be improved from 5.8X to 1.2X, respectively. The processor supports 16- to 8192-point FFT and provides about 2GSamples/s for FFT size less than or equal to 256 by batch processing, and 1GSamples/s throughput for larger size FFT at 500MHz. The core area is 2.04 mm2 and the power consumption is 68 mW at 100MHz for 1k-point.","PeriodicalId":150654,"journal":{"name":"2013 IEEE 10th International Conference on ASIC","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115476654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}