Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158108
Kewei Zhu, Yici Cai, Qiang Zhou, Xianlong Hong
This paper presents a new detialed router for the hierarchical field programmable gate arrays (H-FPGAs). The optimal objectives of proposed routing algorithm are improving the time consumption of routing procedure (minimizing the running time of algorithm), and at the same time make great effort to decrease the wire length and critical path delay. Initially, nets are routed sequentially according to their criticalities. Then, to achieve optimization targets, the nets violating routablity constrains are resolved iteratively by a rip-up and rerouting router using the simulated evolution optimization technique, where each net will be evaluated via a rip-up priority function consisting of the timing part and the congestion part, and then compared to a random number to decide if it will be ripped and rerouted. An experimental result under commercial H-FPGA shows that our router can have about 26% improvement on the time-consumption and 0.45% reduction on total wire length when compared with a modified VPR.
{"title":"A detailed router for hierarchical FPGAs based on simulated evolution","authors":"Kewei Zhu, Yici Cai, Qiang Zhou, Xianlong Hong","doi":"10.1109/VDAT.2009.5158108","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158108","url":null,"abstract":"This paper presents a new detialed router for the hierarchical field programmable gate arrays (H-FPGAs). The optimal objectives of proposed routing algorithm are improving the time consumption of routing procedure (minimizing the running time of algorithm), and at the same time make great effort to decrease the wire length and critical path delay. Initially, nets are routed sequentially according to their criticalities. Then, to achieve optimization targets, the nets violating routablity constrains are resolved iteratively by a rip-up and rerouting router using the simulated evolution optimization technique, where each net will be evaluated via a rip-up priority function consisting of the timing part and the congestion part, and then compared to a random number to decide if it will be ripped and rerouted. An experimental result under commercial H-FPGA shows that our router can have about 26% improvement on the time-consumption and 0.45% reduction on total wire length when compared with a modified VPR.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131145404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158170
Yi-Ruei Wu, Yu-Sheng Chen, J. Shann
For improving the efficiency of a program, it is important to reduce stalls caused by memory access. Traditional programs usually spend much time on memory stalls when accessing lower-level memory, and so do Java programs. In order to reduce memory stall time, prefetching is a feasible solution. We observed some obvious properties of array access so that we could prefetch array data by taking advantage of these properties. We analyzed these properties and proposed a suitable array prefetching mechanism for embedded Java hardware accelerators so as to reduce the time spent on memory stalls. Our approach eliminates about 25% of array stall time on average and even up to 50% for some array-based programs.
{"title":"Prefetching for array data in embedded Java hardware accelerator","authors":"Yi-Ruei Wu, Yu-Sheng Chen, J. Shann","doi":"10.1109/VDAT.2009.5158170","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158170","url":null,"abstract":"For improving the efficiency of a program, it is important to reduce stalls caused by memory access. Traditional programs usually spend much time on memory stalls when accessing lower-level memory, and so do Java programs. In order to reduce memory stall time, prefetching is a feasible solution. We observed some obvious properties of array access so that we could prefetch array data by taking advantage of these properties. We analyzed these properties and proposed a suitable array prefetching mechanism for embedded Java hardware accelerators so as to reduce the time spent on memory stalls. Our approach eliminates about 25% of array stall time on average and even up to 50% for some array-based programs.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132807638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158165
Lianq-Yu Lin, Huang-Kai Lin, Cheng-Yeh Wang, Lan-Da Van, Jing-Yang Jou
In this paper, we propose one hierarchical 2-D mesh Network-on-Chip (NoC) platform to support applications with the complexity of several hundreds of tasks or with huge amount of transmission data. Moreover, applying the task binding method by considering communication amount, communication data contention and bandwidth penalty to enhance the system overall performance of the new architecture. Modeling the NoC system data transmission behavior at system level is applied to predict system overall performance and an automatic NoC system performance simulation tonl is also built. Therefore, architecture and designers can predict the system performance and obtain all parameters of the designed platform at system abstraction level. The experimental results show that the overall system throughput, the latency, and the saving of redundant transactions are improved by 27%, 14.4% and 21.8% respectively under the communication dominated situation.
{"title":"Hierarchical architecture for network-on-chip platform","authors":"Lianq-Yu Lin, Huang-Kai Lin, Cheng-Yeh Wang, Lan-Da Van, Jing-Yang Jou","doi":"10.1109/VDAT.2009.5158165","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158165","url":null,"abstract":"In this paper, we propose one hierarchical 2-D mesh Network-on-Chip (NoC) platform to support applications with the complexity of several hundreds of tasks or with huge amount of transmission data. Moreover, applying the task binding method by considering communication amount, communication data contention and bandwidth penalty to enhance the system overall performance of the new architecture. Modeling the NoC system data transmission behavior at system level is applied to predict system overall performance and an automatic NoC system performance simulation tonl is also built. Therefore, architecture and designers can predict the system performance and obtain all parameters of the designed platform at system abstraction level. The experimental results show that the overall system throughput, the latency, and the saving of redundant transactions are improved by 27%, 14.4% and 21.8% respectively under the communication dominated situation.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133269575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158148
Jinwook Oh, Seungjin Lee, Joo-Young Kim, H. Yoo
This paper presents an area and power efficient cellular neural network (CNN) that enables real-time image processing. The proposed shared synapse architecture halves the number of required synapse multipliers, which are the main contributor to area and power consumption of CNNs. For this, a current holder circuit is used to sample and hold the currents of non-changing synaptic circuit outputs. Compared to the conventional architecture of CNNs, power and area are reduced by 46% and 41%, respectively.
{"title":"An area efficient shared synapse cellular neural network for low power image processing","authors":"Jinwook Oh, Seungjin Lee, Joo-Young Kim, H. Yoo","doi":"10.1109/VDAT.2009.5158148","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158148","url":null,"abstract":"This paper presents an area and power efficient cellular neural network (CNN) that enables real-time image processing. The proposed shared synapse architecture halves the number of required synapse multipliers, which are the main contributor to area and power consumption of CNNs. For this, a current holder circuit is used to sample and hold the currents of non-changing synaptic circuit outputs. Compared to the conventional architecture of CNNs, power and area are reduced by 46% and 41%, respectively.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114149267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158135
S. Jang, Chia‐Wei Tai, Cheng-Chen Liu
This paper proposes a 6-port 3-dimensional (3-D) transformer used to improve the performance of injection-locked frequency divider (ILFD). The aim of the 3-D transformer is to reduce chip size and to reduce power consumption. The CMOS LC-tank ILFD is implemented using the direct injection nMOS between the differential outputs of an nMOS-core cross-coupled VCO. At the supply voltage of 0.6 V, the free-running frequency of ILFD is tunable from 4.81 GHz to 5.3 GHz. At the incident power of 0 dBm and VDD = 0.6 V, the total locking range is about 3 GHz, from the incident frequency 8.9 to 11.9 GHz for the ILFD in the divide-by-2 mode. The core power consumption is 1.02 mW. The die area is 0.394 × 0.623 mm2.
{"title":"Implementation of 6-Port 3D transformer in injection-locked frequency divider","authors":"S. Jang, Chia‐Wei Tai, Cheng-Chen Liu","doi":"10.1109/VDAT.2009.5158135","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158135","url":null,"abstract":"This paper proposes a 6-port 3-dimensional (3-D) transformer used to improve the performance of injection-locked frequency divider (ILFD). The aim of the 3-D transformer is to reduce chip size and to reduce power consumption. The CMOS LC-tank ILFD is implemented using the direct injection nMOS between the differential outputs of an nMOS-core cross-coupled VCO. At the supply voltage of 0.6 V, the free-running frequency of ILFD is tunable from 4.81 GHz to 5.3 GHz. At the incident power of 0 dBm and VDD = 0.6 V, the total locking range is about 3 GHz, from the incident frequency 8.9 to 11.9 GHz for the ILFD in the divide-by-2 mode. The core power consumption is 1.02 mW. The die area is 0.394 × 0.623 mm2.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121712632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158167
Li-Yi Lin, Hsin-chang Lin, Shih-Arn Hwang
SOC designs for consumer electronics often evolve generation by generation in a very short time. Besides the needs for merging more functionality, more and more enhancements are for the purpose of interface upgrading for new standards and better or faster signal processing hardware engines for video/audio encoding and decoding. Physical designs for these kinds of enhanced chips can reuse large potions of the previous layout and do not need to be re-implemented from the ground up to shorten the time to market. However, traditional physical incremental design method is becoming impractical, especially for the flat design, which usually can has the advantage of the smaller die size compared with the hierarchical design. In this paper, we propose an incremental physical design method to take the advantages of the hieratical design while maintaining the cost strength in the flat design. Our proposed method has been successfully applied to our next generation multimedia chip and the results show that no design iteration is needed and the run time is at least 5 times faster compared with the traditional method.
{"title":"Incremental physical design method for flat SOC design","authors":"Li-Yi Lin, Hsin-chang Lin, Shih-Arn Hwang","doi":"10.1109/VDAT.2009.5158167","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158167","url":null,"abstract":"SOC designs for consumer electronics often evolve generation by generation in a very short time. Besides the needs for merging more functionality, more and more enhancements are for the purpose of interface upgrading for new standards and better or faster signal processing hardware engines for video/audio encoding and decoding. Physical designs for these kinds of enhanced chips can reuse large potions of the previous layout and do not need to be re-implemented from the ground up to shorten the time to market. However, traditional physical incremental design method is becoming impractical, especially for the flat design, which usually can has the advantage of the smaller die size compared with the hierarchical design. In this paper, we propose an incremental physical design method to take the advantages of the hieratical design while maintaining the cost strength in the flat design. Our proposed method has been successfully applied to our next generation multimedia chip and the results show that no design iteration is needed and the run time is at least 5 times faster compared with the traditional method.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122545716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158094
Sying-Jyan Wang, Shun-Jie Huang, Katherine Shu-Min Li
Static power due to leakage current will become a major source of power consumption in the nanometer technology era. In this paper, we propose a simple yet effective technique to reduce both static and dynamic power consumption in the scan test process. The leakage current is restrained by selecting a good primary input vector to control the paths of leakage current during the scan shift process, and this vector can also be used to reduce dynamic power. However, the reverse is not always true. A heuristic algorithm is presented to find such vectors. The proposed method is simulated by SPICE with BPTM 22nm transistor model, and the results show that on the average 15% total power reduction is achievable by the proposed method.
{"title":"Static and dynamic test power reduction in scan-based testing","authors":"Sying-Jyan Wang, Shun-Jie Huang, Katherine Shu-Min Li","doi":"10.1109/VDAT.2009.5158094","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158094","url":null,"abstract":"Static power due to leakage current will become a major source of power consumption in the nanometer technology era. In this paper, we propose a simple yet effective technique to reduce both static and dynamic power consumption in the scan test process. The leakage current is restrained by selecting a good primary input vector to control the paths of leakage current during the scan shift process, and this vector can also be used to reduce dynamic power. However, the reverse is not always true. A heuristic algorithm is presented to find such vectors. The proposed method is simulated by SPICE with BPTM 22nm transistor model, and the results show that on the average 15% total power reduction is achievable by the proposed method.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127711797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158131
Chen-Kang Ho, Hao-Chiao Hong
This paper demonstrates a 6-GS/s 6-bit flash ADC and current-steering DAC pair in 0.13µm CMOS. Averaging and interpolating techniques are applied to reduce the offsets and to save the power of the ADC. Current mode logics are used to achieve a high speed and to overcome the severe power bouncing issue. Design-for-testability circuits are added to conduct the at-speed tests by internally cascading the ADC and DAC. The cascaded ADC and DAC pair clocked at 6GHz achieves a 37.0 dB signal-to-noise ratio and a 26.0 dBc spurious-free dynamic range with the −1 dBFS, 502 MHz stimulus. The ADC and DAC consumes 655 mW and 115 mW from a 1.2-V supply, respectively.
{"title":"A 6-GS/s, 6-bit, at-speed testable ADC and DAC pair in 0.13µm CMOS","authors":"Chen-Kang Ho, Hao-Chiao Hong","doi":"10.1109/VDAT.2009.5158131","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158131","url":null,"abstract":"This paper demonstrates a 6-GS/s 6-bit flash ADC and current-steering DAC pair in 0.13µm CMOS. Averaging and interpolating techniques are applied to reduce the offsets and to save the power of the ADC. Current mode logics are used to achieve a high speed and to overcome the severe power bouncing issue. Design-for-testability circuits are added to conduct the at-speed tests by internally cascading the ADC and DAC. The cascaded ADC and DAC pair clocked at 6GHz achieves a 37.0 dB signal-to-noise ratio and a 26.0 dBc spurious-free dynamic range with the −1 dBFS, 502 MHz stimulus. The ADC and DAC consumes 655 mW and 115 mW from a 1.2-V supply, respectively.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134152774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158114
A. Niknejad, D. Chowdhury
This papers highlights the design of RF and mm-wave building blocks which achieve high performance despite implementation in low-voltage digital CMOS technology. It is shown that integrated circuit transformers can greatly aid the design of such circuits by serving many roles, such as impedance matching, AC coupling, voltage combining, and converting signals from single-ended to differential form. Design, layout, and simulation strategies will be reviewed and several key power amplifier (PA) building blocks will be demonstrated.
{"title":"Transforming RF and mm-Wave CMOS circuits","authors":"A. Niknejad, D. Chowdhury","doi":"10.1109/VDAT.2009.5158114","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158114","url":null,"abstract":"This papers highlights the design of RF and mm-wave building blocks which achieve high performance despite implementation in low-voltage digital CMOS technology. It is shown that integrated circuit transformers can greatly aid the design of such circuits by serving many roles, such as impedance matching, AC coupling, voltage combining, and converting signals from single-ended to differential form. Design, layout, and simulation strategies will be reviewed and several key power amplifier (PA) building blocks will be demonstrated.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127685635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158134
Jung-Yu Chang, Che-Wei Fan, Shen-Iuan Liu
A frequency synthesizer for Mode-1 MB-OFDM UWB applications is realized in 65nm CMOS. By using a delay-locked loop (DLL) and the proposed multiply-by-two circuit, the frequency synthesizer achieves the in-band spur of −40dBc for the three-band operation. The proposed multiply-by-2 circuit realizes the quadrature signals, and its input signals do not need the 50% duty cycle. A modified current-starving cell in a DLL is also proposed to reduce the supply noise sensitivity. The measured switching time from 3.342GHz to 4.488GHz is around 1.1ns. The area is 1.25×1.175mm2 with pads and the power is 19.2mW for 1.2V supply.
{"title":"A frequency synthesizer for Mode-1 MB-OFDM UWB applications","authors":"Jung-Yu Chang, Che-Wei Fan, Shen-Iuan Liu","doi":"10.1109/VDAT.2009.5158134","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158134","url":null,"abstract":"A frequency synthesizer for Mode-1 MB-OFDM UWB applications is realized in 65nm CMOS. By using a delay-locked loop (DLL) and the proposed multiply-by-two circuit, the frequency synthesizer achieves the in-band spur of −40dBc for the three-band operation. The proposed multiply-by-2 circuit realizes the quadrature signals, and its input signals do not need the 50% duty cycle. A modified current-starving cell in a DLL is also proposed to reduce the supply noise sensitivity. The measured switching time from 3.342GHz to 4.488GHz is around 1.1ns. The area is 1.25×1.175mm2 with pads and the power is 19.2mW for 1.2V supply.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127886744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}