This paper proposes three approaches for allocation of scratch-pad memory in non-preemptive fixed-priority multi-task systems. These approaches can reduce energy consumption of instruction memory. Each approach is formulated as an integer programming problem which simultaneously determines (1) partitioning of scratch-pad memory spaces for the tasks, and (2) allocation of functions to the scratch-pad memory space for each task. The experimental results show the effectiveness of the proposed approaches.
{"title":"Allocation of scratch-pad memory in priority-based multi-task systems","authors":"Hideki Takase, H. Tomiyama, H. Takada","doi":"10.2197/ipsjtsldm.2.180","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.2.180","url":null,"abstract":"This paper proposes three approaches for allocation of scratch-pad memory in non-preemptive fixed-priority multi-task systems. These approaches can reduce energy consumption of instruction memory. Each approach is formulated as an integer programming problem which simultaneously determines (1) partitioning of scratch-pad memory spaces for the tasks, and (2) allocation of functions to the scratch-pad memory space for each task. The experimental results show the effectiveness of the proposed approaches.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128235018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158155
S. Jang, Chuang-Jen Huang, Cheng-Chen Liu
A divide-by-3 CMOS LC-tank injection locked frequency divider (ILFD) is proposed and implemented in a 0.35µm CMOS process. The ILFD circuit is realized with a double cross-coupled complementary MOSFET LC-tank oscillator with two injection MOSFETs across the resonator inductors for signal injection. The self-oscillating VCO is injection-locked by third-harmonic input to obtain the division factor of three. Measurement results show that at the supply voltage of 2.0 V, the free-running frequency is from 3.18 GHz to 3.316 GHz. At the incident power of 0dBm, the total locking range is from the incident frequency 9.41 GHz to 10.03 GHz. The power consumption of the ILFD core is 8 mW.
{"title":"A 0.35µm CMOS divide-by-3 LC injection-locked frequency divider","authors":"S. Jang, Chuang-Jen Huang, Cheng-Chen Liu","doi":"10.1109/VDAT.2009.5158155","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158155","url":null,"abstract":"A divide-by-3 CMOS LC-tank injection locked frequency divider (ILFD) is proposed and implemented in a 0.35µm CMOS process. The ILFD circuit is realized with a double cross-coupled complementary MOSFET LC-tank oscillator with two injection MOSFETs across the resonator inductors for signal injection. The self-oscillating VCO is injection-locked by third-harmonic input to obtain the division factor of three. Measurement results show that at the supply voltage of 2.0 V, the free-running frequency is from 3.18 GHz to 3.316 GHz. At the incident power of 0dBm, the total locking range is from the incident frequency 9.41 GHz to 10.03 GHz. The power consumption of the ILFD core is 8 mW.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128598523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158163
Yun-Ching Tang, Dosheng Hu, Weiyi Wei, Wen-Chung Lin, Hongchin Lin
A memory-efficient Viterbi decoder (VD) named modified state exchange (MSE) is proposed using pre-trace back technique to obtain the decoded data by blocks. Since the architecture of MSE can record the “survival state number,” which can also be the resulted decoded data, no decision bit is required during trace back and decoding. Therefore, the power and chip area of the survivor memory unit in the MSE method are smaller than those of the existing trace back approaches. The VD using MSE approach for (2, 1, 6) convolutional code was designed using TSMC 0.18µm 1P6M CMOS technology. The core area is 0.69mm2 with power consumption of 58mW at 100MHz.
{"title":"A memory-efficient architecture for low latency Viterbi decoders","authors":"Yun-Ching Tang, Dosheng Hu, Weiyi Wei, Wen-Chung Lin, Hongchin Lin","doi":"10.1109/VDAT.2009.5158163","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158163","url":null,"abstract":"A memory-efficient Viterbi decoder (VD) named modified state exchange (MSE) is proposed using pre-trace back technique to obtain the decoded data by blocks. Since the architecture of MSE can record the “survival state number,” which can also be the resulted decoded data, no decision bit is required during trace back and decoding. Therefore, the power and chip area of the survivor memory unit in the MSE method are smaller than those of the existing trace back approaches. The VD using MSE approach for (2, 1, 6) convolutional code was designed using TSMC 0.18µm 1P6M CMOS technology. The core area is 0.69mm2 with power consumption of 58mW at 100MHz.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127966984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158099
Aseem Gupta, S. Pasricha, N. Dutt, F. Kurdahi, K. Khouri, M. Abadir
In current Systems-on-Chip (SoC) designs, managing peak temperature is critical to ensure operation without failure. Our novel Communication Architecture Based Thermal Management (CBTM) scheme manages thermal behavior of components by delaying the execution of chosen IP-blocks or components by regulating the flow of data over the on-chip communication bus. This temperature aware traffic flow over the bus is achieved by dynamically changing the communication priority table in response to thermal readings from sensors. With CBTM, the temperatures of individual components can be controlled selectively. In this paper we demonstrate the effectiveness of CBTM on four industrial size SoC designs and also evaluate its performance impact. We observe that CBTM maintained thermal thresholds and reduced the peak temperature of an SoC by as much as 29°C.
{"title":"On chip Communication-Architecture Based Thermal Management for SoCs","authors":"Aseem Gupta, S. Pasricha, N. Dutt, F. Kurdahi, K. Khouri, M. Abadir","doi":"10.1109/VDAT.2009.5158099","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158099","url":null,"abstract":"In current Systems-on-Chip (SoC) designs, managing peak temperature is critical to ensure operation without failure. Our novel Communication Architecture Based Thermal Management (CBTM) scheme manages thermal behavior of components by delaying the execution of chosen IP-blocks or components by regulating the flow of data over the on-chip communication bus. This temperature aware traffic flow over the bus is achieved by dynamically changing the communication priority table in response to thermal readings from sensors. With CBTM, the temperatures of individual components can be controlled selectively. In this paper we demonstrate the effectiveness of CBTM on four industrial size SoC designs and also evaluate its performance impact. We observe that CBTM maintained thermal thresholds and reduced the peak temperature of an SoC by as much as 29°C.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133916922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158166
Sung-Che Li, Wei-Ting Liao, M. Lee, W. Hsieh, C. Liu
Nowadays, the communication architecture has become a major source of power consumption in complicated System-on-Chip (SoC) designs. In this paper, a practical cycle-accurate power model for on-chip communication architecture using AMBA system is proposed to help high-level power analysis. According to the distinct properties of each bus component, different methods are adopted to build accurate power models. In addition, the proposed power model can be integrated into RTL simulator easily, which allows performing the power analysis at high level. The experiment results have shown that the average error of the proposed power model is less than 5.14% and the simulation overhead is less than 8.7%
{"title":"A practical power model of AMBA system for high-level power analysis","authors":"Sung-Che Li, Wei-Ting Liao, M. Lee, W. Hsieh, C. Liu","doi":"10.1109/VDAT.2009.5158166","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158166","url":null,"abstract":"Nowadays, the communication architecture has become a major source of power consumption in complicated System-on-Chip (SoC) designs. In this paper, a practical cycle-accurate power model for on-chip communication architecture using AMBA system is proposed to help high-level power analysis. According to the distinct properties of each bus component, different methods are adopted to build accurate power models. In addition, the proposed power model can be integrated into RTL simulator easily, which allows performing the power analysis at high level. The experiment results have shown that the average error of the proposed power model is less than 5.14% and the simulation overhead is less than 8.7%","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133246339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158105
Víctor Reyes
ESL design methods and tools are being proposed to improve the productivity of the designers and to bridge the design and verification gaps. The main area where ESL solutions are being successfully applied on current desire flows is Virtual Prototyping. The success of these methods relies on the rapidly adoption from Semiconductor industry and EDA vendors of standards such as SystemC and TLM 2.0. Ideally. TLM models must be accurate enough, fast enough and easy to create in order to fit all Virtual Prototype use-cases. However reality shows that different requirements are achieved only by using different type of models (the right model for the right use-case). This is because TLM modeling is a multidimensional problem where the different dimensions (speed, timing accuracy and modeling effort) are orthogonal with each other. Having to create and maintain a separated model for each use-case is drastically reducing the benefits of VP technology, due to elevated cost of creating and maintaining the models consistent with each other. Therefore, model reuse and refinement is a must for the suceess of ESL technology. This paper describes modeling concepts that can be used to create speed optimal models with low effort, which can be gradually refined with more timing accuracy and therefore reused for different VP use-cases.
{"title":"Refinement and reuse of TLM 2.0 models: The key for ESL success","authors":"Víctor Reyes","doi":"10.1109/VDAT.2009.5158105","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158105","url":null,"abstract":"ESL design methods and tools are being proposed to improve the productivity of the designers and to bridge the design and verification gaps. The main area where ESL solutions are being successfully applied on current desire flows is Virtual Prototyping. The success of these methods relies on the rapidly adoption from Semiconductor industry and EDA vendors of standards such as SystemC and TLM 2.0. Ideally. TLM models must be accurate enough, fast enough and easy to create in order to fit all Virtual Prototype use-cases. However reality shows that different requirements are achieved only by using different type of models (the right model for the right use-case). This is because TLM modeling is a multidimensional problem where the different dimensions (speed, timing accuracy and modeling effort) are orthogonal with each other. Having to create and maintain a separated model for each use-case is drastically reducing the benefits of VP technology, due to elevated cost of creating and maintaining the models consistent with each other. Therefore, model reuse and refinement is a must for the suceess of ESL technology. This paper describes modeling concepts that can be used to create speed optimal models with low effort, which can be gradually refined with more timing accuracy and therefore reused for different VP use-cases.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114674660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158120
Shyh-Chyi Yang, Hao-I Yang, C. Chuang, W. Hwang
The threshold voltage (VT) drifts caused by Negative-Bias Temperature Instability (NBTI) and Positive-Bias Temperature Instability (PBTI) degrade stability, margin, and performance of nanoscale SRAM over the lifetime of usage. Moreover, most state-of-the-art SRAMs employ replica timing control scheme to mitigate the effects of excessive leakage and variation, and NBTI/PBTI induced VT drifts can render the scheme ineffective or even useless. In this paper, we investigate impacts of NBTI and PBTI on SRAM Write operations based on PTM 32nm CMOS technology node poly-gate and high-k metal-gate models. We propose an NBTI/PBTI tolerant Write-replica timing control scheme to mitigate Write margin and performance degradation. By using multi-bank architecture and biasing the virtual supply line of inactive timing-critical circuits to GND to minimize the stress time and maximize the “Recovery” period, the NBTI/PBTI induced SRAM Write performance degradation can be reduced by around 32–48%.
{"title":"Timing control degradation and NBTI/PBTI tolerant design for Write-replica circuit in nanoscale CMOS SRAM","authors":"Shyh-Chyi Yang, Hao-I Yang, C. Chuang, W. Hwang","doi":"10.1109/VDAT.2009.5158120","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158120","url":null,"abstract":"The threshold voltage (VT) drifts caused by Negative-Bias Temperature Instability (NBTI) and Positive-Bias Temperature Instability (PBTI) degrade stability, margin, and performance of nanoscale SRAM over the lifetime of usage. Moreover, most state-of-the-art SRAMs employ replica timing control scheme to mitigate the effects of excessive leakage and variation, and NBTI/PBTI induced VT drifts can render the scheme ineffective or even useless. In this paper, we investigate impacts of NBTI and PBTI on SRAM Write operations based on PTM 32nm CMOS technology node poly-gate and high-k metal-gate models. We propose an NBTI/PBTI tolerant Write-replica timing control scheme to mitigate Write margin and performance degradation. By using multi-bank architecture and biasing the virtual supply line of inactive timing-critical circuits to GND to minimize the stress time and maximize the “Recovery” period, the NBTI/PBTI induced SRAM Write performance degradation can be reduced by around 32–48%.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129494372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158152
Chung-Fu Lin, Jen-Chieh Ou, Meng-Hsueh Wang, Y. Ou, Ming-Hsin Ku
With the increasing functionalities in modern SoC design, the need for dense embedded memory is growing. The test issue for this high density embedded DRAM (eDRAM) macro in a complex integration environment is becoming an important issue. In this work, we propose a single-instruction based programmable memory BIST for testing an eDRAM macro. Based on our BIST design, the supported memory testing algorithms are classified into five groups. Moreover, a compact instruction is proposed to encode the operation of each group and a two-level address generator is adopted to produce all the required addressing indexes. The proposed architecture provides a better design tradeoff in terms of the area overhead and the programmability compared with the existing work.
{"title":"Single-instruction based programmable memory BIST for testing embedded DRAM","authors":"Chung-Fu Lin, Jen-Chieh Ou, Meng-Hsueh Wang, Y. Ou, Ming-Hsin Ku","doi":"10.1109/VDAT.2009.5158152","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158152","url":null,"abstract":"With the increasing functionalities in modern SoC design, the need for dense embedded memory is growing. The test issue for this high density embedded DRAM (eDRAM) macro in a complex integration environment is becoming an important issue. In this work, we propose a single-instruction based programmable memory BIST for testing an eDRAM macro. Based on our BIST design, the supported memory testing algorithms are classified into five groups. Moreover, a compact instruction is proposed to encode the operation of each group and a two-level address generator is adopted to produce all the required addressing indexes. The proposed architecture provides a better design tradeoff in terms of the area overhead and the programmability compared with the existing work.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128919816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158083
G. Konstadinidis
The free ride from process technology for CPU design has ended. Innovations in architecture, circuit design, and physical implementation are required to cope with increased challenges imposed by the lack of process scaling, increased variability and layout-dependent effects. In addition, power density is rising to prohibitive levels and has now become the predominant performance limiter. Extensive power management at both architectural and circuit levels is a major focus point in today's microprocessor design. This paper will give an overview of the issues, the potential solutions and the tool requirements to address the ever- increasing physical design and power management challenges.
在 CPU 设计方面,工艺技术的免费搭车期已经结束。必须在架构、电路设计和物理实现方面进行创新,以应对因缺乏工艺扩展、可变性增加和布局依赖效应而带来的更多挑战。此外,功率密度正在上升到令人望而却步的水平,现已成为限制性能的主要因素。在架构和电路层面进行广泛的电源管理是当今微处理器设计的一个主要焦点。本文将概述这些问题、潜在的解决方案和工具要求,以应对不断增加的物理设计和电源管理挑战。
{"title":"Challenges in microprocessor physical and power management design","authors":"G. Konstadinidis","doi":"10.1109/VDAT.2009.5158083","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158083","url":null,"abstract":"The free ride from process technology for CPU design has ended. Innovations in architecture, circuit design, and physical implementation are required to cope with increased challenges imposed by the lack of process scaling, increased variability and layout-dependent effects. In addition, power density is rising to prohibitive levels and has now become the predominant performance limiter. Extensive power management at both architectural and circuit levels is a major focus point in today's microprocessor design. This paper will give an overview of the issues, the potential solutions and the tool requirements to address the ever- increasing physical design and power management challenges.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126426194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-04-28DOI: 10.1109/VDAT.2009.5158098
Shu-Yen Lin, Wen-Chung Shen, Chan-Cheng Hsu, Chih-Hao Chao, A. Wu
A fault-tolerant router design (20-path router) is proposed to reduce the impacts of faulty routers for 2D-mesh based chip multiprocessor systems. In our experiments, the OCNs using 20PRs can reduce 75.65% ∼ 85.01% unreachable packets and 7.78% ∼ 26.59% latency in comparison with the OCNs using generic XY routers.
{"title":"Fault-tolerant router with built-in self-test/self-diagnosis and fault-isolation circuits for 2D-mesh based chip multiprocessor systems","authors":"Shu-Yen Lin, Wen-Chung Shen, Chan-Cheng Hsu, Chih-Hao Chao, A. Wu","doi":"10.1109/VDAT.2009.5158098","DOIUrl":"https://doi.org/10.1109/VDAT.2009.5158098","url":null,"abstract":"A fault-tolerant router design (20-path router) is proposed to reduce the impacts of faulty routers for 2D-mesh based chip multiprocessor systems. In our experiments, the OCNs using 20PRs can reduce 75.65% ∼ 85.01% unreachable packets and 7.78% ∼ 26.59% latency in comparison with the OCNs using generic XY routers.","PeriodicalId":246670,"journal":{"name":"2009 International Symposium on VLSI Design, Automation and Test","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114148190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}