Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337530
P. Marwedel, L. Wehmeyer, Manish Verma, S. Steinke, Urs Helmig
The design of future high-performance embedded systems is hampered by two problems: First, the required hardware needs more energy than is available from batteries. Second, current cache-based approaches for bridging the increasing speed gap between processors and memories cannot guarantee predictable real-time behavior. A contribution to solving both problems is made which describes a comprehensive set of algorithms that can be applied at design time in order to maximally exploit scratch pad memories (SPMs). We show that both the energy consumption as well as the computed worst case execution time (WCET) can be reduced by up to to 80% and 48%, respectively, by establishing a strong link between the memory architecture and the compiler.
{"title":"Fast, predictable and low energy memory references through architecture-aware compilation","authors":"P. Marwedel, L. Wehmeyer, Manish Verma, S. Steinke, Urs Helmig","doi":"10.1109/ASPDAC.2004.1337530","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337530","url":null,"abstract":"The design of future high-performance embedded systems is hampered by two problems: First, the required hardware needs more energy than is available from batteries. Second, current cache-based approaches for bridging the increasing speed gap between processors and memories cannot guarantee predictable real-time behavior. A contribution to solving both problems is made which describes a comprehensive set of algorithms that can be applied at design time in order to maximally exploit scratch pad memories (SPMs). We show that both the energy consumption as well as the computed worst case execution time (WCET) can be reduced by up to to 80% and 48%, respectively, by establishing a strong link between the memory architecture and the compiler.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337717
Shinobu Nagayama, Tsutomu Sasao
In this paper, we pmpose exact and heuristic algorithms for minimizing the memory size for heterogeneous Multivalued Decision Diagrams (MDh). In a heterogeneous MDD, each multi-valued variable can take a different domain. To represen1 a binary logic hnclion using a heterogeneous MDD, we partition the binary variables into gmnps, and treat the groups as multi-valued variables. Therefore, the memory size of a hetemgeneous MDD depends on the partition of the binary variables. Our experimental results show that heterogeneous MDDs repuim smaller memory size than Reduced Ordered Binary Decision Diagrams (ROBDDs) and Free BDDs (FBDDs).
{"title":"Minimization of memory size for heterogeneous MDDs","authors":"Shinobu Nagayama, Tsutomu Sasao","doi":"10.1109/ASPDAC.2004.1337717","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337717","url":null,"abstract":"In this paper, we pmpose exact and heuristic algorithms for minimizing the memory size for heterogeneous Multivalued Decision Diagrams (MDh). In a heterogeneous MDD, each multi-valued variable can take a different domain. To represen1 a binary logic hnclion using a heterogeneous MDD, we partition the binary variables into gmnps, and treat the groups as multi-valued variables. Therefore, the memory size of a hetemgeneous MDD depends on the partition of the binary variables. Our experimental results show that heterogeneous MDDs repuim smaller memory size than Reduced Ordered Binary Decision Diagrams (ROBDDs) and Free BDDs (FBDDs).","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337642
Yi-Ming Wang, Jinn-Shyan Wang
A reliable low-power fast skew-compensation circuit is proposed. Operating on the clock with a 50% duty cycle, the new design is more reliable compared to conventional SMD-based circuits [1]-[3], which can operate only on the pulsed clock. This new circuit also gets phase locking within two clock cycles. The test circuit works successfully between 600-MHz ~ 800-MHz with a power consumption of 25-μW/MHz ~ 26-μW/MHz. When measured at 616.9-MHz and 791.4-MHz, the static phase is 76.8-ps and 124.5-ps, respectively.
提出了一种可靠的低功耗快速偏斜补偿电路。与传统的基于smd的电路相比,新设计在50%占空比的时钟上工作更加可靠[1]-[3],传统的smd电路只能在脉冲时钟上工作。这种新电路也能在两个时钟周期内实现锁相。测试电路工作在600 ~ 800 MHz范围内,功耗为25 μ w /MHz ~ 26 μ w /MHz。当测量频率为616.9 mhz和791.4 mhz时,静态相位分别为76.8 ps和124.5 ps。
{"title":"A reliable low-power fast skew-compensation circuit","authors":"Yi-Ming Wang, Jinn-Shyan Wang","doi":"10.1109/ASPDAC.2004.1337642","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337642","url":null,"abstract":"A reliable low-power fast skew-compensation circuit is proposed. Operating on the clock with a 50% duty cycle, the new design is more reliable compared to conventional SMD-based circuits [1]-[3], which can operate only on the pulsed clock. This new circuit also gets phase locking within two clock cycles. The test circuit works successfully between 600-MHz ~ 800-MHz with a power consumption of 25-μW/MHz ~ 26-μW/MHz. When measured at 616.9-MHz and 791.4-MHz, the static phase is 76.8-ps and 124.5-ps, respectively.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"47 45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127531086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337588
M. Velev
We present an indirect method to automatically prove liveness for pipelined microprocessors. This is done by first proving safety-correctness for one step, starting from an arbitrary initial state that is possibly restricted by invariant constraints. By induction, the implementation will be correct for any number of steps; we need to prove that for some fixed number of steps, n, the implementation will fetch at least one instruction that will be completed. This was proved efficiently by using the property of positive equality. Modeling restrictions made the method applicable to designs with exceptions and branch prediction. The indirect method and the modeling restrictions resulted in 4 orders of magnitude speedup, enabling the automatic live-ness proof for dual-issue superscalar and VLIW designs.
{"title":"Using positive equality to prove liveness for pipelined microprocessors","authors":"M. Velev","doi":"10.1109/ASPDAC.2004.1337588","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337588","url":null,"abstract":"We present an indirect method to automatically prove liveness for pipelined microprocessors. This is done by first proving safety-correctness for one step, starting from an arbitrary initial state that is possibly restricted by invariant constraints. By induction, the implementation will be correct for any number of steps; we need to prove that for some fixed number of steps, n, the implementation will fetch at least one instruction that will be completed. This was proved efficiently by using the property of positive equality. Modeling restrictions made the method applicable to designs with exceptions and branch prediction. The indirect method and the modeling restrictions resulted in 4 orders of magnitude speedup, enabling the automatic live-ness proof for dual-issue superscalar and VLIW designs.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127537403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337673
Dongkun Shin, Jihong Kim
We describe dynamic voltage scaling (DVS) algorithms for real-time systems with both periodic and aperiodic tasks. Although many DVS algorithms have been developed for real-time systems with periodic tasks, none of them can he used for the system with both periodic and aperiodic tasks because of arbitrary temporal hehaviors of aperiodic tasks. We propose an off-line DVS algorithm and on-line DVS algorithms that are based on existing DVS algorithms. The proposed algorithms utilize the execution behaviors of scheduling server for aperiodic tasks. Experimental results show that the proposed algorithms reduce the energy consumption by 12% and 32% under the RM scheduling policy and the EDF scheduling policy, respectively.
{"title":"Dynamic voltage scaling of periodic and aperiodic tasks in priority-driven systems","authors":"Dongkun Shin, Jihong Kim","doi":"10.1109/ASPDAC.2004.1337673","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337673","url":null,"abstract":"We describe dynamic voltage scaling (DVS) algorithms for real-time systems with both periodic and aperiodic tasks. Although many DVS algorithms have been developed for real-time systems with periodic tasks, none of them can he used for the system with both periodic and aperiodic tasks because of arbitrary temporal hehaviors of aperiodic tasks. We propose an off-line DVS algorithm and on-line DVS algorithms that are based on existing DVS algorithms. The proposed algorithms utilize the execution behaviors of scheduling server for aperiodic tasks. Experimental results show that the proposed algorithms reduce the energy consumption by 12% and 32% under the RM scheduling policy and the EDF scheduling policy, respectively.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126794469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337692
N. Togawa, K. Tachikake, Yuichiro Miyaoka, M. Yanagisawa, T. Ohtsuki
This paper focuses on SlMD processor synthesis and proposes a SIMD instruction setlfunctional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assemhly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.
{"title":"Instruction set and functional unit synthesis for SIMD processor cores","authors":"N. Togawa, K. Tachikake, Yuichiro Miyaoka, M. Yanagisawa, T. Ohtsuki","doi":"10.1109/ASPDAC.2004.1337692","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337692","url":null,"abstract":"This paper focuses on SlMD processor synthesis and proposes a SIMD instruction setlfunctional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assemhly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114555760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337580
H. Huang, Cheng-Yeh Wang, Jing-Yang Jou
A novel strategy for designing the heterogeneous-tree multiplexer is proposed. We build the multiplexer delay model by curve fitting and then formulate the heterogeneous-tree multiplexer design problem as a special type of optimization problem called mixed-integer nonlinear programming (MINLP). A new design parameter, the switch size in each stage, is introduced to improve the speed of the heterogeneous-tree multiplexer. The proposed strategy can determine the multiplexer architecture and the switch size in each stage simultaneously. Three optimization methods are provided to synthesize the heterogeneous-tree multiplexer according to the design specifications.
{"title":"Optimal design of high fan-in multiplexers via mixed-integer nonlinear programming","authors":"H. Huang, Cheng-Yeh Wang, Jing-Yang Jou","doi":"10.1109/ASPDAC.2004.1337580","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337580","url":null,"abstract":"A novel strategy for designing the heterogeneous-tree multiplexer is proposed. We build the multiplexer delay model by curve fitting and then formulate the heterogeneous-tree multiplexer design problem as a special type of optimization problem called mixed-integer nonlinear programming (MINLP). A new design parameter, the switch size in each stage, is introduced to improve the speed of the heterogeneous-tree multiplexer. The proposed strategy can determine the multiplexer architecture and the switch size in each stage simultaneously. Three optimization methods are provided to synthesize the heterogeneous-tree multiplexer according to the design specifications.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"34 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114099942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337607
N. Jangkrajarng, S. Bhattacharya, R. Hartono, C. Shi
We present an automatic layout retargeting tool that generates analog and RF layouts incorporating new device sizes and geometries based on new circuit specifications. A graph-based symbolic template is automatically constructed from a practical layout such that expert designer knowledge embedded in the layout is preserved. The template can be solved for multiple layouts based on different device sizes and geometries, satisfying several different specifications. Symmetry conservation and passive device modification are also embedded in the tool. The retargeting tool is demonstrated on a voltage controlled oscillator to generate three layouts with different target goals. While manual redesign is known to take days to finish, the automatic layout retargeting tool takes a few hours to generate a reusable template and takes minutes to generate comparable layouts.
{"title":"Multiple specifications radio-frequency integrated circuit design with automatic template-driven layout retargeting","authors":"N. Jangkrajarng, S. Bhattacharya, R. Hartono, C. Shi","doi":"10.1109/ASPDAC.2004.1337607","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337607","url":null,"abstract":"We present an automatic layout retargeting tool that generates analog and RF layouts incorporating new device sizes and geometries based on new circuit specifications. A graph-based symbolic template is automatically constructed from a practical layout such that expert designer knowledge embedded in the layout is preserved. The template can be solved for multiple layouts based on different device sizes and geometries, satisfying several different specifications. Symmetry conservation and passive device modification are also embedded in the tool. The retargeting tool is demonstrated on a voltage controlled oscillator to generate three layouts with different target goals. While manual redesign is known to take days to finish, the automatic layout retargeting tool takes a few hours to generate a reusable template and takes minutes to generate comparable layouts.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121543482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337605
Subhasish Banerjee, G. Surendra, S. Nandy
Processing streaming media comprises several program phases (often distinct) that are periodic and independent of application data. Here we characterize execution of such programs into execution phases based on their dynamic IPC (instruction per cycle) profile. We show that program execution of selected phases can be dynamically boosted by activating additional standby functional units which are otherwise powered down for saving energy. Through simulation we show that speedup ranging from 1.1 to 1.25 can be achieved while reducing the energy-delay product (EDP) for most of the media benchmarks evaluated. Additionally we show that artificially introduced stalls during phases of processor underutilization reduces power by around 2 to 4%.
{"title":"Exploiting program execution phases to trade power and performance for media workload","authors":"Subhasish Banerjee, G. Surendra, S. Nandy","doi":"10.1109/ASPDAC.2004.1337605","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337605","url":null,"abstract":"Processing streaming media comprises several program phases (often distinct) that are periodic and independent of application data. Here we characterize execution of such programs into execution phases based on their dynamic IPC (instruction per cycle) profile. We show that program execution of selected phases can be dynamically boosted by activating additional standby functional units which are otherwise powered down for saving energy. Through simulation we show that speedup ranging from 1.1 to 1.25 can be achieved while reducing the energy-delay product (EDP) for most of the media benchmarks evaluated. Additionally we show that artificially introduced stalls during phases of processor underutilization reduces power by around 2 to 4%.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114973170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-01-27DOI: 10.1109/ASPDAC.2004.1337645
Chun-Pong Yu, O. Choy, Hao Min, C. Chan, K. Pun
This paper presents the design of a low power 16-hit asynchronous java processor for contactless smart card. It can directly execute the java bytecodes in a subset ofthe instruction set defined in the Java Card Virtual Machine specification 111 The remaining java bytecodes are handled by software routines. Also, we intend to use asynchronous circuit design technique to reduce the power Consumption of the java processor core. It has been fabricated in a CMOS 0.35-μm technology and the experimental result shows that it is suitable for the contactless smart card.
{"title":"A low power asynchronous java processor for contactless smart card","authors":"Chun-Pong Yu, O. Choy, Hao Min, C. Chan, K. Pun","doi":"10.1109/ASPDAC.2004.1337645","DOIUrl":"https://doi.org/10.1109/ASPDAC.2004.1337645","url":null,"abstract":"This paper presents the design of a low power 16-hit asynchronous java processor for contactless smart card. It can directly execute the java bytecodes in a subset ofthe instruction set defined in the Java Card Virtual Machine specification 111 The remaining java bytecodes are handled by software routines. Also, we intend to use asynchronous circuit design technique to reduce the power Consumption of the java processor core. It has been fabricated in a CMOS 0.35-μm technology and the experimental result shows that it is suitable for the contactless smart card.","PeriodicalId":426349,"journal":{"name":"ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115287006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}