Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751839
Xueqi Cheng, M. Hsiao
The increasing complexity and size of software designs has made scalability a major bottleneck in software verification. Program abstraction has shown potential in alleviating this problem through selective search space reduction. In this paper, we propose an Ant Colony Optimization (ACO)-directed program structure construction to formulate a novel under-approximation based program abstraction (UAPA). By taking advantage of the resulting abstraction, a new software bounded model checking framework is built with the aim of improving the performance of property checking, especially for property falsification. Experimental results on various programs showed that the proposed ACO-directed program abstraction can dramatically improve the performance of software bounded model checking with significant speedups.
{"title":"Ant Colony Optimization directed program abstraction for software bounded model checking","authors":"Xueqi Cheng, M. Hsiao","doi":"10.1109/ICCD.2008.4751839","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751839","url":null,"abstract":"The increasing complexity and size of software designs has made scalability a major bottleneck in software verification. Program abstraction has shown potential in alleviating this problem through selective search space reduction. In this paper, we propose an Ant Colony Optimization (ACO)-directed program structure construction to formulate a novel under-approximation based program abstraction (UAPA). By taking advantage of the resulting abstraction, a new software bounded model checking framework is built with the aim of improving the performance of property checking, especially for property falsification. Experimental results on various programs showed that the proposed ACO-directed program abstraction can dramatically improve the performance of software bounded model checking with significant speedups.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123318527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751885
C. Hochberger, A. Weiss
The amount of time and resources that have to be spent on debugging of embedded cores continuously increases. Approaches valid 10 years ago can no longer be used due to the variety and complexity of peripheral components of SoC solutions that even might consist of multiple heterogeneous cores. Although there are some initiatives to standardize and leverage the embedded debugging capabilities, current debugging solutions only cover a fraction of the problems present in that area. In this contribution we show a new approach for debugging and tracing SoCs. The new approach, called hidICE (hidden ICE), delivers an exhaustive, continuous and real-time trace with much lower system interference compared to state-of-the-art solutions.
{"title":"Acquiring an exhaustive, continuous and real-time trace from SoCs","authors":"C. Hochberger, A. Weiss","doi":"10.1109/ICCD.2008.4751885","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751885","url":null,"abstract":"The amount of time and resources that have to be spent on debugging of embedded cores continuously increases. Approaches valid 10 years ago can no longer be used due to the variety and complexity of peripheral components of SoC solutions that even might consist of multiple heterogeneous cores. Although there are some initiatives to standardize and leverage the embedded debugging capabilities, current debugging solutions only cover a fraction of the problems present in that area. In this contribution we show a new approach for debugging and tracing SoCs. The new approach, called hidICE (hidden ICE), delivers an exhaustive, continuous and real-time trace with much lower system interference compared to state-of-the-art solutions.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115666830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751914
V. Nagbhushan, C. Y. Chen
Complex timing constraints that refer to multiple clocks and/or edges are often used in the design of modern high performance processors. Such constraints complicate the design of downstream algorithms such as logic synthesis. The complexity of the overall CAD system can be reduced considerably if we can optimally transform the timing constraints so that they refer only to a single clock and edge. In this paper, we show how to model these multi clock/edge timing constraints and describe algorithms to reduce the number reference clocks/edges. We address the important problems of accurately handling signal transitions, sequential elements, input slope variations and timing overrides, which have not been addressed before.
{"title":"Modeling and reduction of complex timing constraints in high performance digital circuits","authors":"V. Nagbhushan, C. Y. Chen","doi":"10.1109/ICCD.2008.4751914","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751914","url":null,"abstract":"Complex timing constraints that refer to multiple clocks and/or edges are often used in the design of modern high performance processors. Such constraints complicate the design of downstream algorithms such as logic synthesis. The complexity of the overall CAD system can be reduced considerably if we can optimally transform the timing constraints so that they refer only to a single clock and edge. In this paper, we show how to model these multi clock/edge timing constraints and describe algorithms to reduce the number reference clocks/edges. We address the important problems of accurately handling signal transitions, sequential elements, input slope variations and timing overrides, which have not been addressed before.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131041310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751910
Hyotaek Shim, Jaegeuk Kim, Dawoon Jung, Jin-Soo Kim, S. Maeng
It is an important issue to reduce the power consumption of a hard disk that takes a large amount of computer systempsilas power. As a new trend, an NV cache is used to make a disk spin down longer by servicing read/write requests instead of the disk. During the spin-down periods, write requests can be simply handled by write buffering, but read requests are still the main cause of initiating spin-ups because of a low hit ratio in the NV cache. Even when there is no user activity, read requests can be frequently generated by running applications and system services, hindering the spin-down. In this paper, we propose new NV cache policies: active write caching to reduce or to delay spin-ups caused by read misses during spin-down periods and a read miss-based spin-down algorithm to extend the spin-down periods, exploiting the NV cache effectively. Our policies reduce the power consumption of a hard disk by up to 50.1% with a 512 MB NV cache, compared with preceding approaches.
{"title":"RMA: A Read Miss-Based Spin-Down Algorithm using an NV Cache","authors":"Hyotaek Shim, Jaegeuk Kim, Dawoon Jung, Jin-Soo Kim, S. Maeng","doi":"10.1109/ICCD.2008.4751910","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751910","url":null,"abstract":"It is an important issue to reduce the power consumption of a hard disk that takes a large amount of computer systempsilas power. As a new trend, an NV cache is used to make a disk spin down longer by servicing read/write requests instead of the disk. During the spin-down periods, write requests can be simply handled by write buffering, but read requests are still the main cause of initiating spin-ups because of a low hit ratio in the NV cache. Even when there is no user activity, read requests can be frequently generated by running applications and system services, hindering the spin-down. In this paper, we propose new NV cache policies: active write caching to reduce or to delay spin-ups caused by read misses during spin-down periods and a read miss-based spin-down algorithm to extend the spin-down periods, exploiting the NV cache effectively. Our policies reduce the power consumption of a hard disk by up to 50.1% with a 512 MB NV cache, compared with preceding approaches.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127207312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751844
Atif Hashmi, Mikko H. Lipasti
World data is increasing rapidly, doubling almost every three years[1][2]. To comprehend and use this data effectively, search and recognition (SR) applications will demand more computational power in the future. The inherent speedups that these applications get due to frequency scaling will no longer exist as processor vendors move away from frequency scaling and towards multi-core architectures. Thus, modifications to both the structure of SR applications and current processor architectures are required to meet the computational needs of these workloads. This paper describes a novel hardware acceleration scheme to improve the performance of SR applications. The hardware accelerator relies on Ternary Content-Addressable Memory and some straightforward ISA extensions to deliver a promising speedup of 3.0-4.0 for SR workloads like Template Matching, BLAST, and multi-threaded applications using Software Transactional Memory (STM).
{"title":"Accelerating search and recognition with a TCAM functional unit","authors":"Atif Hashmi, Mikko H. Lipasti","doi":"10.1109/ICCD.2008.4751844","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751844","url":null,"abstract":"World data is increasing rapidly, doubling almost every three years[1][2]. To comprehend and use this data effectively, search and recognition (SR) applications will demand more computational power in the future. The inherent speedups that these applications get due to frequency scaling will no longer exist as processor vendors move away from frequency scaling and towards multi-core architectures. Thus, modifications to both the structure of SR applications and current processor architectures are required to meet the computational needs of these workloads. This paper describes a novel hardware acceleration scheme to improve the performance of SR applications. The hardware accelerator relies on Ternary Content-Addressable Memory and some straightforward ISA extensions to deliver a promising speedup of 3.0-4.0 for SR workloads like Template Matching, BLAST, and multi-threaded applications using Software Transactional Memory (STM).","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751883
J. P. Grossman, J. Salmon, C. R. Ho, D. Ierardi, Brian Towles, Brannon Batson, Jochen Spengler, Stanley C. Wang, Rolf Mueller, Michael Theobald, C. Young, Joseph Gagliardo, Martin M. Deneroff, R. Dror, D. Shaw
One of the major design verification challenges in the development of Anton, a massively parallel special-purpose machine for molecular dynamics, was to provide evidence that computations spanning more than a quadrillion clock cycles will produce valid scientific results. Our verification methodology addressed this problem by using a hierarchy of RTL, architectural, and numerical simulations. Block- and chip-level RTL models were verified by means of extensive co-simulation with a detailed C++ architectural simulator, ensuring that the RTL models could perform the same molecular dynamics computations as the architectural simulator. The output of the architectural simulator was compared to a parallelized numerical simulator that produces bitwise identical results to Anton, and is fast enough to verify the long-term numerical stability of computations on Anton. These explicit couplings between adjacent levels of the simulation hierarchy created a continuous verification chain from molecular dynamics to individual logic gates.
{"title":"Hierarchical simulation-based verification of Anton, a special-purpose parallel machine","authors":"J. P. Grossman, J. Salmon, C. R. Ho, D. Ierardi, Brian Towles, Brannon Batson, Jochen Spengler, Stanley C. Wang, Rolf Mueller, Michael Theobald, C. Young, Joseph Gagliardo, Martin M. Deneroff, R. Dror, D. Shaw","doi":"10.1109/ICCD.2008.4751883","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751883","url":null,"abstract":"One of the major design verification challenges in the development of Anton, a massively parallel special-purpose machine for molecular dynamics, was to provide evidence that computations spanning more than a quadrillion clock cycles will produce valid scientific results. Our verification methodology addressed this problem by using a hierarchy of RTL, architectural, and numerical simulations. Block- and chip-level RTL models were verified by means of extensive co-simulation with a detailed C++ architectural simulator, ensuring that the RTL models could perform the same molecular dynamics computations as the architectural simulator. The output of the architectural simulator was compared to a parallelized numerical simulator that produces bitwise identical results to Anton, and is fast enough to verify the long-term numerical stability of computations on Anton. These explicit couplings between adjacent levels of the simulation hierarchy created a continuous verification chain from molecular dynamics to individual logic gates.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124183673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751931
Daehyun Kim, S. Lim
Circuit and processor designs will continue to increase in complexity for the foreseeable future. With these increasing sizes comes the use of wide buses to move large amounts of data from one place to another. Bus routing has therefore become increasingly important. In this paper, we present a new bus routing algorithm that globally optimizes both the floorplan and the bus routes themselves. Our algorithm is based on creating a range of feasible bus positions and then using Linear Programming to optimally solve for bus locations. We present this algorithm for use in microarchitectures and explore several different optimization objectives, including performance, floorplan area, and power consumption. Our results demonstrate that this algorithm is effective for efficiently generating feasible routes for complex modern designs and provides better results than previous approaches.
{"title":"Global bus route optimization with application to microarchitectural design exploration","authors":"Daehyun Kim, S. Lim","doi":"10.1109/ICCD.2008.4751931","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751931","url":null,"abstract":"Circuit and processor designs will continue to increase in complexity for the foreseeable future. With these increasing sizes comes the use of wide buses to move large amounts of data from one place to another. Bus routing has therefore become increasingly important. In this paper, we present a new bus routing algorithm that globally optimizes both the floorplan and the bus routes themselves. Our algorithm is based on creating a range of feasible bus positions and then using Linear Programming to optimally solve for bus locations. We present this algorithm for use in microarchitectures and explore several different optimization objectives, including performance, floorplan area, and power consumption. Our results demonstrate that this algorithm is effective for efficiently generating feasible routes for complex modern designs and provides better results than previous approaches.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124291231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751937
H. Homayoun, Avesta Sasan, A. Veidenbaum
Based on Recent studies peripheral circuit (including decoders, wordline drivers, input and output drivers) constitutes a large portion of the cache leakage. In addition as technology migrate to smaller geometries, leakage contribution to total power consumption increases faster than dynamic power, promoting leakage as the largest power consumption factor. This paper proposes zig-zag share, a circuit technique to reduce leakage in SRAM peripheral. Using architectural control of zig-zag share, an integrated technique called Sleep-Share is proposed and applied in L1 and L2 caches. The results show leakage reduction by up to 40X in deeply pipelined SRAM peripheral circuits, with only a 4% area overhead and small additional delay.
{"title":"ZZ-HVS: Zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits","authors":"H. Homayoun, Avesta Sasan, A. Veidenbaum","doi":"10.1109/ICCD.2008.4751937","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751937","url":null,"abstract":"Based on Recent studies peripheral circuit (including decoders, wordline drivers, input and output drivers) constitutes a large portion of the cache leakage. In addition as technology migrate to smaller geometries, leakage contribution to total power consumption increases faster than dynamic power, promoting leakage as the largest power consumption factor. This paper proposes zig-zag share, a circuit technique to reduce leakage in SRAM peripheral. Using architectural control of zig-zag share, an integrated technique called Sleep-Share is proposed and applied in L1 and L2 caches. The results show leakage reduction by up to 40X in deeply pipelined SRAM peripheral circuits, with only a 4% area overhead and small additional delay.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123661605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751909
C. Nitta, M. Farrens
In this paper we examine techniques for increasing the effective bandwidth of the microprocessor off-chip interconnect. We focus on mechanisms that are orthogonal to other techniques currently being studied (3-D fabrication, optical interconnect, etc.) Using a range of full-system simulations we study the distribution of values being transferred to and from memory, and find that (as expected) high entropy data such as floating point numbers have limited compressibility, but that other data types offer more potential for compression. By using a simple heuristic to classify the contents of a cache line and providing different compression schemes for each classification, we show it is possible to provide overall compression at a cache line granularity comparable to that obtained by using a much more complex Lempel-Ziv-Welch algorithm.
{"title":"Techniques for increasing effective data bandwidth","authors":"C. Nitta, M. Farrens","doi":"10.1109/ICCD.2008.4751909","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751909","url":null,"abstract":"In this paper we examine techniques for increasing the effective bandwidth of the microprocessor off-chip interconnect. We focus on mechanisms that are orthogonal to other techniques currently being studied (3-D fabrication, optical interconnect, etc.) Using a range of full-system simulations we study the distribution of values being transferred to and from memory, and find that (as expected) high entropy data such as floating point numbers have limited compressibility, but that other data types offer more potential for compression. By using a simple heuristic to classify the contents of a cache line and providing different compression schemes for each classification, we show it is possible to provide overall compression at a cache line granularity comparable to that obtained by using a much more complex Lempel-Ziv-Welch algorithm.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117121737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751927
Baoxian Zhao, Hakan Aydin, Dakai Zhu
The dynamic voltage scaling (DVS) technique is the basis of numerous state-of-the-art energy management schemes proposed for real-time embedded systems. However, recent research has illustrated the alarmingly negative impact of DVS on task and system reliability. In this paper, we consider the problem of processing frequency assignment to a set of real-time tasks in order to maximize the overall reliability, under given time and energy constraints. First, we formulate the problem as a non-linear optimization problem and show how to obtain the static optimal solution. Then, we propose on-line (dynamic) algorithms that detect early completions and adjust the task frequencies at run-time, to improve overall reliability. Our simulation results indicate that our algorithms perform comparably to a clairvoyant optimal scheduler that knows the exact workload in advance.
{"title":"Reliability-aware Dynamic Voltage Scaling for energy-constrained real-time embedded systems","authors":"Baoxian Zhao, Hakan Aydin, Dakai Zhu","doi":"10.1109/ICCD.2008.4751927","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751927","url":null,"abstract":"The dynamic voltage scaling (DVS) technique is the basis of numerous state-of-the-art energy management schemes proposed for real-time embedded systems. However, recent research has illustrated the alarmingly negative impact of DVS on task and system reliability. In this paper, we consider the problem of processing frequency assignment to a set of real-time tasks in order to maximize the overall reliability, under given time and energy constraints. First, we formulate the problem as a non-linear optimization problem and show how to obtain the static optimal solution. Then, we propose on-line (dynamic) algorithms that detect early completions and adjust the task frequencies at run-time, to improve overall reliability. Our simulation results indicate that our algorithms perform comparably to a clairvoyant optimal scheduler that knows the exact workload in advance.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133772497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}