Voltage emergencies have become a major challenge to multi-core processors because core-to-core resonance may put all cores into danger which jeopardizes system reliability. We observed that the applications following SPMD (Single Program and Multiple Data) programming model tend to spark domain-wide voltage resonance because multiple threads sharing the same function body exhibit similar power activity. When threads are judiciously relocated among the cores, the voltage droops can be greatly reduced. We propose “Orchestrator”, a sensor-free non-intrusive scheme for multi-core architectures to smooth the voltage droops. Orchestrator focuses on the inter-core voltage interactions, and maximally leverages the thread diversity to avoid voltage droops synergy among cores. Experimental results show that Orchestrator can reduce up to 64% voltage emergencies on average, meanwhile improving performance.
{"title":"Orchestrator: A low-cost solution to reduce voltage emergencies for multi-threaded applications","authors":"Xing Hu, Guihai Yan, Yu Hu, Xiaowei Li","doi":"10.7873/DATE.2013.056","DOIUrl":"https://doi.org/10.7873/DATE.2013.056","url":null,"abstract":"Voltage emergencies have become a major challenge to multi-core processors because core-to-core resonance may put all cores into danger which jeopardizes system reliability. We observed that the applications following SPMD (Single Program and Multiple Data) programming model tend to spark domain-wide voltage resonance because multiple threads sharing the same function body exhibit similar power activity. When threads are judiciously relocated among the cores, the voltage droops can be greatly reduced. We propose “Orchestrator”, a sensor-free non-intrusive scheme for multi-core architectures to smooth the voltage droops. Orchestrator focuses on the inter-core voltage interactions, and maximally leverages the thread diversity to avoid voltage droops synergy among cores. Experimental results show that Orchestrator can reduce up to 64% voltage emergencies on average, meanwhile improving performance.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"24 1","pages":"208-213"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86653379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dramatic increase in the number of processors, memories and other components in the same chip calls for resource-aware mechanisms to improve performance. This paper proposes four different resource mapping policies for NoC-based MPSoCs that leverage on distinct aspects of the parallel nature of the applications and on architecture constraints, such as off-chip memory latency. Results show that the use of these policies can improve performance up to 22.5% in average, and, in some cases, depending on the parallel programming model of each application, the improvement may reach up to 32%.
{"title":"Exploring resource mapping policies for dynamic clustering on NoC-based MPSoCs","authors":"Gustavo Girão, Thiago Santini, F. Wagner","doi":"10.7873/DATE.2013.147","DOIUrl":"https://doi.org/10.7873/DATE.2013.147","url":null,"abstract":"The dramatic increase in the number of processors, memories and other components in the same chip calls for resource-aware mechanisms to improve performance. This paper proposes four different resource mapping policies for NoC-based MPSoCs that leverage on distinct aspects of the parallel nature of the applications and on architecture constraints, such as off-chip memory latency. Results show that the use of these policies can improve performance up to 22.5% in average, and, in some cases, depending on the parallel programming model of each application, the improvement may reach up to 32%.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"132 1","pages":"681-684"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86340845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present the novel concept of Pipelets: self-organizing stages of software pipelines that monitor their computational demands and communication patterns and interact to optimize the performance of the application they belong to. They enable dynamic task remapping and exploit application-specific properties. Our experiments show that they improve performance by up to 31.2% compared to state-of-the-art when resource demands of applications alter at runtime as is the case for many complex applications.
{"title":"Pipelets: Self-organizing software Pipelines for many-core architectures","authors":"J. Jahn, J. Henkel","doi":"10.7873/DATE.2013.308","DOIUrl":"https://doi.org/10.7873/DATE.2013.308","url":null,"abstract":"We present the novel concept of Pipelets: self-organizing stages of software pipelines that monitor their computational demands and communication patterns and interact to optimize the performance of the application they belong to. They enable dynamic task remapping and exploit application-specific properties. Our experiments show that they improve performance by up to 31.2% compared to state-of-the-art when resource demands of applications alter at runtime as is the case for many complex applications.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"1516-1521"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88286694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ahmadyan, Jayanand Asok Kumar, Shobha Vasudevan
Because of complexity of analog circuits, their verification presents many challenges. We propose a runtime verification algorithm to verify design properties of nonlinear analog circuits. Our algorithm is based on performing exploratory simulations in the state-time space using the Time-augmented Rapidly Exploring Random Tree (TRRT) algorithm. The proposed runtime verification methodology consists of i) incremental construction of the TRRT to explore the state-time space and ii) use of an incremental online monitoring algorithm to check whether or not the incremented TRRT satisfies or violates specification properties at each iteration. In comparison to the Monte Carlo simulations, for providing the same state-space coverage, we utilize a logarithmic order of memory and time.
{"title":"Runtime verification of nonlinear analog circuits using incremental Time-augmented RRT algorithm","authors":"S. Ahmadyan, Jayanand Asok Kumar, Shobha Vasudevan","doi":"10.7873/DATE.2013.019","DOIUrl":"https://doi.org/10.7873/DATE.2013.019","url":null,"abstract":"Because of complexity of analog circuits, their verification presents many challenges. We propose a runtime verification algorithm to verify design properties of nonlinear analog circuits. Our algorithm is based on performing exploratory simulations in the state-time space using the Time-augmented Rapidly Exploring Random Tree (TRRT) algorithm. The proposed runtime verification methodology consists of i) incremental construction of the TRRT to explore the state-time space and ii) use of an incremental online monitoring algorithm to check whether or not the incremented TRRT satisfies or violates specification properties at each iteration. In comparison to the Monte Carlo simulations, for providing the same state-space coverage, we utilize a logarithmic order of memory and time.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"25 1","pages":"21-26"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83457029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dark silicon is an emerging problem in multi-core processors, where it is not possible to enable all cores simultaneously because of either insufficient parallelism in software applications or because of high-spatial power densities that generate hot-spot constraints. Superlattice-based thermoelectric cooling (TEC) is a promising technology that offers large heat pumping capability and the ability to target hot spots of each core independently. In this paper, we devise novel system-level methods that address the two main sources of dark silicon using superlattice TECs. Our methods leverage the TECs in conjunction with dynamic voltage and frequency scaling and number of threads to maximize the performance of multi-core processor under thermal and power constraints. Using an experimental setup based on a quad-core processor, we provide an evaluation of the trade-offs among performance, temperature and power consumption arising from the use of superlattice-based TECs. Our results demonstrate the potential of this emerging cooling technology in mitigating dark silicon problems and in improving the performance of multi-core processors.
{"title":"Mitigating dark-silicon problems using superlattice-based thermoelectric coolers","authors":"Francesco Paterna, S. Reda","doi":"10.7873/DATE.2013.284","DOIUrl":"https://doi.org/10.7873/DATE.2013.284","url":null,"abstract":"Dark silicon is an emerging problem in multi-core processors, where it is not possible to enable all cores simultaneously because of either insufficient parallelism in software applications or because of high-spatial power densities that generate hot-spot constraints. Superlattice-based thermoelectric cooling (TEC) is a promising technology that offers large heat pumping capability and the ability to target hot spots of each core independently. In this paper, we devise novel system-level methods that address the two main sources of dark silicon using superlattice TECs. Our methods leverage the TECs in conjunction with dynamic voltage and frequency scaling and number of threads to maximize the performance of multi-core processor under thermal and power constraints. Using an experimental setup based on a quad-core processor, we provide an evaluation of the trade-offs among performance, temperature and power consumption arising from the use of superlattice-based TECs. Our results demonstrate the potential of this emerging cooling technology in mitigating dark silicon problems and in improving the performance of multi-core processors.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"55 7 1","pages":"1391-1394"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83610981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a system-level approach to improve the latency of FPGA designs by performing optimization of the design specification on a functional level prior to high-level synthesis. The approach uses Taylor Expansion Diagrams (TEDs), a functional graph-based design representation, as a vehicle to optimize the dataflow graph (DFG) used as input to the subsequent synthesis. The optimization focuses on critical path compaction in the functional representation before translating it into a structural DFG representation. Our approach engages several passes of a traditional high-level synthesis (HLS) process in a simulated annealing-based loop to make efficient cost tradeoffs. The algorithm is time efficient and can be used for fast design space exploration. The results indicate a latency performance improvement of 22% on average versus HLS with the initial DFG for a series of designs mapped to Altera Stratix II devices.
{"title":"FPGA latency optimization using system-level transformations and DFG restructuring","authors":"D. Gomez-Prado, M. Ciesielski, R. Tessier","doi":"10.7873/DATE.2013.316","DOIUrl":"https://doi.org/10.7873/DATE.2013.316","url":null,"abstract":"This paper describes a system-level approach to improve the latency of FPGA designs by performing optimization of the design specification on a functional level prior to high-level synthesis. The approach uses Taylor Expansion Diagrams (TEDs), a functional graph-based design representation, as a vehicle to optimize the dataflow graph (DFG) used as input to the subsequent synthesis. The optimization focuses on critical path compaction in the functional representation before translating it into a structural DFG representation. Our approach engages several passes of a traditional high-level synthesis (HLS) process in a simulated annealing-based loop to make efficient cost tradeoffs. The algorithm is time efficient and can be used for fast design space exploration. The results indicate a latency performance improvement of 22% on average versus HLS with the initial DFG for a series of designs mapped to Altera Stratix II devices.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"73 1","pages":"1553-1558"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76499443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soft error has become a critical reliability issue in nanoscale integrated circuits, especially in sequential circuits where a latched error will be propagated for many cycles and affect many outputs at different time. Retiming is a structural operation that relocates registers in a circuit without changing its functionality. In this paper, the effect of retiming on soft error rate (SER) of a sequential circuit is investigated considering both logic masking and timing masking. A minimum observability retiming problem under error-latching window constraints is formulated to reduce the SER of the circuit. And an efficient algorithm is proposed to solve the problem optimally. Experimental results show on average a 32.7% reduction on SER from the original circuits and a 15% improvement over the existing method.
{"title":"Retiming for soft error minimization under error-latching window constraints","authors":"Yinghai Lu, H. Zhou","doi":"10.7873/DATE.2013.210","DOIUrl":"https://doi.org/10.7873/DATE.2013.210","url":null,"abstract":"Soft error has become a critical reliability issue in nanoscale integrated circuits, especially in sequential circuits where a latched error will be propagated for many cycles and affect many outputs at different time. Retiming is a structural operation that relocates registers in a circuit without changing its functionality. In this paper, the effect of retiming on soft error rate (SER) of a sequential circuit is investigated considering both logic masking and timing masking. A minimum observability retiming problem under error-latching window constraints is formulated to reduce the SER of the circuit. And an efficient algorithm is proposed to solve the problem optimally. Experimental results show on average a 32.7% reduction on SER from the original circuits and a 15% improvement over the existing method.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"82 1","pages":"1008-1013"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83922909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The SystemC/TLM technologies are widely accepted in the industry for fast system-level simulation. An important limitation of SystemC regarding performance is that the reference implementation is sequential, and the official semantics makes parallel executions difficult. As the number of cores in computers increase quickly, the ability to take advantage of the host parallelism during a simulation is becoming a major concern. Most existing work on parallelization of SystemC targets cycle-accurate simulation, and would be inefficient on loosely timed systems since they cannot run in parallel processes that do not execute simultaneously. We propose an approach that explicitly targets loosely timed systems, and offers the user a set of primitives to express tasks with duration, as opposed to the notion of time in SystemC which allows only instantaneous computations and time elapses without computation. Our tool exploits this notion of duration to run the simulation in parallel. It runs on top of any (unmodified) SystemC implementation, which lets legacy SystemC code continue running as-it-is. This allows the user to focus on the performance-critical parts of the program that need to be parallelized.
{"title":"Parallel programming with SystemC for loosely timed models: A non-intrusive approach","authors":"M. Moy","doi":"10.7873/DATE.2013.017","DOIUrl":"https://doi.org/10.7873/DATE.2013.017","url":null,"abstract":"The SystemC/TLM technologies are widely accepted in the industry for fast system-level simulation. An important limitation of SystemC regarding performance is that the reference implementation is sequential, and the official semantics makes parallel executions difficult. As the number of cores in computers increase quickly, the ability to take advantage of the host parallelism during a simulation is becoming a major concern. Most existing work on parallelization of SystemC targets cycle-accurate simulation, and would be inefficient on loosely timed systems since they cannot run in parallel processes that do not execute simultaneously. We propose an approach that explicitly targets loosely timed systems, and offers the user a set of primitives to express tasks with duration, as opposed to the notion of time in SystemC which allows only instantaneous computations and time elapses without computation. Our tool exploits this notion of duration to run the simulation in parallel. It runs on top of any (unmodified) SystemC implementation, which lets legacy SystemC code continue running as-it-is. This allows the user to focus on the performance-critical parts of the program that need to be parallelized.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"62 1","pages":"9-14"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88392837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Beer, R. Ginosar, Jerome Cox, Tom Chaney, D. Zar
Recent synchronizer metastability measurements indicate degradation of MTBF with technology scaling, calling for measurement and calibration circuits in 65nm and below. Degradation of parameters can be even worse if the system is operated at extreme supply voltages and temperature conditions. In this work we study the behavior of synchronizers in a broad range of supply voltage and temperature corners. A digital on-chip measurement system is presented that helps to characterize synchronizers in future technologies and a new calibrating system is shown that accounts for changes in delay values due to supply voltage and temperature changes. We present a detailed comparison of measurements and simulations for a fabricated 65nm bulk CMOS circuit and discuss implications of the measurements for synchronization systems in 65nm and beyond. We propose an adaptive self-calibrating synchronizer to account for supply voltage, temperature, global process variations and DVFS.
{"title":"Metastability challenges for 65nm and beyond; simulation and measurements","authors":"S. Beer, R. Ginosar, Jerome Cox, Tom Chaney, D. Zar","doi":"10.7873/DATE.2013.268","DOIUrl":"https://doi.org/10.7873/DATE.2013.268","url":null,"abstract":"Recent synchronizer metastability measurements indicate degradation of MTBF with technology scaling, calling for measurement and calibration circuits in 65nm and below. Degradation of parameters can be even worse if the system is operated at extreme supply voltages and temperature conditions. In this work we study the behavior of synchronizers in a broad range of supply voltage and temperature corners. A digital on-chip measurement system is presented that helps to characterize synchronizers in future technologies and a new calibrating system is shown that accounts for changes in delay values due to supply voltage and temperature changes. We present a detailed comparison of measurements and simulations for a fabricated 65nm bulk CMOS circuit and discuss implications of the measurements for synchronization systems in 65nm and beyond. We propose an adaptive self-calibrating synchronizer to account for supply voltage, temperature, global process variations and DVFS.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"8 1","pages":"1297-1302"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77176899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanker Shreejith, Kizheppatt Vipin, Suhaib A. Fahmy, M. Lukasiewycz
Safety-critical in-vehicle electronic control units (ECUs) demand high levels of determinism and isolation, since they directly influence vehicle behaviour and passenger safety. As modern vehicles incorporate more complex computational systems, ensuring the safety of critical systems becomes paramount. One-to-one redundant units have been previously proposed as measures for evolving critical functions like x-by-wire. However, these may not be viable solutions for power-constrained systems like next generation electric vehicles. Reconfigurable architectures offer alternative approaches to implementing reliable safety critical systems using more efficient hardware. In this paper, we present an approach for implementing redundancy in safety-critical in-car systems, that uses FPGA partial reconfiguration and a customised bus controller to offer fast recovery from faults. Results show that such an integrated design is better than alternatives that use discrete bus interface modules.
{"title":"An approach for redundancy in FlexRay networks using FPGA partial reconfiguration","authors":"Shanker Shreejith, Kizheppatt Vipin, Suhaib A. Fahmy, M. Lukasiewycz","doi":"10.7873/DATE.2013.155","DOIUrl":"https://doi.org/10.7873/DATE.2013.155","url":null,"abstract":"Safety-critical in-vehicle electronic control units (ECUs) demand high levels of determinism and isolation, since they directly influence vehicle behaviour and passenger safety. As modern vehicles incorporate more complex computational systems, ensuring the safety of critical systems becomes paramount. One-to-one redundant units have been previously proposed as measures for evolving critical functions like x-by-wire. However, these may not be viable solutions for power-constrained systems like next generation electric vehicles. Reconfigurable architectures offer alternative approaches to implementing reliable safety critical systems using more efficient hardware. In this paper, we present an approach for implementing redundancy in safety-critical in-car systems, that uses FPGA partial reconfiguration and a customised bus controller to offer fast recovery from faults. Results show that such an integrated design is better than alternatives that use discrete bus interface modules.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"105 1","pages":"721-724"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77422572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}