Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5456952
R. Pellizzoni, A. Schranzhofer, Jian-Jia Chen, M. Caccamo, L. Thiele
Employing COTS components in real-time embedded systems leads to timing challenges. When multiple CPU cores and DMA peripherals run simultaneously, contention for access to main memory can greatly increase a task's WCET. In this paper, we introduce an analysis methodology that computes upper bounds to task delay due to memory contention. First, an arrival curve is derived for each core representing the maximum memory traffic produced by all tasks executed on it. Arrival curves are then combined with a representation of the cache behavior for the task under analysis to generate a delay bound. Based on the computed delay, we show how tasks can be feasibly scheduled according to assigned time slots on each core.
{"title":"Worst case delay analysis for memory interference in multicore systems","authors":"R. Pellizzoni, A. Schranzhofer, Jian-Jia Chen, M. Caccamo, L. Thiele","doi":"10.1109/DATE.2010.5456952","DOIUrl":"https://doi.org/10.1109/DATE.2010.5456952","url":null,"abstract":"Employing COTS components in real-time embedded systems leads to timing challenges. When multiple CPU cores and DMA peripherals run simultaneously, contention for access to main memory can greatly increase a task's WCET. In this paper, we introduce an analysis methodology that computes upper bounds to task delay due to memory contention. First, an arrival curve is derived for each core representing the maximum memory traffic produced by all tasks executed on it. Arrival curves are then combined with a representation of the cache behavior for the task under analysis to generate a delay bound. Based on the computed delay, we show how tasks can be feasibly scheduled according to assigned time slots on each core.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"223 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115147715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5456943
M. Wieckowski, D. Sylvester, D. Blaauw, V. Chandra, Sachin Idgunji, C. Pietrzyk, R. Aitken
Static noise margin analysis using butterfly curves has traditionally played a leading role in the sizing and optimization of SRAM cell structures. Heightened variability and reduced supply voltages have resulted in increased attention being paid to new methods for characterizing dynamic robustness. In this work, a technique based on vector field analysis is presented for quickly extracting both static and dynamic stability characteristics of arbitrary SRAM topologies. It is shown that the traditional butterfly curve simulation for 6T cells is actually a special case of the proposed method. The proposed technique not only allows for standard SNM “smallest-square” measurements, but also enables tracing of the state-space separatrix, an operation critical for quantifying dynamic stability. It is established via importance sampling that cell characterization using a combination of both separatrix tracing and butterfly SNM measurements is significantly more correlated to cell failure rates then using SNM measurements alone. The presented technique is demonstrated to be thousands of times faster than the brute force transient approach and can be implemented with widely available, standard design tools.
{"title":"A black box method for stability analysis of arbitrary SRAM cell structures","authors":"M. Wieckowski, D. Sylvester, D. Blaauw, V. Chandra, Sachin Idgunji, C. Pietrzyk, R. Aitken","doi":"10.1109/DATE.2010.5456943","DOIUrl":"https://doi.org/10.1109/DATE.2010.5456943","url":null,"abstract":"Static noise margin analysis using butterfly curves has traditionally played a leading role in the sizing and optimization of SRAM cell structures. Heightened variability and reduced supply voltages have resulted in increased attention being paid to new methods for characterizing dynamic robustness. In this work, a technique based on vector field analysis is presented for quickly extracting both static and dynamic stability characteristics of arbitrary SRAM topologies. It is shown that the traditional butterfly curve simulation for 6T cells is actually a special case of the proposed method. The proposed technique not only allows for standard SNM “smallest-square” measurements, but also enables tracing of the state-space separatrix, an operation critical for quantifying dynamic stability. It is established via importance sampling that cell characterization using a combination of both separatrix tracing and butterfly SNM measurements is significantly more correlated to cell failure rates then using SNM measurements alone. The presented technique is demonstrated to be thousands of times faster than the brute force transient approach and can be implemented with widely available, standard design tools.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127670459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5457081
D. Borodin, B. Juurlink
Fault tolerance (FT) has become a major concern in computing systems. Instruction duplication has been proposed to verify application execution at run time. Two techniques, instruction memoization and precomputation, have been shown to improve the performance and fault coverage of duplication. This work shows that the combination of these two techniques is much more powerful than either one in isolation. In addition to performance, it improves the long-lasting transient and permanent fault coverage upon the memoization scheme. Compared to the precomputation scheme, it reduces the long-lasting transient and permanent fault coverage of 10.6% of the instructions, but covers 2.6 times as many instructions against shorter transient faults. On a system with 2 integer ALUs, the combined scheme reduces the performance degradation due to duplication by on average 27.3% and 22.2% compared to the precomputation and memoization-based techniques, respectively, with similar hardware requirements.
{"title":"Instruction precomputation with memoization for fault detection","authors":"D. Borodin, B. Juurlink","doi":"10.1109/DATE.2010.5457081","DOIUrl":"https://doi.org/10.1109/DATE.2010.5457081","url":null,"abstract":"Fault tolerance (FT) has become a major concern in computing systems. Instruction duplication has been proposed to verify application execution at run time. Two techniques, instruction memoization and precomputation, have been shown to improve the performance and fault coverage of duplication. This work shows that the combination of these two techniques is much more powerful than either one in isolation. In addition to performance, it improves the long-lasting transient and permanent fault coverage upon the memoization scheme. Compared to the precomputation scheme, it reduces the long-lasting transient and permanent fault coverage of 10.6% of the instructions, but covers 2.6 times as many instructions against shorter transient faults. On a system with 2 integer ALUs, the combined scheme reduces the performance degradation due to duplication by on average 27.3% and 22.2% compared to the precomputation and memoization-based techniques, respectively, with similar hardware requirements.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127718037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5456909
Ming Liu, Zhonghai Lu, W. Kuehn, A. Jantsch
In conventional static implementations for correlated streaming applications, computing resources may be in-efficiently utilized since multiple stream processors may supply their sub-results at asynchronous rates for result correlation or synchronization. To enhance the resource utilization efficiency, we analyze multi-streaming models and implement an adaptive architecture based on FPGA Partial Reconfiguration (PR) technology. The adaptive system can intelligently schedule and manage various processing modules during run-time. Experimental results demonstrate up to 78.2% improvement in throughput-per-unit-area on unbalanced processing of correlated streams, as well as only 0.3% context switching overhead in the overall processing time in the worst-case.
{"title":"FPGA-based adaptive computing for correlated multi-stream processing","authors":"Ming Liu, Zhonghai Lu, W. Kuehn, A. Jantsch","doi":"10.1109/DATE.2010.5456909","DOIUrl":"https://doi.org/10.1109/DATE.2010.5456909","url":null,"abstract":"In conventional static implementations for correlated streaming applications, computing resources may be in-efficiently utilized since multiple stream processors may supply their sub-results at asynchronous rates for result correlation or synchronization. To enhance the resource utilization efficiency, we analyze multi-streaming models and implement an adaptive architecture based on FPGA Partial Reconfiguration (PR) technology. The adaptive system can intelligently schedule and manage various processing modules during run-time. Experimental results demonstrate up to 78.2% improvement in throughput-per-unit-area on unbalanced processing of correlated streams, as well as only 0.3% context switching overhead in the overall processing time in the worst-case.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132561958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5457061
Pramod Subramanyan, Virendra Singh, K. Saluja, E. Larsson
Continued CMOS scaling is expected to make future microprocessors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market.
{"title":"Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors","authors":"Pramod Subramanyan, Virendra Singh, K. Saluja, E. Larsson","doi":"10.1109/DATE.2010.5457061","DOIUrl":"https://doi.org/10.1109/DATE.2010.5457061","url":null,"abstract":"Continued CMOS scaling is expected to make future microprocessors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"603 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132593443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5456913
Doochul Shin, S. Gupta
Error tolerance formally captures the notion that - for a wide variety of applications including audio, video, graphics, and wireless communications - a defective chip that produces erroneous values at its outputs may be acceptable, provided the errors are of certain types and their severities are within application-specified thresholds. All previous research on error tolerance has focused on identifying such defective but acceptable chips during post-fabrication testing to improve yield. In this paper, we explore a completely new approach to exploit error tolerance based on the following observation: If certain deviations from the nominal output values are acceptable, then we can exploit this flexibility during circuit design to reduce circuit area and delay as well as to increase yield. The specific metric of error tolerance we focus on is error rate, i.e., how often the circuit produces erroneous outputs. We propose a new logic synthesis approach for the new problem of identifying how to exploit a given error rate threshold to maximally reduce the area of the synthesized circuit. Experiment results show that for an error rate threshold within 1%, our approach provides 9.43% literal reductions on average for all the benchmarks that we target.
{"title":"Approximate logic synthesis for error tolerant applications","authors":"Doochul Shin, S. Gupta","doi":"10.1109/DATE.2010.5456913","DOIUrl":"https://doi.org/10.1109/DATE.2010.5456913","url":null,"abstract":"Error tolerance formally captures the notion that - for a wide variety of applications including audio, video, graphics, and wireless communications - a defective chip that produces erroneous values at its outputs may be acceptable, provided the errors are of certain types and their severities are within application-specified thresholds. All previous research on error tolerance has focused on identifying such defective but acceptable chips during post-fabrication testing to improve yield. In this paper, we explore a completely new approach to exploit error tolerance based on the following observation: If certain deviations from the nominal output values are acceptable, then we can exploit this flexibility during circuit design to reduce circuit area and delay as well as to increase yield. The specific metric of error tolerance we focus on is error rate, i.e., how often the circuit produces erroneous outputs. We propose a new logic synthesis approach for the new problem of identifying how to exploit a given error rate threshold to maximally reduce the area of the synthesized circuit. Experiment results show that for an error rate threshold within 1%, our approach provides 9.43% literal reductions on average for all the benchmarks that we target.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132660152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5456954
Kedar Karmarkar, S. Tragoudas
Inductive and capacitive coupling are responsible for slowing down signals. Existing bus encoding techniques tackle the issue by avoiding certain types of transitions. This work proposes a codeword generation method for such techniques that is scalable to very wide buses. Experimentation on a recent encoding technique confirms that the conventional method is limited to 16-bit bus while the proposed method is easily extended beyond 128-bits.
{"title":"Scalable codeword generation for coupled buses","authors":"Kedar Karmarkar, S. Tragoudas","doi":"10.1109/DATE.2010.5456954","DOIUrl":"https://doi.org/10.1109/DATE.2010.5456954","url":null,"abstract":"Inductive and capacitive coupling are responsible for slowing down signals. Existing bus encoding techniques tackle the issue by avoiding certain types of transitions. This work proposes a codeword generation method for such techniques that is scalable to very wide buses. Experimentation on a recent encoding technique confirms that the conventional method is limited to 16-bit bus while the proposed method is easily extended beyond 128-bits.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132748812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5457049
K. Hao, Fei Xie, S. Ray, Jin Yang
Behavioral synthesis is the compilation of an Electronic system-level (ESL) design into an RTL implementation. We present a suite of optimizations for equivalence checking of RTL generated through behavioral synthesis. The optimizations exploit the high-level structure of the ESL description to ameliorate verification complexity. Experiments on representative benchmarks indicate that the optimizations can handle equivalence checking of synthesized designs with tens of thousands of lines of RTL.
{"title":"Optimizing equivalence checking for behavioral synthesis","authors":"K. Hao, Fei Xie, S. Ray, Jin Yang","doi":"10.1109/DATE.2010.5457049","DOIUrl":"https://doi.org/10.1109/DATE.2010.5457049","url":null,"abstract":"Behavioral synthesis is the compilation of an Electronic system-level (ESL) design into an RTL implementation. We present a suite of optimizations for equivalence checking of RTL generated through behavioral synthesis. The optimizations exploit the high-level structure of the ESL description to ameliorate verification complexity. Experiments on representative benchmarks indicate that the optimizations can handle equivalence checking of synthesized designs with tens of thousands of lines of RTL.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133495682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5457232
Kebin Zeng, Yu Guo, C. Angelov
Model Driven Software Development has offered a faster way to design and implement embedded real-time software by moving the design to a model level, and by transforming models to code. However, the testing of embedded systems has remained at the code level. This paper presents a Graphical Model Debugger Framework, providing an auxiliary avenue of analysis of system models at runtime by executing generated code and updating models synchronously, which allows embedded developers to focus on the model level. With the model debugger, embedded developers can graphically test their design model and check the running status of the system, which offers a debugging capability on a higher level of abstraction. The framework intends to contribute a tool to the Eclipse society, especially suitable for model-driven development of embedded systems.
{"title":"Graphical Model Debugger Framework for embedded systems","authors":"Kebin Zeng, Yu Guo, C. Angelov","doi":"10.1109/DATE.2010.5457232","DOIUrl":"https://doi.org/10.1109/DATE.2010.5457232","url":null,"abstract":"Model Driven Software Development has offered a faster way to design and implement embedded real-time software by moving the design to a model level, and by transforming models to code. However, the testing of embedded systems has remained at the code level. This paper presents a Graphical Model Debugger Framework, providing an auxiliary avenue of analysis of system models at runtime by executing generated code and updating models synchronously, which allows embedded developers to focus on the model level. With the model debugger, embedded developers can graphically test their design model and check the running status of the system, which offers a debugging capability on a higher level of abstraction. The framework intends to contribute a tool to the Eclipse society, especially suitable for model-driven development of embedded systems.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134376260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-08DOI: 10.1109/DATE.2010.5456987
Rauf Salimi Khaligh, M. Radetzki
We present a set of modeling constructs accompanied by a high performance simulation kernel for accuracy adaptive transaction level models. In contrast to traditional, fixed accuracy TLMs, accuracy of adaptive TLMs can be changed during simulation to the level which is most suitable for a given use case and scenario. Ad-hoc development of adaptive models can result in complex models, and the implementation detail of adaptivity mechanisms can obscure the actual logic of a model. To simplify and enable systematic development of adaptive models, we have identified several mechanisms which are applicable to a wide variety of models. The proposed constructs relieve the modeler from low level implementation details of those mechanisms. We have developed an efficient, light-weight simulation kernel optimized for the proposed constructs, which enables parallel simulation of large models on widely available, low-cost multi-core simulation hosts. The modeling constructs and the kernel have been evaluated using industrial benchmark applications.
{"title":"Modeling constructs and kernel for parallel simulation of accuracy adaptive TLMs","authors":"Rauf Salimi Khaligh, M. Radetzki","doi":"10.1109/DATE.2010.5456987","DOIUrl":"https://doi.org/10.1109/DATE.2010.5456987","url":null,"abstract":"We present a set of modeling constructs accompanied by a high performance simulation kernel for accuracy adaptive transaction level models. In contrast to traditional, fixed accuracy TLMs, accuracy of adaptive TLMs can be changed during simulation to the level which is most suitable for a given use case and scenario. Ad-hoc development of adaptive models can result in complex models, and the implementation detail of adaptivity mechanisms can obscure the actual logic of a model. To simplify and enable systematic development of adaptive models, we have identified several mechanisms which are applicable to a wide variety of models. The proposed constructs relieve the modeler from low level implementation details of those mechanisms. We have developed an efficient, light-weight simulation kernel optimized for the proposed constructs, which enables parallel simulation of large models on widely available, low-cost multi-core simulation hosts. The modeling constructs and the kernel have been evaluated using industrial benchmark applications.","PeriodicalId":432902,"journal":{"name":"2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134578548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}