Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727107
Hamid Mushtaq, Z. Al-Ars, K. Bertels
This paper describes a software based fault tolerance approach for multithreaded programs running on multicore processors. Redundant multithreaded processes are used to detect soft errors and recover from them. Our scheme makes sure that the execution of the redundant processes is identical even in the presence of non-determinism due to shared memory accesses. This is done by making sure that the redundant processes acquire the locks for accessing the shared memory in the same order. Instead of using record/replay technique to do that, our scheme is based on deterministic multithreading, meaning that for the same input, a multithreaded program always have the same lock interleaving. Unlike record/replay systems, this eliminates the requirement for communication between the redundant processes. Moreover, our scheme is implemented totally in software, requiring no special hardware, making it very portable. Furthermore, our scheme is totally implemented at user-level, requiring no modification of the kernel. For selected benchmarks, our scheme adds an average overhead of 49% for 4 threads.
{"title":"Fault tolerance on multicore processors using deterministic multithreading","authors":"Hamid Mushtaq, Z. Al-Ars, K. Bertels","doi":"10.1109/IDT.2013.6727107","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727107","url":null,"abstract":"This paper describes a software based fault tolerance approach for multithreaded programs running on multicore processors. Redundant multithreaded processes are used to detect soft errors and recover from them. Our scheme makes sure that the execution of the redundant processes is identical even in the presence of non-determinism due to shared memory accesses. This is done by making sure that the redundant processes acquire the locks for accessing the shared memory in the same order. Instead of using record/replay technique to do that, our scheme is based on deterministic multithreading, meaning that for the same input, a multithreaded program always have the same lock interleaving. Unlike record/replay systems, this eliminates the requirement for communication between the redundant processes. Moreover, our scheme is implemented totally in software, requiring no special hardware, making it very portable. Furthermore, our scheme is totally implemented at user-level, requiring no modification of the kernel. For selected benchmarks, our scheme adds an average overhead of 49% for 4 threads.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120949346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727082
Stelios N. Neophytou, Stavros Hadjitheophanous, M. Michael
Modern multicore systems have multiplied the processing power of computing systems, increasing the potential of solving difficult EDA problems. At the same time, careful decomposition of the problem should be made in order to explore the parallelism without compromising the quality of the result with respect to the existing non-parallel solutions. Test set compaction is one of the major EDA problems that is NP-hard and a crucial component of any ATPG methodology. This paper presents a study on the effect of fault list partitioning on a dynamic test set compaction algorithm that has shown to give very good results when considering the entire fault list. The serial algorithm is executed in different subsets of the considered fault list and the obtained results are evaluated in terms of the compaction achieved as well as the execution time. The experimental results demonstrate that the partitioning technique used highly affects the compaction quality while the execution time is significantly reduced.
{"title":"On the impact of fault list partitioning in parallel implementations for dynamic test compaction considering multicore systems","authors":"Stelios N. Neophytou, Stavros Hadjitheophanous, M. Michael","doi":"10.1109/IDT.2013.6727082","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727082","url":null,"abstract":"Modern multicore systems have multiplied the processing power of computing systems, increasing the potential of solving difficult EDA problems. At the same time, careful decomposition of the problem should be made in order to explore the parallelism without compromising the quality of the result with respect to the existing non-parallel solutions. Test set compaction is one of the major EDA problems that is NP-hard and a crucial component of any ATPG methodology. This paper presents a study on the effect of fault list partitioning on a dynamic test set compaction algorithm that has shown to give very good results when considering the entire fault list. The serial algorithm is executed in different subsets of the considered fault list and the obtained results are evaluated in terms of the compaction achieved as well as the execution time. The experimental results demonstrate that the partitioning technique used highly affects the compaction quality while the execution time is significantly reduced.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128721776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727083
K. Khalifa, H. Fawzy, Sameh El-Ashry, K. Salah
This paper clarifies the differences between six memory architectures, which are Flex-OneNAND, Open NAND Flash Memory (ONFI 3.1), Embedded Multi-Media Card (eMMC v.5.0), Hybrid Memory Cube (HMC v.1.0) WideIO, and Universal Flash Storage (UFS). The paper shows the impact of such discriminating differences on choosing the most suitable architecture for certain application. The comparison is done in terms of most important features to microelectronics industry point of view. The comparison shows that the highest speed is given by HMC v.1.0 which reaches 15GBps supported with power management per link. On the other hand, Flex-OneNAND provides single flash chip with ultra-high density of NAND and simplified interface of NOR with the simplest architecture at very attractive price points. WideIO offers more bandwidth at lower power. Regarding the lowest power consumption, eMMC is sparkling. UFS combines the speed of SSD with the slim form factor and low power of eMMC. ONFI supports increased performance through parallelism using multiple logic units and interleaved addressing. This comparison is very powerful for designers to decide which memory controller is suitable for their applications and satisfies their requirements.
{"title":"Memory controller architectures: A comparative study","authors":"K. Khalifa, H. Fawzy, Sameh El-Ashry, K. Salah","doi":"10.1109/IDT.2013.6727083","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727083","url":null,"abstract":"This paper clarifies the differences between six memory architectures, which are Flex-OneNAND, Open NAND Flash Memory (ONFI 3.1), Embedded Multi-Media Card (eMMC v.5.0), Hybrid Memory Cube (HMC v.1.0) WideIO, and Universal Flash Storage (UFS). The paper shows the impact of such discriminating differences on choosing the most suitable architecture for certain application. The comparison is done in terms of most important features to microelectronics industry point of view. The comparison shows that the highest speed is given by HMC v.1.0 which reaches 15GBps supported with power management per link. On the other hand, Flex-OneNAND provides single flash chip with ultra-high density of NAND and simplified interface of NOR with the simplest architecture at very attractive price points. WideIO offers more bandwidth at lower power. Regarding the lowest power consumption, eMMC is sparkling. UFS combines the speed of SSD with the slim form factor and low power of eMMC. ONFI supports increased performance through parallelism using multiple logic units and interleaved addressing. This comparison is very powerful for designers to decide which memory controller is suitable for their applications and satisfies their requirements.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727148
A. Mohamed, Anane Nadjia
Enhancing Montgomery modular multiplication (MMM) performances in term of speed and area is crucial for public key cryptography applications. This paper presents an efficient hardware-algorithm for a high radix MMM method that exploits the features available in the Virtex-5 Xilinx FPGA. Our main contribution in this paper is to develop hardware algorithms for radix-216 number system in the FPGA to speed up the MMM. It performs an operation of two 1024-bits numbers on 64 iterations. The CS (Carry Save) representation is advantageously used to overcome the carry propagation then the iteration cycle datapath length independent. Specials efforts were made to design, at the LUT level, the compressor 6:2, which is the key feature of our design. The resulting architecture can run with clock period equivalent to the total delay of an embedded 18×18-bits and two LUT6.
{"title":"High radix montgomery modular multiplication on FPGA","authors":"A. Mohamed, Anane Nadjia","doi":"10.1109/IDT.2013.6727148","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727148","url":null,"abstract":"Enhancing Montgomery modular multiplication (MMM) performances in term of speed and area is crucial for public key cryptography applications. This paper presents an efficient hardware-algorithm for a high radix MMM method that exploits the features available in the Virtex-5 Xilinx FPGA. Our main contribution in this paper is to develop hardware algorithms for radix-216 number system in the FPGA to speed up the MMM. It performs an operation of two 1024-bits numbers on 64 iterations. The CS (Carry Save) representation is advantageously used to overcome the carry propagation then the iteration cycle datapath length independent. Specials efforts were made to design, at the LUT level, the compressor 6:2, which is the key feature of our design. The resulting architecture can run with clock period equivalent to the total delay of an embedded 18×18-bits and two LUT6.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126618647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727119
H. Saleh, B. Mohammad, E. Swartzlander
This paper investigates the optimum Booth integer multiplier for low power applications. Booth radix-4, radix-8 and radix-16 were compared for area, speed and power using standard-cell ASIC design flow and 28nm CMOS technology. All of the investigated designs were implemented in RTL, fully verified and then synthesized using 28nm standard-cell libraries which have low leakage slow cells, regular leakage average-speed cells and high-leakage fast-speed cells. The area, speed and power were compared to determine the best choice for low power designs. Among the three investigated designs, the Booth radix-4 was the best choice, it had the lowest area, power and fastest execution speed among the 3-choices. It is worthy of note that radix-8 had lower leakage power and overall power among the three designs when implemented using LVT cells. So for power sensitive and high-speed applications radix-8 could be a better choice with overhead of about 18% area and 3% slower.
{"title":"The optimum Booth radix for low power integer multipliers","authors":"H. Saleh, B. Mohammad, E. Swartzlander","doi":"10.1109/IDT.2013.6727119","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727119","url":null,"abstract":"This paper investigates the optimum Booth integer multiplier for low power applications. Booth radix-4, radix-8 and radix-16 were compared for area, speed and power using standard-cell ASIC design flow and 28nm CMOS technology. All of the investigated designs were implemented in RTL, fully verified and then synthesized using 28nm standard-cell libraries which have low leakage slow cells, regular leakage average-speed cells and high-leakage fast-speed cells. The area, speed and power were compared to determine the best choice for low power designs. Among the three investigated designs, the Booth radix-4 was the best choice, it had the lowest area, power and fastest execution speed among the 3-choices. It is worthy of note that radix-8 had lower leakage power and overall power among the three designs when implemented using LVT cells. So for power sensitive and high-speed applications radix-8 could be a better choice with overhead of about 18% area and 3% slower.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125055519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727099
H. Mir, L. Albasha
The design of a wideband digital radar system is presented. The system operates at S-band and uses a unique stretch-processing based architecture. The two fully digital receiver channels enhance the system dynamic range and enable the application of DSP algorithms. Experimental results verify that the system can achieve an in-band dynamic range of 60 dB across 600 MHz of instantaneous bandwidth.
{"title":"On the design of a high-performance digital radar system","authors":"H. Mir, L. Albasha","doi":"10.1109/IDT.2013.6727099","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727099","url":null,"abstract":"The design of a wideband digital radar system is presented. The system operates at S-band and uses a unique stretch-processing based architecture. The two fully digital receiver channels enhance the system dynamic range and enable the application of DSP algorithms. Experimental results verify that the system can achieve an in-band dynamic range of 60 dB across 600 MHz of instantaneous bandwidth.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128336181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727102
K. Salah
In this paper, performance comparison between Air-Gap Based Coaxial TSV and conventional circular TSV are presented. The comparison shows that the air-gap TSVs reduce the overall parasitic capacitance and the overall energy loss compared to the conventional circular TSV or conventional coaxial TSV.
{"title":"Performance comparison between air-gap based coaxial TSV and conventional circular TSV in 3D-ICs","authors":"K. Salah","doi":"10.1109/IDT.2013.6727102","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727102","url":null,"abstract":"In this paper, performance comparison between Air-Gap Based Coaxial TSV and conventional circular TSV are presented. The comparison shows that the air-gap TSVs reduce the overall parasitic capacitance and the overall energy loss compared to the conventional circular TSV or conventional coaxial TSV.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128517087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727084
Mathias Soeken, R. Drechsler
This paper presents an algorithm that generates test programs in order to test programming languages and domain specific languages using formal methods. The novelty of the approach is that it is embedded into a model driven engineering environment and it is described as a model finding problem. The grammar of the language and the respective test programs are represented as meta-models and models, respectively. As a result, model finders are utilized to generate test programs based on user constraints while additionally ensuring embedded constraints of the programmmg languages. An experimental evaluation demonstrates the applicability of the approach.
{"title":"Grammar-based program generation based on model finding","authors":"Mathias Soeken, R. Drechsler","doi":"10.1109/IDT.2013.6727084","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727084","url":null,"abstract":"This paper presents an algorithm that generates test programs in order to test programming languages and domain specific languages using formal methods. The novelty of the approach is that it is embedded into a model driven engineering environment and it is described as a model finding problem. The grammar of the language and the respective test programs are represented as meta-models and models, respectively. As a result, model finders are utilized to generate test programs based on user constraints while additionally ensuring embedded constraints of the programmmg languages. An experimental evaluation demonstrates the applicability of the approach.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125592383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727077
S. Muhammad, M. El-Moursy, Ali A. Ei-Moursy, A. M. Refaat
Low leakage power with maintained high throughput NoC is achieved. Traffic-based Virtual channel Activation algorithm (TVA) is proposed to activate/deactivate virtual channels in a NoC. The proposed algorithm implements Adaptive Virtual Channel technique of switching-OFF idle virtual channels in the NoC according to traffic heaviness. TVA is an efficient and flexible algorithm which provides set of parameters to be tuned and combined to achieve high performance and high power saving. NoC average leakage power has been reduced by 73.5% with negligible less than 1% degradation in throughput.
{"title":"Traffic-based virtual channel activation for low-power NoC","authors":"S. Muhammad, M. El-Moursy, Ali A. Ei-Moursy, A. M. Refaat","doi":"10.1109/IDT.2013.6727077","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727077","url":null,"abstract":"Low leakage power with maintained high throughput NoC is achieved. Traffic-based Virtual channel Activation algorithm (TVA) is proposed to activate/deactivate virtual channels in a NoC. The proposed algorithm implements Adaptive Virtual Channel technique of switching-OFF idle virtual channels in the NoC according to traffic heaviness. TVA is an efficient and flexible algorithm which provides set of parameters to be tuned and combined to achieve high performance and high power saving. NoC average leakage power has been reduced by 73.5% with negligible less than 1% degradation in throughput.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130868250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727136
R. Asif, A. Hussaini, R. Abd‐Alhameed, S. R. Jones, J. Noras, E. Elkhazmi, Jonathan Rodriguez
We have studied the performance of multidimensional signaling techniques using wavelets based modulation within an orthogonally multiplexed communication system. The discrete wavelets transform and wavelet packet modulation techniques have been studied using Daubechies 2 and 8, Biothogonal1.5 and 3.1 and reverse Biorthognal 1.5 and 3.1 wavelets in the presence of Rayleigh multipath fading channels with AWGN. Results showed that DWT based systems outperform WPM systems both in terms of BER vs. SNR performance as well as processing. The performances of two different equalizations techniques, namely zero forcing (ZF) and minimum mean square error (MMSE), were also compared using DWT. When the channel is modeled using Rayleigh multipath fading, AWGN and ISI both techniques yield similar performance.
{"title":"Performance of different wavelet families using DWT and DWPT-channel equalization using ZF and MMSE","authors":"R. Asif, A. Hussaini, R. Abd‐Alhameed, S. R. Jones, J. Noras, E. Elkhazmi, Jonathan Rodriguez","doi":"10.1109/IDT.2013.6727136","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727136","url":null,"abstract":"We have studied the performance of multidimensional signaling techniques using wavelets based modulation within an orthogonally multiplexed communication system. The discrete wavelets transform and wavelet packet modulation techniques have been studied using Daubechies 2 and 8, Biothogonal1.5 and 3.1 and reverse Biorthognal 1.5 and 3.1 wavelets in the presence of Rayleigh multipath fading channels with AWGN. Results showed that DWT based systems outperform WPM systems both in terms of BER vs. SNR performance as well as processing. The performances of two different equalizations techniques, namely zero forcing (ZF) and minimum mean square error (MMSE), were also compared using DWT. When the channel is modeled using Rayleigh multipath fading, AWGN and ISI both techniques yield similar performance.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133281648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}