Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727130
B. K. Mohanty, S. Al-Maadeed, A. Amira
In this paper, we present an efficient poly-phase decomposition scheme for implementation of 2-D non-separable filter bank. Poly-phase decomposition scheme offers multiplexing of filter bank computations or/and reduce the data clocking without affecting the overall throughput rate. Both these features can be used conveniently depending on resources availability or processor-technology. Time-multiplexing could be the choice for resource-constrained applications. Slower clocking rate could be chosen if processor-technology is the constraint. In that case, the design could be realized with cheaper and slower processor-technology. Time-multiplexed design needs proper data scheduling to perform filter bank computation interleavingly without data overlapping. Keeping this in mind, we have derived a systolic architecture for hardware realization of time-multiplexed filter bank where we have used novel data buffering scheme for the filter coefficients of the filter bank. Comparison result show that, the proposed structure involves almost J times less hardware resource than the non poly-phase filter bank structure and it provides the same throughput rate as the other, where J is the filter bank size. The hardware saving is significant for large size filter banks like Gabor. The proposed structure could be a good candidate for efficient hardware implementation of non-separable filter bank used in various image processing applications such as biometrics systems.
{"title":"Systolic architecture for hardware implementation of two-dimensional non-separable filter-bank","authors":"B. K. Mohanty, S. Al-Maadeed, A. Amira","doi":"10.1109/IDT.2013.6727130","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727130","url":null,"abstract":"In this paper, we present an efficient poly-phase decomposition scheme for implementation of 2-D non-separable filter bank. Poly-phase decomposition scheme offers multiplexing of filter bank computations or/and reduce the data clocking without affecting the overall throughput rate. Both these features can be used conveniently depending on resources availability or processor-technology. Time-multiplexing could be the choice for resource-constrained applications. Slower clocking rate could be chosen if processor-technology is the constraint. In that case, the design could be realized with cheaper and slower processor-technology. Time-multiplexed design needs proper data scheduling to perform filter bank computation interleavingly without data overlapping. Keeping this in mind, we have derived a systolic architecture for hardware realization of time-multiplexed filter bank where we have used novel data buffering scheme for the filter coefficients of the filter bank. Comparison result show that, the proposed structure involves almost J times less hardware resource than the non poly-phase filter bank structure and it provides the same throughput rate as the other, where J is the filter bank size. The hardware saving is significant for large size filter banks like Gabor. The proposed structure could be a good candidate for efficient hardware implementation of non-separable filter bank used in various image processing applications such as biometrics systems.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114689648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727109
S. Butt, L. Lavagno
Design teams are increasingly looking for design flows that can rapidly lead to high performance and low power implementation of DSP algorithms. Model-based design can satisfy this requirement, but it must be (1) coupled with efficient high-level synthesis support in order to provide good Quality of Results, and (2) controlled to derive the desired area/performance/throughput trade-off. We present a semi-automatic design flow for rapid high level synthesis-based hardware design space exploration starting from Simulink digital signal processing models. We illustrate our flow with a realistically complex signal processing algorithm for estimating the direction of arrival of a sound source. We show how one can start from a functionally validated fixed point model in Simulink and then go through a relatively simple design flow for hardware synthesis and automatic design space exploration, obtaining a very efficient hardware implementation that is competitive with the RTL implementation generated by another commercial model-based design tool.
{"title":"Design space exploration and synthesis for digital signal processing algorithms from Simulink models","authors":"S. Butt, L. Lavagno","doi":"10.1109/IDT.2013.6727109","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727109","url":null,"abstract":"Design teams are increasingly looking for design flows that can rapidly lead to high performance and low power implementation of DSP algorithms. Model-based design can satisfy this requirement, but it must be (1) coupled with efficient high-level synthesis support in order to provide good Quality of Results, and (2) controlled to derive the desired area/performance/throughput trade-off. We present a semi-automatic design flow for rapid high level synthesis-based hardware design space exploration starting from Simulink digital signal processing models. We illustrate our flow with a realistically complex signal processing algorithm for estimating the direction of arrival of a sound source. We show how one can start from a functionally validated fixed point model in Simulink and then go through a relatively simple design flow for hardware synthesis and automatic design space exploration, obtaining a very efficient hardware implementation that is competitive with the RTL implementation generated by another commercial model-based design tool.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132664291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727108
Sezer Gören, Ozgur Ozkurt, Yusuf Turk, Abdullah Yildiz, H. F. Ugurdag
This paper presents a new Dynamic Partial Self Reconfiguration (DPSR) flow for Xilinx FPGAs. Leveraging the Xilinx FPGA Editor and PlanAhead tools, we provide two implementation approaches that enable partial reconfiguration for large configuration changes without Xilinx's paid tool. The flow is difference-based but still allows a modular design, which is made up of Partial Reconfiguration (PR) modules and a static design. It works regardless of the amount of difference between PR modules. We call this flow DPSR-LD, where LD stands for Large Differences. DPSR-LD is an enabler especially for Spartan-6 FPGA family., as Xilinx currently supports PR on Spartan-6 only through the difference-based flow and only for small differences. DPSR-LD also includes an ICAP controller that makes DPSR possible and offers bitstream compression.
{"title":"Enabling difference-based dynamic partial self reconfiguration for large differences","authors":"Sezer Gören, Ozgur Ozkurt, Yusuf Turk, Abdullah Yildiz, H. F. Ugurdag","doi":"10.1109/IDT.2013.6727108","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727108","url":null,"abstract":"This paper presents a new Dynamic Partial Self Reconfiguration (DPSR) flow for Xilinx FPGAs. Leveraging the Xilinx FPGA Editor and PlanAhead tools, we provide two implementation approaches that enable partial reconfiguration for large configuration changes without Xilinx's paid tool. The flow is difference-based but still allows a modular design, which is made up of Partial Reconfiguration (PR) modules and a static design. It works regardless of the amount of difference between PR modules. We call this flow DPSR-LD, where LD stands for Large Differences. DPSR-LD is an enabler especially for Spartan-6 FPGA family., as Xilinx currently supports PR on Spartan-6 only through the difference-based flow and only for small differences. DPSR-LD also includes an ICAP controller that makes DPSR possible and offers bitstream compression.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133251201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727103
S. Majzoub, Z. Al-Ars, S. Hamdioui
In this paper, we propose a novel technique that uses multi-Vt design to reduce the impact of random process variation on delay and power in a many-core platform. Random variation is mostly attributed to the random-dopant fluctuation. The proposed technique reduces this fluctuation by lowering the dopant density and then compensating the threshold voltage using a footer transistor. The results show a reduction of the total standard deviation from 25% down to 17% using the proposed method.
{"title":"Reducing random-dopant fluctuation impact on core-speed and power variability in many-core platforms","authors":"S. Majzoub, Z. Al-Ars, S. Hamdioui","doi":"10.1109/IDT.2013.6727103","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727103","url":null,"abstract":"In this paper, we propose a novel technique that uses multi-Vt design to reduce the impact of random process variation on delay and power in a many-core platform. Random variation is mostly attributed to the random-dopant fluctuation. The proposed technique reduces this fluctuation by lowering the dopant density and then compensating the threshold voltage using a footer transistor. The results show a reduction of the total standard deviation from 25% down to 17% using the proposed method.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134218858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727080
Hamid Mushtaq, Z. Al-Ars, K. Bertels
Parallel systems were for a long time confined to high-performance computing. However, with the increasing popularity of multicore processors, parallelization has also become important for other computing domains, such as desktops and embedded systems. Mission-critical embedded software, like that used in avionics and automotive industry, also needs to guarantee real time behavior. For that purpose, tools are needed to calculate the worst-case execution time (WCET) of tasks running on a processor, so that the real time system can make sure that real time guarantees are met. However, due to the shared resources present in a multicore system, this task is made much more difficult as compared to finding WCET for a single core processor. In this paper, we will discuss how recent research has tried to solve this problem and what the open research problems are.
{"title":"Accurate and efficient identification of worst-case execution time for multicore processors: A survey","authors":"Hamid Mushtaq, Z. Al-Ars, K. Bertels","doi":"10.1109/IDT.2013.6727080","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727080","url":null,"abstract":"Parallel systems were for a long time confined to high-performance computing. However, with the increasing popularity of multicore processors, parallelization has also become important for other computing domains, such as desktops and embedded systems. Mission-critical embedded software, like that used in avionics and automotive industry, also needs to guarantee real time behavior. For that purpose, tools are needed to calculate the worst-case execution time (WCET) of tasks running on a processor, so that the real time system can make sure that real time guarantees are met. However, due to the shared resources present in a multicore system, this task is made much more difficult as compared to finding WCET for a single core processor. In this paper, we will discuss how recent research has tried to solve this problem and what the open research problems are.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134251870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727100
Jose Pedro Cardoso, J. M. D. Silva
Timing is a critical issue in communication systems, especially for synchronous communications. These show a high dependence on the clock signal purity due to errors that can be introduced into the decision process. This paper addresses the design, on a 130nm CMOS process, of a Radiation Tolerant Voltage Controlled Quartz Crystal Oscillator (VCXO), including techniques to reduce the influence of radiation and noise on its performance. The VCXO is included on a PLL designed to work within High Energy Physics (HEP) experiments.
{"title":"Design tradeoffs for voltage controlled crystal oscillators with built-in calibration mechanisms","authors":"Jose Pedro Cardoso, J. M. D. Silva","doi":"10.1109/IDT.2013.6727100","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727100","url":null,"abstract":"Timing is a critical issue in communication systems, especially for synchronous communications. These show a high dependence on the clock signal purity due to errors that can be introduced into the decision process. This paper addresses the design, on a 130nm CMOS process, of a Radiation Tolerant Voltage Controlled Quartz Crystal Oscillator (VCXO), including techniques to reduce the influence of radiation and noise on its performance. The VCXO is included on a PLL designed to work within High Energy Physics (HEP) experiments.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115520292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727145
C. Preschern, N. Kajtazovic, Andrea Höller, C. Steger, Christian Kreiner
In this paper we present generic CPU self-test programs and we check if the test programs conform to the IEC 61508 safety standard. We use processor architecture independent test programs to indirectly test the CPU components. We present a fault injection framework which we use to verify the fault detection ratio of the self-tests through simulation on a Plasma/MIPS and on a LEON3 processor.
{"title":"Verifying generic IEC 61508 CPU self-tests with fault injection","authors":"C. Preschern, N. Kajtazovic, Andrea Höller, C. Steger, Christian Kreiner","doi":"10.1109/IDT.2013.6727145","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727145","url":null,"abstract":"In this paper we present generic CPU self-test programs and we check if the test programs conform to the IEC 61508 safety standard. We use processor architecture independent test programs to indirectly test the CPU components. We present a fault injection framework which we use to verify the fault detection ratio of the self-tests through simulation on a Plasma/MIPS and on a LEON3 processor.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121178173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727126
E. Awad, Theodora Rezk, A. Abou-Auf
We present theoretical model of interdigitated MSM photo-detector and RF electro-optic self-mixer based on standard CMOS technology. The model allows for simulation and analysis of photodetection performance and RF self-mixing capabilities of MSM. A performance comparison is performed between Si and GaAs materials in case of steady-state and transient operation.
{"title":"Silicon CMOS interdigitated-MSM photodetector and self-mixer for low-cost crash-avoidance Ladar system","authors":"E. Awad, Theodora Rezk, A. Abou-Auf","doi":"10.1109/IDT.2013.6727126","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727126","url":null,"abstract":"We present theoretical model of interdigitated MSM photo-detector and RF electro-optic self-mixer based on standard CMOS technology. The model allows for simulation and analysis of photodetection performance and RF self-mixing capabilities of MSM. A performance comparison is performed between Si and GaAs materials in case of steady-state and transient operation.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117075952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727120
P. Bernardi, D. Boyang, Lyl M. Ciganda Brasca, E. Sánchez, M. Reorda, M. Grosso, O. Ballan
When the result of a previous instruction is needed in the pipeline before it is available, a “data hazard” occurs. Register Forwarding and Pipeline Interlock (RF&PI) are mechanisms suitable to avoid data corruption and to limit the performance penalty caused by data hazards in pipelined microprocessors. Data hazards handling is part of the microprocessor control logic; its test can hardly be achieved with a functional approach, unless a specific test algorithm is adopted. In this paper we analyze the causes for the low functional testability of the RF&PI logic and propose some techniques able to effectively perform its test. In particular, we describe a strategy to perform Software-Based Self-Test (SBST) on the RF&PI unit. The general structure of the unit is analyzed, a suitable test algorithm is proposed and the strategy to observe the test responses is explained. The method can be exploited for test both at the end of manufacturing and in the operational phase. Feasibility and effectiveness of the proposed approach are demonstrated on both an academic MIPS-like processor and an industrial System-on-Chip based on the Power ArchitectureTM.
{"title":"A functional test algorithm for the register forwarding and pipeline interlocking unit in pipelined microprocessors","authors":"P. Bernardi, D. Boyang, Lyl M. Ciganda Brasca, E. Sánchez, M. Reorda, M. Grosso, O. Ballan","doi":"10.1109/IDT.2013.6727120","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727120","url":null,"abstract":"When the result of a previous instruction is needed in the pipeline before it is available, a “data hazard” occurs. Register Forwarding and Pipeline Interlock (RF&PI) are mechanisms suitable to avoid data corruption and to limit the performance penalty caused by data hazards in pipelined microprocessors. Data hazards handling is part of the microprocessor control logic; its test can hardly be achieved with a functional approach, unless a specific test algorithm is adopted. In this paper we analyze the causes for the low functional testability of the RF&PI logic and propose some techniques able to effectively perform its test. In particular, we describe a strategy to perform Software-Based Self-Test (SBST) on the RF&PI unit. The general structure of the unit is analyzed, a suitable test algorithm is proposed and the strategy to observe the test responses is explained. The method can be exploited for test both at the end of manufacturing and in the operational phase. Feasibility and effectiveness of the proposed approach are demonstrated on both an academic MIPS-like processor and an industrial System-on-Chip based on the Power ArchitectureTM.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126530355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/IDT.2013.6727074
Carlos Ivan Castro Marquez, M. Strum, J. Wang
Formal techniques allow exhaustive verification on circuit design (at least in theory), but due to actual computational limitations, workarounds must always be adopted to check only a portion of the design at a time. Sequential equivalence checking is an effective approach, but it can only be applied between circuit descriptions where a one-to-one correspondence for states, as well as for memory elements, is expected. This paper presents a formal methodology to verify RTL descriptions through direct comparison with high-level reference models. By doing so, there is no need to specify or analyze formal properties, as the complete behavior is already contained in the reference model. We also consider the natural discrepancies between system level and RTL code, including non-matching interface and memory elements, and state mapping. In this manner, we are able to prove the functional coherence for the overall sequential behavior of the design under verification.
{"title":"Functional verification of complete sequential behaviors: A formal treatment of discrepancies between system-level and RTL descriptions","authors":"Carlos Ivan Castro Marquez, M. Strum, J. Wang","doi":"10.1109/IDT.2013.6727074","DOIUrl":"https://doi.org/10.1109/IDT.2013.6727074","url":null,"abstract":"Formal techniques allow exhaustive verification on circuit design (at least in theory), but due to actual computational limitations, workarounds must always be adopted to check only a portion of the design at a time. Sequential equivalence checking is an effective approach, but it can only be applied between circuit descriptions where a one-to-one correspondence for states, as well as for memory elements, is expected. This paper presents a formal methodology to verify RTL descriptions through direct comparison with high-level reference models. By doing so, there is no need to specify or analyze formal properties, as the complete behavior is already contained in the reference model. We also consider the natural discrepancies between system level and RTL code, including non-matching interface and memory elements, and state mapping. In this manner, we are able to prove the functional coherence for the overall sequential behavior of the design under verification.","PeriodicalId":446826,"journal":{"name":"2013 8th IEEE Design and Test Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126539079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}