Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724671
Hsiang-Chih Hsiao, Chun-Wei Chen, Jonas Wang, Ming-Der Shieh, Pei-Yin Chen
Cascaded classifier based object detectors are popular for many applications because of their high efficiency. Many researches have been devoted to developing the corresponding hardware accelerators. To reduce the circuit complexity while maintaining sufficient throughput, on-chip memories are commonly partitioned into several banks for parallel data access. However, since the coefficients of feature extraction are irregular, memory access conflict would frequently occur without proper scheduling. The proposed scheme explicitly schedules the access sequence as a post-processing for managing the coefficient memory. By formulating the desired sequence as a graph model, the classical graph coloring theory can then be adopted to solve the scheduling problem. In addition, the proposed graph model also considers the resource constraint on intermediate storage. Experimental results show that the throughput and area-efficiency of the target cascaded classifier can be greatly improved by adopting the proposed scheme as compared to the related work.
{"title":"Architecture-aware Memory Access Scheduling for High-throughput Cascaded Classifiers","authors":"Hsiang-Chih Hsiao, Chun-Wei Chen, Jonas Wang, Ming-Der Shieh, Pei-Yin Chen","doi":"10.1109/DDECS.2019.8724671","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724671","url":null,"abstract":"Cascaded classifier based object detectors are popular for many applications because of their high efficiency. Many researches have been devoted to developing the corresponding hardware accelerators. To reduce the circuit complexity while maintaining sufficient throughput, on-chip memories are commonly partitioned into several banks for parallel data access. However, since the coefficients of feature extraction are irregular, memory access conflict would frequently occur without proper scheduling. The proposed scheme explicitly schedules the access sequence as a post-processing for managing the coefficient memory. By formulating the desired sequence as a graph model, the classical graph coloring theory can then be adopted to solve the scheduling problem. In addition, the proposed graph model also considers the resource constraint on intermediate storage. Experimental results show that the throughput and area-efficiency of the target cascaded classifier can be greatly improved by adopting the proposed scheme as compared to the related work.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124330358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724657
L. Lopacinski, M. Eissa, G. Panic, A. Hasani, R. Kraemer
In this paper, we demonstrate a modular baseband and modular data link layer processors for wireless communication, which has been designed for a 200 GHz frontend. Although the individual system elements are well known, we combine the performance of parallel baseband and data link layer cores to cover a larger bandwidth. We combine three cores and achieve a single 1.5 GHz channel $( 3 times 500$ MHz).This paper is focused on the digital elements of the demonstrator, especially on the data link layer aspects and field-programmable gate array (FPGA) processing. We discuss the performance of the back-to-back connected demonstrator, with the focus on the data link layer implementation that is included in the baseband chip. The peak data rate achieved by the presented demonstrator is 1920 Mbps. The solution uses forward error correction mechanisms based on convolutional codes at the code rate equal to 3/4 and accepts bit error rate (BER) up to $10^{mathbf {-2}}$. Point-to-point and mesh network topologies are supported.
{"title":"Modular Data Link Layer Processing for THz communication","authors":"L. Lopacinski, M. Eissa, G. Panic, A. Hasani, R. Kraemer","doi":"10.1109/DDECS.2019.8724657","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724657","url":null,"abstract":"In this paper, we demonstrate a modular baseband and modular data link layer processors for wireless communication, which has been designed for a 200 GHz frontend. Although the individual system elements are well known, we combine the performance of parallel baseband and data link layer cores to cover a larger bandwidth. We combine three cores and achieve a single 1.5 GHz channel $( 3 times 500$ MHz).This paper is focused on the digital elements of the demonstrator, especially on the data link layer aspects and field-programmable gate array (FPGA) processing. We discuss the performance of the back-to-back connected demonstrator, with the focus on the data link layer implementation that is included in the baseband chip. The peak data rate achieved by the presented demonstrator is 1920 Mbps. The solution uses forward error correction mechanisms based on convolutional codes at the code rate equal to 3/4 and accepts bit error rate (BER) up to $10^{mathbf {-2}}$. Point-to-point and mesh network topologies are supported.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724668
M. Kovác, D. Arbet, V. Stopjaková, Michal Sovcík, L. Nagy
This paper deals with cross-implementation of analytical and physical fundamentals of ultra low-voltage charge pumps. The analysis is based on precise, general formulas including characteristic parasitic effects valid for linear charge pumps. The parasitic effects are extended by non-linear parasitic capacitances represented as equivalent linear model of a switched transistor itself. The discussion about non-linear and linear behaviour of these parasitics is also included and demonstrated using cross-coupled, dynamic threshold implementation, where the EKV model of transistors has been utilized. The paper also introduced a new design rule for design of charge pumps based on transistors working in sub-threshold region to maximize the power throughput. This is achieved by tuning the operation conditions to the boundary case.
{"title":"Investigation of Low-Voltage, Sub-threshold Charge Pump with Parasitics Aware Design Methodology","authors":"M. Kovác, D. Arbet, V. Stopjaková, Michal Sovcík, L. Nagy","doi":"10.1109/DDECS.2019.8724668","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724668","url":null,"abstract":"This paper deals with cross-implementation of analytical and physical fundamentals of ultra low-voltage charge pumps. The analysis is based on precise, general formulas including characteristic parasitic effects valid for linear charge pumps. The parasitic effects are extended by non-linear parasitic capacitances represented as equivalent linear model of a switched transistor itself. The discussion about non-linear and linear behaviour of these parasitics is also included and demonstrated using cross-coupled, dynamic threshold implementation, where the EKV model of transistors has been utilized. The paper also introduced a new design rule for design of charge pumps based on transistors working in sub-threshold region to maximize the power throughput. This is achieved by tuning the operation conditions to the boundary case.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116242840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/ddecs.2019.8724631
{"title":"[DDECS 2019 Breaker page]","authors":"","doi":"10.1109/ddecs.2019.8724631","DOIUrl":"https://doi.org/10.1109/ddecs.2019.8724631","url":null,"abstract":"","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126640731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724669
Andreas Krinke, T. Horst, G. Glaeser, Martin Grabmann, Tobias Markus, Benjamin Prautsch, U. Hatnik, J. Lienig
The effort in designing analog/mixed-signal (AMS) integrated circuits is characterized by the largely manual work involved in the design of analog cells and their integration into the overall circuit. This inequality in effort between analog and digital cells increases with the use of modern, more complex technology nodes. To mitigate this problem, this paper presents four methods to improve existing mixed-signal design flows: (1) automatic schematic generation from a system-level model, (2) flexible automatic analog layout generation, (3) constraint propagation and budget calculation for dependency resolution, and (4) verification of nonfunctional effects. The implementation of these steps results in a novel AMS design flow with a significantly higher degree of automation.
{"title":"From Constraints to Tape-Out: Towards a Continuous AMS Design Flow","authors":"Andreas Krinke, T. Horst, G. Glaeser, Martin Grabmann, Tobias Markus, Benjamin Prautsch, U. Hatnik, J. Lienig","doi":"10.1109/DDECS.2019.8724669","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724669","url":null,"abstract":"The effort in designing analog/mixed-signal (AMS) integrated circuits is characterized by the largely manual work involved in the design of analog cells and their integration into the overall circuit. This inequality in effort between analog and digital cells increases with the use of modern, more complex technology nodes. To mitigate this problem, this paper presents four methods to improve existing mixed-signal design flows: (1) automatic schematic generation from a system-level model, (2) flexible automatic analog layout generation, (3) constraint propagation and budget calculation for dependency resolution, and (4) verification of nonfunctional effects. The implementation of these steps results in a novel AMS design flow with a significantly higher degree of automation.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131421710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724659
L. Kohútka, L. Nagy, V. Stopjaková
This paper presents a novel hardware architecture of dynamic memory manager providing memory allocation and deallocation operations. Due to very low and constant latency of these operations with respect to the actual number and location of free blocks of memory, the proposed solution is suitable for hard real-time and mixed-criticality systems. The proposed hardware-accelerated memory manager implements Worst-Fit algorithm for selection of a suitable free block of memory that can be used by the external environment, e.g. CPU or any custom hardware. The proposed solution uses hardware-accelerated max queue, which is a data structure that continuously provides the largest free memory block in two clock cycles regardless of the actual number or constellation of available free blocks. The proposed memory manager was verified using simplified version of UVM and applying billions of randomly generated instructions as test inputs. A synthesis into Intel FPGA Cyclone V was performed, and the synthesis results are presented as well. The memory manager was also synthesized into 28 nm technology with 1 GHz clock frequency and the power supply voltage of 0.9 V. The ASIC synthesis results show that the proposed memory manager consumes additional chip area from 35% to 70% of the managed memory.
{"title":"Low Latency Hardware-Accelerated Dynamic Memory Manager for Hard Real-Time and Mixed-Criticality Systems","authors":"L. Kohútka, L. Nagy, V. Stopjaková","doi":"10.1109/DDECS.2019.8724659","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724659","url":null,"abstract":"This paper presents a novel hardware architecture of dynamic memory manager providing memory allocation and deallocation operations. Due to very low and constant latency of these operations with respect to the actual number and location of free blocks of memory, the proposed solution is suitable for hard real-time and mixed-criticality systems. The proposed hardware-accelerated memory manager implements Worst-Fit algorithm for selection of a suitable free block of memory that can be used by the external environment, e.g. CPU or any custom hardware. The proposed solution uses hardware-accelerated max queue, which is a data structure that continuously provides the largest free memory block in two clock cycles regardless of the actual number or constellation of available free blocks. The proposed memory manager was verified using simplified version of UVM and applying billions of randomly generated instructions as test inputs. A synthesis into Intel FPGA Cyclone V was performed, and the synthesis results are presented as well. The memory manager was also synthesized into 28 nm technology with 1 GHz clock frequency and the power supply voltage of 0.9 V. The ASIC synthesis results show that the proposed memory manager consumes additional chip area from 35% to 70% of the managed memory.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132169635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724636
Karel Szurman, Z. Kotásek
Redundancy (TMR). SRAM FPGAs are susceptible to Single Event Upsets (SEUs) which are the most common transient faults induced by the cosmic radiation. Therefore, SEU mitigation strategy is required when SRAM FPGAs are integrated into safety-critical systems. An essential requirement for these systems is often to remain fail-operational and thus, to perform implemented functionality after the occurrence of a fault. In this paper, we propose a run-time reconfigurable FT architecture based on coarse-grained TMR with triplicated soft-core processor NEO430 core, PDR for removing all transient SEU faults and the state synchronization allowing smooth state recovery from the inconsistent state when the reconfiguration of a failed processor instance was finished into the state where all three processors operate synchronously. The paper describes implemented FT architecture and run-time fault recovery strategy performing all necessary steps without additional blocking of the system functionality. The state synchronization for the soft-core processor NEO430 architecture is described in a further detail. Moreover, the paper presents developed PDR framework used for validation and testing of proposed fault recovery strategy.
{"title":"Run-Time Reconfigurable Fault Tolerant Architecture for Soft-Core Processor NEO430","authors":"Karel Szurman, Z. Kotásek","doi":"10.1109/DDECS.2019.8724636","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724636","url":null,"abstract":"Redundancy (TMR). SRAM FPGAs are susceptible to Single Event Upsets (SEUs) which are the most common transient faults induced by the cosmic radiation. Therefore, SEU mitigation strategy is required when SRAM FPGAs are integrated into safety-critical systems. An essential requirement for these systems is often to remain fail-operational and thus, to perform implemented functionality after the occurrence of a fault. In this paper, we propose a run-time reconfigurable FT architecture based on coarse-grained TMR with triplicated soft-core processor NEO430 core, PDR for removing all transient SEU faults and the state synchronization allowing smooth state recovery from the inconsistent state when the reconfiguration of a failed processor instance was finished into the state where all three processors operate synchronously. The paper describes implemented FT architecture and run-time fault recovery strategy performing all necessary steps without additional blocking of the system functionality. The state synchronization for the soft-core processor NEO430 architecture is described in a further detail. Moreover, the paper presents developed PDR framework used for validation and testing of proposed fault recovery strategy.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128393690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/ddecs.2019.8724633
{"title":"DDECS 2019 Committees","authors":"","doi":"10.1109/ddecs.2019.8724633","DOIUrl":"https://doi.org/10.1109/ddecs.2019.8724633","url":null,"abstract":"","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127628579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-01DOI: 10.1109/DDECS.2019.8724632
Stanislav Jerabek, Jan Schmidt
The dummy rounds protection scheme, intended to offer resistance against Side Channel Attacks to Feistel and SP ciphers, has been introduced in earlier work. Its experimental evaluation revealed weaknesses, most notably in the first and last round. In this contribution, we show that the situation can be greatly improved by controlling the transition probabilities in the state space of the algorithm. We derived necessary and sufficient conditions for the round execution probabilities to be uniform and hence the minimum possible. The optimum trajectories over the state space are regular and easy to implement.
{"title":"Analyzing and Optimizing the Dummy Rounds Scheme","authors":"Stanislav Jerabek, Jan Schmidt","doi":"10.1109/DDECS.2019.8724632","DOIUrl":"https://doi.org/10.1109/DDECS.2019.8724632","url":null,"abstract":"The dummy rounds protection scheme, intended to offer resistance against Side Channel Attacks to Feistel and SP ciphers, has been introduced in earlier work. Its experimental evaluation revealed weaknesses, most notably in the first and last round. In this contribution, we show that the situation can be greatly improved by controlling the transition probabilities in the state space of the algorithm. We derived necessary and sufficient conditions for the round execution probabilities to be uniform and hence the minimum possible. The optimum trajectories over the state space are regular and easy to implement.","PeriodicalId":197053,"journal":{"name":"2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116736867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}