Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226067
Mingmin Bai, Dan Zhao, M. Bayoumi
When on-chip interconnection network scales to integrate more processing elements, the average end-to-end latency is highly increased due to long average hop distance. Though it has been discovered that, almost of the communication in large scale networks is between nodes in a short range, it revealed that the small portion of data delivery between distant nodes consumes or occupies most of the network bandwidth. Hierarchical NoCs caters an attractive solution to resolve the distant data transmission problem by taking advantage of the network hierarchy. However, it brings about new sever congestion challenge because of uneven traffic distribution among hierarchy. In previous work, we performed a detouring scheme on a layered hierarchical NoC. When congestion is formed on the access link to adjacent hierarchical layer, the detouring scheme seeks and reroutes the packets to an nearby node to access the next adjacent network layer. It revealed that the links, which bridges the packets up to higher layers, are more essential for distributing the traffic and avoiding congestion between hierarchy levels. In this paper, we proposed dynamic schemes to solve the congestion problem introduced by region-based hierarchical routing on a hierarchical NoC. The results exposed that the dynamic approaches are efficient to manage the congestion under heavier long range traffic load, yielding significant average network latency reduction and throughput increment under mixed synthetic traffic patterns.
{"title":"Router-level performance driven dynamic management in hierarchical networks-on-chip","authors":"Mingmin Bai, Dan Zhao, M. Bayoumi","doi":"10.1109/SOCC.2017.8226067","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226067","url":null,"abstract":"When on-chip interconnection network scales to integrate more processing elements, the average end-to-end latency is highly increased due to long average hop distance. Though it has been discovered that, almost of the communication in large scale networks is between nodes in a short range, it revealed that the small portion of data delivery between distant nodes consumes or occupies most of the network bandwidth. Hierarchical NoCs caters an attractive solution to resolve the distant data transmission problem by taking advantage of the network hierarchy. However, it brings about new sever congestion challenge because of uneven traffic distribution among hierarchy. In previous work, we performed a detouring scheme on a layered hierarchical NoC. When congestion is formed on the access link to adjacent hierarchical layer, the detouring scheme seeks and reroutes the packets to an nearby node to access the next adjacent network layer. It revealed that the links, which bridges the packets up to higher layers, are more essential for distributing the traffic and avoiding congestion between hierarchy levels. In this paper, we proposed dynamic schemes to solve the congestion problem introduced by region-based hierarchical routing on a hierarchical NoC. The results exposed that the dynamic approaches are efficient to manage the congestion under heavier long range traffic load, yielding significant average network latency reduction and throughput increment under mixed synthetic traffic patterns.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131183127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226048
Jinglan Liu, Yukun Ding, Jianlei Yang, Ulf Schlichtmann, Yiyu Shi
The relentless efforts towards power reduction of integrated circuits have led to the prevalence of near-threshold computing paradigms. With the significantly reduced noise margin, therefore, it is no longer possible to fully assure power integrity at design time. As a result, designers seek to contain noise violations, commonly known as voltage emergencies, through various runtime techniques. All these techniques require accurate capture of voltage emergencies through noise sensors. Although existing approaches have explored the optimal placement of noise sensors, they all exploited the statistical modeling of noise, which requires a large number of samples in a high-dimensional space. For large scale power grids, these techniques may not work due to the very long simulation time required to get the samples. In this paper, we explore a novel approach based on generative adversarial network (GAN), which only requires a small number of samples to train. Experimental results show that compared with a simple heuristic which takes in the same number of samples, our approach can reduce the miss rate of voltage emergency detection by up to 65.3% on an industrial design.
{"title":"Generative adversarial network based scalable on-chip noise sensor placement","authors":"Jinglan Liu, Yukun Ding, Jianlei Yang, Ulf Schlichtmann, Yiyu Shi","doi":"10.1109/SOCC.2017.8226048","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226048","url":null,"abstract":"The relentless efforts towards power reduction of integrated circuits have led to the prevalence of near-threshold computing paradigms. With the significantly reduced noise margin, therefore, it is no longer possible to fully assure power integrity at design time. As a result, designers seek to contain noise violations, commonly known as voltage emergencies, through various runtime techniques. All these techniques require accurate capture of voltage emergencies through noise sensors. Although existing approaches have explored the optimal placement of noise sensors, they all exploited the statistical modeling of noise, which requires a large number of samples in a high-dimensional space. For large scale power grids, these techniques may not work due to the very long simulation time required to get the samples. In this paper, we explore a novel approach based on generative adversarial network (GAN), which only requires a small number of samples to train. Experimental results show that compared with a simple heuristic which takes in the same number of samples, our approach can reduce the miss rate of voltage emergency detection by up to 65.3% on an industrial design.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131243191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8225999
A. Alshammari, M. Sobhy, P. Lee
A new cryptosystem approach based on Lorenz chaotic systems is presented for secure data transmission. The system uses a stream cipher, in which the encryption key varies continuously. Furthermore one or more of the parameters of the Lorenz generator is controlled by an auxiliary chaotic generator for increased security. The system is implemented by using two separate Spartan 6 FPGA boards. Security analysis (Section VII) shows the system to have a high degree of security compared to other communication systems.
{"title":"Secure digital communication based on Lorenz stream cipher","authors":"A. Alshammari, M. Sobhy, P. Lee","doi":"10.1109/SOCC.2017.8225999","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225999","url":null,"abstract":"A new cryptosystem approach based on Lorenz chaotic systems is presented for secure data transmission. The system uses a stream cipher, in which the encryption key varies continuously. Furthermore one or more of the parameters of the Lorenz generator is controlled by an auxiliary chaotic generator for increased security. The system is implemented by using two separate Spartan 6 FPGA boards. Security analysis (Section VII) shows the system to have a high degree of security compared to other communication systems.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121784763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226081
N. C. Laurenciu, S. Cotofana
In this paper we introduce and evaluate Haar based codec assisted medium and long range data transport structures, e.g., bus segments, Network on Chip interconnects, able to deal with technology scaling related phenomena (e.g., increased susceptibility to proximity coupling noise and transmission delay variability), targeting energy savings at the expense of a reasonably small overhead, i.e., 1 extra wire, a 2-gate encoder, and a 2-gate decoder, for each and every pair of uncoded wires. For practical evaluation we employed a 45nm commercial CMOS technology and different random, uncorrelated workload profiles. For 5mm and 10mm long 8-bit buses (without repeaters), we obtain energy savings of 55% and 34%, and a transmission frequency increase of 35% and 41%, respectively, at the expense of less than 1% area overhead with respect to the reference system (i.e., 8-wire synchronous uncoded bus), which prove energy and delay effectiveness. We further augment our proposal with a Single Error Correction and Double Error Detection (SECDED) scheme particularly adapted to its structure, in order to cope with very deep sub-micron noise (e.g., supply voltage variations, electromagnetic interference) induced transmission errors. When compared to the reference system (not SECDED protected), for 10mm long buses, our Haar tailored SECDED approach consumes 27% less energy at the expense of 2% area overhead.
{"title":"Haar-based interconnect coding for energy effective medium/long range data transport","authors":"N. C. Laurenciu, S. Cotofana","doi":"10.1109/SOCC.2017.8226081","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226081","url":null,"abstract":"In this paper we introduce and evaluate Haar based codec assisted medium and long range data transport structures, e.g., bus segments, Network on Chip interconnects, able to deal with technology scaling related phenomena (e.g., increased susceptibility to proximity coupling noise and transmission delay variability), targeting energy savings at the expense of a reasonably small overhead, i.e., 1 extra wire, a 2-gate encoder, and a 2-gate decoder, for each and every pair of uncoded wires. For practical evaluation we employed a 45nm commercial CMOS technology and different random, uncorrelated workload profiles. For 5mm and 10mm long 8-bit buses (without repeaters), we obtain energy savings of 55% and 34%, and a transmission frequency increase of 35% and 41%, respectively, at the expense of less than 1% area overhead with respect to the reference system (i.e., 8-wire synchronous uncoded bus), which prove energy and delay effectiveness. We further augment our proposal with a Single Error Correction and Double Error Detection (SECDED) scheme particularly adapted to its structure, in order to cope with very deep sub-micron noise (e.g., supply voltage variations, electromagnetic interference) induced transmission errors. When compared to the reference system (not SECDED protected), for 10mm long buses, our Haar tailored SECDED approach consumes 27% less energy at the expense of 2% area overhead.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"71 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121004647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226005
Z. Yan, M. Atef, Guoxing Wang, Y. Lian
This paper presents the design and implementation of an 8-channel low-noise chopper-stabilized analog front-end (AFE) for electroencephalogram (EEG) acquisition system. Each channel of the AFE is composed of an AC-coupled chopper instrumentation amplifier (ACCIA), a programmable gain amplifier (PGA), and a buffer. A positive feedback loop is adopted to boost its input impedance while the low-pass property suppresses the chopping ripple. The proposed AFE is implemented in 0.35 gm CMOS technology with the ADC, MUX, digital part and other control blocks. Post-layout simulation results show that the AFE achieves 46/52/58/64 dB programmable gain, 108 dB CMRR, and 0.32 μVrms input-referred noise for a bandwidth of 0.5–150 Hz. Each channel consumes 7.5 μA from a 3 V supply.
{"title":"Low-noise high input impedance 8-channels chopper-stabilized EEG acquisition system","authors":"Z. Yan, M. Atef, Guoxing Wang, Y. Lian","doi":"10.1109/SOCC.2017.8226005","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226005","url":null,"abstract":"This paper presents the design and implementation of an 8-channel low-noise chopper-stabilized analog front-end (AFE) for electroencephalogram (EEG) acquisition system. Each channel of the AFE is composed of an AC-coupled chopper instrumentation amplifier (ACCIA), a programmable gain amplifier (PGA), and a buffer. A positive feedback loop is adopted to boost its input impedance while the low-pass property suppresses the chopping ripple. The proposed AFE is implemented in 0.35 gm CMOS technology with the ADC, MUX, digital part and other control blocks. Post-layout simulation results show that the AFE achieves 46/52/58/64 dB programmable gain, 108 dB CMRR, and 0.32 μVrms input-referred noise for a bandwidth of 0.5–150 Hz. Each channel consumes 7.5 μA from a 3 V supply.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115407012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226032
N. Sharma, J. Bird, P. Dowben, A. Marshall
The magneto-electric magnetic tunnel junction (ME-MTJ) is a voltage controlled beyond CMOS device based on the principle of ME anti-ferromagnetic (AFM) exchange biasing of chromia (Cr2O3) and the tunneling magnetoresistance (TMR) of a magnetic tunnel junction (fixed/free ferromagnet (FM) stack). These devices have previously been demonstrated for the implementation of digital logic and memory applications. We here demonstrate their analog capabilities with a variety of analog functions adapted specifically to the characteristics of ME-MTJ — based devices. The novel circuit options proposed in this paper includes a ME-MTJ based analog comparator and the two variations of an 8-level analog-to-digital converter (ADC) using serial and parallel ME-MTJ circuit configurations.
{"title":"Magneto-electric magnetic tunnel junction based analog circuit options","authors":"N. Sharma, J. Bird, P. Dowben, A. Marshall","doi":"10.1109/SOCC.2017.8226032","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226032","url":null,"abstract":"The magneto-electric magnetic tunnel junction (ME-MTJ) is a voltage controlled beyond CMOS device based on the principle of ME anti-ferromagnetic (AFM) exchange biasing of chromia (Cr2O3) and the tunneling magnetoresistance (TMR) of a magnetic tunnel junction (fixed/free ferromagnet (FM) stack). These devices have previously been demonstrated for the implementation of digital logic and memory applications. We here demonstrate their analog capabilities with a variety of analog functions adapted specifically to the characteristics of ME-MTJ — based devices. The novel circuit options proposed in this paper includes a ME-MTJ based analog comparator and the two variations of an 8-level analog-to-digital converter (ADC) using serial and parallel ME-MTJ circuit configurations.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125751592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226015
Marco Pagani, Alessio Balsini, Alessandro Biondi, Mauro Marinoni, G. Buttazzo
Heterogeneous computing platforms including both processors and field programmable gate arrays (FPGAs) represent an attractive solution for balancing software flexibility with high performance and energy efficiency of custom hardware modules. Furthermore, the dynamic partial reconfiguration (DPR) capabilities of modern FPGAs allow virtualizing the available area to support several hardware modules in time sharing, hence making them even more attractive. Such a feature is exploited by the FRED framework, recently proposed to support the development of real-time applications upon such platforms. This paper presents an implementation of the FRED framework for the Linux operating system over the Zynq-7000 platform produced by Xilinx. Design solutions for managing hardware accelerators are first discussed. Then, a software architecture for Linux is presented, which comprises (i) support for shared-memory communication with hardware accelerators, (ii) an improved driver to handle the FPGA reconfiguration and (iii) a scheduler for requests of hardware acceleration. The proposed solution allows exploiting the enormous number of software systems available for Linux (such as drivers, libraries, communication stacks, etc.) and the typical programming flexibility of software, while relying on predictable hardware acceleration of heavy computations.
{"title":"A Linux-based support for developing real-time applications on heterogeneous platforms with dynamic FPGA reconfiguration","authors":"Marco Pagani, Alessio Balsini, Alessandro Biondi, Mauro Marinoni, G. Buttazzo","doi":"10.1109/SOCC.2017.8226015","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226015","url":null,"abstract":"Heterogeneous computing platforms including both processors and field programmable gate arrays (FPGAs) represent an attractive solution for balancing software flexibility with high performance and energy efficiency of custom hardware modules. Furthermore, the dynamic partial reconfiguration (DPR) capabilities of modern FPGAs allow virtualizing the available area to support several hardware modules in time sharing, hence making them even more attractive. Such a feature is exploited by the FRED framework, recently proposed to support the development of real-time applications upon such platforms. This paper presents an implementation of the FRED framework for the Linux operating system over the Zynq-7000 platform produced by Xilinx. Design solutions for managing hardware accelerators are first discussed. Then, a software architecture for Linux is presented, which comprises (i) support for shared-memory communication with hardware accelerators, (ii) an improved driver to handle the FPGA reconfiguration and (iii) a scheduler for requests of hardware acceleration. The proposed solution allows exploiting the enormous number of software systems available for Linux (such as drivers, libraries, communication stacks, etc.) and the typical programming flexibility of software, while relying on predictable hardware acceleration of heavy computations.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226002
Youngtae Yang, Jaehoon Jun, Suhwan Kim
This paper presents a low-noise, low-power CMOS interface circuit for MEMS gyroscope readout ASIC. Our interface circuit is composed of a continuous-time delta-sigma modulator, an anti-aliasing filter, and an on-chip reference generator. By using a low-pass delta-sigma modulator instead of a band-pass delta-sigma modulator, a frequency matching circuit is unnecessary which enables wideband operation. A switched-capacitor resistor digital-to-analog converter is exploited to reduce clock jitter sensitivity of the modulator. An anti-aliasing filter rejects the out-band signal, and a low-noise on-chip reference generator is embedded for miniaturization. The proposed circuit is realized in a 0.18 μm CMOS process. It achieves 70.3 dB signal-to-noise ratio in a signal bandwidth from 29.5 kHz to 30.5 kHz with only 0.2 V differential peak-peak input. It dissipates 2.6 mW from a 3.3 V supply.
{"title":"A low-pass continuous-time delta-sigma interface circuit for wideband MEMS gyroscope readout ASIC","authors":"Youngtae Yang, Jaehoon Jun, Suhwan Kim","doi":"10.1109/SOCC.2017.8226002","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226002","url":null,"abstract":"This paper presents a low-noise, low-power CMOS interface circuit for MEMS gyroscope readout ASIC. Our interface circuit is composed of a continuous-time delta-sigma modulator, an anti-aliasing filter, and an on-chip reference generator. By using a low-pass delta-sigma modulator instead of a band-pass delta-sigma modulator, a frequency matching circuit is unnecessary which enables wideband operation. A switched-capacitor resistor digital-to-analog converter is exploited to reduce clock jitter sensitivity of the modulator. An anti-aliasing filter rejects the out-band signal, and a low-noise on-chip reference generator is embedded for miniaturization. The proposed circuit is realized in a 0.18 μm CMOS process. It achieves 70.3 dB signal-to-noise ratio in a signal bandwidth from 29.5 kHz to 30.5 kHz with only 0.2 V differential peak-peak input. It dissipates 2.6 mW from a 3.3 V supply.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114351604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226080
Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes
The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.
{"title":"System management recovery protocol for MPSoCs","authors":"Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes","doi":"10.1109/SOCC.2017.8226080","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226080","url":null,"abstract":"The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226013
A. Nannarelli, M. Re, G. Cardarilli, L. Nunzio, M. Brunella, R. Fazzolari, F. Carbonari
Reducing the configuration time of portions of an FPGA at run time is crucial in contemporary FPGA-based accelerators. In this work, we propose a method to increase the throughput for FPGA dynamic partial reconfiguration by using standard IP blocks. The throughput is increased by over-clocking the configuration bitstream circuitry beyond the limits stated in the specifications of these standard blocks. The experimental results show that the most power efficient implementation can reach a throughput of about 780 MB/s, corresponding to a configuration latency of about 670 micro-seconds for bitstreams of 1.2 MB. We also investigate alternatives to boost the reconfiguration throughput and sketch a methodology to achieve the most power efficient implementation of FPGA-based accelerators.
{"title":"Robust throughput boosting for low latency dynamic partial reconfiguration","authors":"A. Nannarelli, M. Re, G. Cardarilli, L. Nunzio, M. Brunella, R. Fazzolari, F. Carbonari","doi":"10.1109/SOCC.2017.8226013","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226013","url":null,"abstract":"Reducing the configuration time of portions of an FPGA at run time is crucial in contemporary FPGA-based accelerators. In this work, we propose a method to increase the throughput for FPGA dynamic partial reconfiguration by using standard IP blocks. The throughput is increased by over-clocking the configuration bitstream circuitry beyond the limits stated in the specifications of these standard blocks. The experimental results show that the most power efficient implementation can reach a throughput of about 780 MB/s, corresponding to a configuration latency of about 670 micro-seconds for bitstreams of 1.2 MB. We also investigate alternatives to boost the reconfiguration throughput and sketch a methodology to achieve the most power efficient implementation of FPGA-based accelerators.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130987269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}