Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00021
S. Duong, Anh Vu Trinh, T. Dinh
In this age of IoT (Internet of Things), Indoor Positioning (IPS) is considered as one of the most popular topics and has been researched widely all around the world, as the result of various applications it can provide. However, IPS is also a challenging topic that has a number of stringent requirements, such as cost, energy efficiency, availability and accuracy. The development of Bluetooth Low Energy (BLE) iBeacon has opened great opportunities for researchers to solve those challenges. In this paper, we present our iBeacon based positioning system, which we built as an application running on iOS platform. We also present Fingerprinting - the main positioning technique used in our system, in which we configure its fingerprints to improve accuracy. With that, a machine learning algorithm called k-Nearest Neighbor (kNN) is applied to extract the most probable user location. In addition, we also use Kalman Filter in order to enhance the stability of iBeacon's signal. Our system results in a 60% - 71.4% accuracy rate and an error of up to 1.6 m, which is acceptable in IPS.
{"title":"Bluetooth Low Energy Based Indoor Positioning on iOS Platform","authors":"S. Duong, Anh Vu Trinh, T. Dinh","doi":"10.1109/MCSoC2018.2018.00021","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00021","url":null,"abstract":"In this age of IoT (Internet of Things), Indoor Positioning (IPS) is considered as one of the most popular topics and has been researched widely all around the world, as the result of various applications it can provide. However, IPS is also a challenging topic that has a number of stringent requirements, such as cost, energy efficiency, availability and accuracy. The development of Bluetooth Low Energy (BLE) iBeacon has opened great opportunities for researchers to solve those challenges. In this paper, we present our iBeacon based positioning system, which we built as an application running on iOS platform. We also present Fingerprinting - the main positioning technique used in our system, in which we configure its fingerprints to improve accuracy. With that, a machine learning algorithm called k-Nearest Neighbor (kNN) is applied to extract the most probable user location. In addition, we also use Kalman Filter in order to enhance the stability of iBeacon's signal. Our system results in a 60% - 71.4% accuracy rate and an error of up to 1.6 m, which is acceptable in IPS.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127769830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00020
A. Kostrov, V. Stempitsky, A. Borovik, V. Tchekhovsky
8-channel mixed-signal application specific test IC was implemented in a TSMC 0.18 µm CMOS MS/RF 1.8/3.3 V process. A single IC channel is comprised of a chargesensitive preamplifier/shaper with a semi-Gaussian response, shaping amplifier with ion tail cancellation circuitry, differential output baseline restorer, a 10bit 10MSPS ADC. The structural scheme and specification of the IC are presented, the algorithm and features of chip functioning are described. The results of IC test channel simulation in the Cadence software package are presented.
{"title":"Design Features of Analog-to-Digital Solutions for the Tracking Detector Readout Electronics","authors":"A. Kostrov, V. Stempitsky, A. Borovik, V. Tchekhovsky","doi":"10.1109/MCSoC2018.2018.00020","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00020","url":null,"abstract":"8-channel mixed-signal application specific test IC was implemented in a TSMC 0.18 µm CMOS MS/RF 1.8/3.3 V process. A single IC channel is comprised of a chargesensitive preamplifier/shaper with a semi-Gaussian response, shaping amplifier with ion tail cancellation circuitry, differential output baseline restorer, a 10bit 10MSPS ADC. The structural scheme and specification of the IC are presented, the algorithm and features of chip functioning are described. The results of IC test channel simulation in the Cadence software package are presented.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124895487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the appearance of on-line big data stream computation, the explosive growth of mobile devices, the development of broadband cellular network, and widespread use of WiFi in recent years, the VM allocation problem has shifted gradually from batch processing to real-time processing. As the processing streaming workflow allocation becomes very large, it has become far more difficult. First, in this paper, we have modeled new network based on mobile cloud computing and mobile edge computing scheme for the real-time streaming workflow allocation problem. Our proposed network called Heterogeneous Node Network (HNN) consists of three types of computing node. HNN has a conventional data center (DC), a cloudlet (CL) located between edge server (ES) and DC, and ES consisting of mobile devices. In HNN, DC is the conventional placement destination of virtual machine (VM) and has high computing resource compared to other nodes; CL is a new computing resource, whose performance is lower than DC, but data transmission between CL and ES is faster than between DC and ES, and ES is a cluster of mobile devices with the lowest computing resource and its advantage is reducing the amount of data from raw data for crucial processes of streaming workflow. Second, we propose a heuristic streaming workflow allocation algorithm, which is flexible according to change of real-time availability for streaming workflow and HNN environment to achieve cost minimization. Our algorithm is the hybrid of a bin-packing algorithm and a shortest path algorithm based on the VM placement problem and the shortest path problem in graph network respectively. Finally, our developed algorithm has been compared with the result of linear programming (LP). In performance evaluation, the experimental results show our approach leads to a solution close to an optimal solution generated by LP and its execution time is reduced.
{"title":"On-Line Cost-Aware Workflow Allocation in Heterogeneous Computing Environments","authors":"Incheon Paik, Yuji Ishizuka, Quang-Minh Do, Wuhui Chen","doi":"10.1109/MCSoC2018.2018.00042","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00042","url":null,"abstract":"With the appearance of on-line big data stream computation, the explosive growth of mobile devices, the development of broadband cellular network, and widespread use of WiFi in recent years, the VM allocation problem has shifted gradually from batch processing to real-time processing. As the processing streaming workflow allocation becomes very large, it has become far more difficult. First, in this paper, we have modeled new network based on mobile cloud computing and mobile edge computing scheme for the real-time streaming workflow allocation problem. Our proposed network called Heterogeneous Node Network (HNN) consists of three types of computing node. HNN has a conventional data center (DC), a cloudlet (CL) located between edge server (ES) and DC, and ES consisting of mobile devices. In HNN, DC is the conventional placement destination of virtual machine (VM) and has high computing resource compared to other nodes; CL is a new computing resource, whose performance is lower than DC, but data transmission between CL and ES is faster than between DC and ES, and ES is a cluster of mobile devices with the lowest computing resource and its advantage is reducing the amount of data from raw data for crucial processes of streaming workflow. Second, we propose a heuristic streaming workflow allocation algorithm, which is flexible according to change of real-time availability for streaming workflow and HNN environment to achieve cost minimization. Our algorithm is the hybrid of a bin-packing algorithm and a shortest path algorithm based on the VM placement problem and the shortest path problem in graph network respectively. Finally, our developed algorithm has been compared with the result of linear programming (LP). In performance evaluation, the experimental results show our approach leads to a solution close to an optimal solution generated by LP and its execution time is reduced.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"25 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114052911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00014
Y. Ben-Asher, V. Tartakovsky, Katrina Portman, Orr Zilberman, Avishi Hadar
Viterbi decoders are an essential component in many embedded systems used for decoding streams of N data symbols over noisy channels. The decoding process is a sequential process wherein the decoder builds a trellis for N received symbols and then it traverses the trellis back computing the path in the trellis that implies the minimal amount of corrections in the bits of the N received symbols. Several techniques have been developed to increase the amount of parallelism of Viterbi decoders, showing building the trellis can be parallelized however to the selecting the minimal path proved harder to parallelize. In this work, we show that both building the Trellis and computing the minimal path can be parallelized as a sequence of matrix multiplications. This yields a parallel implementation with a linear speedup of order N/P+P where P is any amount of the desired parallelism in the circuit. We implemented a Verilog-generator that for any set of parameters generates an optimized sequential decoder and an optimized parallel decoder. We thus able to verify that the parallel version can obtain linear speedups.
{"title":"An FPGA Scalable Parallel Viterbi Decoder","authors":"Y. Ben-Asher, V. Tartakovsky, Katrina Portman, Orr Zilberman, Avishi Hadar","doi":"10.1109/MCSoC2018.2018.00014","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00014","url":null,"abstract":"Viterbi decoders are an essential component in many embedded systems used for decoding streams of N data symbols over noisy channels. The decoding process is a sequential process wherein the decoder builds a trellis for N received symbols and then it traverses the trellis back computing the path in the trellis that implies the minimal amount of corrections in the bits of the N received symbols. Several techniques have been developed to increase the amount of parallelism of Viterbi decoders, showing building the trellis can be parallelized however to the selecting the minimal path proved harder to parallelize. In this work, we show that both building the Trellis and computing the minimal path can be parallelized as a sequence of matrix multiplications. This yields a parallel implementation with a linear speedup of order N/P+P where P is any amount of the desired parallelism in the circuit. We implemented a Verilog-generator that for any set of parameters generates an optimized sequential decoder and an optimized parallel decoder. We thus able to verify that the parallel version can obtain linear speedups.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133212968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00019
Ghazanfar Ali, J. Pathrose, H. Kerkhoff
The developments in technology and complexity of many-processor Systems-on-Chips emerge at a very rapid pace as is their introduction in safety-critical applications, for instance the transport sector. The inherent decrease in dependability of these complex nanosystems must be compensated by counter measures. One promising approach is the usage of IJTAG-compatible embedded instruments in and around cores, monitoring the "health" of target processors. It has been anticipated that these instruments will be (primarily) used for reducing the cost of final testing. In case of degradation during life time, however, they can be reused and counteractions like run-time remapping can be carried out. In this paper, the on-line data of two types of embedded instruments will be used for the prognostics, a slack-delay monitor and an IDDX monitor. Their (correlated) data is being fused which enables a more accurate life-time prediction as compared to a single monitor approach. However, the computational requirements for the embedded dependability manager will increase to enable handling embedded instrument data fusion and/or multi-parameter life-time prediction
{"title":"On-Chip Lifetime Prediction for Dependable Many-Processor SoCs Based on Data Fusion","authors":"Ghazanfar Ali, J. Pathrose, H. Kerkhoff","doi":"10.1109/MCSoC2018.2018.00019","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00019","url":null,"abstract":"The developments in technology and complexity of many-processor Systems-on-Chips emerge at a very rapid pace as is their introduction in safety-critical applications, for instance the transport sector. The inherent decrease in dependability of these complex nanosystems must be compensated by counter measures. One promising approach is the usage of IJTAG-compatible embedded instruments in and around cores, monitoring the \"health\" of target processors. It has been anticipated that these instruments will be (primarily) used for reducing the cost of final testing. In case of degradation during life time, however, they can be reused and counteractions like run-time remapping can be carried out. In this paper, the on-line data of two types of embedded instruments will be used for the prognostics, a slack-delay monitor and an IDDX monitor. Their (correlated) data is being fused which enables a more accurate life-time prediction as compared to a single monitor approach. However, the computational requirements for the embedded dependability manager will increase to enable handling embedded instrument data fusion and/or multi-parameter life-time prediction","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125789844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/mcsoc2018.2018.00023
Thi Hue Le Dao, Pham Van Giap, Hoang Van Xiem
{"title":"Adaptive Long-Term Reference Selection for Efficient Scalable Surveillance Video Coding","authors":"Thi Hue Le Dao, Pham Van Giap, Hoang Van Xiem","doi":"10.1109/mcsoc2018.2018.00023","DOIUrl":"https://doi.org/10.1109/mcsoc2018.2018.00023","url":null,"abstract":"","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00027
Phuc-Vinh Nguyen, T. Tran, Phuoc-Loc Diep, Duc-Hung Le
In this paper, a hierarchy low-power design flow has been proposed. Low-power design techniques for digital ASIC design have been implemented with this proposed flow such as clock gating technique at RTL synthesis stage, multi-threshold voltage and power switching technique at back-end stage for power optimization. These low-power flow and techniques are implemented on an open source RTL of OpenSPARC T1 processor core. Firstly, the core is run synthesis and place-and-route without applying any low-power optimization techniques from front-end to back-end stage. Secondly, the core is completed by using the low-power design techniques. This work is implemented on open 90nm CMOS process with the EDA tools.
{"title":"A Low-Power ASIC Implementation of Multi-Core OpenSPARC T1 Processor on 90nm CMOS Process","authors":"Phuc-Vinh Nguyen, T. Tran, Phuoc-Loc Diep, Duc-Hung Le","doi":"10.1109/MCSoC2018.2018.00027","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00027","url":null,"abstract":"In this paper, a hierarchy low-power design flow has been proposed. Low-power design techniques for digital ASIC design have been implemented with this proposed flow such as clock gating technique at RTL synthesis stage, multi-threshold voltage and power switching technique at back-end stage for power optimization. These low-power flow and techniques are implemented on an open source RTL of OpenSPARC T1 processor core. Firstly, the core is run synthesis and place-and-route without applying any low-power optimization techniques from front-end to back-end stage. Secondly, the core is completed by using the low-power design techniques. This work is implemented on open 90nm CMOS process with the EDA tools.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129614217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00033
Leonard Masing, A. Srivatsa, Fabian Kreß, Nidhi Anantharajaiah, A. Herkersdorf, J. Becker
Scalable communication and low latency memory accesses are the deciding factors for future manycore performance. An efficient hardware infrastructure is required, since raw performance must be balanced with area and power constraints. In distributed shared-memory (DSM) architectures, caches help in reducing costly remote accesses but must be kept coherent. To enable scalable coherence in manycore systems, the recently proposed region-based cache coherence defines configurable regions, i.e. cache coherent sub-sections of a manycore architecture. In this paper, a technique for supporting the regionbased cache coherence mechanism by using so called in-NoC circuits (INCs) in a hybrid networks-on-chip is proposed. These circuits are automatically established based on traffic monitoring and traffic analysis to connect nodes (i.e. routers) in the network to enable a shortcut for packets, reducing their latency. The INCs can be used by packets stemming from different sources and targeting different destinations in contrast to traditional end-toend circuits. Depending on the coherence region, our evaluations of several benchmarks show a latency reduction of up to 45% on average in a 4x4 mesh that further increases with the mesh size. The FPGA synthesis of a router from a scientific DSM architecture that was extended with the presented features shows additional costs of up to 31% more LUTs and 20% more Flip Flops.
{"title":"In-NoC Circuits for Low-Latency Cache Coherence in Distributed Shared-Memory Architectures","authors":"Leonard Masing, A. Srivatsa, Fabian Kreß, Nidhi Anantharajaiah, A. Herkersdorf, J. Becker","doi":"10.1109/MCSoC2018.2018.00033","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00033","url":null,"abstract":"Scalable communication and low latency memory accesses are the deciding factors for future manycore performance. An efficient hardware infrastructure is required, since raw performance must be balanced with area and power constraints. In distributed shared-memory (DSM) architectures, caches help in reducing costly remote accesses but must be kept coherent. To enable scalable coherence in manycore systems, the recently proposed region-based cache coherence defines configurable regions, i.e. cache coherent sub-sections of a manycore architecture. In this paper, a technique for supporting the regionbased cache coherence mechanism by using so called in-NoC circuits (INCs) in a hybrid networks-on-chip is proposed. These circuits are automatically established based on traffic monitoring and traffic analysis to connect nodes (i.e. routers) in the network to enable a shortcut for packets, reducing their latency. The INCs can be used by packets stemming from different sources and targeting different destinations in contrast to traditional end-toend circuits. Depending on the coherence region, our evaluations of several benchmarks show a latency reduction of up to 45% on average in a 4x4 mesh that further increases with the mesh size. The FPGA synthesis of a router from a scientific DSM architecture that was extended with the presented features shows additional costs of up to 31% more LUTs and 20% more Flip Flops.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123211535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00040
Moises Urbina, R. Obermaisser
The AUTOSAR standard does not provide an approach for the mapping of its ECU software architecture to a message-based multicore system. In this work we present an analysis of performance and fault containment of a novel TIme-triggered MEssage-based multi-core architecture for AUTOSAR (TIMEA). The TIMEA platform is intended to bring the advantages of network-on-a-chip architectures to the AUTOSAR software, which lead to multiple benefits for fail operational real time systems such as temporal predictability and fault isolation. We introduce a fault hypothesis consisting of multiple fault assumptions and the definition of the fault containment regions and we describe the algorithms for the integration of a multicore monitoring service into the AUTOSAR Basic Software. A set of experiments were carried out to evaluate the performance of the system using an anti-lock braking use case in a simulation scenario under failure occurrences. The obtained results demonstrate how the TIMEA platform remains operational in the presence of failures.
{"title":"Evaluation of Performance and Fault Containment in AUTOSAR Micro-ECUs on a Multi-Core Processor","authors":"Moises Urbina, R. Obermaisser","doi":"10.1109/MCSoC2018.2018.00040","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00040","url":null,"abstract":"The AUTOSAR standard does not provide an approach for the mapping of its ECU software architecture to a message-based multicore system. In this work we present an analysis of performance and fault containment of a novel TIme-triggered MEssage-based multi-core architecture for AUTOSAR (TIMEA). The TIMEA platform is intended to bring the advantages of network-on-a-chip architectures to the AUTOSAR software, which lead to multiple benefits for fail operational real time systems such as temporal predictability and fault isolation. We introduce a fault hypothesis consisting of multiple fault assumptions and the definition of the fault containment regions and we describe the algorithms for the integration of a multicore monitoring service into the AUTOSAR Basic Software. A set of experiments were carried out to evaluate the performance of the system using an anti-lock braking use case in a simulation scenario under failure occurrences. The obtained results demonstrate how the TIMEA platform remains operational in the presence of failures.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115103064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-01DOI: 10.1109/MCSoC2018.2018.00034
Akram Ben Ahmed, Hayate Okuhara, Hiroki Matsutani, M. Koibuchi, H. Amano
Over the past decade, the power consumption has been one of the main design challenges in Network-on-Chips (NoCs) as it significantly defines the performance of a given Chip-Multiprocessor (CMP). Body bias control is one of the solutions that provide an efficient trade-off between leakage power and performance. However, employing such a method is not straightforward since several factors should be taken into consideration, especially when adaptively implemented on-chip. In this paper, we propose a new router design and on-chip body bias control mechanism to adaptively control the body bias voltages supply in ultra low-power NoC systems. With the help of a light-weight monitoring circuit, the proposed router predicts the traffic load at each input-port and accordingly adjusts its pipeline depth in a fine-grained fashion. To satisfy the timing constraints, the router adaptively supplies each one of its input-ports with the appropriate body bias voltages to either boost the performance or to reduce the leakage power at the standby state. The evaluation results, using the SOTB 65nm Fully Depleted Silicon On Insulator (FD-SOI) technology, shows the ability of the proposed router in reducing both dynamic and static energies. When compared to two fixed-pipeline baseline routers (3-stages and 2-stages), the total energy reduction could reach up to 67% and 59%, respectively. At the same time, a reasonable performance tendency can be obtained with less than 6% area overhead.
{"title":"Adaptive Body Bias Control Scheme for Ultra Low-Power Network-on-Chip Systems","authors":"Akram Ben Ahmed, Hayate Okuhara, Hiroki Matsutani, M. Koibuchi, H. Amano","doi":"10.1109/MCSoC2018.2018.00034","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00034","url":null,"abstract":"Over the past decade, the power consumption has been one of the main design challenges in Network-on-Chips (NoCs) as it significantly defines the performance of a given Chip-Multiprocessor (CMP). Body bias control is one of the solutions that provide an efficient trade-off between leakage power and performance. However, employing such a method is not straightforward since several factors should be taken into consideration, especially when adaptively implemented on-chip. In this paper, we propose a new router design and on-chip body bias control mechanism to adaptively control the body bias voltages supply in ultra low-power NoC systems. With the help of a light-weight monitoring circuit, the proposed router predicts the traffic load at each input-port and accordingly adjusts its pipeline depth in a fine-grained fashion. To satisfy the timing constraints, the router adaptively supplies each one of its input-ports with the appropriate body bias voltages to either boost the performance or to reduce the leakage power at the standby state. The evaluation results, using the SOTB 65nm Fully Depleted Silicon On Insulator (FD-SOI) technology, shows the ability of the proposed router in reducing both dynamic and static energies. When compared to two fixed-pipeline baseline routers (3-stages and 2-stages), the total energy reduction could reach up to 67% and 59%, respectively. At the same time, a reasonable performance tendency can be obtained with less than 6% area overhead.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128879139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}