Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00088
N. Parlavantzas, L. Pham, Arnab Sinha, C. Morin
Applications are increasingly being deployed on resources delivered by Infrastructure-as-a-Service (IaaS) cloud providers. A major challenge for application owners is continually managing the application deployment in order to satisfy the performance requirements of application users while reducing the charges paid to IaaS providers. This paper proposes an approach for adaptive application deployment that explicitly considers adaptation costs and benefits in making deployment decisions. The approach builds on the PaaSage open-source platform, thus enabling automatic deployment and execution over multiple clouds. The paper describes experiments in a real cloud testbed that demonstrate that the approach enables multi-cloud adaptation while increasing the total value of the application for its owner.
{"title":"Cost-Effective Reconfiguration for Multi-Cloud Applications","authors":"N. Parlavantzas, L. Pham, Arnab Sinha, C. Morin","doi":"10.1109/PDP2018.2018.00088","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00088","url":null,"abstract":"Applications are increasingly being deployed on resources delivered by Infrastructure-as-a-Service (IaaS) cloud providers. A major challenge for application owners is continually managing the application deployment in order to satisfy the performance requirements of application users while reducing the charges paid to IaaS providers. This paper proposes an approach for adaptive application deployment that explicitly considers adaptation costs and benefits in making deployment decisions. The approach builds on the PaaSage open-source platform, thus enabling automatic deployment and execution over multiple clouds. The paper describes experiments in a real cloud testbed that demonstrate that the approach enables multi-cloud adaptation while increasing the total value of the application for its owner.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121304633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00074
Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero
Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.
{"title":"ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data","authors":"Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero","doi":"10.1109/PDP2018.2018.00074","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00074","url":null,"abstract":"Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114975148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00111
Zhaofei Yu, Tiejun Huang, Jian K. Liu
Numerous neuroscience experiments have suggested that the cognitive process of human brain is realized as probability reasoning and further modeled as Bayesian inference. It is still unclear how Bayesian inference could be implemented by neural underpinnings in the brain. Here we present a novel Bayesian inference algorithm based on importance sampling. By distributed sampling through a deep tree structure with simple and stackable basic motifs for any given neural circuit, one can perform local inference while guaranteeing the accuracy of global inference. We show that these task-independent motifs can be used in parallel for fast inference without iteration and scale-limitation. Furthermore, experimental simulations with a small-scale neural network demonstrate that our distributed sampling-based algorithm, consisting with our theoretical analysis, can approximate Bayesian inference. Taken all together, we provide a proofof- principle to use distributed neural networks to implement Bayesian inference, which gives a road-map for large-scale Bayesian network implementation based on spiking neural networks with computer hardwares, including neuromorphic chips.
{"title":"Implementation of Bayesian Inference In Distributed Neural Networks","authors":"Zhaofei Yu, Tiejun Huang, Jian K. Liu","doi":"10.1109/PDP2018.2018.00111","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00111","url":null,"abstract":"Numerous neuroscience experiments have suggested that the cognitive process of human brain is realized as probability reasoning and further modeled as Bayesian inference. It is still unclear how Bayesian inference could be implemented by neural underpinnings in the brain. Here we present a novel Bayesian inference algorithm based on importance sampling. By distributed sampling through a deep tree structure with simple and stackable basic motifs for any given neural circuit, one can perform local inference while guaranteeing the accuracy of global inference. We show that these task-independent motifs can be used in parallel for fast inference without iteration and scale-limitation. Furthermore, experimental simulations with a small-scale neural network demonstrate that our distributed sampling-based algorithm, consisting with our theoretical analysis, can approximate Bayesian inference. Taken all together, we provide a proofof- principle to use distributed neural networks to implement Bayesian inference, which gives a road-map for large-scale Bayesian network implementation based on spiking neural networks with computer hardwares, including neuromorphic chips.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123819007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00054
Lu Li, C. Kessler
We present two memory optimization techniques which improve the efficiency of data transfer over PCIe bus for GPU-based heterogeneous systems, namely lazy allocation and transfer fusion optimization. Both are based on merging data transfers so that less overhead is incurred, thereby increasing transfer throughput and making accelerator usage profitable also for smaller operand sizes. We provide the design and prototype implementation of the two techniques in CUDA. Microbenchmarking results show that especially for smaller and medium-sized operands significant speedups can be achieved. We also prove that our transfer fusion optimization algorithm is optimal.
{"title":"Lazy Allocation and Transfer Fusion Optimization for GPU-Based Heterogeneous Systems","authors":"Lu Li, C. Kessler","doi":"10.1109/PDP2018.2018.00054","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00054","url":null,"abstract":"We present two memory optimization techniques which improve the efficiency of data transfer over PCIe bus for GPU-based heterogeneous systems, namely lazy allocation and transfer fusion optimization. Both are based on merging data transfers so that less overhead is incurred, thereby increasing transfer throughput and making accelerator usage profitable also for smaller operand sizes. We provide the design and prototype implementation of the two techniques in CUDA. Microbenchmarking results show that especially for smaller and medium-sized operands significant speedups can be achieved. We also prove that our transfer fusion optimization algorithm is optimal.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129279295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00066
C. Quast, Angela Pohl, Biagio Cosenza, B. Juurlink, R. Schwemmer
At the LHC, particles are collided in order to understand how the universe was created. Those collisions are called events and generate large quantities of data, which have to be pre-filtered before they are stored to hard disks. This paper presents a parallel implementation of these algorithms that is specifically designed for the Intel Xeon Phi Knights Landing platform, exploiting its 64 cores and AVX-512 instruction set. It shows that a linear speedup up until approximately 64 threads is attainable when vectorization is used, data is aligned to cache line boundaries, program execution is pinned to MCDRAM, mathematical expressions are transformed to a more efficient equivalent formulation, and OpenMP is used for parallelization. The code was transformed from being compute bound to memory bound. Overall, a speedup of 36.47x was reached while obtaining an error which is smaller than the detector resolution.
{"title":"Accelerating the RICH Particle Detector Algorithm on Intel Xeon Phi","authors":"C. Quast, Angela Pohl, Biagio Cosenza, B. Juurlink, R. Schwemmer","doi":"10.1109/PDP2018.2018.00066","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00066","url":null,"abstract":"At the LHC, particles are collided in order to understand how the universe was created. Those collisions are called events and generate large quantities of data, which have to be pre-filtered before they are stored to hard disks. This paper presents a parallel implementation of these algorithms that is specifically designed for the Intel Xeon Phi Knights Landing platform, exploiting its 64 cores and AVX-512 instruction set. It shows that a linear speedup up until approximately 64 threads is attainable when vectorization is used, data is aligned to cache line boundaries, program execution is pinned to MCDRAM, mathematical expressions are transformed to a more efficient equivalent formulation, and OpenMP is used for parallelization. The code was transformed from being compute bound to memory bound. Overall, a speedup of 36.47x was reached while obtaining an error which is smaller than the detector resolution.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122729208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00028
M. Torquati, Tullio Menga, T. D. Matteis, D. D. Sensi, G. Mencagli
In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at low and moderate rates the latency is higher than at high data rates. To this end, we propose a modification of the polling strategies in the work-stealing CAF scheduler, which can reduce message latency at low and moderate data rates up to two orders of magnitude without compromising the overall throughput and message latency at maximum pressure. The technique proposed uses a lightweight event notification protocol that is general enough to be used used to optimize the runtime of other frameworks experiencing similar issues.
{"title":"Reducing Message Latency and CPU Utilization in the CAF Actor Framework","authors":"M. Torquati, Tullio Menga, T. D. Matteis, D. D. Sensi, G. Mencagli","doi":"10.1109/PDP2018.2018.00028","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00028","url":null,"abstract":"In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at low and moderate rates the latency is higher than at high data rates. To this end, we propose a modification of the polling strategies in the work-stealing CAF scheduler, which can reduce message latency at low and moderate data rates up to two orders of magnitude without compromising the overall throughput and message latency at maximum pressure. The technique proposed uses a lightweight event notification protocol that is general enough to be used used to optimize the runtime of other frameworks experiencing similar issues.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121024272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00037
Hovhannes A. Harutyunyan, Meghrig Terzian
Multicast communication constrained by end-to-end delay and inter-destination delay variation is known as Delay and Delay Variation Bounded Multicast (DVBM). In this paper, we propose a dynamic multi-core multicast approach to solve the DVBM problem. The proposed three-phase algorithm, Multi-core DVBM Trees (MCDVBMT), semi-matches group members to core nodes. The message is disseminated to group members using trees rooted at the designated core nodes. MCDVBMT dynamically reorganizes the rooted trees in response to changes to multicast group members. On average, only 5.2% of the total requests trigger re-executions and 53.6% of the graphs generated by MCDVBMT suffer from re-execution before receiving all dynamic requests.
{"title":"A Dynamic Multi-Core Multicast Approach for Delay and Delay Variation Multicast Routing","authors":"Hovhannes A. Harutyunyan, Meghrig Terzian","doi":"10.1109/PDP2018.2018.00037","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00037","url":null,"abstract":"Multicast communication constrained by end-to-end delay and inter-destination delay variation is known as Delay and Delay Variation Bounded Multicast (DVBM). In this paper, we propose a dynamic multi-core multicast approach to solve the DVBM problem. The proposed three-phase algorithm, Multi-core DVBM Trees (MCDVBMT), semi-matches group members to core nodes. The message is disseminated to group members using trees rooted at the designated core nodes. MCDVBMT dynamically reorganizes the rooted trees in response to changes to multicast group members. On average, only 5.2% of the total requests trigger re-executions and 53.6% of the graphs generated by MCDVBMT suffer from re-execution before receiving all dynamic requests.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127712591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00060
S. McNally, Jason Roche, Simon Caton
The goal of this paper is to ascertain with what accuracy the direction of Bitcoin price in USD can be predicted. The price data is sourced from the Bitcoin Price Index. The task is achieved with varying degrees of success through the implementation of a Bayesian optimised recurrent neural network (RNN) and a Long Short Term Memory (LSTM) network. The LSTM achieves the highest classification accuracy of 52% and a RMSE of 8%. The popular ARIMA model for time series forecasting is implemented as a comparison to the deep learning models. As expected, the non-linear deep learning methods outperform the ARIMA forecast which performs poorly. Finally, both deep learning models are benchmarked on both a GPU and a CPU with the training time on the GPU outperforming the CPU implementation by 67.7%.
{"title":"Predicting the Price of Bitcoin Using Machine Learning","authors":"S. McNally, Jason Roche, Simon Caton","doi":"10.1109/PDP2018.2018.00060","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00060","url":null,"abstract":"The goal of this paper is to ascertain with what accuracy the direction of Bitcoin price in USD can be predicted. The price data is sourced from the Bitcoin Price Index. The task is achieved with varying degrees of success through the implementation of a Bayesian optimised recurrent neural network (RNN) and a Long Short Term Memory (LSTM) network. The LSTM achieves the highest classification accuracy of 52% and a RMSE of 8%. The popular ARIMA model for time series forecasting is implemented as a comparison to the deep learning models. As expected, the non-linear deep learning methods outperform the ARIMA forecast which performs poorly. Finally, both deep learning models are benchmarked on both a GPU and a CPU with the training time on the GPU outperforming the CPU implementation by 67.7%.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133735422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00039
Van Long Tran, É. Renault, Xuan Huyen Do, Viet Hai Ha
Checkpointing-Aided Parallel Execution (CAPE) is a framework that is based on checkpointing technique and serves to automatically translates and execute OpenMP programs on distributed-memory architectures. In some comparisons with MPI, CAPE have demonstrated high-performance and the potential for fully compatibility with OpenMP on distributed-memory systems. However, it should be continued to improve the performance, flexibility, portability and capability. This paper presents the new execution model for CAPE that improves its performance and makes CAPE even more flexible.
{"title":"A New Execution Model for Improving Performance and Flexibility of CAPE","authors":"Van Long Tran, É. Renault, Xuan Huyen Do, Viet Hai Ha","doi":"10.1109/PDP2018.2018.00039","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00039","url":null,"abstract":"Checkpointing-Aided Parallel Execution (CAPE) is a framework that is based on checkpointing technique and serves to automatically translates and execute OpenMP programs on distributed-memory architectures. In some comparisons with MPI, CAPE have demonstrated high-performance and the potential for fully compatibility with OpenMP on distributed-memory systems. However, it should be continued to improve the performance, flexibility, portability and capability. This paper presents the new execution model for CAPE that improves its performance and makes CAPE even more flexible.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134068872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00103
Mahmoud A. Elmohr, A. Eissa, Moamen Ibrahim, Mostafa Khamis, Sameh El-Ashry, A. Shalaby, Mohamed Abdelsalam, M. El-Kharashi
With the increase in the number of cores embedded on a chip; The main challenge for Multiprocessor System-on-Chip (MPSoC) platforms is the interconnection between that massive number of cores. Networks-on-Chip (NoC) was introduced to solve that challenge, by providing a scalable and modular solution for communication between the cores. In this paper, we introduce a configurable MPSoC framework called RVNoC that generates synthesizable RTL that can be used in both ASIC and FPGA implementations. The proposed framework is based on the open source RISC-V Instruction Set Architecture (ISA) and an open source configurable flit-based router for interconnection between cores, with a core network interface of our design to connect each core with its designated router. A benchmarking environment is developed to evaluate variant parameters of the generated MPSoC. Synthesis of a single building block containing a single core without any peripherals, a router, and a core network interface, using 45nm technology, shows an area of 102.34 kilo Gate Equivalents (kGE), a maximum frequency of 250 MHz, and a 9.9 μW/MHz power consumption.
{"title":"RVNoC: A Framework for Generating RISC-V NoC-Based MPSoC","authors":"Mahmoud A. Elmohr, A. Eissa, Moamen Ibrahim, Mostafa Khamis, Sameh El-Ashry, A. Shalaby, Mohamed Abdelsalam, M. El-Kharashi","doi":"10.1109/PDP2018.2018.00103","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00103","url":null,"abstract":"With the increase in the number of cores embedded on a chip; The main challenge for Multiprocessor System-on-Chip (MPSoC) platforms is the interconnection between that massive number of cores. Networks-on-Chip (NoC) was introduced to solve that challenge, by providing a scalable and modular solution for communication between the cores. In this paper, we introduce a configurable MPSoC framework called RVNoC that generates synthesizable RTL that can be used in both ASIC and FPGA implementations. The proposed framework is based on the open source RISC-V Instruction Set Architecture (ISA) and an open source configurable flit-based router for interconnection between cores, with a core network interface of our design to connect each core with its designated router. A benchmarking environment is developed to evaluate variant parameters of the generated MPSoC. Synthesis of a single building block containing a single core without any peripherals, a router, and a core network interface, using 45nm technology, shows an area of 102.34 kilo Gate Equivalents (kGE), a maximum frequency of 250 MHz, and a 9.9 μW/MHz power consumption.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}