首页 > 最新文献

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
Cost-Effective Reconfiguration for Multi-Cloud Applications 多云应用的经济高效的重新配置
N. Parlavantzas, L. Pham, Arnab Sinha, C. Morin
Applications are increasingly being deployed on resources delivered by Infrastructure-as-a-Service (IaaS) cloud providers. A major challenge for application owners is continually managing the application deployment in order to satisfy the performance requirements of application users while reducing the charges paid to IaaS providers. This paper proposes an approach for adaptive application deployment that explicitly considers adaptation costs and benefits in making deployment decisions. The approach builds on the PaaSage open-source platform, thus enabling automatic deployment and execution over multiple clouds. The paper describes experiments in a real cloud testbed that demonstrate that the approach enables multi-cloud adaptation while increasing the total value of the application for its owner.
应用程序越来越多地部署在基础设施即服务(IaaS)云提供商提供的资源上。应用程序所有者面临的一个主要挑战是持续管理应用程序部署,以满足应用程序用户的性能需求,同时减少向IaaS提供商支付的费用。本文提出了一种自适应应用程序部署方法,该方法在做出部署决策时明确考虑了自适应的成本和收益。该方法建立在PaaSage开源平台上,因此可以在多个云上自动部署和执行。本文描述了在一个真实的云测试平台上进行的实验,证明了该方法能够实现多云适应,同时为其所有者增加应用程序的总价值。
{"title":"Cost-Effective Reconfiguration for Multi-Cloud Applications","authors":"N. Parlavantzas, L. Pham, Arnab Sinha, C. Morin","doi":"10.1109/PDP2018.2018.00088","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00088","url":null,"abstract":"Applications are increasingly being deployed on resources delivered by Infrastructure-as-a-Service (IaaS) cloud providers. A major challenge for application owners is continually managing the application deployment in order to satisfy the performance requirements of application users while reducing the charges paid to IaaS providers. This paper proposes an approach for adaptive application deployment that explicitly considers adaptation costs and benefits in making deployment decisions. The approach builds on the PaaSage open-source platform, thus enabling automatic deployment and execution over multiple clouds. The paper describes experiments in a real cloud testbed that demonstrate that the approach enables multi-cloud adaptation while increasing the total value of the application for its owner.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121304633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data parallel normal:非匹配序列数据的高效变体调用管道
Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero
Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.
目前,下一代测序技术在肿瘤领域的临床应用更加接近。事实上,它可以识别在癌症发生、进展和对治疗的抵抗过程中获得的肿瘤特异性突变。与不断发展的测序技术并行,需要新的计算方法来应对将测序数据快速处理成临床相关基因组变异列表的要求。由于来自肿瘤及其匹配的正常样本的测序数据并不总是可用的(未匹配的数据),因此需要一个计算管道来导致调用未匹配数据的变体。尽管存在许多精确和精确的变量调用算法,但仍然缺乏一种有效的方法。在这里,我们提出了一个平行管道(parallel normal),旨在有效地从全外显子组测序数据中识别基因组变异,没有匹配的正常。parallel normal集成了著名的算法,如BWA和GATK,一种新的重复删除工具(DuplicateRemove),以及FreeBayes变体调用算法。FreeBayes的一个重新设计的实现,优化了在现代多核架构上的执行。parallel normal应用于胰腺癌样本的全外显子组测序数据,而不考虑其匹配正常。通过使用匹配的正常样本分析相同数据集的结果并考虑与胰腺癌发生有关的基因,对parallel normal的稳健性进行了测试。我们的管道能够确认使用匹配的正常数据识别的大多数变体。
{"title":"ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data","authors":"Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero","doi":"10.1109/PDP2018.2018.00074","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00074","url":null,"abstract":"Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114975148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Bayesian Inference In Distributed Neural Networks 分布式神经网络中贝叶斯推理的实现
Zhaofei Yu, Tiejun Huang, Jian K. Liu
Numerous neuroscience experiments have suggested that the cognitive process of human brain is realized as probability reasoning and further modeled as Bayesian inference. It is still unclear how Bayesian inference could be implemented by neural underpinnings in the brain. Here we present a novel Bayesian inference algorithm based on importance sampling. By distributed sampling through a deep tree structure with simple and stackable basic motifs for any given neural circuit, one can perform local inference while guaranteeing the accuracy of global inference. We show that these task-independent motifs can be used in parallel for fast inference without iteration and scale-limitation. Furthermore, experimental simulations with a small-scale neural network demonstrate that our distributed sampling-based algorithm, consisting with our theoretical analysis, can approximate Bayesian inference. Taken all together, we provide a proofof- principle to use distributed neural networks to implement Bayesian inference, which gives a road-map for large-scale Bayesian network implementation based on spiking neural networks with computer hardwares, including neuromorphic chips.
大量的神经科学实验表明,人类大脑的认知过程是通过概率推理来实现的,并进一步建模为贝叶斯推理。目前还不清楚贝叶斯推理如何通过大脑中的神经基础来实现。本文提出了一种新的基于重要性抽样的贝叶斯推理算法。对于任意给定的神经回路,通过具有简单可堆叠基本基序的深度树结构进行分布式采样,可以在保证全局推理准确性的同时进行局部推理。我们证明了这些任务无关的基序可以并行地用于快速推理,而不受迭代和规模限制。此外,用小型神经网络进行的实验模拟表明,我们的基于分布式抽样的算法与我们的理论分析相一致,可以近似贝叶斯推理。综上所述,我们提供了使用分布式神经网络实现贝叶斯推理的原理证明,这为基于峰值神经网络与计算机硬件(包括神经形态芯片)的大规模贝叶斯网络实现提供了路线图。
{"title":"Implementation of Bayesian Inference In Distributed Neural Networks","authors":"Zhaofei Yu, Tiejun Huang, Jian K. Liu","doi":"10.1109/PDP2018.2018.00111","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00111","url":null,"abstract":"Numerous neuroscience experiments have suggested that the cognitive process of human brain is realized as probability reasoning and further modeled as Bayesian inference. It is still unclear how Bayesian inference could be implemented by neural underpinnings in the brain. Here we present a novel Bayesian inference algorithm based on importance sampling. By distributed sampling through a deep tree structure with simple and stackable basic motifs for any given neural circuit, one can perform local inference while guaranteeing the accuracy of global inference. We show that these task-independent motifs can be used in parallel for fast inference without iteration and scale-limitation. Furthermore, experimental simulations with a small-scale neural network demonstrate that our distributed sampling-based algorithm, consisting with our theoretical analysis, can approximate Bayesian inference. Taken all together, we provide a proofof- principle to use distributed neural networks to implement Bayesian inference, which gives a road-map for large-scale Bayesian network implementation based on spiking neural networks with computer hardwares, including neuromorphic chips.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123819007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Lazy Allocation and Transfer Fusion Optimization for GPU-Based Heterogeneous Systems 基于gpu的异构系统延迟分配与传输融合优化
Lu Li, C. Kessler
We present two memory optimization techniques which improve the efficiency of data transfer over PCIe bus for GPU-based heterogeneous systems, namely lazy allocation and transfer fusion optimization. Both are based on merging data transfers so that less overhead is incurred, thereby increasing transfer throughput and making accelerator usage profitable also for smaller operand sizes. We provide the design and prototype implementation of the two techniques in CUDA. Microbenchmarking results show that especially for smaller and medium-sized operands significant speedups can be achieved. We also prove that our transfer fusion optimization algorithm is optimal.
针对基于gpu的异构系统,提出了两种提高PCIe总线数据传输效率的内存优化技术,即延迟分配和传输融合优化。两者都基于合并数据传输,因此产生的开销更少,从而增加了传输吞吐量,并且对于较小的操作数大小,使用加速器也是有利可图的。我们在CUDA中提供了这两种技术的设计和原型实现。微基准测试结果表明,特别是对于中小型操作数,可以实现显着的加速。我们还证明了我们的迁移融合优化算法是最优的。
{"title":"Lazy Allocation and Transfer Fusion Optimization for GPU-Based Heterogeneous Systems","authors":"Lu Li, C. Kessler","doi":"10.1109/PDP2018.2018.00054","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00054","url":null,"abstract":"We present two memory optimization techniques which improve the efficiency of data transfer over PCIe bus for GPU-based heterogeneous systems, namely lazy allocation and transfer fusion optimization. Both are based on merging data transfers so that less overhead is incurred, thereby increasing transfer throughput and making accelerator usage profitable also for smaller operand sizes. We provide the design and prototype implementation of the two techniques in CUDA. Microbenchmarking results show that especially for smaller and medium-sized operands significant speedups can be achieved. We also prove that our transfer fusion optimization algorithm is optimal.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129279295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Accelerating the RICH Particle Detector Algorithm on Intel Xeon Phi 在Intel Xeon Phi处理器上加速RICH粒子检测器算法
C. Quast, Angela Pohl, Biagio Cosenza, B. Juurlink, R. Schwemmer
At the LHC, particles are collided in order to understand how the universe was created. Those collisions are called events and generate large quantities of data, which have to be pre-filtered before they are stored to hard disks. This paper presents a parallel implementation of these algorithms that is specifically designed for the Intel Xeon Phi Knights Landing platform, exploiting its 64 cores and AVX-512 instruction set. It shows that a linear speedup up until approximately 64 threads is attainable when vectorization is used, data is aligned to cache line boundaries, program execution is pinned to MCDRAM, mathematical expressions are transformed to a more efficient equivalent formulation, and OpenMP is used for parallelization. The code was transformed from being compute bound to memory bound. Overall, a speedup of 36.47x was reached while obtaining an error which is smaller than the detector resolution.
在大型强子对撞机中,粒子相互碰撞以了解宇宙是如何形成的。这些碰撞被称为事件,并产生大量数据,这些数据必须在存储到硬盘之前进行预先过滤。本文提出了一个专为Intel Xeon Phi Knights Landing平台设计的并行实现这些算法,利用其64核和AVX-512指令集。它表明,当使用向量化,数据与缓存线边界对齐,程序执行固定在MCDRAM上,数学表达式转换为更有效的等效公式,并使用OpenMP进行并行化时,可以实现线性加速,直到大约64个线程。将代码从计算约束转换为内存约束。总的来说,在获得小于检测器分辨率的误差的同时,达到了36.47倍的加速。
{"title":"Accelerating the RICH Particle Detector Algorithm on Intel Xeon Phi","authors":"C. Quast, Angela Pohl, Biagio Cosenza, B. Juurlink, R. Schwemmer","doi":"10.1109/PDP2018.2018.00066","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00066","url":null,"abstract":"At the LHC, particles are collided in order to understand how the universe was created. Those collisions are called events and generate large quantities of data, which have to be pre-filtered before they are stored to hard disks. This paper presents a parallel implementation of these algorithms that is specifically designed for the Intel Xeon Phi Knights Landing platform, exploiting its 64 cores and AVX-512 instruction set. It shows that a linear speedup up until approximately 64 threads is attainable when vectorization is used, data is aligned to cache line boundaries, program execution is pinned to MCDRAM, mathematical expressions are transformed to a more efficient equivalent formulation, and OpenMP is used for parallelization. The code was transformed from being compute bound to memory bound. Overall, a speedup of 36.47x was reached while obtaining an error which is smaller than the detector resolution.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122729208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Message Latency and CPU Utilization in the CAF Actor Framework 减少CAF Actor框架中的消息延迟和CPU利用率
M. Torquati, Tullio Menga, T. D. Matteis, D. D. Sensi, G. Mencagli
In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at low and moderate rates the latency is higher than at high data rates. To this end, we propose a modification of the polling strategies in the work-stealing CAF scheduler, which can reduce message latency at low and moderate data rates up to two orders of magnitude without compromising the overall throughput and message latency at maximum pressure. The technique proposed uses a lightweight event notification protocol that is general enough to be used used to optimize the runtime of other frameworks experiencing similar issues.
在这项工作中,我们考虑了c++ Actor框架(CAF),这是一个最近的提议,它重新激起了人们对使用c++中的Actor编程模型构建并发和分布式应用程序的兴趣。CAF针对高吞吐量计算进行了优化,而参与者之间的消息延迟很大程度上受到消息数据速率的影响:在低和中等速率下,延迟高于高数据速率。为此,我们提出了对窃取工作的CAF调度器中的轮询策略的修改,该策略可以将低和中等数据速率下的消息延迟减少两个数量级,而不会影响最大压力下的总体吞吐量和消息延迟。所提出的技术使用轻量级事件通知协议,该协议足够通用,可用于优化遇到类似问题的其他框架的运行时。
{"title":"Reducing Message Latency and CPU Utilization in the CAF Actor Framework","authors":"M. Torquati, Tullio Menga, T. D. Matteis, D. D. Sensi, G. Mencagli","doi":"10.1109/PDP2018.2018.00028","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00028","url":null,"abstract":"In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at low and moderate rates the latency is higher than at high data rates. To this end, we propose a modification of the polling strategies in the work-stealing CAF scheduler, which can reduce message latency at low and moderate data rates up to two orders of magnitude without compromising the overall throughput and message latency at maximum pressure. The technique proposed uses a lightweight event notification protocol that is general enough to be used used to optimize the runtime of other frameworks experiencing similar issues.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121024272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Dynamic Multi-Core Multicast Approach for Delay and Delay Variation Multicast Routing 时延和时延变化组播路由的动态多核组播方法
Hovhannes A. Harutyunyan, Meghrig Terzian
Multicast communication constrained by end-to-end delay and inter-destination delay variation is known as Delay and Delay Variation Bounded Multicast (DVBM). In this paper, we propose a dynamic multi-core multicast approach to solve the DVBM problem. The proposed three-phase algorithm, Multi-core DVBM Trees (MCDVBMT), semi-matches group members to core nodes. The message is disseminated to group members using trees rooted at the designated core nodes. MCDVBMT dynamically reorganizes the rooted trees in response to changes to multicast group members. On average, only 5.2% of the total requests trigger re-executions and 53.6% of the graphs generated by MCDVBMT suffer from re-execution before receiving all dynamic requests.
受端到端时延和目的地间时延变化约束的组播通信被称为时延和时延变化有限组播(DVBM)。本文提出了一种动态多核组播方法来解决DVBM问题。提出的多核DVBM树(MCDVBMT)三阶段算法将组成员与核心节点进行半匹配。消息使用扎根于指定核心节点的树传播给组成员。MCDVBMT根据组播组成员的变化动态地重新组织根树。平均而言,只有5.2%的总请求触发重新执行,53.6%的MCDVBMT生成的图在接收到所有动态请求之前会重新执行。
{"title":"A Dynamic Multi-Core Multicast Approach for Delay and Delay Variation Multicast Routing","authors":"Hovhannes A. Harutyunyan, Meghrig Terzian","doi":"10.1109/PDP2018.2018.00037","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00037","url":null,"abstract":"Multicast communication constrained by end-to-end delay and inter-destination delay variation is known as Delay and Delay Variation Bounded Multicast (DVBM). In this paper, we propose a dynamic multi-core multicast approach to solve the DVBM problem. The proposed three-phase algorithm, Multi-core DVBM Trees (MCDVBMT), semi-matches group members to core nodes. The message is disseminated to group members using trees rooted at the designated core nodes. MCDVBMT dynamically reorganizes the rooted trees in response to changes to multicast group members. On average, only 5.2% of the total requests trigger re-executions and 53.6% of the graphs generated by MCDVBMT suffer from re-execution before receiving all dynamic requests.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127712591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting the Price of Bitcoin Using Machine Learning 使用机器学习预测比特币的价格
S. McNally, Jason Roche, Simon Caton
The goal of this paper is to ascertain with what accuracy the direction of Bitcoin price in USD can be predicted. The price data is sourced from the Bitcoin Price Index. The task is achieved with varying degrees of success through the implementation of a Bayesian optimised recurrent neural network (RNN) and a Long Short Term Memory (LSTM) network. The LSTM achieves the highest classification accuracy of 52% and a RMSE of 8%. The popular ARIMA model for time series forecasting is implemented as a comparison to the deep learning models. As expected, the non-linear deep learning methods outperform the ARIMA forecast which performs poorly. Finally, both deep learning models are benchmarked on both a GPU and a CPU with the training time on the GPU outperforming the CPU implementation by 67.7%.
本文的目的是确定比特币以美元计价的价格方向可以预测的准确性。价格数据来源于比特币价格指数。通过贝叶斯优化循环神经网络(RNN)和长短期记忆(LSTM)网络的实现,该任务取得了不同程度的成功。LSTM的最高分类准确率为52%,RMSE为8%。常用的ARIMA时间序列预测模型与深度学习模型进行了比较。正如预期的那样,非线性深度学习方法优于表现不佳的ARIMA预测。最后,两种深度学习模型都在GPU和CPU上进行了基准测试,GPU上的训练时间比CPU实现的训练时间高出67.7%。
{"title":"Predicting the Price of Bitcoin Using Machine Learning","authors":"S. McNally, Jason Roche, Simon Caton","doi":"10.1109/PDP2018.2018.00060","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00060","url":null,"abstract":"The goal of this paper is to ascertain with what accuracy the direction of Bitcoin price in USD can be predicted. The price data is sourced from the Bitcoin Price Index. The task is achieved with varying degrees of success through the implementation of a Bayesian optimised recurrent neural network (RNN) and a Long Short Term Memory (LSTM) network. The LSTM achieves the highest classification accuracy of 52% and a RMSE of 8%. The popular ARIMA model for time series forecasting is implemented as a comparison to the deep learning models. As expected, the non-linear deep learning methods outperform the ARIMA forecast which performs poorly. Finally, both deep learning models are benchmarked on both a GPU and a CPU with the training time on the GPU outperforming the CPU implementation by 67.7%.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133735422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 415
A New Execution Model for Improving Performance and Flexibility of CAPE 一种提高CAPE性能和灵活性的新执行模型
Van Long Tran, É. Renault, Xuan Huyen Do, Viet Hai Ha
Checkpointing-Aided Parallel Execution (CAPE) is a framework that is based on checkpointing technique and serves to automatically translates and execute OpenMP programs on distributed-memory architectures. In some comparisons with MPI, CAPE have demonstrated high-performance and the potential for fully compatibility with OpenMP on distributed-memory systems. However, it should be continued to improve the performance, flexibility, portability and capability. This paper presents the new execution model for CAPE that improves its performance and makes CAPE even more flexible.
检查点辅助并行执行(CAPE)是一个基于检查点技术的框架,用于在分布式内存体系结构上自动转换和执行OpenMP程序。在与MPI的一些比较中,CAPE展示了在分布式内存系统上与OpenMP完全兼容的高性能和潜力。但是,应该继续改进性能、灵活性、可移植性和能力。本文提出了一种新的CAPE执行模型,提高了CAPE的性能,使其更加灵活。
{"title":"A New Execution Model for Improving Performance and Flexibility of CAPE","authors":"Van Long Tran, É. Renault, Xuan Huyen Do, Viet Hai Ha","doi":"10.1109/PDP2018.2018.00039","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00039","url":null,"abstract":"Checkpointing-Aided Parallel Execution (CAPE) is a framework that is based on checkpointing technique and serves to automatically translates and execute OpenMP programs on distributed-memory architectures. In some comparisons with MPI, CAPE have demonstrated high-performance and the potential for fully compatibility with OpenMP on distributed-memory systems. However, it should be continued to improve the performance, flexibility, portability and capability. This paper presents the new execution model for CAPE that improves its performance and makes CAPE even more flexible.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134068872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RVNoC: A Framework for Generating RISC-V NoC-Based MPSoC RVNoC:生成基于RISC-V noc的MPSoC的框架
Mahmoud A. Elmohr, A. Eissa, Moamen Ibrahim, Mostafa Khamis, Sameh El-Ashry, A. Shalaby, Mohamed Abdelsalam, M. El-Kharashi
With the increase in the number of cores embedded on a chip; The main challenge for Multiprocessor System-on-Chip (MPSoC) platforms is the interconnection between that massive number of cores. Networks-on-Chip (NoC) was introduced to solve that challenge, by providing a scalable and modular solution for communication between the cores. In this paper, we introduce a configurable MPSoC framework called RVNoC that generates synthesizable RTL that can be used in both ASIC and FPGA implementations. The proposed framework is based on the open source RISC-V Instruction Set Architecture (ISA) and an open source configurable flit-based router for interconnection between cores, with a core network interface of our design to connect each core with its designated router. A benchmarking environment is developed to evaluate variant parameters of the generated MPSoC. Synthesis of a single building block containing a single core without any peripherals, a router, and a core network interface, using 45nm technology, shows an area of 102.34 kilo Gate Equivalents (kGE), a maximum frequency of 250 MHz, and a 9.9 μW/MHz power consumption.
随着芯片上嵌入的核心数量的增加;多处理器片上系统(MPSoC)平台面临的主要挑战是大量内核之间的互连。通过为内核之间的通信提供可扩展的模块化解决方案,引入了片上网络(NoC)来解决这一挑战。在本文中,我们介绍了一个可配置的MPSoC框架,称为RVNoC,它生成可合成的RTL,可用于ASIC和FPGA实现。提出的框架基于开源的RISC-V指令集架构(ISA)和开源的可配置基于flit的路由器,用于内核之间的互连,并使用我们设计的核心网络接口将每个内核与其指定的路由器连接起来。开发了一个基准测试环境来评估生成的MPSoC的各种参数。采用45nm工艺合成的单内核、路由器和核心网络接口,面积为102.34 kGE,最大频率为250 MHz,功耗为9.9 μW/MHz。
{"title":"RVNoC: A Framework for Generating RISC-V NoC-Based MPSoC","authors":"Mahmoud A. Elmohr, A. Eissa, Moamen Ibrahim, Mostafa Khamis, Sameh El-Ashry, A. Shalaby, Mohamed Abdelsalam, M. El-Kharashi","doi":"10.1109/PDP2018.2018.00103","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00103","url":null,"abstract":"With the increase in the number of cores embedded on a chip; The main challenge for Multiprocessor System-on-Chip (MPSoC) platforms is the interconnection between that massive number of cores. Networks-on-Chip (NoC) was introduced to solve that challenge, by providing a scalable and modular solution for communication between the cores. In this paper, we introduce a configurable MPSoC framework called RVNoC that generates synthesizable RTL that can be used in both ASIC and FPGA implementations. The proposed framework is based on the open source RISC-V Instruction Set Architecture (ISA) and an open source configurable flit-based router for interconnection between cores, with a core network interface of our design to connect each core with its designated router. A benchmarking environment is developed to evaluate variant parameters of the generated MPSoC. Synthesis of a single building block containing a single core without any peripherals, a router, and a core network interface, using 45nm technology, shows an area of 102.34 kilo Gate Equivalents (kGE), a maximum frequency of 250 MHz, and a 9.9 μW/MHz power consumption.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1