首页 > 最新文献

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
The Multi-Layer Graph Based Technique for Proactive Automatic Response Against Cyber Attacks 基于多层图的网络攻击主动自动响应技术
E. Doynikova, Igor Kotenko
The paper evolves an approach for proactive automatic cyber security incident response. The approach is based on usage of data from open sources, analytical modeling and a hierarchical integrated set of heterogeneous security metrics. The paper outlines the features of the analytical models that are crucial for countermeasure selection. It determines a set of security metrics for countermeasure selection. The algorithms that implement the suggested multi-layer countermeasure selection technique are specified. Introduction of the layers allows getting the result at any time with the maximum accuracy depending on the available data. The experiments that demonstrate the efficiency of the suggested technique are outlined.
本文提出了一种主动自动响应网络安全事件的方法。该方法基于对开源数据的使用、分析建模和异构安全度量的分层集成集。本文概述了对对策选择至关重要的分析模型的特点。它为对策选择确定了一组安全度量。给出了实现多层对抗选择技术的具体算法。层的引入允许根据可用数据在任何时间以最大的精度获得结果。通过实验证明了该方法的有效性。
{"title":"The Multi-Layer Graph Based Technique for Proactive Automatic Response Against Cyber Attacks","authors":"E. Doynikova, Igor Kotenko","doi":"10.1109/PDP2018.2018.00081","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00081","url":null,"abstract":"The paper evolves an approach for proactive automatic cyber security incident response. The approach is based on usage of data from open sources, analytical modeling and a hierarchical integrated set of heterogeneous security metrics. The paper outlines the features of the analytical models that are crucial for countermeasure selection. It determines a set of security metrics for countermeasure selection. The algorithms that implement the suggested multi-layer countermeasure selection technique are specified. Introduction of the layers allows getting the result at any time with the maximum accuracy depending on the available data. The experiments that demonstrate the efficiency of the suggested technique are outlined.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116208705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High-Resolution Numerical Relativity Simulations of Spinning Binary Neutron Star Mergers 旋转双中子星并合的高分辨率数值相对论模拟
T. Dietrich, S. Bernuzzi, B. Brügmann, W. Tichy
The recent detection of gravitational waves and electromagnetic counterparts emitted during and after the collision of two neutron stars marks a breakthrough in the field of multi-messenger astronomy. Numerical relativity simulations are the only tool to describe the binary's merger dynamics in the regime when speeds are largest and gravity is strongest. In this work we report state-of-the-art binary neutron star simulations for irrotational (non-spinning) and spinning configurations. The main use of these simulations is to model the gravitational-wave signal. Key numerical requirements are the understanding of the convergence properties of the numerical data and a detailed error budget. The simulations have been performed on different HPC clusters, they use multiple grid resolutions, and are based on eccentricity reduced quasi-circular initial data. We obtain convergent waveforms with phase errors of 0.5-1.5rad accumulated over ~12 orbits to merger. The waveforms have been used for the construction of a phenomenological waveform model which has been applied for the analysis of the recent binary neutron star detection. Additionally, we show that the data can also be used to test other state-of-the-art semi-analytical waveform models.
最近对两颗中子星碰撞期间和之后发射的引力波和电磁对应波的探测标志着多信使天文学领域的一个突破。在速度最大、引力最强的情况下,数值相对论模拟是描述双星合并动力学的唯一工具。在这项工作中,我们报告了最先进的双中子星模拟无旋转(非自旋)和自旋构型。这些模拟的主要用途是模拟引力波信号。关键的数值要求是理解数值数据的收敛特性和详细的误差预算。在不同的HPC集群上进行了模拟,它们使用了多种网格分辨率,并基于减少偏心的准圆初始数据。我们得到了相位误差在0.5 ~ 1.5rad的收敛波形,累积了约12个轨道进行合并。这些波形已用于构建现象学波形模型,该模型已用于分析最近的双中子星探测。此外,我们表明数据也可用于测试其他最先进的半分析波形模型。
{"title":"High-Resolution Numerical Relativity Simulations of Spinning Binary Neutron Star Mergers","authors":"T. Dietrich, S. Bernuzzi, B. Brügmann, W. Tichy","doi":"10.1109/PDP2018.2018.00113","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00113","url":null,"abstract":"The recent detection of gravitational waves and electromagnetic counterparts emitted during and after the collision of two neutron stars marks a breakthrough in the field of multi-messenger astronomy. Numerical relativity simulations are the only tool to describe the binary's merger dynamics in the regime when speeds are largest and gravity is strongest. In this work we report state-of-the-art binary neutron star simulations for irrotational (non-spinning) and spinning configurations. The main use of these simulations is to model the gravitational-wave signal. Key numerical requirements are the understanding of the convergence properties of the numerical data and a detailed error budget. The simulations have been performed on different HPC clusters, they use multiple grid resolutions, and are based on eccentricity reduced quasi-circular initial data. We obtain convergent waveforms with phase errors of 0.5-1.5rad accumulated over ~12 orbits to merger. The waveforms have been used for the construction of a phenomenological waveform model which has been applied for the analysis of the recent binary neutron star detection. Additionally, we show that the data can also be used to test other state-of-the-art semi-analytical waveform models.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127904667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
GPU-Accelerated Differential Dependency Network Analysis gpu加速差分依赖网络分析
G. Speyer, Juan Rodriguez, T. Bencomo, Seungchan Kim
EDDY (Evaluation of Differential DependencY) interrogates transcriptomic data to identify differential genetic dependencies within a biological pathway. Through its probabilistic framework with resampling and permutation, aided by the incorporation of annotated gene sets, EDDY demonstrated superior sensitivity to other methods. However, this statistical rigor incurs considerable computational cost, limiting its application to larger datasets. The ample and independent computation coupled with manageable memory footprint positioned EDDY as a strong candidate for graphical processing unit (GPU) implementation. Custom kernels decompose the independence test loop, network construction, network enumeration, and Bayesian network scoring to accelerate the computation. GPU-accelerated EDDY consistently exhibits two orders of magnitude in performance enhancement, allowing the statistical rigor of the EDDY algorithm to be applied to larger datasets.
艾迪(差异依赖评估)询问转录组数据,以确定生物途径中的差异遗传依赖。通过其重新采样和排列的概率框架,在加入注释基因集的帮助下,EDDY显示出优于其他方法的敏感性。然而,这种统计严谨性带来了相当大的计算成本,限制了它在更大数据集上的应用。充足和独立的计算加上可管理的内存占用使EDDY成为图形处理单元(GPU)实现的有力候选。自定义内核分解了独立性测试循环、网络构造、网络枚举和贝叶斯网络评分,以加快计算速度。gpu加速的EDDY始终表现出两个数量级的性能增强,允许EDDY算法的统计严谨性应用于更大的数据集。
{"title":"GPU-Accelerated Differential Dependency Network Analysis","authors":"G. Speyer, Juan Rodriguez, T. Bencomo, Seungchan Kim","doi":"10.1109/PDP2018.2018.00072","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00072","url":null,"abstract":"EDDY (Evaluation of Differential DependencY) interrogates transcriptomic data to identify differential genetic dependencies within a biological pathway. Through its probabilistic framework with resampling and permutation, aided by the incorporation of annotated gene sets, EDDY demonstrated superior sensitivity to other methods. However, this statistical rigor incurs considerable computational cost, limiting its application to larger datasets. The ample and independent computation coupled with manageable memory footprint positioned EDDY as a strong candidate for graphical processing unit (GPU) implementation. Custom kernels decompose the independence test loop, network construction, network enumeration, and Bayesian network scoring to accelerate the computation. GPU-accelerated EDDY consistently exhibits two orders of magnitude in performance enhancement, allowing the statistical rigor of the EDDY algorithm to be applied to larger datasets.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125483686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ParallelHashClone: A Parallel Implementation of HashClone Suite for Clonality Assessment from NGS Data ParallelHashClone:用于NGS数据克隆性评估的HashClone套件的并行实现
G. Romano, E. Genuardi, R. Calogero, S. Ferrero
In the last years, B/T cell clonality assessment and Minimal Residual Disease (MRD) monitoring acquired a strong prediction value in the therapy response evaluation of haematologic B disorders, improving patients outcome prediction. Polymerase Chain Reaction (PCR) based methods are the most standardized and widely used techniques, allowing a risk stratification in a variable proportion of patients, depending on the analyzed disease. Since its recently introduction, Next Generation Sequencing (NGS) technology could increase the number of patients with a traceable disease during the clinical course. This issue is strictly associated with an appropriate computational analysis of the huge volume of complex data obtained by NGS. In this context, recently, we presented an innovative bioinformatics approach, called HashClone, an easy-to-use and reliable bioinformatics tool that simultane- ously provides clonality assessment and MRD detection over time in patients affected by Mantle Cell Lymphoma (MCL). Actually, HashClone original strategy is organized in three steps that provide the simultaneous analysis of a set of samples reads returning to the corresponding clonotypes list, in which each clone is featured by frequency reads and aligned target nomenclature notification with respect to the reference database [1]. HashClone is composed by four C++ applications combined to implement B-cells clonality assessment in patient's samples. Since its successful preliminary application, in this paper, we present ParallelHashClone, an improved version with a parallel implementation of HashClone suite. In detail, the parallelization of this two applications allows to analyze more efficiently the samples from the same patient in parallel. Moreover we integrated ParallelHashClone in a Docker container platform that allows to easily install and run the application since the Docker packages ParallelHashClone with all its dependencies and libraries. We tested ParallelHashClone version for four MCL-NGS data analysis, showing comparable performances with respect to the original HashClone version in B-lymphoprolipherative molecular clonality assessment.
近年来,B/T细胞克隆性评估和微小残留病(MRD)监测在血液学B疾病的治疗反应评估中获得了很强的预测价值,提高了患者预后预测。基于聚合酶链反应(PCR)的方法是最标准化和广泛使用的技术,根据所分析的疾病,允许在可变比例的患者中进行风险分层。新一代测序(NGS)技术最近被引入,可以在临床过程中增加可追溯疾病的患者数量。这个问题与对NGS获得的大量复杂数据进行适当的计算分析密切相关。在这种背景下,最近,我们提出了一种创新的生物信息学方法,称为HashClone,这是一种易于使用且可靠的生物信息学工具,可以同时提供套细胞淋巴瘤(MCL)患者的克隆性评估和MRD检测。实际上,HashClone原始策略分为三个步骤,提供对一组返回相应克隆类型列表的样本读取的同时分析,其中每个克隆具有频率读取和相对于参考数据库的对齐目标命名通知[1]。HashClone由四个c++应用程序组合而成,用于在患者样本中实现b细胞克隆性评估。由于它的初步应用成功,在本文中,我们提出了ParallelHashClone,这是一个改进版本,具有HashClone套件的并行实现。详细地说,这两个应用程序的并行化允许更有效地分析来自同一患者的并行样本。此外,我们将ParallelHashClone集成在Docker容器平台中,这样可以轻松地安装和运行应用程序,因为Docker将ParallelHashClone与其所有依赖项和库打包在一起。我们在4个MCL-NGS数据分析中测试了ParallelHashClone版本,在b淋巴增生性分子克隆评估中显示出与原始HashClone版本相当的性能。
{"title":"ParallelHashClone: A Parallel Implementation of HashClone Suite for Clonality Assessment from NGS Data","authors":"G. Romano, E. Genuardi, R. Calogero, S. Ferrero","doi":"10.1109/PDP2018.2018.00073","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00073","url":null,"abstract":"In the last years, B/T cell clonality assessment and Minimal Residual Disease (MRD) monitoring acquired a strong prediction value in the therapy response evaluation of haematologic B disorders, improving patients outcome prediction. Polymerase Chain Reaction (PCR) based methods are the most standardized and widely used techniques, allowing a risk stratification in a variable proportion of patients, depending on the analyzed disease. Since its recently introduction, Next Generation Sequencing (NGS) technology could increase the number of patients with a traceable disease during the clinical course. This issue is strictly associated with an appropriate computational analysis of the huge volume of complex data obtained by NGS. In this context, recently, we presented an innovative bioinformatics approach, called HashClone, an easy-to-use and reliable bioinformatics tool that simultane- ously provides clonality assessment and MRD detection over time in patients affected by Mantle Cell Lymphoma (MCL). Actually, HashClone original strategy is organized in three steps that provide the simultaneous analysis of a set of samples reads returning to the corresponding clonotypes list, in which each clone is featured by frequency reads and aligned target nomenclature notification with respect to the reference database [1]. HashClone is composed by four C++ applications combined to implement B-cells clonality assessment in patient's samples. Since its successful preliminary application, in this paper, we present ParallelHashClone, an improved version with a parallel implementation of HashClone suite. In detail, the parallelization of this two applications allows to analyze more efficiently the samples from the same patient in parallel. Moreover we integrated ParallelHashClone in a Docker container platform that allows to easily install and run the application since the Docker packages ParallelHashClone with all its dependencies and libraries. We tested ParallelHashClone version for four MCL-NGS data analysis, showing comparable performances with respect to the original HashClone version in B-lymphoprolipherative molecular clonality assessment.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116423187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SAWS: Simple and Adaptive Warp Scheduling for Improved Performance in Throughput Processors 简单和自适应Warp调度提高吞吐量处理器的性能
Francisco Muñoz-Martínez, M. Acacio
In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a particular type of applications, SAWS considers several simple scheduling algorithms and tries to use the one that best fits each application or phase within an application. Through detailed simulations we demonstrate that a practical implementation of SAWS can obtain IPC values that closely match the best scheduling algorithm in each case.
在这项工作中,我们通过提出SAWS(简单和自适应warp scheduler)来解决为吞吐量处理器设计高效warp scheduler的挑战。与之前针对特定类型应用程序的方法不同,SAWS考虑了几种简单的调度算法,并尝试使用最适合每个应用程序或应用程序中的每个阶段的算法。通过详细的仿真,我们证明了SAWS的实际实现可以在每种情况下获得与最佳调度算法密切匹配的IPC值。
{"title":"SAWS: Simple and Adaptive Warp Scheduling for Improved Performance in Throughput Processors","authors":"Francisco Muñoz-Martínez, M. Acacio","doi":"10.1109/PDP2018.2018.00061","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00061","url":null,"abstract":"In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a particular type of applications, SAWS considers several simple scheduling algorithms and tries to use the one that best fits each application or phase within an application. Through detailed simulations we demonstrate that a practical implementation of SAWS can obtain IPC values that closely match the best scheduling algorithm in each case.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132282101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Saving Energy for Cloud Applications in Mobile Devices Using Nearby Resources 为使用附近资源的移动设备中的云应用节能
Anas Toma, Alexander Starinow, J. E. Lenssen, Jian-Jia Chen
In this paper, we present a middleware to save energy in mobile computing devices that offload tasks to a remote server in the cloud. Saving energy in these devices is very important to prolong the battery life and avoid overheating. The middleware uses an available nearby device called auxiliary server either as a surrogate for the remote one, or as a proxy to pass the data between the mobile device and the remote server. The main idea is to reduce the energy consumption of the communication with the remote server by using a high-speed or a low-power local connection with the auxiliary server instead. The paper also analyzes when it is beneficial to use the auxiliary server based on the response time from the remote server and the bandwidth of the remote connection. The proposed middleware is evaluated using different benchmarks, including commonly used applications in mobile devices, and simulations. Furthermore, it is compared to state-of-the art approaches in this area. The experiments show that The middleware is energy-efficient especially when the bandwidth of the remote communication is relatively low or the server is overloaded.
在本文中,我们提出了一种中间件,可以在移动计算设备中节省能源,将任务卸载到云中的远程服务器。在这些设备中节约能源对于延长电池寿命和避免过热非常重要。中间件使用附近可用的称为辅助服务器的设备作为远程服务器的代理,或者作为在移动设备和远程服务器之间传递数据的代理。其主要思想是通过使用与辅助服务器的高速或低功耗本地连接来减少与远程服务器通信的能耗。根据远程服务器的响应时间和远程连接的带宽,分析了何时使用辅助服务器是有利的。建议的中间件使用不同的基准进行评估,包括移动设备中常用的应用程序和模拟。此外,还将其与该领域最先进的方法进行了比较。实验结果表明,该中间件在远程通信带宽较低或服务器过载的情况下具有较好的节能效果。
{"title":"Saving Energy for Cloud Applications in Mobile Devices Using Nearby Resources","authors":"Anas Toma, Alexander Starinow, J. E. Lenssen, Jian-Jia Chen","doi":"10.1109/PDP2018.2018.00091","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00091","url":null,"abstract":"In this paper, we present a middleware to save energy in mobile computing devices that offload tasks to a remote server in the cloud. Saving energy in these devices is very important to prolong the battery life and avoid overheating. The middleware uses an available nearby device called auxiliary server either as a surrogate for the remote one, or as a proxy to pass the data between the mobile device and the remote server. The main idea is to reduce the energy consumption of the communication with the remote server by using a high-speed or a low-power local connection with the auxiliary server instead. The paper also analyzes when it is beneficial to use the auxiliary server based on the response time from the remote server and the bandwidth of the remote connection. The proposed middleware is evaluated using different benchmarks, including commonly used applications in mobile devices, and simulations. Furthermore, it is compared to state-of-the art approaches in this area. The experiments show that The middleware is energy-efficient especially when the bandwidth of the remote communication is relatively low or the server is overloaded.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114842985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Task-Based Parallelism for Parallel Discrete Event Simulation 开发基于任务的并行离散事件模拟
Yizhuo Wang, Zhiwei Gao, Weixing Ji, Han Zhang, Duzheng Qing
Today large-scale simulation applications are becoming common in research and industry. A significant fraction of them run on multi-core clusters. Current parallel simulation kernels use multi-process and multi-thread to exploit inter-node parallelism and intra-node parallelism on multi-core clusters. We exploit task-base parallelism in parallel discrete event simulation (PDES) kernels, which is more fine-grained than thread-level and process-level parallelism. In our system, every simulation event is wrapped to a task. Work-stealing task scheduling scheme is applied to achieve dynamic load balancing among the multi-cores, and a graph partitioning approach is applied in partitioning simulation entities among the cluster nodes. Experimental results show that our PDES kernel outperforms existing PDES kernels by fully exploiting task parallelism.
今天,大规模仿真应用在研究和工业中变得越来越普遍。其中很大一部分运行在多核集群上。当前的并行仿真内核采用多进程和多线程的方式来利用多核集群的节点间并行性和节点内并行性。我们在并行离散事件模拟(PDES)内核中利用基于任务的并行性,它比线程级和进程级并行性更细粒度。在我们的系统中,每个模拟事件都包装成一个任务。采用偷工调度方案实现多核间的动态负载均衡,采用图分区方法对集群节点间的仿真实体进行分区。实验结果表明,我们的PDES内核通过充分利用任务并行性优于现有的PDES内核。
{"title":"Exploiting Task-Based Parallelism for Parallel Discrete Event Simulation","authors":"Yizhuo Wang, Zhiwei Gao, Weixing Ji, Han Zhang, Duzheng Qing","doi":"10.1109/PDP2018.2018.00095","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00095","url":null,"abstract":"Today large-scale simulation applications are becoming common in research and industry. A significant fraction of them run on multi-core clusters. Current parallel simulation kernels use multi-process and multi-thread to exploit inter-node parallelism and intra-node parallelism on multi-core clusters. We exploit task-base parallelism in parallel discrete event simulation (PDES) kernels, which is more fine-grained than thread-level and process-level parallelism. In our system, every simulation event is wrapped to a task. Work-stealing task scheduling scheme is applied to achieve dynamic load balancing among the multi-cores, and a graph partitioning approach is applied in partitioning simulation entities among the cluster nodes. Experimental results show that our PDES kernel outperforms existing PDES kernels by fully exploiting task parallelism.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114745787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Energy Efficiency Model Generalization on Multicore Embedded Platforms 探索多核嵌入式平台的能效模型泛化
Hergys Rexha, S. Lafond
In this paper we investigate the relation between energy efficiency model and workload type executed in modern embedded architectures. From the energy efficiency model obtained in our previous work we select a few configuration points to verify that the prediction in terms of relative energy efficiency is maintained through different workload scenarios. A configuration point is defined as a set of platform tunable metrics, such as DVFS point, DPM level and utilization rate. As workloads, we use a combination of synthetic generators and real world applications from the embedded domain. In our experiments we use two different architectures for testing the model generality, which provide examples of real systems. First we have a comparison of the efficiency obtained by the two architecturally different chips (ARM and INTEL) in different configuration points and different workload scenarios. Second we try to explain the different results through the thermal management done by the two different chips. At the end we show that only in the case of workloads highly composed by integer instructions the results from the two architectures converge and show the need for a specific model trained with integer operations.
本文研究了现代嵌入式架构中能效模型与工作负载类型之间的关系。从我们之前的工作中得到的能效模型中,我们选择了几个配置点来验证在不同的工作负载场景下相对能效的预测是否保持不变。配置点被定义为一组平台可调指标,例如DVFS点、DPM级别和利用率。作为工作负载,我们使用了嵌入式领域的合成生成器和实际应用程序的组合。在我们的实验中,我们使用了两种不同的体系结构来测试模型的通用性,它们提供了真实系统的示例。首先,我们比较了两种架构不同的芯片(ARM和INTEL)在不同配置点和不同工作负载场景下获得的效率。其次,我们试图通过两种不同芯片的热管理来解释不同的结果。最后,我们表明,只有在由整数指令高度组成的工作负载的情况下,两种体系结构的结果才会收敛,并表明需要使用整数操作训练的特定模型。
{"title":"Exploring Energy Efficiency Model Generalization on Multicore Embedded Platforms","authors":"Hergys Rexha, S. Lafond","doi":"10.1109/PDP2018.2018.00084","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00084","url":null,"abstract":"In this paper we investigate the relation between energy efficiency model and workload type executed in modern embedded architectures. From the energy efficiency model obtained in our previous work we select a few configuration points to verify that the prediction in terms of relative energy efficiency is maintained through different workload scenarios. A configuration point is defined as a set of platform tunable metrics, such as DVFS point, DPM level and utilization rate. As workloads, we use a combination of synthetic generators and real world applications from the embedded domain. In our experiments we use two different architectures for testing the model generality, which provide examples of real systems. First we have a comparison of the efficiency obtained by the two architecturally different chips (ARM and INTEL) in different configuration points and different workload scenarios. Second we try to explain the different results through the thermal management done by the two different chips. At the end we show that only in the case of workloads highly composed by integer instructions the results from the two architectures converge and show the need for a specific model trained with integer operations.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127649407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Secure Distributed Framework for Agglomerative Hierarchical Clustering Construction 聚类分层聚类构建的安全分布式框架
M. Hamidi, M. Alishahi, F. Martinelli
This paper presents a general framework for constructing any agglomerative hierarchical clustering algorithm over partitioned data. It is assumed that data is distributed between two (or more) parties horizontally, such that for mutual benefits the participated parties are willing to identify the clusters' structure on their data as a whole, but for privacy restrictions, they avoid to share the original datasets. To this end, in this study, we propose general algorithms based on secure scalar product and secure hamming distance computation to securely compute the desired criteria for shaping the clusters' scheme. The proposed approach covers all possible secure agglomerative hierarchical clustering construction when data is distributed between two (or more) parties, including both numerical and categorical data.
本文给出了在分区数据上构造任何聚类分层聚类算法的一般框架。假设数据横向分布在两个(或多个)参与方之间,为了相互利益,参与方愿意将其数据作为一个整体识别集群的结构,但出于隐私限制,他们避免共享原始数据集。为此,在本研究中,我们提出了基于安全标量积和安全汉明距离计算的通用算法,以安全地计算形成聚类方案的期望准则。该方法涵盖了数据在两方(或多方)之间分布时所有可能的安全聚集分层聚类结构,包括数值数据和分类数据。
{"title":"A Secure Distributed Framework for Agglomerative Hierarchical Clustering Construction","authors":"M. Hamidi, M. Alishahi, F. Martinelli","doi":"10.1109/PDP2018.2018.00075","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00075","url":null,"abstract":"This paper presents a general framework for constructing any agglomerative hierarchical clustering algorithm over partitioned data. It is assumed that data is distributed between two (or more) parties horizontally, such that for mutual benefits the participated parties are willing to identify the clusters' structure on their data as a whole, but for privacy restrictions, they avoid to share the original datasets. To this end, in this study, we propose general algorithms based on secure scalar product and secure hamming distance computation to securely compute the desired criteria for shaping the clusters' scheme. The proposed approach covers all possible secure agglomerative hierarchical clustering construction when data is distributed between two (or more) parties, including both numerical and categorical data.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132324048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimizing Machine Learning Algorithms on Multi-Core and Many-Core Architectures Using Thread and Data Mapping 使用线程和数据映射优化多核和多核架构上的机器学习算法
M. Serpa, Arthur M. Krause, E. Cruz, P. Navaux, Marcelo Pasin, P. Felber
Driven by the development of new technologies such as personal assistants or autonomous cars, machine learning has rapidly become one of the most active fields in computer science. The algorithms at the core of machine learning are notoriously demanding in terms of resources. It is therefore of paramount importance to optimize their operation on modern processors. Several approaches have been proposed to accelerate machine learning on GPUs and massively parallel computers, as well as dedicated ASICs. In this paper, we focus on Intel's multi-core Xeon and many-core accelerator Xeon Phi Knights Landing, which can host several hundreds of threads on the same CPU. In such architectures, thread and data mapping are keys for performance. We study the impact of mapping strategies, revealing that, with smart mapping policies, one can indeed significantly speed up machine learning applications on many-core architectures. Execution time was reduced by up to 25.2% and 18.5% on Intel Xeon and Xeon Phi KNL, respectively.
在个人助理或自动驾驶汽车等新技术发展的推动下,机器学习已迅速成为计算机科学中最活跃的领域之一。众所周知,机器学习的核心算法对资源的要求很高。因此,优化它们在现代处理器上的操作是至关重要的。已经提出了几种方法来加速gpu和大规模并行计算机以及专用asic上的机器学习。在本文中,我们主要研究英特尔的多核Xeon和多核加速器Xeon Phi Knights Landing,它们可以在同一个CPU上托管数百个线程。在这样的体系结构中,线程和数据映射是性能的关键。我们研究了映射策略的影响,揭示了智能映射策略确实可以显著加快多核心架构上的机器学习应用。在Intel Xeon和Xeon Phi KNL上,执行时间分别减少了25.2%和18.5%。
{"title":"Optimizing Machine Learning Algorithms on Multi-Core and Many-Core Architectures Using Thread and Data Mapping","authors":"M. Serpa, Arthur M. Krause, E. Cruz, P. Navaux, Marcelo Pasin, P. Felber","doi":"10.1109/PDP2018.2018.00058","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00058","url":null,"abstract":"Driven by the development of new technologies such as personal assistants or autonomous cars, machine learning has rapidly become one of the most active fields in computer science. The algorithms at the core of machine learning are notoriously demanding in terms of resources. It is therefore of paramount importance to optimize their operation on modern processors. Several approaches have been proposed to accelerate machine learning on GPUs and massively parallel computers, as well as dedicated ASICs. In this paper, we focus on Intel's multi-core Xeon and many-core accelerator Xeon Phi Knights Landing, which can host several hundreds of threads on the same CPU. In such architectures, thread and data mapping are keys for performance. We study the impact of mapping strategies, revealing that, with smart mapping policies, one can indeed significantly speed up machine learning applications on many-core architectures. Execution time was reduced by up to 25.2% and 18.5% on Intel Xeon and Xeon Phi KNL, respectively.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130042157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1