首页 > 最新文献

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
Memory-Aware Genetic Algorithms for Task Mapping on Hard Real-Time Networks-on-Chip 硬实时片上网络任务映射的内存感知遗传算法
Lloyd Robert Still, L. Indrusiak
The problem of mapping hard real-time tasks onto networks-on-chip has previously been successfully addressed by genetic algorithms. However, none of the existing problem formulations consider memory constraints. State-of-the-art genetic mappers are therefore able to find fully-schedulable mappings which are incompatible with the memory limitations of realistic platforms. In this paper, we extend the problem formulation and devise a memory architecture, in the form of private local memories. We then propose three memory models of increasing complexity and realism, and evaluate the impact these additional constraints pose to the genetic search. We conduct extensive experiments using tasks and communications from a realistic benchmark application, and compare the proposed approach against a state-of-the-art baseline mapper.
将硬实时任务映射到片上网络的问题先前已经通过遗传算法成功地解决了。然而,现有的问题表述都没有考虑内存约束。因此,最先进的基因映射器能够找到与现实平台的内存限制不兼容的完全可调度的映射。在本文中,我们扩展了这个问题的表述,并设计了一个私有局部存储器形式的存储器体系结构。然后,我们提出了三种日益复杂和现实的记忆模型,并评估了这些额外的限制对基因搜索的影响。我们使用来自现实基准应用程序的任务和通信进行了广泛的实验,并将所提出的方法与最先进的基线映射器进行了比较。
{"title":"Memory-Aware Genetic Algorithms for Task Mapping on Hard Real-Time Networks-on-Chip","authors":"Lloyd Robert Still, L. Indrusiak","doi":"10.1109/PDP2018.2018.00101","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00101","url":null,"abstract":"The problem of mapping hard real-time tasks onto networks-on-chip has previously been successfully addressed by genetic algorithms. However, none of the existing problem formulations consider memory constraints. State-of-the-art genetic mappers are therefore able to find fully-schedulable mappings which are incompatible with the memory limitations of realistic platforms. In this paper, we extend the problem formulation and devise a memory architecture, in the form of private local memories. We then propose three memory models of increasing complexity and realism, and evaluate the impact these additional constraints pose to the genetic search. We conduct extensive experiments using tasks and communications from a realistic benchmark application, and compare the proposed approach against a state-of-the-art baseline mapper.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131462324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Studying Victim Caches in GPUs 研究gpu中的受害者缓存
E. Taylor, D. W. Chang
Today Graphics Processing Units (GPUs) are being used for more than traditional graphics processing. Large super computers such as Titan are utilizing GPUs to solve problems that have little resemblance to their original purpose. To improve the performance of these applications, GPU architects are increasing cache sizes to lower latencies of non-uniform memory references found in these programs. In this paper, we investigate an alternative approach where a victim buffer is added to the first level cache. Our studies show that a 256-line victim cache can increase L1 hit rate by 15% and improve IPC by 7.5% over the baseline. This victim cache outperforms increasing the cache size by 400% while being a less costly solution in terms of area.
如今,图形处理单元(gpu)的应用已经超越了传统的图形处理。像泰坦这样的大型超级计算机正在利用gpu来解决与其最初目的几乎没有相似之处的问题。为了提高这些应用程序的性能,GPU架构师正在增加缓存大小,以降低这些程序中发现的非统一内存引用的延迟。在本文中,我们研究了一种将受害者缓冲区添加到第一级缓存的替代方法。我们的研究表明,256行受害者缓存可以将L1命中率提高15%,并将IPC提高7.5%。这个受害者缓存的性能比增加缓存大小400%要好,同时在面积方面是一个成本更低的解决方案。
{"title":"Studying Victim Caches in GPUs","authors":"E. Taylor, D. W. Chang","doi":"10.1109/PDP2018.2018.00069","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00069","url":null,"abstract":"Today Graphics Processing Units (GPUs) are being used for more than traditional graphics processing. Large super computers such as Titan are utilizing GPUs to solve problems that have little resemblance to their original purpose. To improve the performance of these applications, GPU architects are increasing cache sizes to lower latencies of non-uniform memory references found in these programs. In this paper, we investigate an alternative approach where a victim buffer is added to the first level cache. Our studies show that a 256-line victim cache can increase L1 hit rate by 15% and improve IPC by 7.5% over the baseline. This victim cache outperforms increasing the cache size by 400% while being a less costly solution in terms of area.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133375596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Variable Batched DGEMM 变量批处理DGEMM
Pedro Valero-Lara, I. Martínez-Pérez, Sergi Mateo, R. Sirvent, Vicencc Beltran, X. Martorell, Jesús Labarta
Many scientific applications are in need to solve a high number of small-size independent problems. These individual problems do not provide enough parallelism and then, these must be computed as a batch. Today, vendors such as Intel and NVIDIA are developing their own suite of batch routines. Although most of the works focus on computing batches of fixed size, in real applications we can not assume a uniform size for all set of problems. We explore and analyze different strategies based on parallel for, task and taskloop OpenMP pragmas. Although these strategies are straightforward from a programmer's point of view, they have a different impact on performance. We also analyze a new prototype provided by Intel (MKL), which deals with batch operations (cblas dgemm batch). We propose a new approach called grouping. It basically groups a set of problems until filling a limit in terms of memory occupancy or number of operations. In this way, groups composed by different number of problems are distributed on cores, achieving a more balanced distribution in terms of computational cost. This strategy is able to be up to 6× faster than the Intel (MKL) batch routine.
许多科学应用都需要解决大量的小尺度独立问题。这些单独的问题没有提供足够的并行性,然后,这些必须作为一个批计算。今天,像英特尔和NVIDIA这样的供应商正在开发他们自己的批处理例程套件。虽然大多数工作集中在计算固定大小的批,但在实际应用中,我们不能对所有问题集假设一个统一的大小。我们探索和分析了基于并行for、任务和任务循环OpenMP语用的不同策略。尽管从程序员的角度来看,这些策略很简单,但它们对性能的影响不同。我们还分析了英特尔公司(MKL)提供的一个处理批处理(cblas dgemm batch)的新原型。我们提出了一种新的方法,叫做分组。它基本上对一组问题进行分组,直到在内存占用或操作数量方面达到限制。这样,由不同数量的问题组成的组分布在核心上,在计算成本方面实现了更平衡的分布。这种策略能够比Intel (MKL)批处理例程快6倍。
{"title":"Variable Batched DGEMM","authors":"Pedro Valero-Lara, I. Martínez-Pérez, Sergi Mateo, R. Sirvent, Vicencc Beltran, X. Martorell, Jesús Labarta","doi":"10.1109/PDP2018.2018.00065","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00065","url":null,"abstract":"Many scientific applications are in need to solve a high number of small-size independent problems. These individual problems do not provide enough parallelism and then, these must be computed as a batch. Today, vendors such as Intel and NVIDIA are developing their own suite of batch routines. Although most of the works focus on computing batches of fixed size, in real applications we can not assume a uniform size for all set of problems. We explore and analyze different strategies based on parallel for, task and taskloop OpenMP pragmas. Although these strategies are straightforward from a programmer's point of view, they have a different impact on performance. We also analyze a new prototype provided by Intel (MKL), which deals with batch operations (cblas dgemm batch). We propose a new approach called grouping. It basically groups a set of problems until filling a limit in terms of memory occupancy or number of operations. In this way, groups composed by different number of problems are distributed on cores, achieving a more balanced distribution in terms of computational cost. This strategy is able to be up to 6× faster than the Intel (MKL) batch routine.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115406654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Local and Global Shared Memory for Task Based HPC Applications on Heterogeneous Platforms 异构平台上基于任务的HPC应用的本地和全局共享内存
Chao Liu, M. Leeser
With the prevalence of multicore and manycore processors, developing parallel applications to bene?t from massively parallel resources is important. In this work, we introduce a hybrid shared memory mechanism based on a high-level task design. We implemented task scoped global shared data based on the one-sided communication feature of MPI-3 and enable users to implement and create multi-threaded tasks that can execute either on a single node or on multiple nodes. Task threads of distributed nodes can share data sets through global shared data objects using one-sided remote memory access. We ported and developed a set of benchmark applications and tested on a cluster platform. The high-level task design and hybrid shared memory help users develop and maintain parallel programs easily, and the results show that the global shared data can deliver good RMA performance; the multi-threaded task implementations perform up to 20% faster than ordinary OpenMP programs and have better scaling performance than MPI programs on multiple nodes.
随着多核和多核处理器的普及,开发并行应用程序是否会受益?来自大规模并行资源的T很重要。在这项工作中,我们引入了一种基于高级任务设计的混合共享内存机制。我们基于MPI-3的单侧通信特性实现了任务范围内的全局共享数据,并使用户能够实现和创建可以在单个节点或多个节点上执行的多线程任务。分布式节点的任务线程可以使用单侧远程内存访问,通过全局共享数据对象共享数据集。我们移植并开发了一组基准测试应用程序,并在集群平台上进行了测试。采用高级任务设计和混合共享内存,使用户可以轻松地开发和维护并行程序,结果表明,全局共享数据可以提供良好的RMA性能;多线程任务实现比普通OpenMP程序的执行速度快20%,并且在多节点上具有比MPI程序更好的扩展性能。
{"title":"Local and Global Shared Memory for Task Based HPC Applications on Heterogeneous Platforms","authors":"Chao Liu, M. Leeser","doi":"10.1109/PDP2018.2018.00055","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00055","url":null,"abstract":"With the prevalence of multicore and manycore processors, developing parallel applications to bene?t from massively parallel resources is important. In this work, we introduce a hybrid shared memory mechanism based on a high-level task design. We implemented task scoped global shared data based on the one-sided communication feature of MPI-3 and enable users to implement and create multi-threaded tasks that can execute either on a single node or on multiple nodes. Task threads of distributed nodes can share data sets through global shared data objects using one-sided remote memory access. We ported and developed a set of benchmark applications and tested on a cluster platform. The high-level task design and hybrid shared memory help users develop and maintain parallel programs easily, and the results show that the global shared data can deliver good RMA performance; the multi-threaded task implementations perform up to 20% faster than ordinary OpenMP programs and have better scaling performance than MPI programs on multiple nodes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124097984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resizing of Heterogeneous Platforms and the Optimization of Parallel Applications 异构平台的大小调整与并行应用的优化
Moussa Beji, Sami Achour
With the birth of multi-cluster platforms, scheduling and finding the optimal number of resources (clusters, processors) to execute an application constitute very critical problems. In this paper, we address the need for scheduling techniques for parallel task applications on this kind of platforms and we propose a new strategy for scheduling sequential task graphs based on existing heuristics that have proved to be efficient on homogeneous environments. The contribution of this paper lies in determining the appropriate clusters which participate to compute a given application. Our solution is composed of three steps: Firstly, determining of the computing clusters, secondly, determining the optimal number of processors in each cluster, finally place the tasks on the appropriate processors. Simulation results, based on both randomly generated graphs and real configuration platforms, show that the proposed approach provides interesting trade-off between makespan and resource consumption.
随着多集群平台的诞生,调度和寻找执行应用程序的最佳资源(集群、处理器)数量构成了非常关键的问题。在本文中,我们解决了这类平台上并行任务应用程序对调度技术的需求,并提出了一种基于现有启发式调度顺序任务图的新策略,该策略已被证明在同构环境下是有效的。本文的贡献在于确定参与计算给定应用程序的适当集群。我们的解决方案由三个步骤组成:首先,确定计算集群,其次,确定每个集群中最优的处理器数量,最后将任务分配到合适的处理器上。基于随机生成图和实际配置平台的仿真结果表明,所提出的方法在完工时间和资源消耗之间提供了有趣的权衡。
{"title":"Resizing of Heterogeneous Platforms and the Optimization of Parallel Applications","authors":"Moussa Beji, Sami Achour","doi":"10.1109/PDP2018.2018.00029","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00029","url":null,"abstract":"With the birth of multi-cluster platforms, scheduling and finding the optimal number of resources (clusters, processors) to execute an application constitute very critical problems. In this paper, we address the need for scheduling techniques for parallel task applications on this kind of platforms and we propose a new strategy for scheduling sequential task graphs based on existing heuristics that have proved to be efficient on homogeneous environments. The contribution of this paper lies in determining the appropriate clusters which participate to compute a given application. Our solution is composed of three steps: Firstly, determining of the computing clusters, secondly, determining the optimal number of processors in each cluster, finally place the tasks on the appropriate processors. Simulation results, based on both randomly generated graphs and real configuration platforms, show that the proposed approach provides interesting trade-off between makespan and resource consumption.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124182842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributed Heuristics for Optimizing Cohesive Groups: A Support for Clinical Patient Engagement in Social Network Analysis 分布式启发式优化凝聚力群体:支持临床患者参与社会网络分析
I. Zoppis, R. Dondi, Davide Coppetti, Alessandro Beltramo, G. Mauri
Social interaction allows to support the disease management by creating online spaces where patients can interact with clinicians, and share experiences with other patients. Therefore, promoting targeted communication in online social spaces is a means to group patients around shared goals, offer emotional support, and finally engage patients in their healthcare decision making process. In this paper, we approach the argument from a theoretical perspective: we design an optimization problem aimed to encourage the creation of (induced) sub-networks of patients which, being recently diagnosed, wish to deepen the knowledge about their medical treatment with some other similar profiled patients, which have already been followed up by specific (even alternative) care centers. In particular, due to the computational hardness of the proposed problem, we provide approximated solutions based on distributed heuristics (i.e., Genetic Algorithms). Results are given for simulated data using Erdos-Renyi random graphs.
社会互动允许通过创建在线空间来支持疾病管理,患者可以与临床医生互动,并与其他患者分享经验。因此,在在线社交空间中促进有针对性的交流是围绕共同目标将患者分组,提供情感支持,并最终使患者参与其医疗保健决策过程的一种手段。在本文中,我们从理论的角度来探讨这一论点:我们设计了一个优化问题,旨在鼓励创建(诱导的)患者子网络,这些患者最近被诊断出来,希望与其他类似的患者加深对其医疗的了解,这些患者已经被特定(甚至替代)护理中心跟踪。特别是,由于所提出问题的计算难度,我们提供了基于分布式启发式(即遗传算法)的近似解决方案。给出了用Erdos-Renyi随机图模拟数据的结果。
{"title":"Distributed Heuristics for Optimizing Cohesive Groups: A Support for Clinical Patient Engagement in Social Network Analysis","authors":"I. Zoppis, R. Dondi, Davide Coppetti, Alessandro Beltramo, G. Mauri","doi":"10.1109/PDP2018.2018.00044","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00044","url":null,"abstract":"Social interaction allows to support the disease management by creating online spaces where patients can interact with clinicians, and share experiences with other patients. Therefore, promoting targeted communication in online social spaces is a means to group patients around shared goals, offer emotional support, and finally engage patients in their healthcare decision making process. In this paper, we approach the argument from a theoretical perspective: we design an optimization problem aimed to encourage the creation of (induced) sub-networks of patients which, being recently diagnosed, wish to deepen the knowledge about their medical treatment with some other similar profiled patients, which have already been followed up by specific (even alternative) care centers. In particular, due to the computational hardness of the proposed problem, we provide approximated solutions based on distributed heuristics (i.e., Genetic Algorithms). Results are given for simulated data using Erdos-Renyi random graphs.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121138116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How do Loop Transformations Affect the Energy Consumption of Multi-Threaded Runge-Kutta Methods? 循环变换如何影响多线程龙格-库塔方法的能耗?
T. Rauber, G. Rünger
Runge-Kutta methods are widely used and popular solutions method for scientific simulations based on differential equations and, thus, their efficient execution is crucial for many applications. Today, also the energy consumption is getting more and more important for high performance computing. In this article, we investigate the performance and the energy consumption of Runge-Kutta methods solving systems of ordinary differential equations on recent Intel processors. Our specific interest is the study of different program versions of multithreaded Runge-Kutta methods which result from loop transformations within the nested loops over stage vectors and systems sizes. Four program versions of the Runge-Kutta method DOPRI5 are chosen and are applied to systems of ordinary differential equations with different workload. Experiments have been performed for different numbers of threads and the performance, power and energy consumption is reported and analyzed.
龙格-库塔法是基于微分方程的科学模拟中应用最广泛、最流行的求解方法,其有效执行对许多应用至关重要。如今,对于高性能计算来说,能耗也变得越来越重要。在本文中,我们研究了龙格-库塔方法在最新的英特尔处理器上求解常微分方程组的性能和能耗。我们的具体兴趣是研究多线程龙格-库塔方法的不同程序版本,这些方法是由嵌套循环内的循环转换在阶段向量和系统大小上产生的。选择龙格-库塔方法DOPRI5的四个程序版本,并将其应用于不同工作量的常微分方程系统。在不同线程数的情况下进行了实验,并对其性能、功耗和能耗进行了报告和分析。
{"title":"How do Loop Transformations Affect the Energy Consumption of Multi-Threaded Runge-Kutta Methods?","authors":"T. Rauber, G. Rünger","doi":"10.1109/PDP2018.2018.00085","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00085","url":null,"abstract":"Runge-Kutta methods are widely used and popular solutions method for scientific simulations based on differential equations and, thus, their efficient execution is crucial for many applications. Today, also the energy consumption is getting more and more important for high performance computing. In this article, we investigate the performance and the energy consumption of Runge-Kutta methods solving systems of ordinary differential equations on recent Intel processors. Our specific interest is the study of different program versions of multithreaded Runge-Kutta methods which result from loop transformations within the nested loops over stage vectors and systems sizes. Four program versions of the Runge-Kutta method DOPRI5 are chosen and are applied to systems of ordinary differential equations with different workload. Experiments have been performed for different numbers of threads and the performance, power and energy consumption is reported and analyzed.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128569129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Computing Empirical P-Values for Estimating Gene-Gene Interactions in Genome-Wide Association Studies: A Parallel Computing Approach 计算经验p值估计基因-基因相互作用在全基因组关联研究:并行计算方法
Valentina Giansanti, D. D'Agostino, C. Maj, S. Beretta, I. Merelli
In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with genome-wide association studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient approach to compute empirical p-values concerning the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our approach we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.
在复杂表型(例如,精神疾病)中,单位点测试通常与全基因组关联研究一起进行,已被证明在发现强烈的基因关联方面是有限的。越来越多的证据表明,上位非线性效应可能是由不同生物因素相互作用产生的复杂表型的原因。上位性分析的一个主要问题是计算负担,因为在考虑所有潜在的基因型组合时要进行大量的统计检验。在这项工作中,我们开发了一种计算高效的方法来计算双相情感障碍全基因组范围内上位性存在的经验p值,双相情感障碍是具有相关但无法解释的遗传背景的复杂表型的典型例子。通过运行我们的方法,我们能够确定13个上位相互作用,这些变异位于可能参与与分析表型相关的生物过程的基因中。
{"title":"Computing Empirical P-Values for Estimating Gene-Gene Interactions in Genome-Wide Association Studies: A Parallel Computing Approach","authors":"Valentina Giansanti, D. D'Agostino, C. Maj, S. Beretta, I. Merelli","doi":"10.1109/PDP2018.2018.00071","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00071","url":null,"abstract":"In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with genome-wide association studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient approach to compute empirical p-values concerning the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our approach we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSSMT: Compiler Based Software Simultaneous Multithreading (SMT) CSSMT:基于编译器的软件同步多线程(SMT)
Yuanfang Chen, Qingchuan Shi, Xiaoming Li
Simultaneous multithreading (SMT) is a unique computer architecture feature to increase the pipeline utilization and therefore, increase the instruction throughput. It improves instruction level throughput by simultaneously filling both vertical and horizontal super-scalar pipeline slots that are left unfilled by native threads. So far SMT has been implemented in hardware. However, hardware SMT implementations have its limitations. First, it is complex and expensive to implement—only higher-end processors are equipped with it even though lower-end processors have same pipeline design as the higher-end variants and might also benefit from it. SMT also introduces great power/energy and area overheads. Moreover, being a hardware feature, SMT is limited by the range and depth of instruction analysis that it can afford at execution time, therefore it is unlikely to benefit from high-level software knowledge about instruction mix and might lose many improvement opportunities. In this paper, we address the limitation of the hardware-based SMT and introduce CSSMT: Compiler based Software Simultaneous Multithreading (SMT). The main contribution of CSSMT is that it exploits high- level program profiles to purposefully "re-mix" instructions from multiple programs to better fill vertical and horizontal super- scalar pipeline slots so that the overall throughput is improved. Furthermore, CSSMT is a software-transformation technique that enables SMT at software level during compilation time. Therefore, it can help overcome the limitation of the hardware-based SMT implementation and is more portable. We test CSSMT with programs from SPEC2006 and NAS benchmarks and achieve up to 12% speedup of execution time (30.7% improvement in terms of multi-program throughput).
同步多线程(SMT)是一种独特的计算机体系结构特征,可以提高管道利用率,从而提高指令吞吐量。它通过同时填充原生线程未填充的垂直和水平超标量管道槽来提高指令级吞吐量。到目前为止,SMT已经在硬件上实现了。然而,硬件SMT实现有其局限性。首先,它的实现既复杂又昂贵——只有高端处理器配备了它,即使低端处理器具有与高端变体相同的流水线设计,并且也可能从中受益。SMT还带来了巨大的电力/能源和面积开销。此外,作为一种硬件特性,SMT受限于它在执行时所能提供的指令分析的范围和深度,因此它不太可能从有关指令组合的高级软件知识中获益,而且可能会失去许多改进机会。本文针对基于硬件的SMT的局限性,介绍了基于编译器的软件同步多线程(SMT)。CSSMT的主要贡献在于它利用高级程序配置文件有目的地“重新混合”来自多个程序的指令,以更好地填充垂直和水平的超标量管道槽,从而提高总体吞吐量。此外,CSSMT是一种软件转换技术,可以在编译期间在软件级别启用SMT。因此,它可以帮助克服基于硬件的SMT实现的限制,并且更具可移植性。我们用SPEC2006和NAS基准测试中的程序测试CSSMT,并实现了高达12%的执行时间加速(在多程序吞吐量方面提高了30.7%)。
{"title":"CSSMT: Compiler Based Software Simultaneous Multithreading (SMT)","authors":"Yuanfang Chen, Qingchuan Shi, Xiaoming Li","doi":"10.1109/PDP2018.2018.00017","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00017","url":null,"abstract":"Simultaneous multithreading (SMT) is a unique computer architecture feature to increase the pipeline utilization and therefore, increase the instruction throughput. It improves instruction level throughput by simultaneously filling both vertical and horizontal super-scalar pipeline slots that are left unfilled by native threads. So far SMT has been implemented in hardware. However, hardware SMT implementations have its limitations. First, it is complex and expensive to implement—only higher-end processors are equipped with it even though lower-end processors have same pipeline design as the higher-end variants and might also benefit from it. SMT also introduces great power/energy and area overheads. Moreover, being a hardware feature, SMT is limited by the range and depth of instruction analysis that it can afford at execution time, therefore it is unlikely to benefit from high-level software knowledge about instruction mix and might lose many improvement opportunities. In this paper, we address the limitation of the hardware-based SMT and introduce CSSMT: Compiler based Software Simultaneous Multithreading (SMT). The main contribution of CSSMT is that it exploits high- level program profiles to purposefully \"re-mix\" instructions from multiple programs to better fill vertical and horizontal super- scalar pipeline slots so that the overall throughput is improved. Furthermore, CSSMT is a software-transformation technique that enables SMT at software level during compilation time. Therefore, it can help overcome the limitation of the hardware-based SMT implementation and is more portable. We test CSSMT with programs from SPEC2006 and NAS benchmarks and achieve up to 12% speedup of execution time (30.7% improvement in terms of multi-program throughput).","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132347048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CARS: Context Aware Reputation Systems to Evaluate Vehicles' Behaviour 汽车:评估车辆行为的环境感知信誉系统
Gianpiero Costantino, F. Martinelli, I. Matteucci, A. Bertolino, Antonello Calabrò, E. Marchetti
The introduction of new generation ICT systems into vehicles makes them highly connected with the external World. As drawback, vehicle becomes potentially vulnerable to security attacks. Here, we consider a scenario in which Vehicular Networks and a Urban Network work together to realize a defence mechanism based on Reputation Systems. In this way, we are able to identify and isolate possible malicious vehicles acting that could send messages with the aim of reducing the availability of the network. We propose Context Aware Reputation Systems, CARS, able to identify insider attackers and isolate them taking into account contextual conditions derived from sensors spread along the entire urban network. Then, we experimentally evaluate CARS on a real data-set of mobility traces of taxis in Rome to compare the proposed systems with existing ones that do not consider contextual conditions. The preliminary results obtained are promising and show the feasibility and potentiality of CARS.
新一代信息通信技术系统的引入使车辆与外部世界高度连接。作为缺点,车辆可能容易受到安全攻击。在这里,我们考虑一个场景,其中车辆网络和城市网络一起工作,以实现基于声誉系统的防御机制。通过这种方式,我们能够识别和隔离可能的恶意车辆,这些车辆可以发送旨在降低网络可用性的消息。我们提出了上下文感知信誉系统,CARS,能够识别内部攻击者并将其隔离,同时考虑到整个城市网络中分布的传感器产生的上下文条件。然后,我们在罗马出租车移动轨迹的真实数据集上对CARS进行了实验评估,以将所提出的系统与不考虑上下文条件的现有系统进行比较。初步结果显示了CARS的可行性和潜力。
{"title":"CARS: Context Aware Reputation Systems to Evaluate Vehicles' Behaviour","authors":"Gianpiero Costantino, F. Martinelli, I. Matteucci, A. Bertolino, Antonello Calabrò, E. Marchetti","doi":"10.1109/PDP2018.2018.00078","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00078","url":null,"abstract":"The introduction of new generation ICT systems into vehicles makes them highly connected with the external World. As drawback, vehicle becomes potentially vulnerable to security attacks. Here, we consider a scenario in which Vehicular Networks and a Urban Network work together to realize a defence mechanism based on Reputation Systems. In this way, we are able to identify and isolate possible malicious vehicles acting that could send messages with the aim of reducing the availability of the network. We propose Context Aware Reputation Systems, CARS, able to identify insider attackers and isolate them taking into account contextual conditions derived from sensors spread along the entire urban network. Then, we experimentally evaluate CARS on a real data-set of mobility traces of taxis in Rome to compare the proposed systems with existing ones that do not consider contextual conditions. The preliminary results obtained are promising and show the feasibility and potentiality of CARS.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129980091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1