首页 > 最新文献

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)最新文献

英文 中文
VMeter: Power modelling for virtualized clouds VMeter:虚拟化云的功率建模
Ata E. H. Bohra, V. Chaudhary
Datacenters are seeing unprecedented growth in recent years. The energy requirements to operate these large scale facilities are increasing significantly both in terms of operation cost as well as their indirect impact on ecology due to high carbon emissions. There are several ongoing research efforts towards the development of an integrated cloud management system to provide comprehensive online monitoring of resource utilization along with the implementation of power-aware policies to reduce the total energy consumption. However, most of these techniques provide online power monitoring based on the power consumption of a physical node running one or more Virtual Machines (VM). They lack a fine-grained mechanism to profile the power of an individual hosted VM. In this work we present a novel power modelling technique, VMeter, based on online monitoring of system-resources having high correlation with the total power consumption. The monitored system sub-components include: CPU, cache, disk, and DRAM. The proposed model predicts instantaneous power consumption of an individual VM hosted on a physical node besides the full system power consumption. Our model is validated using computationally diverse and industry standard benchmark programs. Our evaluation results show that our model is able to predict instantaneous power with an average mean and median accuracy of 93% and 94%, respectively, against the actual measured power using an externally attached power meter.
近年来,数据中心出现了前所未有的增长。运营这些大型设施的能源需求在运营成本以及高碳排放对生态的间接影响方面都在显著增加。目前有几个正在进行的研究工作,旨在开发一个集成的云管理系统,以提供对资源利用情况的全面在线监测,同时实施功率感知策略,以降低总能耗。但是,这些技术中的大多数都基于运行一个或多个虚拟机(VM)的物理节点的功耗提供在线电源监控。它们缺乏细粒度的机制来分析单个托管VM的功能。在这项工作中,我们提出了一种新的功率建模技术,VMeter,基于与总功耗高度相关的系统资源的在线监测。监控的系统子组件包括:CPU、缓存、磁盘和DRAM。该模型除了预测整个系统功耗外,还预测了驻留在物理节点上的单个VM的瞬时功耗。我们的模型使用计算多样化和行业标准基准程序进行验证。我们的评估结果表明,我们的模型能够预测瞬时功率,相对于使用外接功率计的实际测量功率,平均和中位数精度分别为93%和94%。
{"title":"VMeter: Power modelling for virtualized clouds","authors":"Ata E. H. Bohra, V. Chaudhary","doi":"10.1109/IPDPSW.2010.5470907","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470907","url":null,"abstract":"Datacenters are seeing unprecedented growth in recent years. The energy requirements to operate these large scale facilities are increasing significantly both in terms of operation cost as well as their indirect impact on ecology due to high carbon emissions. There are several ongoing research efforts towards the development of an integrated cloud management system to provide comprehensive online monitoring of resource utilization along with the implementation of power-aware policies to reduce the total energy consumption. However, most of these techniques provide online power monitoring based on the power consumption of a physical node running one or more Virtual Machines (VM). They lack a fine-grained mechanism to profile the power of an individual hosted VM. In this work we present a novel power modelling technique, VMeter, based on online monitoring of system-resources having high correlation with the total power consumption. The monitored system sub-components include: CPU, cache, disk, and DRAM. The proposed model predicts instantaneous power consumption of an individual VM hosted on a physical node besides the full system power consumption. Our model is validated using computationally diverse and industry standard benchmark programs. Our evaluation results show that our model is able to predict instantaneous power with an average mean and median accuracy of 93% and 94%, respectively, against the actual measured power using an externally attached power meter.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
The resource locating strategy based on sub-domain hybrid P2P network model 基于子域混合P2P网络模型的资源定位策略
Yuhua Liu, Yuling Li, L. Yang, N. Xiong, Longquan Zhu, Kaihua Xu
P2P networks are important parts of the next generation Internet and P2P file sharing has become one of the most important internet application systems in the world. But P2P network faces a challenge in locating resource: the unstructured P2P network has complex search functions, but the efficiency of resource locating is low. However, it is efficient to locate resources in structure P2P network, but it does not support fuzzy queries and has the higher maintenance cost of network. So how to search the resources in the P2P networks efficiently is becoming the key problem. In traditional network structure, it is difficult to simultaneously meet users' requirements in resource locating efficiency and recall ratio. So this paper proposes a sub-domain hybrid P2P network model (SHPNM), which makes full use of the advantages of supporting fuzzy queries in the unstructured network and maintains efficient in resource locating in structured P2P network. Then it gives a detailed analysis about the design idea of SHPNM as well as the method of resources locating in this model. The nodes are gathered into a domain according to the approaching of nodes' physical location, and the nodes in a domain can form a structured P2P network. Each domain provides two nodes with biggest comprehensive performance value as a boundary node and a backup node, and the boundary nodes are connected together in the form of unstructured P2P networks. Simulations results show that SHPNM can both improve efficiency in locating resource and promote the recall ratio.
P2P网络是下一代互联网的重要组成部分,P2P文件共享已成为当今世界最重要的互联网应用系统之一。但是P2P网络在资源定位方面面临着一个挑战:非结构化的P2P网络搜索功能复杂,资源定位效率低。在结构化P2P网络中,资源定位效率高,但不支持模糊查询,网络维护成本较高。因此,如何高效地搜索P2P网络中的资源成为关键问题。在传统的网络结构中,很难同时满足用户对资源定位效率和查全率的要求。为此,本文提出了一种子域混合P2P网络模型(SHPNM),该模型充分利用了在非结构化网络中支持模糊查询的优势,并在结构化P2P网络中保持了高效的资源定位。然后详细分析了该模型的设计思想和资源定位方法。根据节点物理位置的逼近,将节点聚集到一个域中,域中的节点可以组成一个结构化的P2P网络。每个域提供两个综合性能值最大的节点作为边界节点和备份节点,边界节点以非结构化P2P网络的形式连接在一起。仿真结果表明,该算法既提高了资源定位效率,又提高了查全率。
{"title":"The resource locating strategy based on sub-domain hybrid P2P network model","authors":"Yuhua Liu, Yuling Li, L. Yang, N. Xiong, Longquan Zhu, Kaihua Xu","doi":"10.1109/IPDPSW.2010.5470710","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470710","url":null,"abstract":"P2P networks are important parts of the next generation Internet and P2P file sharing has become one of the most important internet application systems in the world. But P2P network faces a challenge in locating resource: the unstructured P2P network has complex search functions, but the efficiency of resource locating is low. However, it is efficient to locate resources in structure P2P network, but it does not support fuzzy queries and has the higher maintenance cost of network. So how to search the resources in the P2P networks efficiently is becoming the key problem. In traditional network structure, it is difficult to simultaneously meet users' requirements in resource locating efficiency and recall ratio. So this paper proposes a sub-domain hybrid P2P network model (SHPNM), which makes full use of the advantages of supporting fuzzy queries in the unstructured network and maintains efficient in resource locating in structured P2P network. Then it gives a detailed analysis about the design idea of SHPNM as well as the method of resources locating in this model. The nodes are gathered into a domain according to the approaching of nodes' physical location, and the nodes in a domain can form a structured P2P network. Each domain provides two nodes with biggest comprehensive performance value as a boundary node and a backup node, and the boundary nodes are connected together in the form of unstructured P2P networks. Simulations results show that SHPNM can both improve efficiency in locating resource and promote the recall ratio.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130782394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Storage space reduction for the solution of systems of ordinary differential equations by pipelining and overlapping of vectors 常微分方程组的管道化和向量重叠解的存储空间缩减
Matthias Korch, T. Rauber
Systems of ordinary differential equations (ODEs) arise from the mathematical modeling of time-dependent processes. Many sequential and parallel numerical methods have been proposed that can simulate processes described by ODE systems with known initial state. One disadvantage common to the proposed methods is the large amount of storage space required if the ODE systems consist of many equations. Not only do they have to keep the solution of the ODE system corresponding to the current time step in memory, but also several intermediate solutions or results of evaluations of the right hand side function of the ODE system. In this paper, we present an approach based on pipelining and overlapping of vectors which can reduce the storage space of typical ODE solution methods such as Runge-Kutta (RK) and extrapolation methods. We analyze and compare the scalability of different implementation variants of embedded and iterated RK methods on several modern parallel computer systems. Our experiments show that, due to an increased locality of memory references, our approach leads to a good scalability behavior even on large numbers of processors.
常微分方程系统(ode)是由时间相关过程的数学建模而产生的。人们提出了许多顺序和并行的数值方法来模拟已知初始状态的ODE系统所描述的过程。所提出的方法的一个共同缺点是,如果ODE系统包含许多方程,则需要大量的存储空间。它们不仅要在内存中保存ODE系统对应于当前时间步长的解,还要保存ODE系统右边函数的几个中间解或求值结果。在本文中,我们提出了一种基于管道和向量重叠的方法,可以减少典型的ODE求解方法如龙格-库塔(RK)和外推法的存储空间。我们分析和比较了嵌入式和迭代RK方法在几种现代并行计算机系统上的不同实现变体的可扩展性。我们的实验表明,由于增加了内存引用的局部性,我们的方法即使在大量处理器上也能带来良好的可伸缩性行为。
{"title":"Storage space reduction for the solution of systems of ordinary differential equations by pipelining and overlapping of vectors","authors":"Matthias Korch, T. Rauber","doi":"10.1109/IPDPSW.2010.5470768","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470768","url":null,"abstract":"Systems of ordinary differential equations (ODEs) arise from the mathematical modeling of time-dependent processes. Many sequential and parallel numerical methods have been proposed that can simulate processes described by ODE systems with known initial state. One disadvantage common to the proposed methods is the large amount of storage space required if the ODE systems consist of many equations. Not only do they have to keep the solution of the ODE system corresponding to the current time step in memory, but also several intermediate solutions or results of evaluations of the right hand side function of the ODE system. In this paper, we present an approach based on pipelining and overlapping of vectors which can reduce the storage space of typical ODE solution methods such as Runge-Kutta (RK) and extrapolation methods. We analyze and compare the scalability of different implementation variants of embedded and iterated RK methods on several modern parallel computer systems. Our experiments show that, due to an increased locality of memory references, our approach leads to a good scalability behavior even on large numbers of processors.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129709312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
How Algorithm Definition Language (ADL) improves the performance of SmartGridSolve applications 算法定义语言(ADL)如何提高SmartGridSolve应用程序的性能
M. Guidolin, Thomas Brady, Alexey L. Lastovetsky
In this paper, we study the importance of languages for the specification of algorithms in high performance Grid computing. We present one such language, the Algorithm Definition Language (ADL), designed and implemented for the use in conjunction with SmartGridSolve. We demonstrate that the use of this type of language can significantly improve the performance of Grid applications. We discuss how ADL can be used to improve the execution of some typical algorithms that use conditional statements, iterative computations and adaptive methods. We present experimental results demonstrating significant performance gains due to the use of ADL.
本文研究了在高性能网格计算中,语言对算法规范的重要性。我们提出了一种这样的语言,即算法定义语言(ADL),它是为与SmartGridSolve结合使用而设计和实现的。我们证明使用这种类型的语言可以显著提高网格应用程序的性能。我们讨论了如何使用ADL来改进使用条件语句、迭代计算和自适应方法的一些典型算法的执行。我们提出的实验结果表明,由于使用ADL,显著的性能提高。
{"title":"How Algorithm Definition Language (ADL) improves the performance of SmartGridSolve applications","authors":"M. Guidolin, Thomas Brady, Alexey L. Lastovetsky","doi":"10.1109/IPDPSW.2010.5470921","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470921","url":null,"abstract":"In this paper, we study the importance of languages for the specification of algorithms in high performance Grid computing. We present one such language, the Algorithm Definition Language (ADL), designed and implemented for the use in conjunction with SmartGridSolve. We demonstrate that the use of this type of language can significantly improve the performance of Grid applications. We discuss how ADL can be used to improve the execution of some typical algorithms that use conditional statements, iterative computations and adaptive methods. We present experimental results demonstrating significant performance gains due to the use of ADL.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126316425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance analysis and evaluation of random walk algorithms on wireless networks 无线网络随机游走算法的性能分析与评价
Keqin Li
We propose a model of dynamically evolving random networks and give an analytical result of the cover time of the simple random walk algorithm on a dynamic random symmetric planar point graph. Our dynamic network model considers random node distribution and random node mobility. We analyze the cover time of the parallel random walk algorithm on a complete network and show by numerical data that k parallel random walks reduce the cover time by almost a factor of k. We present simulation results for four random walk algorithms on random asymmetric planar point graphs. These algorithms include the simple random walk algorithm, the intelligent random walk algorithm, the parallel random walk algorithm, and the parallel intelligent random walk algorithm. Our random network model considers random node distribution and random battery transmission power.
提出了一种动态演化的随机网络模型,并给出了简单随机漫步算法在动态随机对称平面点图上的覆盖时间的解析结果。我们的动态网络模型考虑了随机节点分布和随机节点迁移。本文分析了并行随机行走算法在完整网络上的覆盖时间,并通过数值数据表明,k次并行随机行走使覆盖时间减少了近1 / k。本文给出了4种随机行走算法在随机非对称平面点图上的仿真结果。这些算法包括简单随机行走算法、智能随机行走算法、并行随机行走算法和并行智能随机行走算法。我们的随机网络模型考虑了随机节点分布和随机电池传输功率。
{"title":"Performance analysis and evaluation of random walk algorithms on wireless networks","authors":"Keqin Li","doi":"10.1142/S0129054112400369","DOIUrl":"https://doi.org/10.1142/S0129054112400369","url":null,"abstract":"We propose a model of dynamically evolving random networks and give an analytical result of the cover time of the simple random walk algorithm on a dynamic random symmetric planar point graph. Our dynamic network model considers random node distribution and random node mobility. We analyze the cover time of the parallel random walk algorithm on a complete network and show by numerical data that k parallel random walks reduce the cover time by almost a factor of k. We present simulation results for four random walk algorithms on random asymmetric planar point graphs. These algorithms include the simple random walk algorithm, the intelligent random walk algorithm, the parallel random walk algorithm, and the parallel intelligent random walk algorithm. Our random network model considers random node distribution and random battery transmission power.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125915423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
To upgrade or not to upgrade? Catamount vs. Cray Linux Environment 升级还是不升级?Catamount vs. Cray Linux环境
S. Hammond, G. Mudalige, J. A. Smith, Jim Davis, S. Jarvis, J. Holt, I. Miller, J. Herdman, A. Vadgama
Modern supercomputers are growing in diversity and complexity - the arrival of technologies such as multi-core processors, general purpose-GPUs and specialised compute accelerators has increased the potential scientific delivery possible from such machines. This is not however without some cost, including significant increases in the sophistication and complexity of supporting operating systems and software libraries. This paper documents the development and application of methods to assess the potential performance of selecting one hardware, operating system (OS) and software stack combination against another. This is of particular interest to supercomputing centres, which rou tinely examine prospective software/architecture combinations and possible machine upgrades. A case study is presented that assesses the potential performance of a particle transport code on AWE's Cray XT3 8,000-core supercomputer running images of the Catamount and the Cray Linux Environment (CLE) operating systems. This work demonstrates that by running a number of small benchmarks on a test machine and network, and observing factors such as operating system noise, it is possible to speculate as to the performance impact of upgrading from one operating system to another on the system as a whole. This use of performance modelling represents an inexpensive method of examining the likely behaviour of a large supercomputer before and after an operating system upgrade; this method is also attractive if it is desirable to minimise system downtime while exploring software-system upgrades. The results show that benchmark tests run on less than 256 cores would suggest that the impact (overhead) of upgrading the operating system to CLE was less than 10%; model projections suggest that this is not the case at scale.
现代超级计算机的多样性和复杂性正在增长——多核处理器、通用图形处理器和专用计算加速器等技术的出现,增加了这些机器潜在的科学交付可能性。然而,这并非没有成本,包括支持操作系统和软件库的复杂性和复杂性的显著增加。本文记录了评估选择一种硬件、操作系统(OS)和软件堆栈组合的潜在性能的方法的发展和应用。超级计算中心对此特别感兴趣,因为它们会及时检查未来的软件/架构组合和可能的机器升级。本文提出了一个案例研究,评估了在AWE的Cray XT3 8000核超级计算机上运行Catamount和Cray Linux环境(CLE)操作系统图像的粒子传输代码的潜在性能。这项工作表明,通过在测试机器和网络上运行许多小型基准测试,并观察操作系统噪声等因素,可以推测从一个操作系统升级到另一个操作系统对整个系统的性能影响。这种性能建模的使用代表了一种廉价的方法,可以检查大型超级计算机在操作系统升级前后的可能行为;如果希望在探索软件系统升级的同时最小化系统停机时间,这种方法也很有吸引力。结果表明,在少于256个内核上运行的基准测试表明,将操作系统升级到CLE的影响(开销)小于10%;模型预测表明,大规模的情况并非如此。
{"title":"To upgrade or not to upgrade? Catamount vs. Cray Linux Environment","authors":"S. Hammond, G. Mudalige, J. A. Smith, Jim Davis, S. Jarvis, J. Holt, I. Miller, J. Herdman, A. Vadgama","doi":"10.1109/IPDPSW.2010.5470885","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470885","url":null,"abstract":"Modern supercomputers are growing in diversity and complexity - the arrival of technologies such as multi-core processors, general purpose-GPUs and specialised compute accelerators has increased the potential scientific delivery possible from such machines. This is not however without some cost, including significant increases in the sophistication and complexity of supporting operating systems and software libraries. This paper documents the development and application of methods to assess the potential performance of selecting one hardware, operating system (OS) and software stack combination against another. This is of particular interest to supercomputing centres, which rou tinely examine prospective software/architecture combinations and possible machine upgrades. A case study is presented that assesses the potential performance of a particle transport code on AWE's Cray XT3 8,000-core supercomputer running images of the Catamount and the Cray Linux Environment (CLE) operating systems. This work demonstrates that by running a number of small benchmarks on a test machine and network, and observing factors such as operating system noise, it is possible to speculate as to the performance impact of upgrading from one operating system to another on the system as a whole. This use of performance modelling represents an inexpensive method of examining the likely behaviour of a large supercomputer before and after an operating system upgrade; this method is also attractive if it is desirable to minimise system downtime while exploring software-system upgrades. The results show that benchmark tests run on less than 256 cores would suggest that the impact (overhead) of upgrading the operating system to CLE was less than 10%; model projections suggest that this is not the case at scale.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126619322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An adaptive I/O load distribution scheme for distributed systems 分布式系统的自适应I/O负载分配方案
Xin Chen, J. Langston, Xubin He, Fengjiang Mao
A fundamental issue in a large-scale distributed system consisting of heterogeneous machines which vary in both I/O and computing capabilities is to distribute workloads with respect to the capabilities of each node to achieve the optimal performance. However, node capabilities are often not stable due to various factors. Simply using a static workload distribution scheme may not well match the capability of each node. To address this issue, we distribute workload adaptively to the change of system node capability. In this paper we present an adaptive I/O load distribution scheme to dynamically capture the I/O capabilities among system nodes and to predictively determine an suitable load distribution pattern. A case study is conducted by applying our load distribution scheme into a popular distributed file system PVFS2. Experiments results show that our adaptive load distribution scheme can dramatically improve the performance: up to 70% performance gain for writes and 80% for reads, and up to 63% overall performance loss can be avoided in the presence of an unstable Object Storage Device (OSD).
在由I/O和计算能力各不相同的异构机器组成的大规模分布式系统中,一个基本问题是根据每个节点的能力分配工作负载,以实现最佳性能。但是,由于各种因素的影响,节点的能力往往不稳定。简单地使用静态工作负载分配方案可能无法很好地匹配每个节点的能力。为了解决这个问题,我们根据系统节点能力的变化自适应地分配工作负载。在本文中,我们提出了一种自适应I/O负载分配方案,以动态捕获系统节点之间的I/O能力,并预测确定合适的负载分配模式。通过将我们的负载分配方案应用于一个流行的分布式文件系统PVFS2,进行了一个案例研究。实验结果表明,我们的自适应负载分配方案可以显著提高性能:在不稳定对象存储设备(OSD)存在的情况下,写性能可提高70%,读性能可提高80%,总体性能损失可避免63%。
{"title":"An adaptive I/O load distribution scheme for distributed systems","authors":"Xin Chen, J. Langston, Xubin He, Fengjiang Mao","doi":"10.1109/IPDPSW.2010.5470787","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470787","url":null,"abstract":"A fundamental issue in a large-scale distributed system consisting of heterogeneous machines which vary in both I/O and computing capabilities is to distribute workloads with respect to the capabilities of each node to achieve the optimal performance. However, node capabilities are often not stable due to various factors. Simply using a static workload distribution scheme may not well match the capability of each node. To address this issue, we distribute workload adaptively to the change of system node capability. In this paper we present an adaptive I/O load distribution scheme to dynamically capture the I/O capabilities among system nodes and to predictively determine an suitable load distribution pattern. A case study is conducted by applying our load distribution scheme into a popular distributed file system PVFS2. Experiments results show that our adaptive load distribution scheme can dramatically improve the performance: up to 70% performance gain for writes and 80% for reads, and up to 63% overall performance loss can be avoided in the presence of an unstable Object Storage Device (OSD).","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132988057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Collaborative execution environment for heterogeneous parallel systems 异构并行系统的协同执行环境
A. Ilic, L. Sousa
Nowadays, commodity computers are complex heterogeneous systems that provide a huge amount of computational power. However, to take advantage of this power we have to orchestrate the use of processing units with different characteristics. Such distributed memory systems make use of relatively slow interconnection networks, such as system buses. Therefore, most of the time we only individually take advantage of the central processing unit (CPU) or processing accelerators, which are simpler homogeneous subsystems. In this paper we propose a collaborative execution environment for exploiting data parallelism in a heterogeneous system. It is shown that this environment can be applied to program both CPU and graphics processing units (GPUs) to collaboratively compute matrix multiplication and fast Fourier transform (FFT). Experimental results show that significant performance benefits are achieved when both CPU and GPU are used.
如今,商用计算机是复杂的异构系统,提供了巨大的计算能力。然而,为了利用这种能力,我们必须协调使用具有不同特性的处理单元。这种分布式内存系统使用相对较慢的互连网络,例如系统总线。因此,大多数时候我们只单独利用中央处理单元(CPU)或处理加速器,它们是更简单的同构子系统。在本文中,我们提出了一个在异构系统中利用数据并行性的协作执行环境。结果表明,该环境可用于对CPU和图形处理器(gpu)进行编程,以协同计算矩阵乘法和快速傅里叶变换(FFT)。实验结果表明,当CPU和GPU同时使用时,可以获得显著的性能提升。
{"title":"Collaborative execution environment for heterogeneous parallel systems","authors":"A. Ilic, L. Sousa","doi":"10.1109/IPDPSW.2010.5470835","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470835","url":null,"abstract":"Nowadays, commodity computers are complex heterogeneous systems that provide a huge amount of computational power. However, to take advantage of this power we have to orchestrate the use of processing units with different characteristics. Such distributed memory systems make use of relatively slow interconnection networks, such as system buses. Therefore, most of the time we only individually take advantage of the central processing unit (CPU) or processing accelerators, which are simpler homogeneous subsystems. In this paper we propose a collaborative execution environment for exploiting data parallelism in a heterogeneous system. It is shown that this environment can be applied to program both CPU and graphics processing units (GPUs) to collaboratively compute matrix multiplication and fast Fourier transform (FFT). Experimental results show that significant performance benefits are achieved when both CPU and GPU are used.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122801941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multicore-aware reuse distance analysis 多核感知的重用距离分析
Derek L. Schuff, Benjamin S. Parsons, Vijay S. Pai
This paper presents and validates methods to extend reuse distance analysis of application locality characteristics to shared-memory multicore platforms by accounting for invalidation-based cache-coherence and inter-core cache sharing. Existing reuse distance analysis methods track the number of distinct addresses referenced between reuses of the same address by a given thread, but do not model the effects of data references by other threads. This paper shows several methods to keep reuse stacks consistent so that they account for invalidations and cache sharing, either as references arise in a simulated execution or at synchronization points. These methods are evaluated against a Simics-based coherent cache simulator running several OpenMP and transaction-based benchmarks. The results show that adding multicore-awareness substantially improves the ability of reuse distance analysis to model cache behavior, reducing the error in miss ratio prediction (relative to cache simulation for a specific cache size) by an average of 70% for per-core caches and an average of 90% for shared caches.
通过考虑基于无效的缓存一致性和核间缓存共享,提出并验证了将应用程序局部性特征的重用距离分析扩展到共享内存多核平台的方法。现有的重用距离分析方法跟踪给定线程对同一地址的重用之间引用的不同地址的数量,但不模拟其他线程对数据引用的影响。本文展示了几种保持重用堆栈一致性的方法,以便它们考虑到失效和缓存共享,无论是在模拟执行中还是在同步点上出现的引用。这些方法在基于simics的一致性缓存模拟器上进行评估,该模拟器运行几个OpenMP和基于事务的基准测试。结果表明,添加多核感知大大提高了重用距离分析来建模缓存行为的能力,将缺失率预测的误差(相对于特定缓存大小的缓存模拟)降低了每核缓存的平均70%和共享缓存的平均90%。
{"title":"Multicore-aware reuse distance analysis","authors":"Derek L. Schuff, Benjamin S. Parsons, Vijay S. Pai","doi":"10.1109/IPDPSW.2010.5470780","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470780","url":null,"abstract":"This paper presents and validates methods to extend reuse distance analysis of application locality characteristics to shared-memory multicore platforms by accounting for invalidation-based cache-coherence and inter-core cache sharing. Existing reuse distance analysis methods track the number of distinct addresses referenced between reuses of the same address by a given thread, but do not model the effects of data references by other threads. This paper shows several methods to keep reuse stacks consistent so that they account for invalidations and cache sharing, either as references arise in a simulated execution or at synchronization points. These methods are evaluated against a Simics-based coherent cache simulator running several OpenMP and transaction-based benchmarks. The results show that adding multicore-awareness substantially improves the ability of reuse distance analysis to model cache behavior, reducing the error in miss ratio prediction (relative to cache simulation for a specific cache size) by an average of 70% for per-core caches and an average of 90% for shared caches.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123951970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
GridP2P: Resource usage in Grids and Peer-to-Peer systems 网格和点对点系统中的资源使用
Sérgio Esteves, L. Veiga, P. Ferreira
The last few years have witnessed huge growth in computer technology and available resources throughout the Internet. These resources can be used to run CPU-intensive applications requiring long periods of processing time. Grid systems allow us to take advantage of available resources lying over a network. However, these systems impose several difficulties to their usage (e.g. heavy authentication and configuration management); in order to overcome them, Peer-to-Peer systems provide open access making the Grid available to any user. Our solution consists of a platform for distributed cycle sharing which attempts to combine Grid and Peer-to-Peer models. A major goal is to allow any ordinary user to use remote idle cycles in order to speedup commodity applications. On the other hand, users can also provide spare cycles of their machines when they are not using them. Our solution encompasses the following functionalities: application management, job creation and scheduling, resource discovery, security policies, and overlay network management. The simple and modular organization of this system allows that components can be changed at minimum cost. In addition, the use of history-based policies provides powerful usage semantics concerning the resource management.
在过去的几年里,计算机技术和互联网上的可用资源得到了巨大的发展。这些资源可用于运行需要长时间处理的cpu密集型应用程序。网格系统允许我们利用网络上的可用资源。然而,这些系统给它们的使用带来了一些困难(例如繁重的身份验证和配置管理);为了克服这些问题,点对点系统提供了开放访问,使得任何用户都可以使用网格。我们的解决方案包括一个分布式循环共享平台,它试图将网格和点对点模型结合起来。一个主要目标是允许任何普通用户使用远程空闲周期,以加速商品应用程序。另一方面,用户也可以在不使用机器时提供备用周期。我们的解决方案包含以下功能:应用程序管理、作业创建和调度、资源发现、安全策略和覆盖网络管理。该系统的简单和模块化组织允许以最小的成本更改组件。此外,使用基于历史的策略提供了与资源管理相关的强大使用语义。
{"title":"GridP2P: Resource usage in Grids and Peer-to-Peer systems","authors":"Sérgio Esteves, L. Veiga, P. Ferreira","doi":"10.1109/IPDPSW.2010.5470917","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470917","url":null,"abstract":"The last few years have witnessed huge growth in computer technology and available resources throughout the Internet. These resources can be used to run CPU-intensive applications requiring long periods of processing time. Grid systems allow us to take advantage of available resources lying over a network. However, these systems impose several difficulties to their usage (e.g. heavy authentication and configuration management); in order to overcome them, Peer-to-Peer systems provide open access making the Grid available to any user. Our solution consists of a platform for distributed cycle sharing which attempts to combine Grid and Peer-to-Peer models. A major goal is to allow any ordinary user to use remote idle cycles in order to speedup commodity applications. On the other hand, users can also provide spare cycles of their machines when they are not using them. Our solution encompasses the following functionalities: application management, job creation and scheduling, resource discovery, security policies, and overlay network management. The simple and modular organization of this system allows that components can be changed at minimum cost. In addition, the use of history-based policies provides powerful usage semantics concerning the resource management.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1