首页 > 最新文献

2008 IEEE International Conference on Cluster Computing最新文献

英文 中文
Load-balancing methods for parallel and distributed constraint solving 并行和分布式约束求解的负载平衡方法
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663786
Carl Christian Rolf, K. Kuchcinski
Program parallelization and distribution becomes increasingly important when new multi-core architectures and cheaper cluster technology provide ways to improve performance. Using declarative languages, such as constraint programming, can make the transition to parallelism easier for the programmer. In this paper, we address parallel and distributed search in constraint programming (CP) by proposing several load-balancing methods. We show how these methods improve the execution-time scalability of constraint programs. Scalability is the greatest challenge of parallelism and it is particularly an issue in constraint programming, where load-balancing is difficult. We address this problem by proposing CP-specific load-balancing methods and evaluating them on a cluster by using benchmark problems. Our experimental results show that the methods behave differently well depending on the type of problem and the type of search. This gives the programmer the opportunity to optimize the performance for a particular problem.
当新的多核架构和更便宜的集群技术提供了提高性能的方法时,程序并行化和分布变得越来越重要。使用声明性语言,例如约束编程,可以使程序员更容易地过渡到并行性。本文通过提出几种负载均衡方法来解决约束规划中的并行搜索和分布式搜索问题。我们展示了这些方法如何提高约束程序的执行时可伸缩性。可伸缩性是并行性的最大挑战,在约束编程中尤其如此,因为在约束编程中很难实现负载平衡。我们通过提出特定于cp的负载平衡方法来解决这个问题,并通过使用基准问题在集群上对它们进行评估。我们的实验结果表明,根据问题类型和搜索类型的不同,这些方法表现得不同。这为程序员提供了针对特定问题优化性能的机会。
{"title":"Load-balancing methods for parallel and distributed constraint solving","authors":"Carl Christian Rolf, K. Kuchcinski","doi":"10.1109/CLUSTR.2008.4663786","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663786","url":null,"abstract":"Program parallelization and distribution becomes increasingly important when new multi-core architectures and cheaper cluster technology provide ways to improve performance. Using declarative languages, such as constraint programming, can make the transition to parallelism easier for the programmer. In this paper, we address parallel and distributed search in constraint programming (CP) by proposing several load-balancing methods. We show how these methods improve the execution-time scalability of constraint programs. Scalability is the greatest challenge of parallelism and it is particularly an issue in constraint programming, where load-balancing is difficult. We address this problem by proposing CP-specific load-balancing methods and evaluating them on a cluster by using benchmark problems. Our experimental results show that the methods behave differently well depending on the type of problem and the type of search. This gives the programmer the opportunity to optimize the performance for a particular problem.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"18 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114032720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A novel model for synthesizing parallel I/O workloads in scientific applications 在科学应用中合成并行I/O工作负载的新模型
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663778
D. Feng, Qiang Zou, Hong Jiang, Yifeng Zhu
One of the challenging issues in performance evaluation of parallel storage systems through synthetic-trace-driven simulation is to accurately characterize the I/O demands of data-intensive scientific applications. This paper analyzes several I/O traces collected from different distributed systems and concludes that correlations in parallel I/O inter-arrival times are inconsistent, either with little correlation or with evident and abundant correlations. Thus conventional Poisson or Markov arrival processes are inappropriate to model I/O arrivals in some applications. Instead, a new and generic model based on the alpha-stable process is proposed and validated in this paper to accurately model parallel I/O burstiness in both workloads with little and strong correlations. This model can be used to generate reliable synthetic I/O sequences in simulation studies. Experimental results presented in this paper show that this model can capture the complex I/O behaviors of real storage systems more accurately and faithfully than conventional models, particularly for the burstiness characteristics in the parallel I/O workloads.
如何准确表征数据密集型科学应用的I/O需求,是通过综合轨迹驱动仿真对并行存储系统进行性能评估的挑战之一。本文分析了从不同的分布式系统收集的几个I/O轨迹,得出并行I/O间隔到达时间的相关性不一致的结论,要么相关性很小,要么相关性明显且丰富。因此,在某些应用程序中,传统的泊松或马尔可夫到达过程不适合建模I/O到达。相反,本文提出并验证了一种新的基于α稳定过程的通用模型,该模型可以准确地模拟两种工作负载下具有弱相关性和强相关性的并行I/O突发。该模型可用于仿真研究中生成可靠的综合I/O序列。实验结果表明,与传统模型相比,该模型能够更准确、更真实地捕捉实际存储系统的复杂I/O行为,特别是在并行I/O工作负载下的突发特性。
{"title":"A novel model for synthesizing parallel I/O workloads in scientific applications","authors":"D. Feng, Qiang Zou, Hong Jiang, Yifeng Zhu","doi":"10.1109/CLUSTR.2008.4663778","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663778","url":null,"abstract":"One of the challenging issues in performance evaluation of parallel storage systems through synthetic-trace-driven simulation is to accurately characterize the I/O demands of data-intensive scientific applications. This paper analyzes several I/O traces collected from different distributed systems and concludes that correlations in parallel I/O inter-arrival times are inconsistent, either with little correlation or with evident and abundant correlations. Thus conventional Poisson or Markov arrival processes are inappropriate to model I/O arrivals in some applications. Instead, a new and generic model based on the alpha-stable process is proposed and validated in this paper to accurately model parallel I/O burstiness in both workloads with little and strong correlations. This model can be used to generate reliable synthetic I/O sequences in simulation studies. Experimental results presented in this paper show that this model can capture the complex I/O behaviors of real storage systems more accurately and faithfully than conventional models, particularly for the burstiness characteristics in the parallel I/O workloads.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A comparison of search heuristics for empirical code optimization 经验代码优化的搜索启发式比较
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663803
Keith Seymour, Haihang You, J. Dongarra
This paper describes the application of various search techniques to the problem of automatic empirical code optimization. The search process is a critical aspect of auto-tuning systems because the large size of the search space and the cost of evaluating the candidate implementations makes it infeasible to find the true optimum point by brute force. We evaluate the effectiveness of Nelder-Mead Simplex, Genetic Algorithms, Simulated Annealing, Particle Swarm Optimization, Orthogonal search, and Random search in terms of the performance of the best candidate found under varying time limits.
本文介绍了各种搜索技术在经验代码自动优化问题中的应用。搜索过程是自动调优系统的一个关键方面,因为搜索空间的大小和评估候选实现的成本使得通过蛮力找到真正的最优点是不可行的。我们评估了Nelder-Mead单纯形、遗传算法、模拟退火、粒子群优化、正交搜索和随机搜索在不同时间限制下的最佳候选性能的有效性。
{"title":"A comparison of search heuristics for empirical code optimization","authors":"Keith Seymour, Haihang You, J. Dongarra","doi":"10.1109/CLUSTR.2008.4663803","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663803","url":null,"abstract":"This paper describes the application of various search techniques to the problem of automatic empirical code optimization. The search process is a critical aspect of auto-tuning systems because the large size of the search space and the cost of evaluating the candidate implementations makes it infeasible to find the true optimum point by brute force. We evaluate the effectiveness of Nelder-Mead Simplex, Genetic Algorithms, Simulated Annealing, Particle Swarm Optimization, Orthogonal search, and Random search in terms of the performance of the best candidate found under varying time limits.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123773785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
A HyperTransport-based personal parallel computer 基于hypertransport的个人并行计算机
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663763
Xiaojun Yang, Fei Chen, Hailiang Cheng, Ninghui Sun
Instead of all using commodity components, an approach building a personal parallel computer on top of a non-coherent HyperTransport (HT) fabric is presented in the paper. The advantage is to provide both lower cost and higher performance compared with the existing method. A HT switch is designed and implemented for the interconnection of a set of AMD Opteron processors for building an in-a-box cluster. On our prototyping system, the result of evaluation experiments shows this approach gives the better performance.
本文提出了一种在非相干超传输(HT)结构上构建个人并行计算机的方法,而不是使用所有的商品组件。与现有方法相比,其优点是成本更低,性能更高。设计并实现了一种高温交换机,用于连接一组AMD Opteron处理器,以构建盒内集群。在我们的原型系统上进行了评估实验,结果表明该方法具有更好的性能。
{"title":"A HyperTransport-based personal parallel computer","authors":"Xiaojun Yang, Fei Chen, Hailiang Cheng, Ninghui Sun","doi":"10.1109/CLUSTR.2008.4663763","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663763","url":null,"abstract":"Instead of all using commodity components, an approach building a personal parallel computer on top of a non-coherent HyperTransport (HT) fabric is presented in the paper. The advantage is to provide both lower cost and higher performance compared with the existing method. A HT switch is designed and implemented for the interconnection of a set of AMD Opteron processors for building an in-a-box cluster. On our prototyping system, the result of evaluation experiments shows this approach gives the better performance.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Environmental-aware optimization of MPI checkpointing intervals MPI检查点间隔的环境感知优化
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663790
H. Jitsumoto, Toshio Endo, S. Matsuoka
Fault-tolerance for HPC systems with long-running applications of massive and growing scale is now essential. Although checkpointing with rollback recovery is a popular technique, automated checkpointing is becoming troublesome in a real system, due to the extremely large size of collective application memory. Therefore, automated optimization of the checkpoint interval is essential, but the optimal point depends on hardware failure rates and I/O bandwidth. Our new model and an algorithm, which is an extension of Vaidyapsilas model, solve the problem by taking such parameters into account. Prototype implementation on our fault-tolerant MPI framework ABARIS showed approximately 5.5% improvement over statically user-determined cases.
对于长期运行且规模不断扩大的高性能计算系统来说,容错是必不可少的。虽然带回滚恢复的检查点是一种流行的技术,但是由于应用程序的集体内存非常大,在实际系统中自动检查点变得很麻烦。因此,检查点间隔的自动优化是必要的,但最优点取决于硬件故障率和I/O带宽。我们的新模型和算法是Vaidyapsilas模型的扩展,通过考虑这些参数来解决问题。在我们的容错MPI框架ABARIS上的原型实现比静态用户确定的情况下改进了大约5.5%。
{"title":"Environmental-aware optimization of MPI checkpointing intervals","authors":"H. Jitsumoto, Toshio Endo, S. Matsuoka","doi":"10.1109/CLUSTR.2008.4663790","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663790","url":null,"abstract":"Fault-tolerance for HPC systems with long-running applications of massive and growing scale is now essential. Although checkpointing with rollback recovery is a popular technique, automated checkpointing is becoming troublesome in a real system, due to the extremely large size of collective application memory. Therefore, automated optimization of the checkpoint interval is essential, but the optimal point depends on hardware failure rates and I/O bandwidth. Our new model and an algorithm, which is an extension of Vaidyapsilas model, solve the problem by taking such parameters into account. Prototype implementation on our fault-tolerant MPI framework ABARIS showed approximately 5.5% improvement over statically user-determined cases.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are nonblocking networks really needed for high-end-computing workloads? 高端计算工作负载真的需要非阻塞网络吗?
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663766
N. Desai, P. Balaji, P. Sadayappan, Mohammad Islam
High-speed interconnects are frequently used to provide scalable communication on increasingly large high-end computing systems. Often, these networks are nonblocking, where there exist independent paths between all pairs of nodes in the system allowing for simultaneous communication with zero network contention. This performance, however, comes at a heavy cost as the number of components needed (and hence cost) increases superlinearly with the number of nodes in the system. In this paper, we study the behavior of real and synthetic supercomputer workloads to understand the impact of the networkpsilas nonblocking capability on overall performance. Starting from a fully nonblocking network, we begin by assessing the worse-case performance degradation caused by removing interstage communication links, resulting in over provisioning and hence potentially blocking in the communication network.We also study the impact of several factors on this behavior, including system workloads, multicore processors, and switch crossbar sizes. Our observations show that a significant reduction in the number of interstage links can be tolerated on all of the workloads analyzed, causing less than 5% overall loss of performance.
高速互连经常用于在越来越大的高端计算系统上提供可扩展的通信。通常,这些网络是非阻塞的,在系统中所有对节点之间存在独立的路径,允许在零网络争用的情况下同时通信。然而,这种性能的代价很大,因为所需组件的数量(以及成本)随着系统中节点的数量呈超线性增长。在本文中,我们研究了真实和合成超级计算机工作负载的行为,以了解网络的非阻塞能力对整体性能的影响。从一个完全无阻塞的网络开始,我们首先评估删除级间通信链路导致的最坏情况下的性能下降,这会导致过度供应,从而可能导致通信网络中的阻塞。我们还研究了几个因素对这种行为的影响,包括系统工作负载、多核处理器和交换机交叉条大小。我们的观察表明,在分析的所有工作负载上,级间链路数量的显著减少是可以容忍的,导致不到5%的总体性能损失。
{"title":"Are nonblocking networks really needed for high-end-computing workloads?","authors":"N. Desai, P. Balaji, P. Sadayappan, Mohammad Islam","doi":"10.1109/CLUSTR.2008.4663766","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663766","url":null,"abstract":"High-speed interconnects are frequently used to provide scalable communication on increasingly large high-end computing systems. Often, these networks are nonblocking, where there exist independent paths between all pairs of nodes in the system allowing for simultaneous communication with zero network contention. This performance, however, comes at a heavy cost as the number of components needed (and hence cost) increases superlinearly with the number of nodes in the system. In this paper, we study the behavior of real and synthetic supercomputer workloads to understand the impact of the networkpsilas nonblocking capability on overall performance. Starting from a fully nonblocking network, we begin by assessing the worse-case performance degradation caused by removing interstage communication links, resulting in over provisioning and hence potentially blocking in the communication network.We also study the impact of several factors on this behavior, including system workloads, multicore processors, and switch crossbar sizes. Our observations show that a significant reduction in the number of interstage links can be tolerated on all of the workloads analyzed, causing less than 5% overall loss of performance.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117125342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Active storage using object-based devices 使用基于对象的设备的活动存储
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663810
Tina Miriam John, Anuradharthi Thiruvenkata Ramani, J. Chandy
The increasing performance and decreasing cost of processors and memory are causing system intelligence to move from the CPU to peripherals such as disk drives. Storage system designers are using this trend toward excessive computation capability to perform more complex processing and optimizations directly inside the storage devices. Such kind of optimizations have been performed only at low levels of the storage protocol. Another factor to consider is the current trends in storage density, mechanics, and electronics, which are eliminating the bottleneck encountered while moving data off the media, and putting pressure on interconnects and host processors to move data more efficiently. Previous work on active storage has taken advantage of the extra processing power on individual disk drives to run application-level code. This idea of moving portions of an applicationpsilas processing to run directly at disk drives can dramatically reduce data traffic and take advantage of the parallel storage already present in large systems today. This paper aims at demonstrating active storage on an iSCSI OSD standards-based object oriented framework.
处理器和内存的性能提高和成本降低导致系统智能从CPU转移到外围设备(如磁盘驱动器)。存储系统设计人员正在利用这种过度计算能力的趋势,直接在存储设备内部执行更复杂的处理和优化。这类优化只在存储协议的低层执行。另一个需要考虑的因素是当前存储密度、力学和电子学方面的趋势,这些趋势正在消除将数据移出介质时遇到的瓶颈,并对互连和主机处理器施加压力,以更有效地移动数据。以前关于活动存储的工作利用了单个磁盘驱动器上的额外处理能力来运行应用程序级代码。这种将应用程序处理的部分移动到磁盘驱动器上直接运行的想法可以极大地减少数据流量,并利用当今大型系统中已经存在的并行存储。本文旨在演示基于iSCSI OSD标准的面向对象框架上的主动存储。
{"title":"Active storage using object-based devices","authors":"Tina Miriam John, Anuradharthi Thiruvenkata Ramani, J. Chandy","doi":"10.1109/CLUSTR.2008.4663810","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663810","url":null,"abstract":"The increasing performance and decreasing cost of processors and memory are causing system intelligence to move from the CPU to peripherals such as disk drives. Storage system designers are using this trend toward excessive computation capability to perform more complex processing and optimizations directly inside the storage devices. Such kind of optimizations have been performed only at low levels of the storage protocol. Another factor to consider is the current trends in storage density, mechanics, and electronics, which are eliminating the bottleneck encountered while moving data off the media, and putting pressure on interconnects and host processors to move data more efficiently. Previous work on active storage has taken advantage of the extra processing power on individual disk drives to run application-level code. This idea of moving portions of an applicationpsilas processing to run directly at disk drives can dramatically reduce data traffic and take advantage of the parallel storage already present in large systems today. This paper aims at demonstrating active storage on an iSCSI OSD standards-based object oriented framework.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114575401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Performance models for dynamic tuning of parallel applications on Computational Grids 计算网格上并行应用动态调优的性能模型
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663798
Genaro Costa, Josep Jorba, A. Sikora, T. Margalef, E. Luque
Performance is a main issue in parallel application development. Dynamic tuning is a technique that acts over application parameters to raise execution performance indexes. To perform that, it is necessary to collect measurements, analyze application behavior using a performance model and carry out tuning actions. Computational Grids present proclivity for dynamic changes on their features during application execution. Thus, dynamic tuning tools are indispensable to reach the expected performance indexes on those environments. A particular problem which provokes performance bottlenecks is the load distribution in master/worker applications. This paper addresses the performance modeling of such applications on Computational Grids for the perspective of dynamic tuning. It is inferred that grain size and number of workers are critical parameters to reduce execution time while raising the efficiency of resources usage. A heuristic to dynamically tune granularity and number of workers is proposed. The experimental simulated results of a matrix multiplication application in a heterogeneous Grid environment are shown.
性能是并行应用程序开发中的一个主要问题。动态调优是一种通过应用程序参数提高执行性能指标的技术。为此,有必要收集测量数据,使用性能模型分析应用程序行为,并执行调优操作。计算网格在应用程序执行过程中表现出动态变化特征的倾向。因此,要在这些环境中达到预期的性能指标,动态调优工具是必不可少的。引起性能瓶颈的一个特殊问题是主/辅助应用程序中的负载分配。本文从动态调优的角度出发,讨论了这类应用在计算网格上的性能建模。由此可以推断,粒度和工人数量是减少执行时间和提高资源使用效率的关键参数。提出了一种启发式的动态调整工人粒度和数量的方法。给出了异构网格环境下矩阵乘法应用的实验模拟结果。
{"title":"Performance models for dynamic tuning of parallel applications on Computational Grids","authors":"Genaro Costa, Josep Jorba, A. Sikora, T. Margalef, E. Luque","doi":"10.1109/CLUSTR.2008.4663798","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663798","url":null,"abstract":"Performance is a main issue in parallel application development. Dynamic tuning is a technique that acts over application parameters to raise execution performance indexes. To perform that, it is necessary to collect measurements, analyze application behavior using a performance model and carry out tuning actions. Computational Grids present proclivity for dynamic changes on their features during application execution. Thus, dynamic tuning tools are indispensable to reach the expected performance indexes on those environments. A particular problem which provokes performance bottlenecks is the load distribution in master/worker applications. This paper addresses the performance modeling of such applications on Computational Grids for the perspective of dynamic tuning. It is inferred that grain size and number of workers are critical parameters to reduce execution time while raising the efficiency of resources usage. A heuristic to dynamically tune granularity and number of workers is proposed. The experimental simulated results of a matrix multiplication application in a heterogeneous Grid environment are shown.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123070748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards an understanding of the performance of MPI-IO in Lustre file systems 对Lustre文件系统中MPI-IO性能的理解
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663791
Jeremy S. Logan, P. Dickens
Lustre is becoming an increasingly important file system for large-scale computing clusters. The problem, however, is that many data-intensive applications use MPI-IO for their I/O requirements, and MPI-IO performs poorly in a Lustre file system environment. While this poor performance has been well documented, the reasons for such performance are currently not well understood. Our research suggests that the primary performance issues have to do with the assumptions underpinning most of the parallel I/O optimizations implemented in MPI-IO, which do not appear to hold in a Lustre environment. Perhaps the most important assumption is that optimal performance is obtained by performing large, contiguous I/O operations. However, the research results presented in this poster show that this is often the worst approach to take in a Lustre file system. In fact, we found that the best performance is often achieved when each process performs a series of smaller, non-contiguous I/O requests. In this poster, we provide experimental results supporting these non-intuitive ideas, and provide alternative approaches that significantly enhance the performance of MPI-IO in a Lustre file system.
Lustre正在成为大规模计算集群中越来越重要的文件系统。然而,问题是许多数据密集型应用程序使用MPI-IO来满足它们的I/O需求,而MPI-IO在Lustre文件系统环境中表现不佳。虽然这种糟糕的表现已经有了很好的记录,但造成这种表现的原因目前还没有得到很好的理解。我们的研究表明,主要的性能问题与在MPI-IO中实现的大多数并行I/O优化的假设有关,而这些假设在Lustre环境中似乎并不适用。也许最重要的假设是,通过执行大型、连续的I/O操作可以获得最佳性能。然而,这张海报中的研究结果表明,这通常是在Lustre文件系统中采用的最糟糕的方法。事实上,我们发现,当每个进程执行一系列较小的、不连续的I/O请求时,通常可以实现最佳性能。在这张海报中,我们提供了支持这些非直观想法的实验结果,并提供了显著提高Lustre文件系统中MPI-IO性能的替代方法。
{"title":"Towards an understanding of the performance of MPI-IO in Lustre file systems","authors":"Jeremy S. Logan, P. Dickens","doi":"10.1109/CLUSTR.2008.4663791","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663791","url":null,"abstract":"Lustre is becoming an increasingly important file system for large-scale computing clusters. The problem, however, is that many data-intensive applications use MPI-IO for their I/O requirements, and MPI-IO performs poorly in a Lustre file system environment. While this poor performance has been well documented, the reasons for such performance are currently not well understood. Our research suggests that the primary performance issues have to do with the assumptions underpinning most of the parallel I/O optimizations implemented in MPI-IO, which do not appear to hold in a Lustre environment. Perhaps the most important assumption is that optimal performance is obtained by performing large, contiguous I/O operations. However, the research results presented in this poster show that this is often the worst approach to take in a Lustre file system. In fact, we found that the best performance is often achieved when each process performs a series of smaller, non-contiguous I/O requests. In this poster, we provide experimental results supporting these non-intuitive ideas, and provide alternative approaches that significantly enhance the performance of MPI-IO in a Lustre file system.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115757555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Supporting storage resources in Urgent Computing Environments 支持紧急计算环境下的存储资源
Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663794
Jason Cope, H. Tufo
The special priority and urgent computing environment (SPRUCE) provides on-demand access to high-performance computing resources for time-critical applications. While SPRUCE supports computationally intensive applications, it does not fully support high-priority, data intensive applications. To support data intensive applications in urgent computing environments, we developed the urgent computing environment data resource manager (CEDAR). CEDAR provides storage resource provisioning capabilities that manage the availability and quality of service of storage resources used by urgent computing applications. In this paper, we describe the architecture of CEDAR, illustrate how CEDAR will integrate with urgent computing environments, and evaluate the capabilities of CEDAR in simulated urgent computing environments.
特殊优先级和紧急计算环境(SPRUCE)为时间关键型应用程序提供对高性能计算资源的按需访问。虽然SPRUCE支持计算密集型应用程序,但它并不完全支持高优先级、数据密集型应用程序。为了支持紧急计算环境中的数据密集型应用程序,我们开发了紧急计算环境数据资源管理器(CEDAR)。CEDAR提供存储资源配置功能,用于管理紧急计算应用程序使用的存储资源的可用性和服务质量。在本文中,我们描述了雪松的体系结构,说明了雪松如何与紧急计算环境集成,并评估了雪松在模拟紧急计算环境中的能力。
{"title":"Supporting storage resources in Urgent Computing Environments","authors":"Jason Cope, H. Tufo","doi":"10.1109/CLUSTR.2008.4663794","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663794","url":null,"abstract":"The special priority and urgent computing environment (SPRUCE) provides on-demand access to high-performance computing resources for time-critical applications. While SPRUCE supports computationally intensive applications, it does not fully support high-priority, data intensive applications. To support data intensive applications in urgent computing environments, we developed the urgent computing environment data resource manager (CEDAR). CEDAR provides storage resource provisioning capabilities that manage the availability and quality of service of storage resources used by urgent computing applications. In this paper, we describe the architecture of CEDAR, illustrate how CEDAR will integrate with urgent computing environments, and evaluate the capabilities of CEDAR in simulated urgent computing environments.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128395298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2008 IEEE International Conference on Cluster Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1