首页 > 最新文献

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

英文 中文
High performance multivariate visual data exploration for extremely large data 高性能多变量可视化数据探索,用于超大数据
O. Rübel, Prabhat, Kesheng Wu, H. Childs, J. Meredith, C. Geddes, E. Cormier-Michel, Sean Ahern, G. Weber, P. Messmer, H. Hagen, B. Hamann, E. W. Bethel
One of the central challenges in modern science is the need to quickly derive knowledge and understanding from large, complex collections of data. We present a new approach that deals with this challenge by combining and extending techniques from high performance visual data analysis and scientific data management. This approach is demonstrated within the context of gaining insight from complex, time-varying datasets produced by a laser wakefield accelerator simulation. Our approach leverages histogram-based parallel coordinates for both visual information display as well as a vehicle for guiding a data mining operation. Data extraction and subsetting are implemented with state-of-the-art index/query technology. This approach, while applied here to accelerator science, is generally applicable to a broad set of science applications, and is implemented in a production-quality visual data analysis infrastructure. We conduct a detailed performance analysis and demonstrate good scalability on a distributed memory Cray XT4 system.
现代科学的核心挑战之一是需要从大量复杂的数据中快速获得知识和理解。我们提出了一种新的方法,通过结合和扩展高性能可视化数据分析和科学数据管理技术来应对这一挑战。该方法是在激光尾流场加速器模拟产生的复杂时变数据集中获得洞察力的背景下进行演示的。我们的方法利用基于直方图的并行坐标来显示视觉信息,并作为指导数据挖掘操作的工具。数据提取和子集使用最先进的索引/查询技术实现。此方法虽然应用于加速器科学,但通常适用于广泛的科学应用程序集,并在生产质量的可视化数据分析基础设施中实现。我们进行了详细的性能分析,并在分布式内存Cray XT4系统上展示了良好的可扩展性。
{"title":"High performance multivariate visual data exploration for extremely large data","authors":"O. Rübel, Prabhat, Kesheng Wu, H. Childs, J. Meredith, C. Geddes, E. Cormier-Michel, Sean Ahern, G. Weber, P. Messmer, H. Hagen, B. Hamann, E. W. Bethel","doi":"10.1145/1413370.1413422","DOIUrl":"https://doi.org/10.1145/1413370.1413422","url":null,"abstract":"One of the central challenges in modern science is the need to quickly derive knowledge and understanding from large, complex collections of data. We present a new approach that deals with this challenge by combining and extending techniques from high performance visual data analysis and scientific data management. This approach is demonstrated within the context of gaining insight from complex, time-varying datasets produced by a laser wakefield accelerator simulation. Our approach leverages histogram-based parallel coordinates for both visual information display as well as a vehicle for guiding a data mining operation. Data extraction and subsetting are implemented with state-of-the-art index/query technology. This approach, while applied here to accelerator science, is generally applicable to a broad set of science applications, and is implemented in a production-quality visual data analysis infrastructure. We conduct a detailed performance analysis and demonstrate good scalability on a distributed memory Cray XT4 system.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121795289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
High-radix crossbar switches enabled by Proximity Communication 高基数交叉开关启用近距离通信
H. Eberle, P. García, J. Flich, J. Duato, R. Drost, N. Gura, D. Hopkins, W. Olesinski
We describe a novel way to implement high-radix crossbar switches. Our work is enabled by a new chip interconnect technology called proximity communication (PxC) that offers unparalleled chip IO density. First, we show how a crossbar architecture is topologically mapped onto a PxC-enabled multi-chip module (MCM). Then, we describe a first prototype implementation of a small-scale switch based on a PxC MCM. Finally, we present a performance analysis of two large-scale switch configurations with 288 ports and 1,728 ports, respectively, contrasting a 1-stage PxC-enabled switch and a multi-stage switch using conventional technology. Our simulation results show that (a) arbitration delays in a large 1-stage switch can be considerable, (b) multi-stage switches are extremely susceptible to saturation under non-uniform traffic, a problem that becomes worse for higher radices (1-stage switches, in contrast, are not affected by this problem).
我们描述了一种实现高基数交叉开关的新方法。我们的工作是通过一种新的芯片互连技术,称为接近通信(PxC),提供无与伦比的芯片IO密度。首先,我们展示了如何将横杆架构拓扑映射到支持pxc的多芯片模块(MCM)上。然后,我们描述了基于PxC MCM的小型开关的第一个原型实现。最后,我们对288端口和1728端口的两种大型交换机配置进行了性能分析,对比了1级pxc交换机和使用传统技术的多级交换机。我们的仿真结果表明(a)大型一级交换机的仲裁延迟可能相当大,(b)多级交换机在非均匀流量下极易受到饱和的影响,这个问题在更高的基数下变得更糟(相比之下,一级交换机不受此问题的影响)。
{"title":"High-radix crossbar switches enabled by Proximity Communication","authors":"H. Eberle, P. García, J. Flich, J. Duato, R. Drost, N. Gura, D. Hopkins, W. Olesinski","doi":"10.1109/SC.2008.5219754","DOIUrl":"https://doi.org/10.1109/SC.2008.5219754","url":null,"abstract":"We describe a novel way to implement high-radix crossbar switches. Our work is enabled by a new chip interconnect technology called proximity communication (PxC) that offers unparalleled chip IO density. First, we show how a crossbar architecture is topologically mapped onto a PxC-enabled multi-chip module (MCM). Then, we describe a first prototype implementation of a small-scale switch based on a PxC MCM. Finally, we present a performance analysis of two large-scale switch configurations with 288 ports and 1,728 ports, respectively, contrasting a 1-stage PxC-enabled switch and a multi-stage switch using conventional technology. Our simulation results show that (a) arbitration delays in a large 1-stage switch can be considerable, (b) multi-stage switches are extremely susceptible to saturation under non-uniform traffic, a problem that becomes worse for higher radices (1-stage switches, in contrast, are not affected by this problem).","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121162384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An adaptive cut-off for task parallelism 任务并行性的自适应截止
A. Duran, J. Corbalán, E. Ayguadé
In task parallel languages, an important factor for achieving a good performance is the use of a cut-off technique to reduce the number of tasks created. Using a cut-off to avoid an excessive number of tasks helps the runtime system to reduce the total overhead associated with task creation, particularlt if the tasks are fine grain. Unfortunately, the best cut-off technique its usually dependent on the application structure or even the input data of the application. We propose a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application. This technique does not rely on the programmer to determine the cut-off technique that is best suited for the application. We have implemented this cut-off in the context of the new OpenMP tasking model. Our evaluation, with a variety of applications, shows that our adaptive cut-off is able to make good decisions and most of the time matches the optimal cut-off that could be set by hand by a programmer.
在任务并行语言中,实现良好性能的一个重要因素是使用截止技术来减少创建的任务数量。使用截止值来避免过多的任务,有助于运行时系统减少与任务创建相关的总开销,特别是在任务是细粒度的情况下。不幸的是,最好的截止技术通常依赖于应用程序的结构,甚至是应用程序的输入数据。我们提出了一种新的截止技术,该技术使用在运行时收集的应用程序信息来决定应该修剪哪些任务以提高应用程序的性能。该技术不依赖于程序员来确定最适合应用程序的截止技术。我们已经在新的OpenMP任务模型的上下文中实现了这个截止。我们对各种应用程序的评估表明,我们的自适应截止能够做出良好的决策,并且大多数情况下与程序员手动设置的最佳截止相匹配。
{"title":"An adaptive cut-off for task parallelism","authors":"A. Duran, J. Corbalán, E. Ayguadé","doi":"10.1145/1413370.1413407","DOIUrl":"https://doi.org/10.1145/1413370.1413407","url":null,"abstract":"In task parallel languages, an important factor for achieving a good performance is the use of a cut-off technique to reduce the number of tasks created. Using a cut-off to avoid an excessive number of tasks helps the runtime system to reduce the total overhead associated with task creation, particularlt if the tasks are fine grain. Unfortunately, the best cut-off technique its usually dependent on the application structure or even the input data of the application. We propose a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application. This technique does not rely on the programmer to determine the cut-off technique that is best suited for the application. We have implemented this cut-off in the context of the new OpenMP tasking model. Our evaluation, with a variety of applications, shows that our adaptive cut-off is able to make good decisions and most of the time matches the optimal cut-off that could be set by hand by a programmer.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116807790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
A novel domain oriented approach for scientific Grid workflow composition 面向领域的网格工作流科学合成方法
Jun Qin, T. Fahringer
Existing knowledge based grid workflow languages and composition tools require sophisticated expertise of domain scientists in order to automate the process of managing workflows and its components (activities). So far semantic workflow specification and management has not been addressed from a general and integrated perspective. This paper presents a novel domain oriented approach which features separations of concerns between data meaning and data representation and between activity function (semantic description of workflow activities) and activity type (syntactic description of workflow activities). These separations are implemented as part of abstract grid workflow language (AGWL) which supports the development of grid workflows at a high level (semantic) of abstraction. The corresponding workflow composition tool simplifies grid workflow composition by (i) enabling users to compose grid workflows at the level of data meaning and activity function that shields the complexity of the grid, any specific implementation technology (e.g. Web or Grid service) and any specific data representation, (ii) semi-automatic data flow composition, and (iii) automatic data conversions. We have implemented our approach as part of the ASKALON grid application development and computing environment. We demonstrate the effectiveness of our approach by applying it to a real world meteorology workflow application and report some preliminary results. Our approach can also be adapted to other scientific domains by developing the corresponding ontologies for those domains.
现有的基于知识的网格工作流语言和组合工具需要领域科学家的专业知识来实现工作流及其组件(活动)管理过程的自动化。到目前为止,语义工作流规范和管理还没有从一个通用的和集成的角度来解决。本文提出了一种新的面向领域的方法,该方法将数据含义和数据表示、活动功能(工作流活动的语义描述)和活动类型(工作流活动的语法描述)之间的关注点分离。这些分离是作为抽象网格工作流语言(AGWL)的一部分实现的,AGWL支持在高层次(语义)抽象上开发网格工作流。相应的工作流组合工具通过以下方式简化了网格工作流组合:(1)使用户能够在数据意义和活动功能的层面上组合网格工作流,从而屏蔽网格、任何特定实现技术(例如Web或网格服务)和任何特定数据表示的复杂性;(2)半自动数据流组合;(3)自动数据转换。我们已经将我们的方法作为ASKALON网格应用程序开发和计算环境的一部分来实现。我们通过将其应用于真实世界的气象工作流应用程序来证明我们的方法的有效性,并报告了一些初步结果。通过为其他科学领域开发相应的本体,我们的方法也可以适用于这些领域。
{"title":"A novel domain oriented approach for scientific Grid workflow composition","authors":"Jun Qin, T. Fahringer","doi":"10.1109/SC.2008.5214432","DOIUrl":"https://doi.org/10.1109/SC.2008.5214432","url":null,"abstract":"Existing knowledge based grid workflow languages and composition tools require sophisticated expertise of domain scientists in order to automate the process of managing workflows and its components (activities). So far semantic workflow specification and management has not been addressed from a general and integrated perspective. This paper presents a novel domain oriented approach which features separations of concerns between data meaning and data representation and between activity function (semantic description of workflow activities) and activity type (syntactic description of workflow activities). These separations are implemented as part of abstract grid workflow language (AGWL) which supports the development of grid workflows at a high level (semantic) of abstraction. The corresponding workflow composition tool simplifies grid workflow composition by (i) enabling users to compose grid workflows at the level of data meaning and activity function that shields the complexity of the grid, any specific implementation technology (e.g. Web or Grid service) and any specific data representation, (ii) semi-automatic data flow composition, and (iii) automatic data conversions. We have implemented our approach as part of the ASKALON grid application development and computing environment. We demonstrate the effectiveness of our approach by applying it to a real world meteorology workflow application and report some preliminary results. Our approach can also be adapted to other scientific domains by developing the corresponding ontologies for those domains.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117192729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Scalable load-balance measurement for SPMD codes SPMD代码的可扩展负载平衡测量
T. Gamblin, B. Supinski, M. Schulz, R. Fowler, D. Reed
Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need system-wide, temporally-ordered measurements from full-scale runs. This potentially requires data collection from multiple code regions on all processors over the entire execution. Doing this instrumentation naively can, in combination with the application itself, exceed available I/O bandwidth and storage capacity, and can induce severe behavioral perturbations. We present and evaluate a novel technique for scalable, low-error load balance measurement. This uses a parallel wavelet transform and other parallel encoding methods. We show that our technique collects and reconstructs system-wide measurements with low error. Compression time scales sublinearly with system size and data volume is several orders of magnitude smaller than the raw data. The overhead is low enough for online use in a production environment.
良好的负载平衡对于非常大的并行系统至关重要,但大多数复杂的算法通过域分解中的自适应或使用自适应求解器引入动态不平衡。为了观察和诊断不平衡,开发人员需要从全面运行中进行系统范围的、临时有序的测量。这可能需要在整个执行过程中从所有处理器上的多个代码区域收集数据。天真地执行这种检测,再加上应用程序本身,可能会超出可用的I/O带宽和存储容量,并可能导致严重的行为扰动。我们提出并评估了一种可扩展的、低误差负载平衡测量的新技术。这使用了并行小波变换和其他并行编码方法。我们证明了我们的技术以低误差收集和重建系统范围的测量。压缩时间尺度与系统大小和数据量呈亚线性关系,比原始数据小几个数量级。开销足够低,可以在生产环境中在线使用。
{"title":"Scalable load-balance measurement for SPMD codes","authors":"T. Gamblin, B. Supinski, M. Schulz, R. Fowler, D. Reed","doi":"10.1145/1413370.1413417","DOIUrl":"https://doi.org/10.1145/1413370.1413417","url":null,"abstract":"Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need system-wide, temporally-ordered measurements from full-scale runs. This potentially requires data collection from multiple code regions on all processors over the entire execution. Doing this instrumentation naively can, in combination with the application itself, exceed available I/O bandwidth and storage capacity, and can induce severe behavioral perturbations. We present and evaluate a novel technique for scalable, low-error load balance measurement. This uses a parallel wavelet transform and other parallel encoding methods. We show that our technique collects and reconstructs system-wide measurements with low error. Compression time scales sublinearly with system size and data volume is several orders of magnitude smaller than the raw data. The overhead is low enough for online use in a production environment.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127239826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Proactive process-level live migration in HPC environments HPC环境中的主动流程级实时迁移
Chao Wang, F. Mueller, C. Engelmann, S. Scott
As the number of nodes in high-performance computing environments keeps increasing, faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to massive I/O requirements and relies on manual job resubmission. This work complements reactive with proactive FT at the process level. Through health monitoring, a subset of node failures can be anticipated when one's health deteriorates. A novel process-level live migration mechanism supports continued execution of applications during much of processes migration. This scheme is integrated into an MPI execution environment to transparently sustain health-inflicted node failures, which eradicates the need to restart and requeue MPI jobs. Experiments indicate that 1-6.5 seconds of prior warning are required to successfully trigger live process migration while similar operating system virtualization mechanisms require 13-24 seconds. This self-healing approach complements reactive FT by nearly cutting the number of checkpoints in half when 70% of the faults are handled proactively.
随着高性能计算环境中节点数量的不断增加,故障越来越普遍。由于大量的I/O需求,反应性容错(FT)通常无法扩展,并且依赖于手动作业的重新提交。这项工作补充了过程级的被动FT和主动FT。通过健康监测,当一个人的健康状况恶化时,可以预测节点故障的子集。一种新的流程级实时迁移机制支持在大部分流程迁移期间继续执行应用程序。该方案集成到MPI执行环境中,以透明地维持运行状况导致的节点故障,从而消除了重新启动和重新排队MPI作业的需要。实验表明,成功触发活动进程迁移需要1-6.5秒的预警时间,而类似的操作系统虚拟化机制需要13-24秒。当70%的故障被主动处理时,这种自我修复方法几乎将检查点的数量减少了一半,从而补充了反应性FT。
{"title":"Proactive process-level live migration in HPC environments","authors":"Chao Wang, F. Mueller, C. Engelmann, S. Scott","doi":"10.1109/SC.2008.5222634","DOIUrl":"https://doi.org/10.1109/SC.2008.5222634","url":null,"abstract":"As the number of nodes in high-performance computing environments keeps increasing, faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to massive I/O requirements and relies on manual job resubmission. This work complements reactive with proactive FT at the process level. Through health monitoring, a subset of node failures can be anticipated when one's health deteriorates. A novel process-level live migration mechanism supports continued execution of applications during much of processes migration. This scheme is integrated into an MPI execution environment to transparently sustain health-inflicted node failures, which eradicates the need to restart and requeue MPI jobs. Experiments indicate that 1-6.5 seconds of prior warning are required to successfully trigger live process migration while similar operating system virtualization mechanisms require 13-24 seconds. This self-healing approach complements reactive FT by nearly cutting the number of checkpoints in half when 70% of the faults are handled proactively.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127584752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 173
A dynamic scheduler for balancing HPC applications 用于平衡HPC应用程序的动态调度器
C. Boneti, R. Gioiosa, F. Cazorla, M. Valero
Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for controlling the allocation of processors internal resources. In that work, we applied static, hand-tuned resource allocations to balance HPC applications, providing improvements for benchmarks and real applications. In this paper we propose a dynamic process scheduler for the Linux kernel that automatically and transparently balances HPC applications according to their behavior. We tested our new scheduler on an IBM POWER5 machine, which provides a software-controlled prioritization mechanism that allows us to bias the processor resource allocation. Our experiments show that the scheduler reduces the imbalance of HPC applications, achieving results similar to the ones obtained by hand-tuning the applications (up to 16%). Moreover, our solution reduces the application's execution time combining effect of load balance and high responsive scheduling.
在高性能计算应用中,负载不平衡会导致严重的性能下降。在我们之前的工作中,我们表明现代MT处理器提供了控制处理器内部资源分配的机制,可以缓解负载不平衡。在这项工作中,我们应用静态、手动调优的资源分配来平衡HPC应用程序,为基准测试和实际应用程序提供改进。本文提出了一种基于Linux内核的动态进程调度器,它可以根据HPC应用程序的行为自动、透明地平衡HPC应用程序。我们在IBM POWER5机器上测试了新的调度器,它提供了一种软件控制的优先级机制,允许我们调整处理器资源分配。我们的实验表明,调度器减少了HPC应用程序的不平衡,达到了与手工调优应用程序相似的结果(高达16%)。此外,我们的解决方案结合负载均衡和高响应调度的效果,减少了应用程序的执行时间。
{"title":"A dynamic scheduler for balancing HPC applications","authors":"C. Boneti, R. Gioiosa, F. Cazorla, M. Valero","doi":"10.1145/1413370.1413412","DOIUrl":"https://doi.org/10.1145/1413370.1413412","url":null,"abstract":"Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for controlling the allocation of processors internal resources. In that work, we applied static, hand-tuned resource allocations to balance HPC applications, providing improvements for benchmarks and real applications. In this paper we propose a dynamic process scheduler for the Linux kernel that automatically and transparently balances HPC applications according to their behavior. We tested our new scheduler on an IBM POWER5 machine, which provides a software-controlled prioritization mechanism that allows us to bias the processor resource allocation. Our experiments show that the scheduler reduces the imbalance of HPC applications, achieving results similar to the ones obtained by hand-tuning the applications (up to 16%). Moreover, our solution reduces the application's execution time combining effect of load balance and high responsive scheduling.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131459515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Efficient auction-based grid reservations using dynamic programming 基于动态规划的高效竞价网格预约
A. Mutz, R. Wolski
Auction mechanisms have been proposed as a means to efficiently and fairly schedule jobs in high-performance computing environments. The generalized vickrey auction has long been known to produce efficient allocations while exposing users to truth-revealing incentives, but the algorithms used to compute its payments can be computationally intractable. In this paper we present a novel implementation of the generalized vickrey auction that uses dynamic programming to schedule jobs and compute payments in pseudo-polynomial time. Additionally, we have built a version of the PBS scheduler that uses this algorithm to schedule jobs, and in this paper we present the results of our tests using this scheduler.
拍卖机制被认为是一种在高性能计算环境中高效、公平地调度作业的方法。长期以来,人们都知道广义维克里拍卖可以产生有效的分配,同时让用户接触到揭示真相的激励,但用于计算其支付的算法在计算上可能很难处理。在本文中,我们提出了一种新的广义vickrey拍卖的实现,该实现使用动态规划在伪多项式时间内进行作业调度和支付计算。此外,我们还构建了一个版本的PBS调度器,该调度器使用该算法来调度作业,在本文中,我们给出了使用该调度器的测试结果。
{"title":"Efficient auction-based grid reservations using dynamic programming","authors":"A. Mutz, R. Wolski","doi":"10.1145/1413370.1413387","DOIUrl":"https://doi.org/10.1145/1413370.1413387","url":null,"abstract":"Auction mechanisms have been proposed as a means to efficiently and fairly schedule jobs in high-performance computing environments. The generalized vickrey auction has long been known to produce efficient allocations while exposing users to truth-revealing incentives, but the algorithms used to compute its payments can be computationally intractable. In this paper we present a novel implementation of the generalized vickrey auction that uses dynamic programming to schedule jobs and compute payments in pseudo-polynomial time. Additionally, we have built a version of the PBS scheduler that uses this algorithm to schedule jobs, and in this paper we present the results of our tests using this scheduler.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113964371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Performance prediction of large-scale parallell system and application using macro-level simulation 大型并行系统性能预测及宏观仿真应用
R. Susukita, H. Ando, M. Aoyagi, H. Honda, Y. Inadomi, Koji Inoue, Shigeru Ishizuki, Yasunori Kimura, Hidemi Komatsu, M. Kurokawa, K. Murakami, H. Shibamura, Shuji Yamamura, Yunqing Yu
To predict application performance on an HPC system is an important technology for designing the computing system and developing applications. However, accurate prediction is a challenge, particularly, in the case of a future coming system with higher performance. In this paper, we present a new method for predicting application performance on HPC systems. This method combines modeling of sequential performance on a single processor and macro-level simulations of applications for parallel performance on the entire system. In the simulation, the execution flow is traced but kernel computations are omitted for reducing the execution time. Validation on a real terascale system showed that the predicted and measured performance agreed within 10% to 20 %. We employed the method in designing a hypothetical petascale system of 32768 SIMD-extended processor cores. For predicting application performance on the petascale system, the macro-level simulation required several hours.
高性能计算系统的应用性能预测是设计计算系统和开发应用程序的一项重要技术。然而,准确的预测是一个挑战,特别是在未来即将到来的具有更高性能的系统的情况下。本文提出了一种预测高性能计算系统应用程序性能的新方法。该方法结合了单个处理器上的顺序性能建模和整个系统上并行性能应用程序的宏观级模拟。在仿真中,对执行流程进行了跟踪,但为了减少执行时间,省略了内核计算。在一个真实的万亿级系统上的验证表明,预测和测量的性能在10%到20%的范围内一致。我们采用该方法设计了一个假设的千兆级系统,该系统包含32768个simd扩展处理器内核。为了预测千万亿级系统上的应用程序性能,宏观级模拟需要几个小时。
{"title":"Performance prediction of large-scale parallell system and application using macro-level simulation","authors":"R. Susukita, H. Ando, M. Aoyagi, H. Honda, Y. Inadomi, Koji Inoue, Shigeru Ishizuki, Yasunori Kimura, Hidemi Komatsu, M. Kurokawa, K. Murakami, H. Shibamura, Shuji Yamamura, Yunqing Yu","doi":"10.1145/1413370.1413391","DOIUrl":"https://doi.org/10.1145/1413370.1413391","url":null,"abstract":"To predict application performance on an HPC system is an important technology for designing the computing system and developing applications. However, accurate prediction is a challenge, particularly, in the case of a future coming system with higher performance. In this paper, we present a new method for predicting application performance on HPC systems. This method combines modeling of sequential performance on a single processor and macro-level simulations of applications for parallel performance on the entire system. In the simulation, the execution flow is traced but kernel computations are omitted for reducing the execution time. Validation on a real terascale system showed that the predicted and measured performance agreed within 10% to 20 %. We employed the method in designing a hypothetical petascale system of 32768 SIMD-extended processor cores. For predicting application performance on the petascale system, the macro-level simulation required several hours.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128709576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark 使用参数化的综合基准来描述和预测HPC应用程序的I/O性能
H. Shan, K. Antypas, J. Shalf
The unprecedented parallelism of new supercomputing platforms poses tremendous challenges to achieving scalable performance for I/O intensive applications. Performance assessments using traditional I/O system and component benchmarks are difficult to relate back to application I/O requirements. However, the complexity of full applications motivates development of simpler synthetic I/O benchmarks as proxies to the full application. In this paper we examine the I/O requirements of a range of HPC applications and describe how the LLNL IOR synthetic benchmark was chosen as suitable proxy for the diverse workload. We show a procedure for selecting IOR parameters to match the I/O patterns of the selected applications and show it can accurately predict the I/O performance of the full applications. We conclude that IOR is an effective replacement for full-application I/O benchmarks and can bridge the gap of understanding that typically exists between stand-alone benchmarks and the full applications they intend to model.
新型超级计算平台前所未有的并行性对实现I/O密集型应用程序的可伸缩性能提出了巨大挑战。使用传统I/O系统和组件基准的性能评估很难与应用程序I/O需求联系起来。但是,完整应用程序的复杂性促使开发更简单的综合I/O基准,作为完整应用程序的代理。在本文中,我们研究了一系列HPC应用程序的I/O需求,并描述了如何选择LLNL or合成基准作为适合不同工作负载的代理。我们展示了一个选择IOR参数以匹配所选应用程序的I/O模式的过程,并展示了它可以准确地预测整个应用程序的I/O性能。我们得出结论,IOR是完整应用程序I/O基准测试的有效替代品,可以弥合独立基准测试与它们打算建模的完整应用程序之间通常存在的理解差距。
{"title":"Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark","authors":"H. Shan, K. Antypas, J. Shalf","doi":"10.1145/1413370.1413413","DOIUrl":"https://doi.org/10.1145/1413370.1413413","url":null,"abstract":"The unprecedented parallelism of new supercomputing platforms poses tremendous challenges to achieving scalable performance for I/O intensive applications. Performance assessments using traditional I/O system and component benchmarks are difficult to relate back to application I/O requirements. However, the complexity of full applications motivates development of simpler synthetic I/O benchmarks as proxies to the full application. In this paper we examine the I/O requirements of a range of HPC applications and describe how the LLNL IOR synthetic benchmark was chosen as suitable proxy for the diverse workload. We show a procedure for selecting IOR parameters to match the I/O patterns of the selected applications and show it can accurately predict the I/O performance of the full applications. We conclude that IOR is an effective replacement for full-application I/O benchmarks and can bridge the gap of understanding that typically exists between stand-alone benchmarks and the full applications they intend to model.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134354357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 151
期刊
2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1