首页 > 最新文献

2012 SC Companion: High Performance Computing, Networking Storage and Analysis最新文献

英文 中文
End-User Driven Technology Benchmarks Based on Market-Risk Workloads 基于市场风险负载的最终用户驱动的技术基准
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.141
P. Lankford, L. Ericson, Andrey Nikolaev
Market risk management is a critical, resourceintensive task for financial trading firms. The industry relies heavily on innovation in technical infrastructure to increase the quality and quantity of risk management information and to reduce the cost of its production. However, until recently, the industry has lacked an independent standard for gauging the potential of new technologies to help. This changed when the STAC BenchmarkTM Council developed STAC-A2TM, a vendorindependent benchmark suite based on real-world market risk analysis workloads. It was specified by trading firms and made actionable by leading HPC vendors. Unlike vendor-developed benchmarks known to the authors, STAC-A2 satisfies all of the requirements important to end-user firms: relevance, neutrality, scalability, and completeness. Intel has demonstrated the utility of STAC-A2 for comparing successive generations of Intel® Xeon® processors.
市场风险管理对金融交易公司来说是一项关键的资源密集型任务。该行业严重依赖技术基础设施的创新,以提高风险管理信息的质量和数量,并降低其生产成本。然而,直到最近,该行业还缺乏一个独立的标准来衡量新技术的帮助潜力。当STAC BenchmarkTM委员会开发了基于真实市场风险分析工作负载的独立于供应商的基准套件STAC- a2tm后,这种情况发生了变化。它由贸易公司指定,并由领先的HPC供应商执行。与作者已知的供应商开发的基准不同,STAC-A2满足最终用户公司的所有重要需求:相关性、中立性、可伸缩性和完整性。英特尔已经展示了用于比较连续几代英特尔®至强®处理器的STAC-A2的实用性。
{"title":"End-User Driven Technology Benchmarks Based on Market-Risk Workloads","authors":"P. Lankford, L. Ericson, Andrey Nikolaev","doi":"10.1109/SC.Companion.2012.141","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.141","url":null,"abstract":"Market risk management is a critical, resourceintensive task for financial trading firms. The industry relies heavily on innovation in technical infrastructure to increase the quality and quantity of risk management information and to reduce the cost of its production. However, until recently, the industry has lacked an independent standard for gauging the potential of new technologies to help. This changed when the STAC BenchmarkTM Council developed STAC-A2TM, a vendorindependent benchmark suite based on real-world market risk analysis workloads. It was specified by trading firms and made actionable by leading HPC vendors. Unlike vendor-developed benchmarks known to the authors, STAC-A2 satisfies all of the requirements important to end-user firms: relevance, neutrality, scalability, and completeness. Intel has demonstrated the utility of STAC-A2 for comparing successive generations of Intel® Xeon® processors.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91206941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards Energy Efficient Data Intensive Computing Using IEEE 802.3az 使用IEEE 802.3az实现节能数据密集型计算
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.112
Dimitar Pavlov, Joris Soeurt, P. Grosso, Zhiming Zhao, K. V. D. Veldt, Hao Zhu, C. D. Laat
Energy efficiency is an increasingly important requirement for computing and communication systems, especially with their increasing pervasiveness. The IEEE 802.3az protocol reduces the network energy consumption by turning active copper Ethernet links to a low power model when no traffic exists. However, the effect of 802.3az heavily depends on the network traffic patterns which makes system level energy optimization challenging. In clusters, distributed data intensive applications that generate heavy network traffic are common, and in turn the required network devices can consume large amounts of energy. In this research, we examined the 802.3az technology with the goal of applying it in clusters. We defined an energy budget calculator that takes energy-efficient Ethernet into account by including the energy models derived from tests of 802.3az enabled devices. The calculator is an integral tool in a global strategy to optimize the energy usage of applications in a high performance computing environment. We show a few practical examples of how real applications can better plan their execution by integrating this knowledge in their decision strategies.
能源效率是计算和通信系统日益重要的要求,特别是随着它们的日益普及。IEEE 802.3az协议通过在没有流量的情况下将活跃的铜以太网链路转换为低功耗模式来降低网络能耗。然而,802.3az的效果严重依赖于网络流量模式,这使得系统级能量优化具有挑战性。在集群中,产生大量网络流量的分布式数据密集型应用程序很常见,而所需的网络设备又会消耗大量的能量。在本研究中,我们研究了802.3az技术,目标是将其应用于集群中。我们定义了一个能源预算计算器,通过包括从支持802.3az的设备的测试中得出的能源模型,将节能以太网考虑在内。计算器是优化高性能计算环境中应用程序的能源使用的全局策略中不可或缺的工具。我们将展示一些实际示例,说明实际应用程序如何通过将这些知识集成到其决策策略中来更好地规划其执行。
{"title":"Towards Energy Efficient Data Intensive Computing Using IEEE 802.3az","authors":"Dimitar Pavlov, Joris Soeurt, P. Grosso, Zhiming Zhao, K. V. D. Veldt, Hao Zhu, C. D. Laat","doi":"10.1109/SC.Companion.2012.112","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.112","url":null,"abstract":"Energy efficiency is an increasingly important requirement for computing and communication systems, especially with their increasing pervasiveness. The IEEE 802.3az protocol reduces the network energy consumption by turning active copper Ethernet links to a low power model when no traffic exists. However, the effect of 802.3az heavily depends on the network traffic patterns which makes system level energy optimization challenging. In clusters, distributed data intensive applications that generate heavy network traffic are common, and in turn the required network devices can consume large amounts of energy. In this research, we examined the 802.3az technology with the goal of applying it in clusters. We defined an energy budget calculator that takes energy-efficient Ethernet into account by including the energy models derived from tests of 802.3az enabled devices. The calculator is an integral tool in a global strategy to optimize the energy usage of applications in a high performance computing environment. We show a few practical examples of how real applications can better plan their execution by integrating this knowledge in their decision strategies.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86001164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Abstract: Towards Highly Accurate Large-Scale Ab Initio Calculations Using Fragment Molecular Orbital Method in GAMESS 利用GAMESS中的片段分子轨道法实现高精度大规模从头算
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.170
Maricris L. Mayes, G. Fletcher, M. Gordon
Summary form only given. One of the major challenges of modern quantum chemistry (QC) is to apply it to large systems with thousands of correlated electrons and basis functions. The availability of supercomputers and development of novel methods are necessary to realize this challenge. In particular, we employ linear scaling Fragment Molecular Orbital (FMO) method which decompose the large system into smaller, localized fragments which can be treated with high-level QC method like MP2. FMO is inherently scalable since the individual fragment calculations can be carried out simultaneously on separate processor groups. It is implemented in GAMESS, a popular ab-initio QC program. We present the scalability and performance of FMO on Intrepid (Blue Gene/P) and Blue Gene/Q systems at ALCF.
只提供摘要形式。现代量子化学(QC)的主要挑战之一是将其应用于具有数千个相关电子和基函数的大型系统。要实现这一挑战,超级计算机的可用性和新方法的发展是必要的。特别是,我们采用线性缩放片段分子轨道(FMO)方法,将大系统分解成更小的局部片段,可以用MP2等高级QC方法处理。FMO具有固有的可扩展性,因为单个片段计算可以在单独的处理器组上同时进行。它是在GAMESS中实现的,GAMESS是一个流行的从头算QC程序。我们在ALCF的Intrepid (Blue Gene/P)和Blue Gene/Q系统上展示了FMO的可扩展性和性能。
{"title":"Abstract: Towards Highly Accurate Large-Scale Ab Initio Calculations Using Fragment Molecular Orbital Method in GAMESS","authors":"Maricris L. Mayes, G. Fletcher, M. Gordon","doi":"10.1109/SC.Companion.2012.170","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.170","url":null,"abstract":"Summary form only given. One of the major challenges of modern quantum chemistry (QC) is to apply it to large systems with thousands of correlated electrons and basis functions. The availability of supercomputers and development of novel methods are necessary to realize this challenge. In particular, we employ linear scaling Fragment Molecular Orbital (FMO) method which decompose the large system into smaller, localized fragments which can be treated with high-level QC method like MP2. FMO is inherently scalable since the individual fragment calculations can be carried out simultaneously on separate processor groups. It is implemented in GAMESS, a popular ab-initio QC program. We present the scalability and performance of FMO on Intrepid (Blue Gene/P) and Blue Gene/Q systems at ALCF.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86007733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization 负载均衡并行GPU核外连续LOD模型可视化
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.37
Chao Peng, Peng Mi, Yong Cao
Rendering massive 3D models has been recognized as a challenging task. Due to the limited size of GPU memory, a massive model with hundreds of millions of primitives cannot fit into most of modern GPUs. By applying parallel Level-Of-Detail (LOD), as proposed in [1], transferring only a portion of primitives rather than the whole to the GPU is sufficient for generating a desired simplified version of the model. However, the low bandwidth in CPU-GPU communication make data-transferring a very time-consuming process that prevents users from achieving high-performance rendering of massive 3D models on a single-GPU system. This paper explores a device-level parallel design that distributes the workloads in a multi-GPU multi-display system. Our multi-GPU out-of-core uses a load-balancing method and seamlessly integrates with the parallel LOD algorithm. Our experiments show highly interactive frame rates of the “Boeing 777” airplane model that consists of over 332 million triangles and over 223 million vertices.
渲染大量3D模型被认为是一项具有挑战性的任务。由于GPU内存的大小有限,具有数亿个原语的大型模型无法适应大多数现代GPU。通过应用[1]中提出的并行细节级(LOD),仅将原语的一部分而不是全部传输到GPU,就足以生成所需的模型简化版本。然而,CPU-GPU通信的低带宽使得数据传输成为一个非常耗时的过程,这阻碍了用户在单gpu系统上实现大规模3D模型的高性能渲染。本文探讨了一种在多gpu多显示系统中分配工作负载的设备级并行设计。我们的多gpu外核使用负载平衡方法,并与并行LOD算法无缝集成。我们的实验显示了“波音777”飞机模型的高交互帧率,该模型由超过3.32亿个三角形和超过2.23亿个顶点组成。
{"title":"Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization","authors":"Chao Peng, Peng Mi, Yong Cao","doi":"10.1109/SC.Companion.2012.37","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.37","url":null,"abstract":"Rendering massive 3D models has been recognized as a challenging task. Due to the limited size of GPU memory, a massive model with hundreds of millions of primitives cannot fit into most of modern GPUs. By applying parallel Level-Of-Detail (LOD), as proposed in [1], transferring only a portion of primitives rather than the whole to the GPU is sufficient for generating a desired simplified version of the model. However, the low bandwidth in CPU-GPU communication make data-transferring a very time-consuming process that prevents users from achieving high-performance rendering of massive 3D models on a single-GPU system. This paper explores a device-level parallel design that distributes the workloads in a multi-GPU multi-display system. Our multi-GPU out-of-core uses a load-balancing method and seamlessly integrates with the parallel LOD algorithm. Our experiments show highly interactive frame rates of the “Boeing 777” airplane model that consists of over 332 million triangles and over 223 million vertices.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81349994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Poster: Portals 4 Network Programming Interface 海报:门户4网络编程接口
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.264
Brian W. Barrett, R. Brightwell, K. Underwood, K. Hemmert
Portals 4 is an advanced network programming interface which allows for the development of a rich set of upper layer protocols. By careful selection of interfaces and strong progress guarantees, Portals 4 is able to support multiple protocols without significant overhead. Recent developments with Portals 4, including development of MPI, SHMEM, and GASNet protocols are discussed.
Portals 4是一种高级网络编程接口,它允许开发一组丰富的上层协议。通过仔细选择接口和强大的进度保证,portal 4能够在没有显著开销的情况下支持多个协议。讨论了portal 4的最新发展,包括MPI、SHMEM和GASNet协议的发展。
{"title":"Poster: Portals 4 Network Programming Interface","authors":"Brian W. Barrett, R. Brightwell, K. Underwood, K. Hemmert","doi":"10.1109/SC.Companion.2012.264","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.264","url":null,"abstract":"Portals 4 is an advanced network programming interface which allows for the development of a rich set of upper layer protocols. By careful selection of interfaces and strong progress guarantees, Portals 4 is able to support multiple protocols without significant overhead. Recent developments with Portals 4, including development of MPI, SHMEM, and GASNet protocols are discussed.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81830428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials Project 社区可访问的高通量计算数据存储:来自材料项目的经验
Pub Date : 2012-11-10 DOI: 10.1109/SC.COMPANION.2012.150
D. Gunter, S. Cholia, Anubhav Jain, M. Kocher, K. Persson, L. Ramakrishnan, S. Ong, G. Ceder
Efforts such as the Human Genome Project provided a dramatic example of opening scientific datasets to the community. Making high quality scientific data accessible through an online database allows scientists around the world to multiply the value of that data through scientific innovations. Similarly, the goal of the Materials Project is to calculate physical properties of all known inorganic materials and make this data freely available, with the goal of accelerating to invention of better materials. However, the complexity of scientific data, and the complexity of the simulations needed to generate and analyze it, pose challenges to current software ecosystem. In this paper, we describe the approach we used in the Materials Project to overcome these challenges and create and disseminate a high quality database of materials properties computed by solving the basic laws of physics. Our infrastructure requires a novel combination of highthroughput approaches with broadly applicable and scalable approaches to data storage and dissemination.
人类基因组计划等努力为向社会开放科学数据集提供了一个引人注目的例子。通过在线数据库提供高质量的科学数据,使世界各地的科学家能够通过科学创新使这些数据的价值成倍增加。同样,材料项目的目标是计算所有已知无机材料的物理性质,并使这些数据免费提供,以加速发明更好的材料。然而,科学数据的复杂性,以及生成和分析这些数据所需的模拟的复杂性,给当前的软件生态系统带来了挑战。在本文中,我们描述了我们在材料项目中使用的方法,以克服这些挑战,并通过解决基本物理定律来创建和传播高质量的材料属性数据库。我们的基础设施需要高吞吐量方法与广泛适用和可扩展的数据存储和传播方法的新颖组合。
{"title":"Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials Project","authors":"D. Gunter, S. Cholia, Anubhav Jain, M. Kocher, K. Persson, L. Ramakrishnan, S. Ong, G. Ceder","doi":"10.1109/SC.COMPANION.2012.150","DOIUrl":"https://doi.org/10.1109/SC.COMPANION.2012.150","url":null,"abstract":"Efforts such as the Human Genome Project provided a dramatic example of opening scientific datasets to the community. Making high quality scientific data accessible through an online database allows scientists around the world to multiply the value of that data through scientific innovations. Similarly, the goal of the Materials Project is to calculate physical properties of all known inorganic materials and make this data freely available, with the goal of accelerating to invention of better materials. However, the complexity of scientific data, and the complexity of the simulations needed to generate and analyze it, pose challenges to current software ecosystem. In this paper, we describe the approach we used in the Materials Project to overcome these challenges and create and disseminate a high quality database of materials properties computed by solving the basic laws of physics. Our infrastructure requires a novel combination of highthroughput approaches with broadly applicable and scalable approaches to data storage and dissemination.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90792496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-Based Systems PEPPHER组合工具:基于gpu系统的应用程序的性能感知动态组合
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.97
Usman Dastgeer, Lu Li, C. Kessler
The PEPPHER component model defines an environment for annotation of native C/C++ based components for homogeneous and heterogeneous multicore and manycore systems, including GPU and multi-GPU based systems. For the same computational functionality, captured as a component, different sequential and explicitly parallel implementation variants using various types of execution units might be provided, together with metadata such as explicitly exposed tunable parameters. The goal is to compose an application from its components and variants such that, depending on the run-time context, the most suitable implementation variant will be chosen automatically for each invocation. We describe and evaluate the PEPPHER composition tool, which explores the application's components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code. With several applications, we demonstrate how the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath.
PEPPHER组件模型为同质和异构多核和多核系统(包括GPU和基于多GPU的系统)定义了一个注释本地基于C/ c++的组件的环境。对于作为组件捕获的相同计算功能,可能会提供使用各种类型的执行单元的不同顺序和显式并行实现变体,以及显式公开的可调参数等元数据。目标是由组件和变体组成应用程序,以便根据运行时上下文自动为每个调用选择最合适的实现变体。我们描述和评估PEPPHER组合工具,它探索应用程序的组件及其实现变体,生成与运行时系统交互的必要的低级代码,并协调各种代码单元的本地编译和链接,以组成整个应用程序代码。通过几个应用程序,我们演示了组合工具如何提供高级编程前端,同时有效地利用底层基于任务的PEPPHER运行时系统(StarPU)。
{"title":"The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-Based Systems","authors":"Usman Dastgeer, Lu Li, C. Kessler","doi":"10.1109/SC.Companion.2012.97","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.97","url":null,"abstract":"The PEPPHER component model defines an environment for annotation of native C/C++ based components for homogeneous and heterogeneous multicore and manycore systems, including GPU and multi-GPU based systems. For the same computational functionality, captured as a component, different sequential and explicitly parallel implementation variants using various types of execution units might be provided, together with metadata such as explicitly exposed tunable parameters. The goal is to compose an application from its components and variants such that, depending on the run-time context, the most suitable implementation variant will be chosen automatically for each invocation. We describe and evaluate the PEPPHER composition tool, which explores the application's components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code. With several applications, we demonstrate how the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84090326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Towards Improving the Communication Performance of CRESTA's Co-Design Application NEK5000 提高CRESTA协同设计应用NEK5000的通信性能
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.92
Michael Schliephake, E. Laure
In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project “Collaborative Research into Exascale Systemware, Tools & Applications” (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.
为了达到百亿亿次的性能,应用程序和系统软件的各个方面都需要进行分析和潜在的改进。欧盟FP7项目“对Exascale系统软件、工具和应用程序的合作研究”(CRESTA)使用先进仿真应用程序和系统软件以及相关开发工具的共同设计作为其迈向Exascale方法的关键要素。在本文中,我们介绍了使用高度可扩展的应用程序NEK5000进行协同设计活动的第一批结果。我们分析了NEK5000的通信结构,并提出了新的、优化的集体通信操作,这将有助于提高NEK5000的性能,并为未来HPC系统中数百万核的使用做好准备。延迟优化的通信操作在其他环境中也可以是有益的,例如,我们期望它们成为提供动态负载平衡的运行时系统的重要构建块,也在CRESTA中开发。
{"title":"Towards Improving the Communication Performance of CRESTA's Co-Design Application NEK5000","authors":"Michael Schliephake, E. Laure","doi":"10.1109/SC.Companion.2012.92","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.92","url":null,"abstract":"In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project “Collaborative Research into Exascale Systemware, Tools & Applications” (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87902707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Poster: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation 海报:自动适应混合精度浮点计算程序
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.232
Michael O. Lam, B. Supinski, M. LeGendre, J. Hollingsworth
As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. We present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. Initial results with the Algebraic MultiGrid kernel demonstrate a nearly 2χ speedup.
随着科学计算的不断扩展,浮点算术处理器的有效使用至关重要。较低的精度允许流架构每秒执行更多的操作,并且可以减少所有架构的内存带宽压力。然而,对于给定的算法和数据集,使用过低的精度会导致不准确的结果。我们提出了一个框架,该框架使用二进制工具和修改来构建混合精度配置的现有二进制文件,这些文件最初仅使用双精度。使用algeaic MultiGrid内核的初步结果显示了近2χ的加速。
{"title":"Poster: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation","authors":"Michael O. Lam, B. Supinski, M. LeGendre, J. Hollingsworth","doi":"10.1109/SC.Companion.2012.232","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.232","url":null,"abstract":"As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. We present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. Initial results with the Algebraic MultiGrid kernel demonstrate a nearly 2χ speedup.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88408077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing 摘要:利用PEPPHER技术实现高性能便携式超级计算
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.212
C. Kessler, Usman Dastgeer, M. Majeed, N. Furmento, Samuel Thibault, R. Namyst, S. Benkner, Sabri Pllana, J. Träff, Martin Wimmer
PEPPHER is a 3-year EU FP7 project that develops a novel approach and framework to enhance performance portability and programmability of heterogeneous multi-core systems. Its primary target is single-node heterogeneous systems, where several CPU cores are supported by accelerators such as GPUs. This poster briefly surveys the PEPPHER framework for single-node systems, and elaborates on the prospectives for leveraging the PEPPHER approach to generate performance-portable code for heterogeneous multi-node systems.
PEPPHER是一个为期3年的欧盟FP7项目,旨在开发一种新的方法和框架,以增强异构多核系统的性能可移植性和可编程性。它的主要目标是单节点异构系统,其中由gpu等加速器支持多个CPU内核。这张海报简要介绍了单节点系统的PEPPHER框架,并详细阐述了利用PEPPHER方法为异构多节点系统生成性能可移植代码的前景。
{"title":"Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing","authors":"C. Kessler, Usman Dastgeer, M. Majeed, N. Furmento, Samuel Thibault, R. Namyst, S. Benkner, Sabri Pllana, J. Träff, Martin Wimmer","doi":"10.1109/SC.Companion.2012.212","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.212","url":null,"abstract":"PEPPHER is a 3-year EU FP7 project that develops a novel approach and framework to enhance performance portability and programmability of heterogeneous multi-core systems. Its primary target is single-node heterogeneous systems, where several CPU cores are supported by accelerators such as GPUs. This poster briefly surveys the PEPPHER framework for single-node systems, and elaborates on the prospectives for leveraging the PEPPHER approach to generate performance-portable code for heterogeneous multi-node systems.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86603215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2012 SC Companion: High Performance Computing, Networking Storage and Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1