首页 > 最新文献

Proceedings of the 2006 ACM/IEEE conference on Supercomputing最新文献

英文 中文
Topologies for improved InfiniBand latency 改善ib延迟的拓扑结构
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188757
Stephen Fried
InfiniBand continues to become more and more important in High Performance Computing world. This talk discusses the impact of DDR (i.e., 20 GigE) switches and HCA's on the creation of low latency, high bandwidth InfiniBand fabrics. When combined with low latency HCA's, such as those from Qlogic, the fabrics discussed can make as much as a 40% reduction in fabric latency, improving the performance of fine grain parallel applications. They also make it possible to create 3 hop low latency fabrics that provide excellent performance that can be used with clusters that have as many as 1009 nodes.There are two approaches to using DDR fabrics to improving latency. The first uses a combination of one, two and three hop fabrics and uses what we call a FasTree topology to create fabrics which can be used to create small clusters (32 to 96 nodes). FasTree's not only have lower latency, but require fewer switch components than fat trees. This does not mean that they have smaller bi-sectional bandwidths, as their links run at twice the speed of an SDR fabric. One of the features of a FasTree, that distinguishes it from other fabrics, is that it does not contain spines. The second fabric, which we call a ThinTree, uses complex single hop spines to link together different fabric sub-domains. Any node in a ThinTree is at most 3 hops away from any other node. There are, however, some compromises required to make it possible to link together up to 1008 nodes without exceeding 3 hops. These compromises result in sub-domains whose intra-domain bandwidth is full CBB while their inter-domain bandwidth typically runs around 40% of CBB. However, because of the 1.8 GB/sec bandwidth of the DDR fabrics, all the connections between any two nodes in a ThinTree fabric are adequate for virtually any HPC application and most others as well. The characteristics of both these topologies are discussed in the talk.
InfiniBand在高性能计算领域变得越来越重要。本次演讲讨论了DDR(即20gige)交换机和HCA对创建低延迟、高带宽InfiniBand结构的影响。当与低延迟HCA(例如来自Qlogic的HCA)结合使用时,所讨论的结构可以将结构延迟减少多达40%,从而提高细粒度并行应用程序的性能。它们还可以创建3跳低延迟结构,提供出色的性能,可用于具有多达1009个节点的集群。有两种方法可以使用DDR结构来改善延迟。第一种是一跳、二跳和三跳结构的组合,并使用我们称之为FasTree的拓扑结构来创建结构,这种结构可用于创建小型集群(32到96个节点)。fasttree不仅具有较低的延迟,而且比胖树需要更少的交换组件。这并不意味着它们具有更小的双向带宽,因为它们的链路运行速度是SDR结构的两倍。FasTree面料区别于其他面料的一个特点是它不含刺。第二种织物,我们称之为ThinTree,它使用复杂的单跳棘将不同的织物子域连接在一起。ThinTree中的任何节点与任何其他节点的距离最多为3跳。但是,要使多达1008个节点在不超过3个跳的情况下连接在一起成为可能,需要做出一些妥协。这些妥协导致子域的域内带宽是完整的CBB,而它们的域间带宽通常运行在CBB的40%左右。然而,由于DDR结构的1.8 GB/秒带宽,ThinTree结构中任意两个节点之间的所有连接几乎足以满足任何HPC应用程序和大多数其他应用程序。讨论了这两种拓扑结构的特点。
{"title":"Topologies for improved InfiniBand latency","authors":"Stephen Fried","doi":"10.1145/1188455.1188757","DOIUrl":"https://doi.org/10.1145/1188455.1188757","url":null,"abstract":"InfiniBand continues to become more and more important in High Performance Computing world. This talk discusses the impact of DDR (i.e., 20 GigE) switches and HCA's on the creation of low latency, high bandwidth InfiniBand fabrics. When combined with low latency HCA's, such as those from Qlogic, the fabrics discussed can make as much as a 40% reduction in fabric latency, improving the performance of fine grain parallel applications. They also make it possible to create 3 hop low latency fabrics that provide excellent performance that can be used with clusters that have as many as 1009 nodes.There are two approaches to using DDR fabrics to improving latency. The first uses a combination of one, two and three hop fabrics and uses what we call a FasTree topology to create fabrics which can be used to create small clusters (32 to 96 nodes). FasTree's not only have lower latency, but require fewer switch components than fat trees. This does not mean that they have smaller bi-sectional bandwidths, as their links run at twice the speed of an SDR fabric. One of the features of a FasTree, that distinguishes it from other fabrics, is that it does not contain spines. The second fabric, which we call a ThinTree, uses complex single hop spines to link together different fabric sub-domains. Any node in a ThinTree is at most 3 hops away from any other node. There are, however, some compromises required to make it possible to link together up to 1008 nodes without exceeding 3 hops. These compromises result in sub-domains whose intra-domain bandwidth is full CBB while their inter-domain bandwidth typically runs around 40% of CBB. However, because of the 1.8 GB/sec bandwidth of the DDR fabrics, all the connections between any two nodes in a ThinTree fabric are adequate for virtually any HPC application and most others as well. The characteristics of both these topologies are discussed in the talk.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125446083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The tera-10 system: implementing the number 1 supercomputer in Europe tera-10系统:实现欧洲排名第一的超级计算机
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188760
Jean-Louis Lahaie
A presentation of the Tera-10 system, the number 1 supercomputer in Europe (and number 5 in the world according to the TOP500® ranking of June 2006), designed and installed by Bull for CEA, France's Atomic Energy Authority. This presentation will cover the different technologies that are at the heart of Tera-10, the overall architecture of the system, as well as the issues that had to be addressed by the implementation team.
介绍了Tera-10系统,这是欧洲排名第一的超级计算机(根据2006年6月的TOP500®排名,世界排名第五),由Bull为CEA,法国原子能管理局设计和安装。本演讲将涵盖Tera-10核心的不同技术,系统的整体架构,以及实施团队必须解决的问题。
{"title":"The tera-10 system: implementing the number 1 supercomputer in Europe","authors":"Jean-Louis Lahaie","doi":"10.1145/1188455.1188760","DOIUrl":"https://doi.org/10.1145/1188455.1188760","url":null,"abstract":"A presentation of the Tera-10 system, the number 1 supercomputer in Europe (and number 5 in the world according to the TOP500® ranking of June 2006), designed and installed by Bull for CEA, France's Atomic Energy Authority. This presentation will cover the different technologies that are at the heart of Tera-10, the overall architecture of the system, as well as the issues that had to be addressed by the implementation team.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122552924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remote inferface control within an access grid environment 访问网格环境中的远程接口控制
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188787
John W. Langkals
Underdevelopment
不发达
{"title":"Remote inferface control within an access grid environment","authors":"John W. Langkals","doi":"10.1145/1188455.1188787","DOIUrl":"https://doi.org/10.1145/1188455.1188787","url":null,"abstract":"Underdevelopment","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126594944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing algorithms on FPGAs using high-level languages and low-level libraries 使用高级语言和低级库在fpga上实现算法
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188614
R. Bruce, Richard Chamberlain, M. Devlin, S. Marshall
Until relatively recently, users of FPGA-based computers have needed electronic-design skills to implement high-performance computing (HPC) algorithms. With the advent of high-level languages for FPGAs it is possible for non-experts in FPGA design to implement algorithms by describing them in a high-level syntax. A natural progression from developing high-level languages is to develop low-level libraries that support them.DIME-C is a high-level language that takes a subset of ANSI C as its input and outputs auto-generated hardware description language (HDL) and pre-synthesised netlists. Within DIME-C, the authors have implemented a math library composed of single-precision, floating-point, elementary functions such as the natural exponential and logarithm. Complex, fully-pipelined algorithms can be described in ANSI-compatible C and implemented on FPGAs, delivering orders of magnitude speed-up over microprocessor implementations. Work is ongoing, expanding the library.The poster will detail project motivations and direction, speedup and resource-use measurements, C-code examples and multi-fpga examples.
直到最近,基于fpga的计算机用户还需要电子设计技能来实现高性能计算(HPC)算法。随着FPGA高级语言的出现,FPGA设计方面的非专家也可以通过用高级语法描述算法来实现算法。开发高级语言的自然过程是开发支持它们的低级库。DIME-C是一种高级语言,它将ANSI C的一个子集作为其输入和输出自动生成的硬件描述语言(HDL)和预合成的网络列表。在DIME-C中,作者实现了一个由单精度、浮点、初等函数(如自然指数和对数)组成的数学库。复杂的、全流水线的算法可以用ansi兼容的C语言描述,并在fpga上实现,比微处理器实现的速度提高了几个数量级。扩建图书馆的工作正在进行中。海报将详细说明项目动机和方向,加速和资源使用测量,c代码示例和多fpga示例。
{"title":"Implementing algorithms on FPGAs using high-level languages and low-level libraries","authors":"R. Bruce, Richard Chamberlain, M. Devlin, S. Marshall","doi":"10.1145/1188455.1188614","DOIUrl":"https://doi.org/10.1145/1188455.1188614","url":null,"abstract":"Until relatively recently, users of FPGA-based computers have needed electronic-design skills to implement high-performance computing (HPC) algorithms. With the advent of high-level languages for FPGAs it is possible for non-experts in FPGA design to implement algorithms by describing them in a high-level syntax. A natural progression from developing high-level languages is to develop low-level libraries that support them.DIME-C is a high-level language that takes a subset of ANSI C as its input and outputs auto-generated hardware description language (HDL) and pre-synthesised netlists. Within DIME-C, the authors have implemented a math library composed of single-precision, floating-point, elementary functions such as the natural exponential and logarithm. Complex, fully-pipelined algorithms can be described in ANSI-compatible C and implemented on FPGAs, delivering orders of magnitude speed-up over microprocessor implementations. Work is ongoing, expanding the library.The poster will detail project motivations and direction, speedup and resource-use measurements, C-code examples and multi-fpga examples.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114112937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Toward a power efficient computer architecture for Barnes-Hut N-body simulations 巴恩斯-胡特n体模拟的节能计算机体系结构
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188607
K. Malkowski, P. Raghavan, M. J. Irwin
Recent improvements in processor performance have been accompanied by increased chip complexity and power consumption, resulting in increased heat dissipation. This has resulted in higher cooling costs and lower reliability. In this paper, we focus on power-aware high performance scientific computing and in particular the Barnes-Hut (BH) code that is used for N-body problems. We show how low power modes of the CPU and caches, and hardware optimizations such as a load miss predictor and data prefetchers enable BH to operate at lower power configurations with out performance degradation. On our optimized processor, power is reduced by 57% and energy is reduced by 58% with no performance penalty using simulations with SimpleScalar and Wattch. Consequently, the energy efficiency of the processor increases by a factor of more than two when compared to the base architecture.
最近处理器性能的改进伴随着芯片复杂性和功耗的增加,导致散热增加。这导致了更高的冷却成本和更低的可靠性。在本文中,我们专注于功率感知的高性能科学计算,特别是用于n体问题的Barnes-Hut (BH)代码。我们展示了CPU和缓存的低功耗模式,以及负载丢失预测器和数据预取器等硬件优化如何使BH能够在低功耗配置下运行而不会导致性能下降。在我们优化的处理器上,使用SimpleScalar和watch进行模拟,功耗降低了57%,能量降低了58%,没有性能损失。因此,与基本架构相比,处理器的能源效率增加了两倍以上。
{"title":"Toward a power efficient computer architecture for Barnes-Hut N-body simulations","authors":"K. Malkowski, P. Raghavan, M. J. Irwin","doi":"10.1145/1188455.1188607","DOIUrl":"https://doi.org/10.1145/1188455.1188607","url":null,"abstract":"Recent improvements in processor performance have been accompanied by increased chip complexity and power consumption, resulting in increased heat dissipation. This has resulted in higher cooling costs and lower reliability. In this paper, we focus on power-aware high performance scientific computing and in particular the Barnes-Hut (BH) code that is used for N-body problems. We show how low power modes of the CPU and caches, and hardware optimizations such as a load miss predictor and data prefetchers enable BH to operate at lower power configurations with out performance degradation. On our optimized processor, power is reduced by 57% and energy is reduced by 58% with no performance penalty using simulations with SimpleScalar and Wattch. Consequently, the energy efficiency of the processor increases by a factor of more than two when compared to the base architecture.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114144560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic data-driven applications systems 动态数据驱动的应用系统
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188458
F. Darema, M. Rotea
The Dynamic Data Driven Applications Systems (DDDAS) concept entails capabilities where application simulations can dynamically accept and respond to field-data and measurements, and/or can control such measurements. This synergistic and symbiotic feedback control-loop between simulations and measurements goes beyond the traditional control systems approaches, and advances applications and measurement approaches, beneficially impacting science and engineering fields, as well as manufacturing, commerce, transportation, hazard prediction/management, medicine, etc. DDDAS environments extend the current computational grids. The multi-agency DDDAS Program Solicitation (www.cise.nsf.gov/dddas) fosters systematically the relevant research areas. NSF, NOAA and NIH, the NSF/OISE and SBIR Offices, and the EU-IST and e-Sciences Programs are cooperating sponsors. This session will consist of a panel of experts, including awardees of DDDAS projects and representatives from funding agencies, and will provide a forum to engage the broader community in open discussion for expanding the opportunities and impact of DDDAS.
动态数据驱动应用系统(DDDAS)概念要求应用程序模拟可以动态地接受和响应现场数据和测量,并且/或者可以控制这些测量。这种模拟和测量之间的协同和共生反馈控制回路超越了传统的控制系统方法,并推进了应用和测量方法,有益地影响了科学和工程领域,以及制造业,商业,交通运输,灾害预测/管理,医学等。DDDAS环境扩展了当前的计算网格。多机构DDDAS项目招标(www.cise.nsf.gov/dddas)系统地促进相关研究领域的发展。NSF、NOAA和NIH、NSF/OISE和SBIR办公室以及EU-IST和e-Sciences计划是合作赞助者。本次会议将由一个专家小组组成,其中包括DDDAS项目的获奖者和资助机构的代表,并将提供一个论坛,让更广泛的社区参与公开讨论,以扩大DDDAS的机会和影响。
{"title":"Dynamic data-driven applications systems","authors":"F. Darema, M. Rotea","doi":"10.1145/1188455.1188458","DOIUrl":"https://doi.org/10.1145/1188455.1188458","url":null,"abstract":"The Dynamic Data Driven Applications Systems (DDDAS) concept entails capabilities where application simulations can dynamically accept and respond to field-data and measurements, and/or can control such measurements. This synergistic and symbiotic feedback control-loop between simulations and measurements goes beyond the traditional control systems approaches, and advances applications and measurement approaches, beneficially impacting science and engineering fields, as well as manufacturing, commerce, transportation, hazard prediction/management, medicine, etc. DDDAS environments extend the current computational grids. The multi-agency DDDAS Program Solicitation (www.cise.nsf.gov/dddas) fosters systematically the relevant research areas. NSF, NOAA and NIH, the NSF/OISE and SBIR Offices, and the EU-IST and e-Sciences Programs are cooperating sponsors. This session will consist of a panel of experts, including awardees of DDDAS projects and representatives from funding agencies, and will provide a forum to engage the broader community in open discussion for expanding the opportunities and impact of DDDAS.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121180049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Understanding our cosmic origin through petascale computing 通过千万亿次计算了解我们的宇宙起源
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188510
A. Mezzacappa
Massive stars die in stellar explosions known as core collapse supernovae. Such supernovae are a dominant source of elements in the Universe and, thus, an important link in our chain of origin from the Big Bang to the present day. Understanding how they occur will require three-dimensional, general relativistic, radiation-magnetohydrodynamics simulations that model the stellar core multifrequency and multiangle neutrino (radiation) transport, fluid instabilities and flow, rotation, magnetic field, and strong gravitational field. Such simulations will require petascale platforms, in turn requiring scalable solution algorithms for the underlying integro-partial differential equations, and a commensurate infrastructure for data management, networking, and visualization that will enable scientific discovery by a geographically distributed team. I will present the current state of the art and discuss near- and longer-term efforts. The ongoing rapid increase in supercomputer capability will allow us to address this Grand Challenge in earnest, in all of its complexity, for the first time.
大质量恒星死于被称为核心坍缩超新星的恒星爆炸。这样的超新星是宇宙元素的主要来源,因此是我们从大爆炸到今天的起源链上的重要一环。要了解它们是如何发生的,需要三维、广义相对论、辐射磁流体动力学模拟,模拟恒星核心的多频率和多角度中微子(辐射)传输、流体不稳定性和流动、旋转、磁场和强引力场。这样的模拟需要千万亿级的平台,反过来又需要可扩展的解算法来解决潜在的积分偏微分方程,以及相应的数据管理、网络和可视化基础设施,这些基础设施将使地理分布的团队能够进行科学发现。我将介绍目前的技术状况,并讨论近期和长期的努力。超级计算机能力的持续快速增长将使我们第一次认真地解决这个大挑战,在所有的复杂性中。
{"title":"Understanding our cosmic origin through petascale computing","authors":"A. Mezzacappa","doi":"10.1145/1188455.1188510","DOIUrl":"https://doi.org/10.1145/1188455.1188510","url":null,"abstract":"Massive stars die in stellar explosions known as core collapse supernovae. Such supernovae are a dominant source of elements in the Universe and, thus, an important link in our chain of origin from the Big Bang to the present day. Understanding how they occur will require three-dimensional, general relativistic, radiation-magnetohydrodynamics simulations that model the stellar core multifrequency and multiangle neutrino (radiation) transport, fluid instabilities and flow, rotation, magnetic field, and strong gravitational field. Such simulations will require petascale platforms, in turn requiring scalable solution algorithms for the underlying integro-partial differential equations, and a commensurate infrastructure for data management, networking, and visualization that will enable scientific discovery by a geographically distributed team. I will present the current state of the art and discuss near- and longer-term efforts. The ongoing rapid increase in supercomputer capability will allow us to address this Grand Challenge in earnest, in all of its complexity, for the first time.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125317824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monitoring trix 监测特利克斯
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188488
Christopher D. Maestas
Monitoring tools have evolved greatly, but when mis-configured they can forget how to do intelligent, non-intrusive and accomplish just-in-time alerts. Non-intrusive collection windows arise during bootup, idle system time, before major workload events and after these events finish. Batch schedulers allow health checking during these opportune times. Most just-in-time alerts arrive via system logs and out-of-band queries that can then trigger appropriate actions. However, abusive out-of-band queries may interrupt normal operational activities.Some vendor and open implementations have been heavyweight in watch-dogging at a brutal cost on computation as systems start scaling to thousands of nodes. Configuring tools to query intelligently during certain opportunities and running only necessary daemons helps to meet monitoring goals. These tools and daemons can include HP's hpasm, Dell's OMSA, supermon, lm_sensors, nagios, ganglia, logsurfer/syslog-ng, torque health checks. Share your monitoring stories and learn how triggers implemented scale to 4000+ node systems.
监控工具已经有了很大的发展,但是如果配置不当,它们可能会忘记如何实现智能、非侵入性和即时警报。非侵入性收集窗口出现在启动、空闲系统时间、主要工作负载事件之前和这些事件结束之后。批调度程序允许在这些适当的时间进行运行状况检查。大多数即时警报通过系统日志和带外查询到达,然后可以触发适当的操作。但是,滥用带外查询可能会中断正常的操作活动。当系统开始扩展到数千个节点时,一些供应商和开放实现在计算上付出了残酷的代价。将工具配置为在某些机会期间智能查询,并只运行必要的守护进程,有助于实现监视目标。这些工具和守护进程可以包括HP的hpasm、Dell的OMSA、supermon、lm_sensors、nagios、ganglia、logsurfer/syslog-ng、扭矩健康检查。分享您的监控故事,并了解如何将触发器实现扩展到4000+节点系统。
{"title":"Monitoring trix","authors":"Christopher D. Maestas","doi":"10.1145/1188455.1188488","DOIUrl":"https://doi.org/10.1145/1188455.1188488","url":null,"abstract":"Monitoring tools have evolved greatly, but when mis-configured they can forget how to do intelligent, non-intrusive and accomplish just-in-time alerts. Non-intrusive collection windows arise during bootup, idle system time, before major workload events and after these events finish. Batch schedulers allow health checking during these opportune times. Most just-in-time alerts arrive via system logs and out-of-band queries that can then trigger appropriate actions. However, abusive out-of-band queries may interrupt normal operational activities.Some vendor and open implementations have been heavyweight in watch-dogging at a brutal cost on computation as systems start scaling to thousands of nodes. Configuring tools to query intelligently during certain opportunities and running only necessary daemons helps to meet monitoring goals. These tools and daemons can include HP's hpasm, Dell's OMSA, supermon, lm_sensors, nagios, ganglia, logsurfer/syslog-ng, torque health checks. Share your monitoring stories and learn how triggers implemented scale to 4000+ node systems.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125095256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Program analysis tools for massively parallel applications: how to achieve highest performance 大规模并行应用程序的程序分析工具:如何实现最高性能
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188687
A. Knüpfer, D. Kranzlmüller, B. Mohr, W. Nagel
Today's HPC environments are increasingly complex in order to achieve highest performance. Hardware platforms introduce features like out-of-order execution, multi-level caches, multi-cores, non-uniform memory access etc. Application software combines OpenMP, MPI, optimized libraries and various types of compiler optimization to exploit potential performance.To reach a reasonable percentage of the theoretical peak performance, three fundamental steps need to be accomplished. First, correctness must be guaranteed especially during the course of optimization. Second, the actual performance achieved needs to be determined. In particular the contributions/limitations of all sub-systems involved (CPU, memory, network, I/O) have to be identified. Third, actual optimization can only be successful with the previously obtained knowledge.Those steps are by no means trivial. There are sophisticated tools beyond simple profiling to support the HPC user. The tutorial introduces a variety of such tools: it shows how they play together and how they scale with long-running massively parallel cases.
为了实现最高性能,当今的HPC环境变得越来越复杂。硬件平台引入了乱序执行、多级缓存、多核、非统一内存访问等特性。应用软件结合了OpenMP、MPI、优化库和各种类型的编译器优化来开发潜在的性能。要达到理论峰值性能的合理百分比,需要完成三个基本步骤。首先,必须保证正确性,特别是在优化过程中。第二,实际取得的业绩需要确定。特别是所有涉及到的子系统(CPU、内存、网络、I/O)的贡献/限制必须被确定。第三,实际的优化只有在已有知识的基础上才能成功。这些步骤绝不是微不足道的。除了简单的分析之外,还有一些复杂的工具来支持HPC用户。本教程介绍了各种这样的工具:它展示了它们如何一起工作,以及它们如何与长时间运行的大规模并行用例进行扩展。
{"title":"Program analysis tools for massively parallel applications: how to achieve highest performance","authors":"A. Knüpfer, D. Kranzlmüller, B. Mohr, W. Nagel","doi":"10.1145/1188455.1188687","DOIUrl":"https://doi.org/10.1145/1188455.1188687","url":null,"abstract":"Today's HPC environments are increasingly complex in order to achieve highest performance. Hardware platforms introduce features like out-of-order execution, multi-level caches, multi-cores, non-uniform memory access etc. Application software combines OpenMP, MPI, optimized libraries and various types of compiler optimization to exploit potential performance.To reach a reasonable percentage of the theoretical peak performance, three fundamental steps need to be accomplished. First, correctness must be guaranteed especially during the course of optimization. Second, the actual performance achieved needs to be determined. In particular the contributions/limitations of all sub-systems involved (CPU, memory, network, I/O) have to be identified. Third, actual optimization can only be successful with the previously obtained knowledge.Those steps are by no means trivial. There are sophisticated tools beyond simple profiling to support the HPC user. The tutorial introduces a variety of such tools: it shows how they play together and how they scale with long-running massively parallel cases.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122512929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
All in a day's work: advancing data-intensive research with the data capacitor 所有在一天的工作:推进数据密集型研究与数据电容器
Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188711
Stephen C. Simms, M. Davy, B. Hammond, Matthew R. Link, C. Stewart, R. Bramley, Beth Plale, Dennis Gannon, M. Baik, S. Teige, J. Huffman, Rick McMullen, Douglas A. Balog, Gregory G. Pike
Indiana University provides powerful compute, storage, and network resources to a diverse local and national research community every day. IU's facilities have been used to support data-intensive applications ranging from digital humanities to computational biology.For this year's bandwidth challenge, several IU researchers will conduct experiments from the exhibit floor utilizing the resources that University Information Technology Services currently provides.Using IU's newly constructed 535 TB Data Capacitor and an additional component installed on the exhibit floor, we will use Lustre across the wide area network to simultaneously facilitate dynamic weather modeling, protein analysis, instrument data capture, and the production, storage, and analysis of simulation data.
印第安纳大学每天为不同的地方和国家研究社区提供强大的计算、存储和网络资源。印第安纳大学的设施已被用于支持从数字人文学科到计算生物学的数据密集型应用。在今年的带宽挑战中,几位IU研究人员将利用大学信息技术服务目前提供的资源在展厅进行实验。使用IU新建的535 TB数据电容器和安装在展览地板上的附加组件,我们将在广域网上使用Lustre,同时促进动态天气建模,蛋白质分析,仪器数据捕获以及模拟数据的生产,存储和分析。
{"title":"All in a day's work: advancing data-intensive research with the data capacitor","authors":"Stephen C. Simms, M. Davy, B. Hammond, Matthew R. Link, C. Stewart, R. Bramley, Beth Plale, Dennis Gannon, M. Baik, S. Teige, J. Huffman, Rick McMullen, Douglas A. Balog, Gregory G. Pike","doi":"10.1145/1188455.1188711","DOIUrl":"https://doi.org/10.1145/1188455.1188711","url":null,"abstract":"Indiana University provides powerful compute, storage, and network resources to a diverse local and national research community every day. IU's facilities have been used to support data-intensive applications ranging from digital humanities to computational biology.For this year's bandwidth challenge, several IU researchers will conduct experiments from the exhibit floor utilizing the resources that University Information Technology Services currently provides.Using IU's newly constructed 535 TB Data Capacitor and an additional component installed on the exhibit floor, we will use Lustre across the wide area network to simultaneously facilitate dynamic weather modeling, protein analysis, instrument data capture, and the production, storage, and analysis of simulation data.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134028262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1