首页 > 最新文献

ACM/IEEE SC 2000 Conference (SC'00)最新文献

英文 中文
Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling 离散元素建模中混合消息传递和共享内存并行性的性能研究
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10005
D. Henty
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP clusters. To address this we study an algorithm from Discrete Element Modeling, parallelised using both the message-passing and shared-memory models simultaneously ("hybrid" parallelisation). The natural load-balancing methods are different in the two parallel models, the shared-memory method being in principle more efficient for very load-imbalanced problems. It is therefore possible that hybrid parallelism will be beneficial on SMP clusters. We benchmark MPI and OpenMP implementations of the algorithm on MPP, SMP and cluster architectures, and evaluate the effectiveness of hybrid parallelism. Although we observe cases where OpenMP is more efficient than MPI on a single SMP node, we conclude that our current OpenMP implementation is not yet efficient enough for hybrid parallelism to outperform pure message-passing on an SMP cluster.
当前高性能计算硬件的发展趋势是面向共享内存(SMP)计算节点集群。对于应用程序开发人员来说,主要的问题是如何最好地对这些SMP集群进行编程。为了解决这个问题,我们研究了离散元素建模中的一种算法,同时使用消息传递和共享内存模型进行并行化(“混合”并行化)。自然负载均衡方法在两个并行模型中是不同的,共享内存方法原则上对负载非常不平衡的问题更有效。因此,混合并行可能对SMP集群有益。我们在MPP、SMP和集群架构上对该算法的MPI和OpenMP实现进行了基准测试,并评估了混合并行的有效性。尽管我们观察到在单个SMP节点上OpenMP比MPI更有效的情况,但我们得出的结论是,我们当前的OpenMP实现还不够高效,混合并行性无法在SMP集群上胜过纯消息传递。
{"title":"Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling","authors":"D. Henty","doi":"10.1109/SC.2000.10005","DOIUrl":"https://doi.org/10.1109/SC.2000.10005","url":null,"abstract":"The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP clusters. To address this we study an algorithm from Discrete Element Modeling, parallelised using both the message-passing and shared-memory models simultaneously (\"hybrid\" parallelisation). The natural load-balancing methods are different in the two parallel models, the shared-memory method being in principle more efficient for very load-imbalanced problems. It is therefore possible that hybrid parallelism will be beneficial on SMP clusters. We benchmark MPI and OpenMP implementations of the algorithm on MPP, SMP and cluster architectures, and evaluate the effectiveness of hybrid parallelism. Although we observe cases where OpenMP is more efficient than MPI on a single SMP node, we conclude that our current OpenMP implementation is not yet efficient enough for hybrid parallelism to outperform pure message-passing on an SMP cluster.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122492041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems 从跟踪生成到可视化:分布式并行系统的性能框架
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10050
C. Wu, Anthony Bolmarcich, M. Snir, David Wootton, F. Parpia, Anthony Chan, E. Lusk, W. Gropp
In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBMâ SPä systems, a self-defining interval file format, an API for framework extensions, utilities for merging and statistics generation, and a visualization tool with preview and multiple time-space diagrams. The trace environment is extremely scalable, and combines MPI events with system activities in the same set of trace files, one for each SMP node. Since the amount of trace data may be very large, utilities are developed to convert and merge individual trace files into a self-defining interval trace file with multiple frame directories. The interval format allows the development of multiple time-space diagrams, such as thread-activity view, processor-activity view, etc., from the same interval file. A visualization tool, Jumpshot, is modified to visualize these views. A statistics utility is developed using the API, along with its graphics viewer.
在本文中,我们描述了一个跟踪分析框架,从跟踪生成到可视化。它包括ibm SPä系统上的统一跟踪工具、自定义间隔文件格式、用于框架扩展的API、用于合并和生成统计数据的实用程序,以及具有预览和多个时空图的可视化工具。跟踪环境具有极强的可伸缩性,并且在同一组跟踪文件中结合MPI事件和系统活动,每个SMP节点一个跟踪文件。由于跟踪数据量可能非常大,因此开发了实用程序来将单个跟踪文件转换和合并为具有多个帧目录的自定义间隔跟踪文件。间隔格式允许从相同的间隔文件开发多个时空图,例如线程-活动视图、处理器-活动视图等。通过修改可视化工具Jumpshot,可以将这些视图可视化。使用API及其图形查看器开发了一个统计实用程序。
{"title":"From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems","authors":"C. Wu, Anthony Bolmarcich, M. Snir, David Wootton, F. Parpia, Anthony Chan, E. Lusk, W. Gropp","doi":"10.1109/SC.2000.10050","DOIUrl":"https://doi.org/10.1109/SC.2000.10050","url":null,"abstract":"In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBMâ SPä systems, a self-defining interval file format, an API for framework extensions, utilities for merging and statistics generation, and a visualization tool with preview and multiple time-space diagrams. The trace environment is extremely scalable, and combines MPI events with system activities in the same set of trace files, one for each SMP node. Since the amount of trace data may be very large, utilities are developed to convert and merge individual trace files into a self-defining interval trace file with multiple frame directories. The interval format allows the development of multiple time-space diagrams, such as thread-activity view, processor-activity view, etc., from the same interval file. A visualization tool, Jumpshot, is modified to visualize these views. A statistics utility is developed using the API, along with its graphics viewer.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132932234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
92¢ /MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster 92 & # 162;/MFlops/s, PIII集群上的超大规模神经网络训练
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10031
Douglas Aberdeen, Jonathan Baxter, R. Edwards
Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data, and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks 1 (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 GFlops/s (single precision). With a machine cost of $150,913, this yields a price/performance ratio of 92.4¢ /MFlops/s (single precision). For comparison purposes, training using double precision and the ATLAS DGEMM produces a sustained performance of 70 MFlops/s or $2.16 / MFlop/s (double precision).
人工神经网络具有数百万个可调参数和类似数量的训练示例,是解决语音和人脸识别、大量网络数据分类和金融等领域中困难的大规模模式识别问题的潜在解决方案。瓶颈在于神经网络训练涉及迭代梯度下降,计算量极大。在本文中,我们提出了一种在Bunyip(一个基于linux的196个Pentium III处理器集群)上分布式训练超大规模神经网络1 (ULSNN)的技术。为了说明ULSNN训练,我们描述了一个实验,其中训练了一个具有173万个可调参数的神经网络,以从包含900万个训练模式的数据库中识别机器打印的日文字符。训练的平均性能为163.3 GFlops/s(单精度)。机器成本为150,913美元,这产生了92.4美分/MFlops/s(单精度)的性价比。为了进行比较,使用双精度和ATLAS DGEMM的训练产生的持续性能为70 MFlop/s或2.16美元/ MFlop/s(双精度)。
{"title":"92¢ /MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster","authors":"Douglas Aberdeen, Jonathan Baxter, R. Edwards","doi":"10.1109/SC.2000.10031","DOIUrl":"https://doi.org/10.1109/SC.2000.10031","url":null,"abstract":"Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data, and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks 1 (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 GFlops/s (single precision). With a machine cost of $150,913, this yields a price/performance ratio of 92.4¢ /MFlops/s (single precision). For comparison purposes, training using double precision and the ATLAS DGEMM produces a sustained performance of 70 MFlops/s or $2.16 / MFlop/s (double precision).","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133582966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Requirements for and Evaluation of RMI Protocols for Scientific Computing 科学计算RMI协议的需求与评估
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10060
M. Govindaraju, Aleksander Slominski, Venkatesh Choppella, R. Bramley, Dennis Gannon
Distributed software component architectures provide promising approach to the problem of building large scale, scientific Grid applications [18]. Communication in these component architectures is based on Remote Method Invocation (RMI) protocols that allow one software component to invoke the functionality of another. Examples include Java remote method invocation (Java RMI)[25] and the new Simple Object Access Protocol (SOAP) [15]. SOAP has the advantage that many programming languages and component frameworks can support it. This paper describes experiments showing that SOAP by itself is not efficient enough for large scale scientific applications. However, when it is embedded in multi-protocol RMI framework, SOAP can be effectively used as a universal control protocol, that can be swapped out by faster, more special purpose protocols when large data transfer speeds are needed.
分布式软件组件体系结构为构建大规模、科学的网格应用程序提供了很有前途的方法。这些组件体系结构中的通信基于远程方法调用(RMI)协议,该协议允许一个软件组件调用另一个软件组件的功能。示例包括Java远程方法调用(Java RMI)[15]和新的简单对象访问协议(SOAP)[15]。SOAP的优点是许多编程语言和组件框架都可以支持它。本文描述的实验表明,SOAP本身对于大规模的科学应用来说是不够有效的。然而,当它嵌入到多协议RMI框架中时,SOAP可以有效地用作通用控制协议,当需要大数据传输速度时,它可以被更快、更特殊用途的协议交换出来。
{"title":"Requirements for and Evaluation of RMI Protocols for Scientific Computing","authors":"M. Govindaraju, Aleksander Slominski, Venkatesh Choppella, R. Bramley, Dennis Gannon","doi":"10.1109/SC.2000.10060","DOIUrl":"https://doi.org/10.1109/SC.2000.10060","url":null,"abstract":"Distributed software component architectures provide promising approach to the problem of building large scale, scientific Grid applications [18]. Communication in these component architectures is based on Remote Method Invocation (RMI) protocols that allow one software component to invoke the functionality of another. Examples include Java remote method invocation (Java RMI)[25] and the new Simple Object Access Protocol (SOAP) [15]. SOAP has the advantage that many programming languages and component frameworks can support it. This paper describes experiments showing that SOAP by itself is not efficient enough for large scale scientific applications. However, when it is embedded in multi-protocol RMI framework, SOAP can be effectively used as a universal control protocol, that can be swapped out by faster, more special purpose protocols when large data transfer speeds are needed.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134301849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures 跨并行语言和体系结构的NAS MG基准比较研究
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10006
B. Chamberlain, Steven J. Deitz, L. Snyder
Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500, and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language’s support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications.
分层算法如多网格应用构成了科学计算的重要基石。在这项研究中,我们通过比较几种并行编程语言(Co-Array Fortran、High Performance Fortran、Single Assignment C和ZPL)对NAS MG基准的实现,迈出了评估并行语言对分层应用程序支持的第一步。我们根据可移植性、性能以及清晰简洁地表达算法的能力来评估每种语言。实验平台包括Cray T3E、IBM SP、SGI Origin、Sun Enterprise 5500和一个高性能Linux集群。我们的研究结果表明,虽然有可能实现良好的可移植性、性能和表达性,但大多数语言目前至少在这些领域中的一个方面存在不足。我们发现表达性和语言对计算全局视图的支持之间存在很强的相关性,并且我们确定了在多网格应用程序中实现可移植性能的关键因素。
{"title":"A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures","authors":"B. Chamberlain, Steven J. Deitz, L. Snyder","doi":"10.1109/SC.2000.10006","DOIUrl":"https://doi.org/10.1109/SC.2000.10006","url":null,"abstract":"Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500, and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language’s support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116989747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid 苹果参数扫描模板:网格的用户级中间件
Pub Date : 2000-11-01 DOI: 10.1155/2000/319291
H. Casanova, Graziano Obertelli, F. Berman, R. Wolski
The Computational Grid is a promising platform for the efficient execution of parameter sweep applications over large parameter spaces. To achieve performance on the Grid, such applications must be scheduled so that shared data files are strategically placed to maximize reuse, and so that the application execution can adapt to the deliverable performance potential of target heterogeneous, distributed and shared resources. Parameter sweep applications are an important class of applications and would greatly benefit from the development of Grid middleware that embeds a scheduler for performance and targets Grid resources transparently. In this paper we describe a user-level Grid middleware project, the AppLeS Parameter Sweep Template (APST), that uses application-level scheduling techniques [1] and various Grid technologies to allow the efficient deployment of parameter sweep applications over the Grid. We discuss several possible scheduling algorithms and detail our software design. We then describe our current implementation of APST using systems like Globus [2], NetSolve [3] and the Network Weather Service [4], and present experimental results.
计算网格是一个很有前途的平台,可以在大参数空间上有效地执行参数扫描应用程序。为了在网格上实现性能,必须对这些应用程序进行调度,以便战略性地放置共享数据文件以最大化重用,并且使应用程序执行能够适应目标异构、分布式和共享资源的可交付性能潜力。参数扫描应用程序是一类重要的应用程序,它将极大地受益于网格中间件的开发,这种中间件嵌入了性能调度器并透明地针对网格资源。在本文中,我们描述了一个用户级网格中间件项目,苹果参数扫描模板(APST),它使用应用程序级调度技术[1]和各种网格技术来允许在网格上有效地部署参数扫描应用程序。讨论了几种可能的调度算法,并详细介绍了软件设计。然后,我们描述了我们目前使用Globus[2]、NetSolve[3]和网络气象服务[4]等系统实现的APST,并给出了实验结果。
{"title":"The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid","authors":"H. Casanova, Graziano Obertelli, F. Berman, R. Wolski","doi":"10.1155/2000/319291","DOIUrl":"https://doi.org/10.1155/2000/319291","url":null,"abstract":"The Computational Grid is a promising platform for the efficient execution of parameter sweep applications over large parameter spaces. To achieve performance on the Grid, such applications must be scheduled so that shared data files are strategically placed to maximize reuse, and so that the application execution can adapt to the deliverable performance potential of target heterogeneous, distributed and shared resources. Parameter sweep applications are an important class of applications and would greatly benefit from the development of Grid middleware that embeds a scheduler for performance and targets Grid resources transparently. In this paper we describe a user-level Grid middleware project, the AppLeS Parameter Sweep Template (APST), that uses application-level scheduling techniques [1] and various Grid technologies to allow the efficient deployment of parameter sweep applications over the Grid. We discuss several possible scheduling algorithms and detail our software design. We then describe our current implementation of APST using systems like Globus [2], NetSolve [3] and the Network Weather Service [4], and present experimental results.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121472117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 440
A PARALLEL DYNAMIC-MESH LAGRANGIAN METHOD FOR SIMULATION OF FLOWS WITH DYNAMIC INTERFACES 具有动态界面的流动模拟的并行动态网格拉格朗日方法
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10045
J. Antaki, G. Blelloch, O. Ghattas, Ivan Malcevic, G. Miller, N. Walkington
Many important phenomena in science and engineering, including our motivating problem of microstructural blood flow, can be modeled as flows with dynamic interfaces. The major challenge faced in simulating such flows is resolving the interfacial motion. Lagrangian methods are ideally suited for such problems, since interfaces are naturally represented and propagated. However, the material description of motion results in dynamic meshes, which become hopelessly distorted unless they are regularly regenerated. Lagrangian methods are particularly challenging on parallel computers, because scalable dynamic mesh methods remain elusive. Here, we present a parallel dynamic mesh Lagrangian method for flows with dynamic interfaces. We take an aggressive approach to dynamic meshing by triangulating the propagating grid points at every timestep using a scalable parallel Delaunay algorithm. Contrary to conventional wisdom, we show that the costs of the geometric components (triangulation, coarsening, refinement, and partitioning) can be made small relative to the flow solver.
科学和工程中的许多重要现象,包括我们的微观结构血流激励问题,都可以用具有动态界面的流动来建模。模拟此类流动所面临的主要挑战是解决界面运动问题。拉格朗日方法非常适合这类问题,因为接口是自然表示和传播的。然而,运动的材料描述导致动态网格,除非它们有规律地再生,否则就会无可救药地扭曲。拉格朗日方法在并行计算机上尤其具有挑战性,因为可扩展的动态网格方法仍然难以捉摸。本文提出了一种具有动态界面的流动的并行动态网格拉格朗日方法。我们采用了一种积极的动态网格划分方法,通过使用可扩展的并行Delaunay算法在每个时间步对传播网格点进行三角测量。与传统智慧相反,我们表明几何组件(三角剖分、粗化、细化和划分)的成本可以相对于流求解器更小。
{"title":"A PARALLEL DYNAMIC-MESH LAGRANGIAN METHOD FOR SIMULATION OF FLOWS WITH DYNAMIC INTERFACES","authors":"J. Antaki, G. Blelloch, O. Ghattas, Ivan Malcevic, G. Miller, N. Walkington","doi":"10.1109/SC.2000.10045","DOIUrl":"https://doi.org/10.1109/SC.2000.10045","url":null,"abstract":"Many important phenomena in science and engineering, including our motivating problem of microstructural blood flow, can be modeled as flows with dynamic interfaces. The major challenge faced in simulating such flows is resolving the interfacial motion. Lagrangian methods are ideally suited for such problems, since interfaces are naturally represented and propagated. However, the material description of motion results in dynamic meshes, which become hopelessly distorted unless they are regularly regenerated. Lagrangian methods are particularly challenging on parallel computers, because scalable dynamic mesh methods remain elusive. Here, we present a parallel dynamic mesh Lagrangian method for flows with dynamic interfaces. We take an aggressive approach to dynamic meshing by triangulating the propagating grid points at every timestep using a scalable parallel Delaunay algorithm. Contrary to conventional wisdom, we show that the costs of the geometric components (triangulation, coarsening, refinement, and partitioning) can be made small relative to the flow solver.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117150137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A Framework for Sparse Matrix Code Synthesis from High-level Specifications 基于高级规范的稀疏矩阵代码合成框架
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10033
Nawaaz Ahmed, N. Mateev, K. Pingali, Paul V. Stodghill
We present compiler technology for synthesizing sparse matrix code from (i) dense matrix code, and (ii) a description of the index structure of a sparse matrix. Our approach is to embed statement instances into a Cartesian product of statement iteration and data spaces, and to produce efficient sparse code by identifying common enumerations for multiple references to sparse matrices. The approach works for imperfectly-nested codes with dependences, and produces sparse code competitive with hand-written library code for the Basic Linear Algebra Subroutines (BLAS).
我们提出了从(i)密集矩阵代码合成稀疏矩阵代码的编译器技术,以及(ii)稀疏矩阵索引结构的描述。我们的方法是将语句实例嵌入到语句迭代和数据空间的笛卡尔积中,并通过识别对稀疏矩阵的多个引用的公共枚举来生成高效的稀疏代码。该方法适用于具有依赖关系的嵌套不完美的代码,并生成与基本线性代数子例程(BLAS)的手写库代码竞争的稀疏代码。
{"title":"A Framework for Sparse Matrix Code Synthesis from High-level Specifications","authors":"Nawaaz Ahmed, N. Mateev, K. Pingali, Paul V. Stodghill","doi":"10.1109/SC.2000.10033","DOIUrl":"https://doi.org/10.1109/SC.2000.10033","url":null,"abstract":"We present compiler technology for synthesizing sparse matrix code from (i) dense matrix code, and (ii) a description of the index structure of a sparse matrix. Our approach is to embed statement instances into a Cartesian product of statement iteration and data spaces, and to produce efficient sparse code by identifying common enumerations for multiple references to sparse matrices. The approach works for imperfectly-nested codes with dependences, and produces sparse code competitive with hand-written library code for the Basic Linear Algebra Subroutines (BLAS).","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129658402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks PSockets:使用高速广域网的数据密集型应用的应用级网络条带化案例
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10040
H. Sivakumar, S. Bailey, R. Grossman
Transmission Control Protocol (TCP) is used by various applications to achieve reliable data transfer. TCP was originally designed for unreliable networks. With the emergence of high-speed wide area networks various improvements have been applied to TCP to reduce latency and achieve improved bandwidth. The improvement is achieved by having system administrators tune the network and can take a considerable amount of time. This paper introduces PSockets (Parallel Sockets), a library that achieves an equivalent performance without manual tuning. The basic idea behind PSockets is to exploit network striping. By network striping we mean striping partitioned data across several open sockets. We describe experimental studies using PSockets over the Abilene network. We show in particular that network striping using PSockets is effective for high performance data intensive computing applications using geographically distributed data.
传输控制协议(TCP)被各种应用程序用来实现可靠的数据传输。TCP最初是为不可靠的网络设计的。随着高速广域网的出现,人们对TCP进行了各种改进,以减少延迟,提高带宽。这种改进是通过系统管理员对网络进行调优来实现的,这可能需要花费相当多的时间。本文介绍了PSockets (Parallel Sockets),这是一个无需手动调优即可实现同等性能的库。PSockets背后的基本思想是利用网络条带化。通过网络条带化,我们的意思是跨几个打开的套接字条带化分区数据。我们描述了在阿比林网络上使用PSockets的实验研究。我们特别展示了使用PSockets的网络条带化对于使用地理分布数据的高性能数据密集型计算应用程序是有效的。
{"title":"PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks","authors":"H. Sivakumar, S. Bailey, R. Grossman","doi":"10.1109/SC.2000.10040","DOIUrl":"https://doi.org/10.1109/SC.2000.10040","url":null,"abstract":"Transmission Control Protocol (TCP) is used by various applications to achieve reliable data transfer. TCP was originally designed for unreliable networks. With the emergence of high-speed wide area networks various improvements have been applied to TCP to reduce latency and achieve improved bandwidth. The improvement is achieved by having system administrators tune the network and can take a considerable amount of time. This paper introduces PSockets (Parallel Sockets), a library that achieves an equivalent performance without manual tuning. The basic idea behind PSockets is to exploit network striping. By network striping we mean striping partitioned data across several open sockets. We describe experimental studies using PSockets over the Abilene network. We show in particular that network striping using PSockets is effective for high performance data intensive computing applications using geographically distributed data.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117212236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 293
High-Cost CFD on a Low-Cost Cluster 低成本集群上的高成本CFD
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10020
T. Hauser, T. Mattox, R. LeBeau, H. Dietz, P. Huang
Direct numerical simulation of the Navier-Stokes equations (DNS) is an important technique for the future of computational fluid dynamics (CFD) in engineering applications. However, DNS requires massive computing resources. This paper presents a new approach for implementing high-cost DNS CFD using low-cost cluster hardware. After describing the DNS CFD code DNSTool, the paper focuses on the techniques and tools that we have developed to customize the performance of a cluster implementation of this application. This tuning of system performance involves both recoding of the application and careful engineering of the cluster design. Using the cluster KLAT2 (Kentucky Linux Athlon Testbed 2), while DNSTool cannot match the $0.64 per MFLOPS that KLAT2 achieves on single precision ScaLAPACK, it is very efficient; DNSTool on KLAT2 achieves price/performance of $2.75 per MFLOPS double precision and $1.86 single precision. Further, the code and tools are all, or will soon be, made freely available as full source code.
Navier-Stokes方程(DNS)的直接数值模拟是未来计算流体力学(CFD)工程应用的一项重要技术。但是,DNS需要大量的计算资源。本文提出了一种利用低成本集群硬件实现高成本DNS CFD的新方法。在描述DNS CFD代码DNSTool之后,本文将重点介绍我们开发的技术和工具,以定制该应用程序的集群实现的性能。系统性能的调优涉及到应用程序的重新编码和集群设计的精心设计。使用集群KLAT2(肯塔基Linux Athlon Testbed 2),虽然DNSTool无法与KLAT2在单精度ScaLAPACK上实现的每MFLOPS 0.64美元相匹配,但它非常高效;KLAT2上的DNSTool实现了2.75美元/ MFLOPS双精度和1.86美元/ MFLOPS单精度的性价比。此外,所有的代码和工具都是,或者很快就会作为完整的源代码免费提供。
{"title":"High-Cost CFD on a Low-Cost Cluster","authors":"T. Hauser, T. Mattox, R. LeBeau, H. Dietz, P. Huang","doi":"10.1109/SC.2000.10020","DOIUrl":"https://doi.org/10.1109/SC.2000.10020","url":null,"abstract":"Direct numerical simulation of the Navier-Stokes equations (DNS) is an important technique for the future of computational fluid dynamics (CFD) in engineering applications. However, DNS requires massive computing resources. This paper presents a new approach for implementing high-cost DNS CFD using low-cost cluster hardware. After describing the DNS CFD code DNSTool, the paper focuses on the techniques and tools that we have developed to customize the performance of a cluster implementation of this application. This tuning of system performance involves both recoding of the application and careful engineering of the cluster design. Using the cluster KLAT2 (Kentucky Linux Athlon Testbed 2), while DNSTool cannot match the $0.64 per MFLOPS that KLAT2 achieves on single precision ScaLAPACK, it is very efficient; DNSTool on KLAT2 achieves price/performance of $2.75 per MFLOPS double precision and $1.86 single precision. Further, the code and tools are all, or will soon be, made freely available as full source code.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115753253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
ACM/IEEE SC 2000 Conference (SC'00)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1