首页 > 最新文献

Proceedings of the IEEE/ACM SC95 Conference最新文献

英文 中文
Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations 平衡处理器负载和利用n体仿真中的数据局部性
Pub Date : 1995-12-08 DOI: 10.1145/224170.224306
I. Banicescu, S. F. Hummel
Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.
虽然n体仿真算法适用于并行化,但由于体的不规则分布导致的负载不平衡,在并行机上执行的性能难以获得提高。通常,平衡处理器负载和维护局部性之间存在矛盾,因为动态重新分配工作需要访问远程数据。分形调度是一种利用分形的自相似特性来平衡处理器负载和保持局部性的动态调度方案。分段是基于概率分析的,因此,可以适应由可预测现象(如不规则数据)和不可预测现象(如数据访问延迟)引起的负载不平衡。在KSR1上的实验中,n体仿真代码的性能通过分形提高了53%。在均匀分布和非均匀分布的情况下,性能得到了改善,强调了需要一个适应系统诱导方差的调度方案。由于分形方案与n体算法正交,我们可以使用简单的代码将空间离散成大小相等的子矩形(2-d)或子立方体(3-d)作为基本算法。
{"title":"Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations","authors":"I. Banicescu, S. F. Hummel","doi":"10.1145/224170.224306","DOIUrl":"https://doi.org/10.1145/224170.224306","url":null,"abstract":"Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129224142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers 分布式内存多计算机上细粒度语言的混合执行模型
Pub Date : 1995-12-08 DOI: 10.1145/224170.224302
John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien
While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.
虽然细粒度并发语言可以自然地捕获许多不规则和动态问题中的并发性,但它们的灵活性通常导致执行效率低下。在这些语言中,计算由许多动态创建和隐式同步的小线程组成。为了最大限度地减少这些操作的开销,我们提出了一种混合执行模型,该模型可以动态地适应运行时数据布局,同时提供顺序效率和低开销的并行执行。该模型使用分别优化的顺序版本和并行版本的代码。顺序效率是通过基于堆栈的执行动态合并线程获得的,并行效率是通过延迟隐藏和使用堆分配激活帧的廉价同步获得的。堆栈机制的新方面包括在堆栈上处理期货的返回值和执行转发的消息(响应的责任被传递,就像Scheme中的call/cc)。此外,混合执行模型完全用C语言表示,因此很容易移植到许多系统。在函数调用密集的程序中进行的实验表明,该模型达到了与C程序相当的顺序效率。在CM5和T3D上对规则和不规则应用程序内核进行的实验表明,它的性能比单独为并行执行优化的代码提高1.5到3倍。
{"title":"A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers","authors":"John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien","doi":"10.1145/224170.224302","DOIUrl":"https://doi.org/10.1145/224170.224302","url":null,"abstract":"While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126211182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Server-Directed Collective I/O in Panda 熊猫中的服务器定向的集体I/O
Pub Date : 1995-12-08 DOI: 10.1145/224170.224371
K. Seamons, Ying Chen, P. Jones, J. Jozwiak, M. Winslett
We present the architecture and implementation results for Panda 2.0, a library for input and output of multidimensional arrays on parallel and sequential platforms. Panda achieves remarkable performance levels on the IBM SP2, showing excellent scalability as data size increases and as the number of nodes increases, and provides throughputs close to the full capacity of the AIX file system on the SP2 we used. We argue that this good performance can be traced to Panda's use of server-directed i/o (a logical-level version of disk-directed i/o [Kotz94b]) to perform array i/o using sequential disk reads and writes, a very high level interface for collective i/o requests, and built-in facilities for arbitrary rearrangements of arrays during i/o. Other advantages of Panda's approach are ease of use, easy application portability, and a reliance on commodity system software.
本文介绍了Panda 2.0的体系结构和实现结果,Panda 2.0是一个用于并行和顺序平台上多维数组输入和输出的库。Panda在IBM SP2上实现了卓越的性能水平,随着数据大小的增加和节点数量的增加显示出出色的可伸缩性,并提供接近我们使用的SP2上AIX文件系统的全部容量的吞吐量。我们认为,这种良好的性能可以追溯到Panda使用服务器定向i/o(磁盘定向i/o的逻辑级版本[Kotz94b])来执行数组i/o,使用顺序磁盘读写,一个非常高级的集合i/o请求接口,以及在i/o期间任意重新排列数组的内置设施。Panda方法的其他优点是易于使用、易于应用程序可移植性和对商品系统软件的依赖。
{"title":"Server-Directed Collective I/O in Panda","authors":"K. Seamons, Ying Chen, P. Jones, J. Jozwiak, M. Winslett","doi":"10.1145/224170.224371","DOIUrl":"https://doi.org/10.1145/224170.224371","url":null,"abstract":"We present the architecture and implementation results for Panda 2.0, a library for input and output of multidimensional arrays on parallel and sequential platforms. Panda achieves remarkable performance levels on the IBM SP2, showing excellent scalability as data size increases and as the number of nodes increases, and provides throughputs close to the full capacity of the AIX file system on the SP2 we used. We argue that this good performance can be traced to Panda's use of server-directed i/o (a logical-level version of disk-directed i/o [Kotz94b]) to perform array i/o using sequential disk reads and writes, a very high level interface for collective i/o requests, and built-in facilities for arbitrary rearrangements of arrays during i/o. Other advantages of Panda's approach are ease of use, easy application portability, and a reliance on commodity system software.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125820135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 247
Predicting Application Behavior in Large Scale Shared-Memory Multiprocessors 预测大规模共享内存多处理器中的应用程序行为
Pub Date : 1995-12-08 DOI: 10.1145/224170.224356
Karim Harzallah, K. Sevcik
In this paper we present an analytical-based framework for parallel program performance prediction. The main thrust of this work is to provide a means for treating realistic applications within a single unified framework. Our approach is based upon the specification of a set of non-linear equations which describe the application, processor configuration, network and memory operations. These equations are solved iteratively since the application execution rate depends on the communication latencies. The iterative solution technique is found to be efficient as it typically requires only few iterations to reach convergence. Our modeling methodology achieves a good balance between abstraction and accuracy. This is attained by accounting for both time and space dimensions of memory references, while maintaining a simple description of the workload. We demonstrate both the practicality and the accuracy of our approach by comparing predicted results with measurements taken on a commercial multiprocessor system. We found the model to be faithful in reflecting changes in processor speed, and changes in the number and placement of allocated processors.
本文提出了一个基于分析的并行程序性能预测框架。这项工作的主要目的是提供一种在单一统一框架内处理实际应用的方法。我们的方法是基于一组描述应用程序、处理器配置、网络和内存操作的非线性方程的规范。由于应用程序的执行速率取决于通信延迟,因此这些方程是迭代求解的。迭代求解技术被认为是有效的,因为它通常只需要很少的迭代就可以达到收敛。我们的建模方法在抽象和准确性之间取得了很好的平衡。这是通过考虑内存引用的时间和空间维度来实现的,同时保持对工作负载的简单描述。我们通过将预测结果与商业多处理器系统上的测量结果进行比较,证明了我们方法的实用性和准确性。我们发现该模型忠实地反映了处理器速度的变化,以及分配的处理器数量和位置的变化。
{"title":"Predicting Application Behavior in Large Scale Shared-Memory Multiprocessors","authors":"Karim Harzallah, K. Sevcik","doi":"10.1145/224170.224356","DOIUrl":"https://doi.org/10.1145/224170.224356","url":null,"abstract":"In this paper we present an analytical-based framework for parallel program performance prediction. The main thrust of this work is to provide a means for treating realistic applications within a single unified framework. Our approach is based upon the specification of a set of non-linear equations which describe the application, processor configuration, network and memory operations. These equations are solved iteratively since the application execution rate depends on the communication latencies. The iterative solution technique is found to be efficient as it typically requires only few iterations to reach convergence. Our modeling methodology achieves a good balance between abstraction and accuracy. This is attained by accounting for both time and space dimensions of memory references, while maintaining a simple description of the workload. We demonstrate both the practicality and the accuracy of our approach by comparing predicted results with measurements taken on a commercial multiprocessor system. We found the model to be faithful in reflecting changes in processor speed, and changes in the number and placement of allocated processors.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123660587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Distributed Information Management in the National HPCC Software Exchange 全国HPCC软件交换中的分布式信息管理
Pub Date : 1995-12-08 DOI: 10.1145/224170.224211
S. Browne, J. Dongarra, G. Fox, K. Hawick, K. Kennedy, R. Stevens, R. Olson, T. Rowan
The National HPCC Software Exchange is a collaborative effort by member institutions of the Center for Research on Parallel Computation to provide network access to HPCC-related software, documents, and data. Challenges for the NHSE include identifying, organizing, filtering, and indexing the rapidly growing wealth of relevant information available on the Web. The large quantity of information necessitates performing these tasks using automatic techniques, many of which make use of parallel and distribution computation, but human intervention is needed for intelligent abstracting, analysis, and critical review tasks. Thus, major goals of NHSE research are to find the right mix of manual and automated techniques, and to leverage the results of manual efforts to the maximum extent possible. This paper describes our current information gathering and processing techniques, as well as our future plans for integrating the manual and automated approaches. The NHSE home page is accessible at http://www.netlib.org/nhse/.
国家HPCC软件交换是并行计算研究中心成员机构的一项合作努力,旨在提供对HPCC相关软件、文档和数据的网络访问。NHSE面临的挑战包括识别、组织、过滤和索引Web上可用的快速增长的相关信息财富。大量的信息需要使用自动技术来执行这些任务,其中许多使用并行和分布计算,但是智能抽象、分析和关键审查任务需要人工干预。因此,NHSE研究的主要目标是找到人工和自动化技术的正确组合,并最大限度地利用人工努力的结果。本文描述了我们目前的信息收集和处理技术,以及我们未来集成人工和自动化方法的计划。NHSE主页可访问http://www.netlib.org/nhse/。
{"title":"Distributed Information Management in the National HPCC Software Exchange","authors":"S. Browne, J. Dongarra, G. Fox, K. Hawick, K. Kennedy, R. Stevens, R. Olson, T. Rowan","doi":"10.1145/224170.224211","DOIUrl":"https://doi.org/10.1145/224170.224211","url":null,"abstract":"The National HPCC Software Exchange is a collaborative effort by member institutions of the Center for Research on Parallel Computation to provide network access to HPCC-related software, documents, and data. Challenges for the NHSE include identifying, organizing, filtering, and indexing the rapidly growing wealth of relevant information available on the Web. The large quantity of information necessitates performing these tasks using automatic techniques, many of which make use of parallel and distribution computation, but human intervention is needed for intelligent abstracting, analysis, and critical review tasks. Thus, major goals of NHSE research are to find the right mix of manual and automated techniques, and to leverage the results of manual efforts to the maximum extent possible. This paper describes our current information gathering and processing techniques, as well as our future plans for integrating the manual and automated approaches. The NHSE home page is accessible at http://www.netlib.org/nhse/.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126308056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Living Textbook and the K-12 Classroom of the Future 生活教科书和未来的K-12课堂
Pub Date : 1995-12-08 DOI: 10.1145/224170.224196
Kim Mills, Geoffrey C. Fox, P. Coddington, Barbara Mihalas, M. Podgorny, Barbara Shelly, Steven Bossert
The Living Textbook creates a unique learning environment enabling teachers and students to use educational resources on multimedia information servers, supercomputers, parallel databases, and network testbeds. We have three innovative educational software applications running in our laboratory, and under test in the classroom. Our education-focused goal is to learn how new, learner-driven, explorative models of learning can be supported by these high bandwidth, interactive applications and ultimately how they will impact the classroom of the future.
“生活教材”创造了一个独特的学习环境,使教师和学生能够使用多媒体信息服务器、超级计算机、并行数据库和网络测试平台上的教育资源。我们有三个创新的教育软件应用程序在我们的实验室里运行,并在课堂上进行测试。我们以教育为中心的目标是了解这些高带宽、交互式应用程序如何支持新的、学习者驱动的、探索性的学习模式,以及它们最终将如何影响未来的课堂。
{"title":"The Living Textbook and the K-12 Classroom of the Future","authors":"Kim Mills, Geoffrey C. Fox, P. Coddington, Barbara Mihalas, M. Podgorny, Barbara Shelly, Steven Bossert","doi":"10.1145/224170.224196","DOIUrl":"https://doi.org/10.1145/224170.224196","url":null,"abstract":"The Living Textbook creates a unique learning environment enabling teachers and students to use educational resources on multimedia information servers, supercomputers, parallel databases, and network testbeds. We have three innovative educational software applications running in our laboratory, and under test in the classroom. Our education-focused goal is to learn how new, learner-driven, explorative models of learning can be supported by these high bandwidth, interactive applications and ultimately how they will impact the classroom of the future.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131811320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Distributing a Chemical Process Optimization Application Over a Gigabit Network 在千兆网络上分发化学过程优化应用程序
Pub Date : 1995-12-08 DOI: 10.1145/224170.224310
R. Clay, P. Steenkiste
We evaluate the impact of a gigabit network on the implementation of a distributed chemical process optimization application. The optimization problem is formulated as a stochastic Linear Assignment Problem and was solved using the Thinking Machines CM-2 (SIMD) and the Cray C-90 (vector) computers at PSC, and the Intel iWarp (MIMD) system at CMU, connected by the Gigabit Nectar testbed. We report our experience distributing the application across this heterogeneous set of systems and present measurements that show how the communication requirements of the application depend on the structure of the application. We use detailed traces to build an application performance model that can be used to estimate the elapsed time of the application for different computer system and network combinations. Our results show that the application benefits from the high-speed network, and that the need for high network throughput is increasing as computer systems get faster. We also observed that supporting high burst rates is critical, although structuring the application so that communication is overlapped with computation relaxes the bandwidth requirements.
我们评估了千兆网络对分布式化工过程优化应用实施的影响。优化问题是一个随机线性分配问题,并通过PSC的思维机器CM-2 (SIMD)和Cray C-90(矢量)计算机和CMU的Intel iWarp (MIMD)系统进行求解,并通过千兆Nectar测试平台连接。我们报告了跨异构系统集分发应用程序的经验,并提供了显示应用程序的通信需求如何依赖于应用程序结构的度量。我们使用详细的跟踪来构建应用程序性能模型,该模型可用于估计不同计算机系统和网络组合下应用程序的运行时间。我们的结果表明,应用程序受益于高速网络,并且随着计算机系统变得越来越快,对高网络吞吐量的需求也在增加。我们还观察到,支持高突发速率是至关重要的,尽管构建应用程序使通信与计算重叠可以放松带宽需求。
{"title":"Distributing a Chemical Process Optimization Application Over a Gigabit Network","authors":"R. Clay, P. Steenkiste","doi":"10.1145/224170.224310","DOIUrl":"https://doi.org/10.1145/224170.224310","url":null,"abstract":"We evaluate the impact of a gigabit network on the implementation of a distributed chemical process optimization application. The optimization problem is formulated as a stochastic Linear Assignment Problem and was solved using the Thinking Machines CM-2 (SIMD) and the Cray C-90 (vector) computers at PSC, and the Intel iWarp (MIMD) system at CMU, connected by the Gigabit Nectar testbed. We report our experience distributing the application across this heterogeneous set of systems and present measurements that show how the communication requirements of the application depend on the structure of the application. We use detailed traces to build an application performance model that can be used to estimate the elapsed time of the application for different computer system and network combinations. Our results show that the application benefits from the high-speed network, and that the need for high network throughput is increasing as computer systems get faster. We also observed that supporting high burst rates is critical, although structuring the application so that communication is overlapped with computation relaxes the bandwidth requirements.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122292906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Surface Fitting Using GCV Smoothing Splines on Supercomputers 基于GCV平滑样条的超级计算机曲面拟合
Pub Date : 1995-12-08 DOI: 10.1145/224170.224192
Alan Williams, K. Burrage
The task of fitting smoothing spline surfaces to meteorological data such as temperature or rainfall observations is computationally intensive. The Generalised Cross Validation (GCV) smoothing algorithm is O(n³) computationally, and memory requirements are 0(n²). Fitting a spline to a moderately sized data set of, for example. 1080 observations and calculating an output surface grid of dimension 220 × 220 involves approximately 5 billion floating point operations, and takes approximately 19 minutes of execution time on a Sun SPARC2 workstation. Since fitting a surface to data collected from the whole of Australia could conceivably involve data sets with approximately 10000 points, and because it is desirable to be able to fit surfaces of at least 1000 data points in 1 to 5 seconds for use in interactive visualisations, it is crucial to be able to take advantage of supercomputing resources. This paper describes the adaptation of the surface fitting program to different supercomputing platforms, and the results achieved.
将光滑样条曲面拟合到气象数据(如温度或降雨观测)的任务是计算密集型的。广义交叉验证(GCV)平滑算法计算量为O(n³),内存要求为0(n²)。将样条曲线拟合到中等大小的数据集,例如。1080次观测和计算一个220 × 220维的输出曲面网格涉及大约50亿次浮点运算,在Sun SPARC2工作站上大约需要19分钟的执行时间。由于拟合从整个澳大利亚收集的数据的表面可能涉及大约10000个点的数据集,并且因为能够在1到5秒内拟合至少1000个数据点的表面以用于交互式可视化,因此能够利用超级计算资源是至关重要的。本文介绍了曲面拟合程序在不同的超级计算平台上的适应性,以及所取得的结果。
{"title":"Surface Fitting Using GCV Smoothing Splines on Supercomputers","authors":"Alan Williams, K. Burrage","doi":"10.1145/224170.224192","DOIUrl":"https://doi.org/10.1145/224170.224192","url":null,"abstract":"The task of fitting smoothing spline surfaces to meteorological data such as temperature or rainfall observations is computationally intensive. The Generalised Cross Validation (GCV) smoothing algorithm is O(n³) computationally, and memory requirements are 0(n²). Fitting a spline to a moderately sized data set of, for example. 1080 observations and calculating an output surface grid of dimension 220 × 220 involves approximately 5 billion floating point operations, and takes approximately 19 minutes of execution time on a Sun SPARC2 workstation. Since fitting a surface to data collected from the whole of Australia could conceivably involve data sets with approximately 10000 points, and because it is desirable to be able to fit surfaces of at least 1000 data points in 1 to 5 seconds for use in interactive visualisations, it is crucial to be able to take advantage of supercomputing resources. This paper describes the adaptation of the surface fitting program to different supercomputing platforms, and the results achieved.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114994388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines 并行机器上快速连接组件算法的性能建模研究
Pub Date : 1995-12-08 DOI: 10.1145/224170.224275
S. Lumetta, A. Krishnamurthy, D. Culler
We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.
我们提出并分析了一种可移植的高性能算法,用于在现代分布式内存多处理器上查找连接组件。该算法是每个处理器本地子图上的经典DFS算法和子图全局集合上的Shiloach-Vishkin PRAM算法的一种变体的混合。我们在Split-C中实现了该算法,并在Cray T3D、Meiko CS-2和Thinking Machines CM-5上使用从计算物理中的聚类动力学方法派生的一类图来测量性能。在256处理器的Cray T3D上,该实现比以前的所有解决方案都要好一个数量级。图形参数的特性使我们能够选择突出显示关键性能特征的图形。我们研究了这些参数和机器特性对算法局部和全局阶段之间时间平衡的影响,发现边缘密度、面体积比和相对通信成本主导了性能。通过了解机器特性对性能的影响,该研究揭示了计算和/或通信性能改进对这个具有挑战性的问题的影响。
{"title":"Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines","authors":"S. Lumetta, A. Krishnamurthy, D. Culler","doi":"10.1145/224170.224275","DOIUrl":"https://doi.org/10.1145/224170.224275","url":null,"abstract":"We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133309054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Large Eddy Simulation of a Spatially-Developing Boundary Layer 空间发展边界层的大涡模拟
Pub Date : 1995-12-08 DOI: 10.1145/224170.224408
Xiaohua Wu, K. Squires, T. Lund
A method for generation of a three-dimensional, time-dependent turbulent inflow condition for simulation of spatially-developing boundary layers is described. Assuming self-preservation of the boundary layer, a quasi-homogeneous coordinate is defined along which streamwise inhomogeneity is minimized (Spalart 1988). Using this quasi-homogeneous coordinate and decomposition of the velocity into a mean and periodic part, the velocity field at a location near the exit boundary of the computational domain is re-introduced at the in- flow boundary at each time step. The method was tested using large eddy simulations of a flat-plate boundary layer for momentum thickness Reynolds numbers ranging from 1470 to 1700. Subgrid scale stresses were modeled using the dynamic eddy viscosity model of Germano et al. (1991). Simulation results demonstrate that the essential features of spatially-developing turbulent boundary layers are reproduced using the present approach without the need for a prolonged and computationally expensive laminar-turbulent transition region. Boundary layer properties such as skin friction and shape factor as well as mean velocity profiles and turbulence intensities are in good agreement with experimental measurements and results from direct numerical simulation. Application of the method for calculation of spatially-developing complex turbulent boundary layers is also described.
描述了一种用于模拟空间发展边界层的三维、随时间变化的湍流入流条件的生成方法。假设边界层自保存,定义一个准齐次坐标,沿此坐标沿流不均匀性最小化(Spalart 1988)。利用这种准齐次坐标,将速度分解为平均部分和周期部分,在每个时间步长在流内边界重新引入计算域出口边界附近位置的速度场。在动量厚度雷诺数范围为1470 ~ 1700的平板边界层大涡模拟中,对该方法进行了验证。采用Germano等人(1991)的动态涡流黏度模型模拟亚网格尺度应力。模拟结果表明,使用本方法可以再现空间发展的湍流边界层的基本特征,而不需要长时间和计算昂贵的层流-湍流过渡区。边界层特性如表面摩擦和形状因子以及平均速度分布和湍流强度与实验测量和直接数值模拟的结果吻合得很好。本文还介绍了该方法在空间发展复杂湍流边界层计算中的应用。
{"title":"Large Eddy Simulation of a Spatially-Developing Boundary Layer","authors":"Xiaohua Wu, K. Squires, T. Lund","doi":"10.1145/224170.224408","DOIUrl":"https://doi.org/10.1145/224170.224408","url":null,"abstract":"A method for generation of a three-dimensional, time-dependent turbulent inflow condition for simulation of spatially-developing boundary layers is described. Assuming self-preservation of the boundary layer, a quasi-homogeneous coordinate is defined along which streamwise inhomogeneity is minimized (Spalart 1988). Using this quasi-homogeneous coordinate and decomposition of the velocity into a mean and periodic part, the velocity field at a location near the exit boundary of the computational domain is re-introduced at the in- flow boundary at each time step. The method was tested using large eddy simulations of a flat-plate boundary layer for momentum thickness Reynolds numbers ranging from 1470 to 1700. Subgrid scale stresses were modeled using the dynamic eddy viscosity model of Germano et al. (1991). Simulation results demonstrate that the essential features of spatially-developing turbulent boundary layers are reproduced using the present approach without the need for a prolonged and computationally expensive laminar-turbulent transition region. Boundary layer properties such as skin friction and shape factor as well as mean velocity profiles and turbulence intensities are in good agreement with experimental measurements and results from direct numerical simulation. Application of the method for calculation of spatially-developing complex turbulent boundary layers is also described.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132369789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings of the IEEE/ACM SC95 Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1