首页 > 最新文献

ACM/IEEE SC 2000 Conference (SC'00)最新文献

英文 中文
Parallel Phylogenetic Inference 平行系统发育推断
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10062
Q. Snell, M. Whiting, M. Clement, David McLaughlin
Recent advances in DNA sequencing technology have created large data sets upon which phylogenetic inference can be performed. However, current research is limited by the prohibitive time necessary to perform tree search on even a reasonably sized data set. Some parallel algorithms have been developed but the biological research community does not use them because they don’t trust the results from newly developed parallel software. This paper presents a new phylogenetic algorithm that allows existing, trusted phylogenetic software packages to be executed in parallel using the DOGMA parallel processing system. The results presented here indicate that data sets that currently take as much as 11 months to search using current algorithms, can be searched in as little as 2 hours using as few as 8 processors. This reduction in the time necessary to complete a phylogenetic search allows new research questions to be explored in many of the biological sciences.
DNA测序技术的最新进展创造了大量的数据集,在这些数据集上可以进行系统发育推断。然而,当前的研究受限于在合理大小的数据集上执行树搜索所需的时间限制。一些并行算法已经被开发出来,但生物研究界并没有使用它们,因为他们不相信新开发的并行软件的结果。本文提出了一种新的系统发育算法,该算法允许现有的、可信的系统发育软件包使用DOGMA并行处理系统并行执行。这里展示的结果表明,目前使用现有算法需要11个月才能搜索到的数据集,使用8个处理器就可以在短短2小时内搜索到。完成系统发育研究所需时间的减少,使许多生物科学领域的新研究问题得以探索。
{"title":"Parallel Phylogenetic Inference","authors":"Q. Snell, M. Whiting, M. Clement, David McLaughlin","doi":"10.1109/SC.2000.10062","DOIUrl":"https://doi.org/10.1109/SC.2000.10062","url":null,"abstract":"Recent advances in DNA sequencing technology have created large data sets upon which phylogenetic inference can be performed. However, current research is limited by the prohibitive time necessary to perform tree search on even a reasonably sized data set. Some parallel algorithms have been developed but the biological research community does not use them because they don’t trust the results from newly developed parallel software. This paper presents a new phylogenetic algorithm that allows existing, trusted phylogenetic software packages to be executed in parallel using the DOGMA parallel processing system. The results presented here indicate that data sets that currently take as much as 11 months to search using current algorithms, can be searched in as little as 2 hours using as few as 8 processors. This reduction in the time necessary to complete a phylogenetic search allows new research questions to be explored in many of the biological sciences.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127091842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Using Hardware Performance Monitors to Isolate Memory Bottlenecks 使用硬件性能监视器隔离内存瓶颈
Pub Date : 2000-11-01 DOI: 10.5555/370049.370420
B. Buck, J. Hollingsworth
In this paper, we present and evaluate two techniques that use different styles of hardware support to provide data structure specific processor cache information. In one approach, hardware performance counter overflow interrupts are used to sample cache misses. In the other, cache misses within regions of memory are counted to perform an n-way search for the areas in which the most misses are occurring. We present a simulation-based study and comparison of the two techniques. We find that both techniques can provide accurate information, and describe the relative advantages and disadvantages of each.
在本文中,我们提出并评估了两种技术,它们使用不同风格的硬件支持来提供特定于数据结构的处理器缓存信息。在一种方法中,使用硬件性能计数器溢出中断对缓存缺失进行采样。在另一种情况下,对内存区域内的缓存丢失进行计数,以便对发生丢失最多的区域执行n向搜索。我们提出了一个基于模拟的研究和两种技术的比较。我们发现这两种技术都可以提供准确的信息,并描述了每种技术的相对优势和劣势。
{"title":"Using Hardware Performance Monitors to Isolate Memory Bottlenecks","authors":"B. Buck, J. Hollingsworth","doi":"10.5555/370049.370420","DOIUrl":"https://doi.org/10.5555/370049.370420","url":null,"abstract":"In this paper, we present and evaluate two techniques that use different styles of hardware support to provide data structure specific processor cache information. In one approach, hardware performance counter overflow interrupts are used to sample cache misses. In the other, cache misses within regions of memory are counted to perform an n-way search for the areas in which the most misses are occurring. We present a simulation-based study and comparison of the two techniques. We find that both techniques can provide accurate information, and describe the relative advantages and disadvantages of each.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128182153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
A 1.349 Tflops simulation of black holes in a galactic center on GRAPE-6 1.349 Tflops在GRAPE-6上模拟了银河系中心的黑洞
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10042
J. Makino, T. Fukushige, Masaki Koga
As an entry for the 2000 Gordon Bell performance prize, we report the performance achieved on a prototype GRAPE-6 system. GRAPE-6 is a special-purpose computer for as-trophysical N-body calculations. The present configuration has 96 custom pipeline processors, each containing six pipeline processors for the calculation of gravitational interactions between particles. Its theoretical peak performance is 2.889 Tflops. The complete GRAPE-6 system will consist of 3072 pipeline chips and will achieve a peak speed of 100 Tflops. The actual performance obtained on the present 96-chip system was 1.349 Tflops, for a simulation of massive black holes embedded in the core of a galaxy with 786,432 stars. For a short benchmark run with 1,400,000 particles, the average speed was 1.640 Tflops.
作为2000年戈登贝尔性能奖的参赛作品,我们报告了在原型机上实现的性能。GRAPE-6是一种用于大气物理n体计算的专用计算机。目前的配置有96个定制管道处理器,每个处理器包含6个管道处理器,用于计算粒子之间的引力相互作用。其理论峰值性能为2.889 Tflops。完整的GRAPE-6系统将由3072个流水线芯片组成,峰值速度将达到100 Tflops。在现有的96片系统上,模拟嵌入在拥有786,432颗恒星的星系核心的大质量黑洞,获得的实际性能为1.349 Tflops。对于包含140万个粒子的短期基准测试,平均速度为1.640 Tflops。
{"title":"A 1.349 Tflops simulation of black holes in a galactic center on GRAPE-6","authors":"J. Makino, T. Fukushige, Masaki Koga","doi":"10.1109/SC.2000.10042","DOIUrl":"https://doi.org/10.1109/SC.2000.10042","url":null,"abstract":"As an entry for the 2000 Gordon Bell performance prize, we report the performance achieved on a prototype GRAPE-6 system. GRAPE-6 is a special-purpose computer for as-trophysical N-body calculations. The present configuration has 96 custom pipeline processors, each containing six pipeline processors for the calculation of gravitational interactions between particles. Its theoretical peak performance is 2.889 Tflops. The complete GRAPE-6 system will consist of 3072 pipeline chips and will achieve a peak speed of 100 Tflops. The actual performance obtained on the present 96-chip system was 1.349 Tflops, for a simulation of massive black holes embedded in the core of a galaxy with 786,432 stars. For a short benchmark run with 1,400,000 particles, the average speed was 1.640 Tflops.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127189828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Expressing and Enforcing Distributed Resource Sharing Agreements 表示和执行分布式资源共享协议
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10054
Tao Zhao, V. Karamcheti
Advances in computing and networking technology, and an explosion in information sources has resulted in a growing number of distributed systems getting constructed out of resources contributed by multiple sources. Use of such resources is typically governed by sharing agreements between owning principals, which limit both who can access a resource and in what quantity. Despite their increasing importance, existing resource management infrastructures offer only limited support for the expression and enforcement of sharing agreements, typically restricting themselves to identifying compatible resources. In this paper, we present a novel approach building on the concepts of tickets and currencies to express resource sharing agreements in an abstract, dynamic, and uniform fashion. We also formulate the allocation problem of enforcing these agreements as a linear-programming model, automatically factoring the transitive availability of resources via chained agreements. A case study modeling resource sharing among ISP-level web proxies shows the benefits of enforcing transitive agreements: worst-case waiting times of clients accessing these proxies improves by up to two orders of magnitude.
计算和网络技术的进步以及信息源的爆炸式增长导致越来越多的分布式系统由多个来源提供的资源构建而成。这些资源的使用通常由所有者之间的共享协议控制,该协议限制了谁可以访问资源以及访问资源的数量。尽管资源管理基础结构的重要性日益增加,但现有的资源管理基础结构对共享协议的表达和执行只提供有限的支持,通常将它们限制在识别兼容的资源。在本文中,我们提出了一种基于票和货币概念的新方法,以抽象、动态和统一的方式表达资源共享协议。我们还将执行这些协议的分配问题表述为线性规划模型,通过链式协议自动分解资源的可传递性。一个在isp级web代理之间建模资源共享的案例研究显示了强制执行传递协议的好处:访问这些代理的客户端在最坏情况下的等待时间最多可以提高两个数量级。
{"title":"Expressing and Enforcing Distributed Resource Sharing Agreements","authors":"Tao Zhao, V. Karamcheti","doi":"10.1109/SC.2000.10054","DOIUrl":"https://doi.org/10.1109/SC.2000.10054","url":null,"abstract":"Advances in computing and networking technology, and an explosion in information sources has resulted in a growing number of distributed systems getting constructed out of resources contributed by multiple sources. Use of such resources is typically governed by sharing agreements between owning principals, which limit both who can access a resource and in what quantity. Despite their increasing importance, existing resource management infrastructures offer only limited support for the expression and enforcement of sharing agreements, typically restricting themselves to identifying compatible resources. In this paper, we present a novel approach building on the concepts of tickets and currencies to express resource sharing agreements in an abstract, dynamic, and uniform fashion. We also formulate the allocation problem of enforcing these agreements as a linear-programming model, automatically factoring the transitive availability of resources via chained agreements. A case study modeling resource sharing among ISP-level web proxies shows the benefits of enforcing transitive agreements: worst-case waiting times of clients accessing these proxies improves by up to two orders of magnitude.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122301023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Realizing Fault Resilience in Web-Server Cluster 在Web-Server集群中实现故障恢复
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10012
Chu-Sing Yang, Mon-Yen Luo
Today, a successful Internet service is absolutely critical to be up 100 percent of the time. Server clustering is the most promising approach to meet this requirement. However, the existing Web server-clustering solutions merely can provide high availability derived from its redundancy nature, but offer no guarantee about fault resilience for the service. In this paper, we address this problem by implementing an innovative mechanism that enables a Web request to be smoothly migrated and recovered on another working node in the presence of server failure. We will show that the request migration and recovery could be efficiently achieved in the manner of user transparency. The achieved capability of fault resilience is important and essential for a variety of critical services (e.g., E-commerce), which are increasingly widespread used. Our approach takes an important step toward providing a highly reliable Web service.
今天,一个成功的互联网服务绝对是100%在线的关键。服务器集群是满足这一需求的最有希望的方法。然而,现有的Web服务器集群解决方案只能提供源于冗余特性的高可用性,而不能保证服务的故障恢复能力。在本文中,我们通过实现一种创新机制来解决这个问题,该机制使Web请求能够在服务器出现故障的情况下在另一个工作节点上顺利迁移和恢复。我们将展示以用户透明的方式可以有效地实现请求迁移和恢复。实现故障恢复的能力对于越来越广泛使用的各种关键服务(例如,电子商务)是重要和必要的。我们的方法向提供高度可靠的Web服务迈出了重要的一步。
{"title":"Realizing Fault Resilience in Web-Server Cluster","authors":"Chu-Sing Yang, Mon-Yen Luo","doi":"10.1109/SC.2000.10012","DOIUrl":"https://doi.org/10.1109/SC.2000.10012","url":null,"abstract":"Today, a successful Internet service is absolutely critical to be up 100 percent of the time. Server clustering is the most promising approach to meet this requirement. However, the existing Web server-clustering solutions merely can provide high availability derived from its redundancy nature, but offer no guarantee about fault resilience for the service. In this paper, we address this problem by implementing an innovative mechanism that enables a Web request to be smoothly migrated and recovered on another working node in the presence of server failure. We will show that the request migration and recovery could be efficiently achieved in the manner of user transparency. The achieved capability of fault resilience is important and essential for a variety of critical services (e.g., E-commerce), which are increasingly widespread used. Our approach takes an important step toward providing a highly reliable Web service.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127706340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path 在地球上着陆CG:进化路径上细粒度多线程的案例研究
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10011
K. B. Theobald, G. Agrawal, Rishi Kumar, G. Heber, G. Gao, Paul V. Stodghill, K. Pingali
We report on our work in developing a fine-grained multithreaded solution for the communication-intensive Conjugate Gradient (CG) problem. In our recent work, we developed a simple yet efficient program for sparse matrix-vector multiply on a multi-threaded system. This paper presents an effective mechanism for the reduction-broadcast phase, which is integrated with the sparse MVM, resulting in a scalable implementation of the complete CG application. Three major observations from our experiments on the EARTH multithreaded testbed are: (1) The scalability of our CG implementation is impressive, e.g., absolute speedup is 90 on 120 processors for the NAS CG class B input. (2) Our dataflow-style reduction-broadcast network based on fine-grain multithreading is twice as fast as a serial reduction scheme on the same system. (3) By slowing down the network by a factor of 2, no notable degradation of overall CG performance was observed.
我们报告了我们在为通信密集型共轭梯度(CG)问题开发细粒度多线程解决方案方面的工作。在我们最近的工作中,我们开发了一个简单而高效的多线程系统稀疏矩阵向量乘法程序。本文提出了一种有效的还原-广播阶段机制,该机制与稀疏MVM相结合,从而实现了完整的CG应用程序的可扩展实现。从我们在EARTH多线程测试平台上的实验中得出的三个主要观察结果是:(1)我们的CG实现的可扩展性令人印象深刻,例如,对于NAS CG B类输入,在120个处理器上的绝对加速是90。(2)基于细粒度多线程的数据流式约简广播网络的速度是同一系统上串行约简方案的两倍。(3)通过将网络的速度降低2倍,没有观察到整体CG性能的显著下降。
{"title":"Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path","authors":"K. B. Theobald, G. Agrawal, Rishi Kumar, G. Heber, G. Gao, Paul V. Stodghill, K. Pingali","doi":"10.1109/SC.2000.10011","DOIUrl":"https://doi.org/10.1109/SC.2000.10011","url":null,"abstract":"We report on our work in developing a fine-grained multithreaded solution for the communication-intensive Conjugate Gradient (CG) problem. In our recent work, we developed a simple yet efficient program for sparse matrix-vector multiply on a multi-threaded system. This paper presents an effective mechanism for the reduction-broadcast phase, which is integrated with the sparse MVM, resulting in a scalable implementation of the complete CG application. Three major observations from our experiments on the EARTH multithreaded testbed are: (1) The scalability of our CG implementation is impressive, e.g., absolute speedup is 90 on 120 processors for the NAS CG class B input. (2) Our dataflow-style reduction-broadcast network based on fine-grain multithreading is twice as fast as a serial reduction scheme on the same system. (3) By slowing down the network by a factor of 2, no notable degradation of overall CG performance was observed.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133430434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Unified Algorithm for Load-balancing Adaptive Scientific Simulations 负载均衡自适应科学仿真的统一算法
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10035
K. Schloegel, G. Karypis, Vipin Kumar
Adaptive scientific simulations require that periodic repartitioning occur dynamically throughout the course of the computation. The repartitionings should be computed so as to minimize both the inter-processor communications incurred during the iterative mesh-based computation and the data redistribution costs required to balance the load. Recently developed schemes for computing repartitionings provide the user with only a limited control of the tradeoffs among these objectives. This paper describes a new Unified Repartitioning Algorithm that can tradeoff one objective for the other dependent upon a user-defined parameter describing the relative costs of these objectives. We show that the Unified Repartitioning Algorithm is able to reduce the precise overheads associated with repartitioning as well as or better than other repartitioning schemes for a variety of problems, regardless of the relative costs of performing inter-processor communication and data redistribution. Our experimental results show that this scheme is extremely fast and scalable to large problems.
自适应科学模拟要求在整个计算过程中动态地进行周期性的重新划分。在计算重分区时,应尽量减少基于网格的迭代计算过程中产生的处理器间通信和平衡负载所需的数据重新分配成本。最近开发的计算重新分区的方案仅为用户提供了对这些目标之间权衡的有限控制。本文描述了一种新的统一重新划分算法,该算法可以根据描述这些目标的相对成本的用户定义参数来权衡一个目标与另一个目标。我们表明,无论执行处理器间通信和数据再分配的相对成本如何,统一重分区算法都能够减少与重分区相关的精确开销,或者优于其他重分区方案。实验结果表明,该方案具有极高的速度和可扩展性。
{"title":"A Unified Algorithm for Load-balancing Adaptive Scientific Simulations","authors":"K. Schloegel, G. Karypis, Vipin Kumar","doi":"10.1109/SC.2000.10035","DOIUrl":"https://doi.org/10.1109/SC.2000.10035","url":null,"abstract":"Adaptive scientific simulations require that periodic repartitioning occur dynamically throughout the course of the computation. The repartitionings should be computed so as to minimize both the inter-processor communications incurred during the iterative mesh-based computation and the data redistribution costs required to balance the load. Recently developed schemes for computing repartitionings provide the user with only a limited control of the tradeoffs among these objectives. This paper describes a new Unified Repartitioning Algorithm that can tradeoff one objective for the other dependent upon a user-defined parameter describing the relative costs of these objectives. We show that the Unified Repartitioning Algorithm is able to reduce the precise overheads associated with repartitioning as well as or better than other repartitioning schemes for a variety of problems, regardless of the relative costs of performing inter-processor communication and data redistribution. Our experimental results show that this scheme is extremely fast and scalable to large problems.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131592890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Single sided MPI implementations for SUN MPI SUN MPI的单面MPI实现
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10022
S. Booth, F. Mourão
This paper describes an implementation of generic MPI-2 single sided communications for SUN-MPI. Our implementation is layered on top of point-to-point MPI communications and therefore can be adapted to other MPI implementations. The code is designed to co-exist with other MPI-2 single sided implementations (for example direct use of shared memory) providing a generic fall-back implementation for those communication paths where an optimised single-sided implementation is not available. MPI-2 single sided communications require the transfer of data-type information as well as user data. We describe a type packing and caching mechanism used to optimise the transfer of data-type information. The performance of this implementation is measured in comparison to equivalent point to point operations and the shared memory implementation provided by SUN.
本文描述了SUN-MPI通用MPI-2单边通信的实现。我们的实现是在点对点MPI通信之上分层的,因此可以适应其他MPI实现。代码被设计为与其他MPI-2单面实现共存(例如直接使用共享内存),为那些无法获得优化单面实现的通信路径提供通用的回退实现。MPI-2单侧通信既需要传输数据类型信息,也需要传输用户数据。我们描述了一种用于优化数据类型信息传输的类型打包和缓存机制。此实现的性能是通过与等效点对点操作和SUN提供的共享内存实现进行比较来衡量的。
{"title":"Single sided MPI implementations for SUN MPI","authors":"S. Booth, F. Mourão","doi":"10.1109/SC.2000.10022","DOIUrl":"https://doi.org/10.1109/SC.2000.10022","url":null,"abstract":"This paper describes an implementation of generic MPI-2 single sided communications for SUN-MPI. Our implementation is layered on top of point-to-point MPI communications and therefore can be adapted to other MPI implementations. The code is designed to co-exist with other MPI-2 single sided implementations (for example direct use of shared memory) providing a generic fall-back implementation for those communication paths where an optimised single-sided implementation is not available. MPI-2 single sided communications require the transfer of data-type information as well as user data. We describe a type packing and caching mechanism used to optimise the transfer of data-type information. The performance of this implementation is measured in comparison to equivalent point to point operations and the shared memory implementation provided by SUN.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130044393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Computing and Data Grids for Science and Engineering 科学与工程的计算与数据网格
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10007
W. Johnston, Dennis Gannon, B. Nitzberg, Leigh Ann Tanner, Bill Thigpen, Alex Woo
We use the term "Grid" to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. While, in general, Grids will provide the infrastructure to support a wide range of services in the scientific environment (e.g. collaboration and remote instrument control) in this paper we focus on services for high performance computing and data handling. We describe the services and architecture of NASA’s Information Power Grid ("IPG") - an early example of a large-scale Grid - and some of the issues that have come up in its implementation.
我们使用术语“网格”指的是一种软件系统,它提供对地理上和组织上分散的异构资源的统一和位置独立的访问,这些资源是持久的和受支持的。一般来说,网格将提供基础设施来支持科学环境中广泛的服务(例如协作和远程仪器控制),在本文中,我们将重点关注高性能计算和数据处理服务。我们描述了NASA的信息电网(“IPG”)的服务和架构——一个大规模电网的早期例子——以及在其实施过程中出现的一些问题。
{"title":"Computing and Data Grids for Science and Engineering","authors":"W. Johnston, Dennis Gannon, B. Nitzberg, Leigh Ann Tanner, Bill Thigpen, Alex Woo","doi":"10.1109/SC.2000.10007","DOIUrl":"https://doi.org/10.1109/SC.2000.10007","url":null,"abstract":"We use the term \"Grid\" to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. While, in general, Grids will provide the infrastructure to support a wide range of services in the scientific environment (e.g. collaboration and remote instrument control) in this paper we focus on services for high performance computing and data handling. We describe the services and architecture of NASA’s Information Power Grid (\"IPG\") - an early example of a large-scale Grid - and some of the issues that have come up in its implementation.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130092327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Integrating Parallel File I/O and Database Support for High-Performance Scientific Data Management 集成并行文件I/O和数据库支持的高性能科学数据管理
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10048
Jaechun No, R. Thakur, A. Choudhary
Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that aims to combine the good features of both file I/O and databases. SDM provides a high-level API to the user and, internally, uses a parallel file system to store real data and a database to store application-related metadata. SDM takes advantage of various I/O optimizations available in MPI-IO, such as collective I/O and noncontiguous requests, in a manner that is transparent to the user. As a result, users can write and retrieve data with the performance of parallel file I/O, without having to bother with the details of actually performing file I/O. In this paper, we describe the design and implementation of SDM. With the help of two parallel application templates, ASTRO3D and an Euler solver, we illustrate how some of the design criteria affect performance.
许多科学应用程序在数据大小和文件或数据集数量方面都有很大的I/O需求。这些数据的管理、存储、有效访问和分析是一项极具挑战性的任务。传统上,这个问题有两种不同的解决方案:文件I/O或数据库。文件I/O可以提供高性能,但对于大量文件和大型复杂数据集来说,使用起来很繁琐。数据库可以是方便、灵活和强大的,但是对于并行超级计算应用程序来说,它的性能和可伸缩性不是很好。我们开发了一个名为Scientific Data Manager (SDM)的软件系统,旨在将文件I/O和数据库的优点结合起来。SDM为用户提供高级API,并在内部使用并行文件系统存储实际数据,使用数据库存储与应用程序相关的元数据。SDM以一种对用户透明的方式利用MPI-IO中可用的各种I/O优化,例如集合I/O和不连续请求。因此,用户可以使用并行文件I/O的性能来写入和检索数据,而不必为实际执行文件I/O的细节而烦恼。在本文中,我们描述了SDM的设计和实现。借助两个并行应用程序模板ASTRO3D和欧拉求解器,我们演示了一些设计标准如何影响性能。
{"title":"Integrating Parallel File I/O and Database Support for High-Performance Scientific Data Management","authors":"Jaechun No, R. Thakur, A. Choudhary","doi":"10.1109/SC.2000.10048","DOIUrl":"https://doi.org/10.1109/SC.2000.10048","url":null,"abstract":"Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that aims to combine the good features of both file I/O and databases. SDM provides a high-level API to the user and, internally, uses a parallel file system to store real data and a database to store application-related metadata. SDM takes advantage of various I/O optimizations available in MPI-IO, such as collective I/O and noncontiguous requests, in a manner that is transparent to the user. As a result, users can write and retrieve data with the performance of parallel file I/O, without having to bother with the details of actually performing file I/O. In this paper, we describe the design and implementation of SDM. With the help of two parallel application templates, ASTRO3D and an Euler solver, we illustrate how some of the design criteria affect performance.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116246842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
ACM/IEEE SC 2000 Conference (SC'00)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1