首页 > 最新文献

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation最新文献

英文 中文
Delphi: an integrated, language-directed performance prediction, measurement and analysis environment Delphi:一个集成的、语言导向的性能预测、测量和分析环境
D. Reed, D. Padua, Ian T Foster, Dennis Gannon, B. Miller
Despite construction of powerful parallel systems and networked computational grids, achieving a large fraction of peak performance for a range of applications has proven very difficult. In this paper, we describe the components of Delphi, an integrated performance measurement and prediction environment that places system design on a solid performance engineering basis.
尽管构建了强大的并行系统和网络计算网格,但要为一系列应用程序实现大部分峰值性能已被证明是非常困难的。在本文中,我们描述了Delphi的组成部分,Delphi是一个集成的性能测量和预测环境,它将系统设计置于坚实的性能工程基础上。
{"title":"Delphi: an integrated, language-directed performance prediction, measurement and analysis environment","authors":"D. Reed, D. Padua, Ian T Foster, Dennis Gannon, B. Miller","doi":"10.1109/FMPC.1999.750595","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750595","url":null,"abstract":"Despite construction of powerful parallel systems and networked computational grids, achieving a large fraction of peak performance for a range of applications has proven very difficult. In this paper, we describe the components of Delphi, an integrated performance measurement and prediction environment that places system design on a solid performance engineering basis.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"427 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Parallel algorithms on the rotation-exchange network-a trivalent variant of the star graph 旋转交换网络上的并行算法——星图的三价变体
C. Yeh, Emmanouel Varvarigos
We investigate a trivalent Cayley graph, which we call the rotation-exchange (RE) network, and present communication algorithms to perform one-to-one routing, single-node broadcasting, multinode broadcasting, and total exchange in it. The RE network can be viewed as a stargraph counterpart to the hypercubic shuffle-exchange network, with the important difference that the RE network is regular and symmetric. We show that RE networks can efficiently embed and emulate star graphs, meshes, hypercubes, cube connected cycles (CCC), pancake graphs, bubble-sort graphs, complete transposition graphs, and the shuffle-exchange permutation graphs. We also show that the performance of RE networks can be significantly improved for a variety of applications if the transmission rate of on-chip links is considerably higher than that of off-chip links.
我们研究了一个三价Cayley图,我们称之为旋转交换(RE)网络,并提出了在其中执行一对一路由、单节点广播、多节点广播和总交换的通信算法。正则网络可以看作是超立方洗牌交换网络的星图对应,重要的区别在于正则网络是规则和对称的。我们证明了RE网络可以有效地嵌入和模拟星图、网格、超立方体、立方体连接循环(CCC)、煎饼图、气泡排序图、完全换位图和洗牌交换排列图。我们还表明,如果片上链路的传输速率大大高于片外链路的传输速率,则可显着改善RE网络的性能。
{"title":"Parallel algorithms on the rotation-exchange network-a trivalent variant of the star graph","authors":"C. Yeh, Emmanouel Varvarigos","doi":"10.1109/FMPC.1999.750613","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750613","url":null,"abstract":"We investigate a trivalent Cayley graph, which we call the rotation-exchange (RE) network, and present communication algorithms to perform one-to-one routing, single-node broadcasting, multinode broadcasting, and total exchange in it. The RE network can be viewed as a stargraph counterpart to the hypercubic shuffle-exchange network, with the important difference that the RE network is regular and symmetric. We show that RE networks can efficiently embed and emulate star graphs, meshes, hypercubes, cube connected cycles (CCC), pancake graphs, bubble-sort graphs, complete transposition graphs, and the shuffle-exchange permutation graphs. We also show that the performance of RE networks can be significantly improved for a variety of applications if the transmission rate of on-chip links is considerably higher than that of off-chip links.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133921889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Cactus computational collaboratory: enabling technologies for relativistic astrophysics, and a toolkit for solving PDE's by communities in science and engineering Cactus计算合作实验室:相对论天体物理学的支持技术,以及由科学和工程社区解决PDE的工具包
Gabrielle Allen, T. Goodale, E. Seidel
We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics, The design of this "Collaboratory" allows many users, with very different areas of expertise, to work coherently together on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called "Cactus". The scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributed computing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.
我们正在为分布在世界各地不同机构的研究人员开发一个协作研究和开发系统。在协作计算科学的新范式中,计算机代码和支持基础设施本身成为协作工具,就像加速器成为粒子物理学中大量分布式研究人员的协作工具一样。这个“协作实验室”的设计允许具有不同专业领域的许多用户在世界各地的分布式计算机上协同工作。不同的超级计算机可以单独使用,或者对于超出任何单个系统能力的问题,多台超级计算机可以通过高速千兆网络联网在一起。这个合作实验室的核心是一种新型的社区模拟代码,称为“Cactus”。这个项目背后的科学驱动力是对爱因斯坦方程的模拟,用于研究黑洞、引力波和中子星,它将来自世界各地许多不同领域的研究人员聚集在一起,在相对论和天体物理学的研究方面取得进展。但是,该系统的开发也为科学家和工程师提供了一个简单的框架,可以在许多并行计算机系统(从传统的超级计算机到工作站网络)上求解任何偏微分方程系统,这些科学家和工程师没有并行或分布式计算、网格细化等方面的专业知识。
{"title":"The Cactus computational collaboratory: enabling technologies for relativistic astrophysics, and a toolkit for solving PDE's by communities in science and engineering","authors":"Gabrielle Allen, T. Goodale, E. Seidel","doi":"10.1109/FMPC.1999.750582","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750582","url":null,"abstract":"We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics, The design of this \"Collaboratory\" allows many users, with very different areas of expertise, to work coherently together on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called \"Cactus\". The scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributed computing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133930904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
The priority broadcast scheme for dynamic broadcast in hypercubes and related networks 超立方体及相关网络中动态广播的优先级广播方案
C. Yeh, Emmanouel Varvarigos, Hua Lee
Dynamic broadcast is a communication problem where each node in a parallel computer generates packets to be broadcast to all the other nodes according to a certain random process. The lower bound on the average time required by any oblivious dynamic broadcast algorithm in an n-dimensional hypercube is /spl Omega/(n+1/(1-/spl rho/)) when packets are generated according to a Poisson process, where /spl rho/ is the load factor. The best previous algorithms, however only achieve /spl Omega/(n/(1-/spl rho/)) time, which is suboptimal by a factor of /spl Theta/(n). In this paper we propose the priority broadcast scheme for designing dynamic broadcast algorithms that require optimal O(n+1/(1-/spl rho/)) time in an n-dimensional hypercube. We apply the routing scheme to other network topologies, including k-ary n-cubes, meshes, tori, star graphs, generalized hypercubes, as well as any symmetric network, for efficient dynamic broadcast. In particular the algorithms for star graphs, generalized hypercubes, and k-ary n-cubes with k=0(1) are also asymptotically optimal. We also propose a method for assigning priority classes to packets, called optimal priority assignment, which achieves the best possible performance for dynamic multiple broadcast in any network topology.
动态广播是并行计算机上的每个节点按照一定的随机过程生成数据包并广播给所有其他节点的通信问题。在n维超立方体中,根据泊松过程生成数据包时,任意遗忘动态广播算法所需的平均时间下界为/spl ω /(n+1/(1-/spl rho/)),其中/spl rho/为负载因子。然而,之前最好的算法只能达到/spl Omega/(n/(1-/spl rho/)))时间,这是次优的/spl Theta/(n)。本文提出了优先广播方案,用于设计动态广播算法,该算法在n维超立方体中需要最优的O(n+1/(1-/spl rho/))时间。我们将路由方案应用于其他网络拓扑,包括k-ary n-立方体、网格、环面、星图、广义超立方体以及任何对称网络,以实现有效的动态广播。特别是星图、广义超立方体和k=0(1)的k-ary n-立方体的算法也是渐近最优的。我们还提出了一种为数据包分配优先级的方法,称为最优优先级分配,该方法可以在任何网络拓扑中实现动态多广播的最佳性能。
{"title":"The priority broadcast scheme for dynamic broadcast in hypercubes and related networks","authors":"C. Yeh, Emmanouel Varvarigos, Hua Lee","doi":"10.1109/FMPC.1999.750612","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750612","url":null,"abstract":"Dynamic broadcast is a communication problem where each node in a parallel computer generates packets to be broadcast to all the other nodes according to a certain random process. The lower bound on the average time required by any oblivious dynamic broadcast algorithm in an n-dimensional hypercube is /spl Omega/(n+1/(1-/spl rho/)) when packets are generated according to a Poisson process, where /spl rho/ is the load factor. The best previous algorithms, however only achieve /spl Omega/(n/(1-/spl rho/)) time, which is suboptimal by a factor of /spl Theta/(n). In this paper we propose the priority broadcast scheme for designing dynamic broadcast algorithms that require optimal O(n+1/(1-/spl rho/)) time in an n-dimensional hypercube. We apply the routing scheme to other network topologies, including k-ary n-cubes, meshes, tori, star graphs, generalized hypercubes, as well as any symmetric network, for efficient dynamic broadcast. In particular the algorithms for star graphs, generalized hypercubes, and k-ary n-cubes with k=0(1) are also asymptotically optimal. We also propose a method for assigning priority classes to packets, called optimal priority assignment, which achieves the best possible performance for dynamic multiple broadcast in any network topology.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131172997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Data sieving and collective I/O in ROMIO 在romeo中数据筛选和集合I/O
R. Thakur, W. Gropp, W. Gropp, E. Lusk
The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.
并行程序的I/O访问模式通常包括对大量小的、不连续的数据块的访问。但是,如果通过发出许多小的、不同的I/O请求来满足应用程序的I/O需求,则I/O性能会急剧下降。为了避免这个问题,MPI-IO允许用户通过单个I/O函数调用访问不连续的数据集。该特性为MPI-IO实现提供了优化数据访问的机会。我们描述了我们的MPI-IO实现ROMIO如何在不连续请求存在的情况下提供高性能。我们详细解释了romeo执行的两个关键优化:针对来自一个进程的不连续请求的数据筛选,以及针对来自多个进程的不连续请求的集合I/O。我们描述了如何在多台机器和文件系统上可移植地实现这些优化,控制它们的内存需求,并实现高性能。我们用三个应用程序的性能结果来演示性能和可移植性——一个天体物理学应用程序模板(DIST3D)、NAS BTIO基准和一个非结构化代码(UNSTRUC)——在五个不同的并行机器上:HP Exemplar IBM SP、Intel Paragon、NEC SX-4和SGI Origin2000。
{"title":"Data sieving and collective I/O in ROMIO","authors":"R. Thakur, W. Gropp, W. Gropp, E. Lusk","doi":"10.1109/FMPC.1999.750599","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750599","url":null,"abstract":"The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128714003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 542
期刊
Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1