Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation最新文献

英文中文

Design trade-offs of low-cost multicomputer network switches 低成本多计算机网络交换机的设计权衡

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1999-02-21 DOI: 10.1109/FMPC.1999.750581

M. Herbordt, J. Ge, S. Sanikop, K. Olin, H. Le

A comparison is made among a large number of designs for the purpose of specifying low-cost yet cost-effective multicomputer network switches. Among the parameters varied are switching mode, number of lanes, buffer size, wraparound, and channel selection. Some assumptions we make are deterministic routing and small fixed-sized packets. We obtain results using two methods: i) RTL cycle-driven simulations to determine latency and capacity with respect to load, communication pattern, and packet size and ii) hardware synthesis to a current technology to find the operating frequency and chip area. These results are also combined to yield performance/area measures for all of the designs. One of the results is deeper understanding of virtual cut-through in terms of deadlock properties and the capability of dynamic load balancing among buffers. We find that lanes are even more likely to improve performance of virtual cut-through as wormhole networks, and that virtual cut-through routing is preferable to wormhole routing in more domains than may have been previously realized. Other results include finding that, after factoring in operating frequency, having more than two lanes per physical channel is not likely to be useful and various observations about the utility of varying buffer sizes.

为了确定低成本、高性价比的多计算机网络交换机，对大量的设计方案进行了比较。不同的参数包括交换模式、通道数、缓冲区大小、封装和通道选择。我们做的一些假设是确定性路由和小的固定大小的数据包。我们使用两种方法获得结果:i) RTL周期驱动模拟，以确定与负载，通信模式和数据包大小相关的延迟和容量;ii)硬件合成到当前技术，以找到工作频率和芯片面积。这些结果还结合起来产生所有设计的性能/面积测量。结果之一是对死锁属性和缓冲区之间动态负载平衡能力方面的虚拟直通有了更深入的理解。我们发现通道更有可能提高虚拟直通作为虫洞网络的性能，并且虚拟直通路由在比以前实现的更多的域中比虫洞路由更可取。其他结果包括发现，在考虑了工作频率之后，每个物理通道拥有两个以上的通道不太可能有用，以及关于不同缓冲区大小的效用的各种观察。

{"title":"Design trade-offs of low-cost multicomputer network switches","authors":"M. Herbordt, J. Ge, S. Sanikop, K. Olin, H. Le","doi":"10.1109/FMPC.1999.750581","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750581","url":null,"abstract":"A comparison is made among a large number of designs for the purpose of specifying low-cost yet cost-effective multicomputer network switches. Among the parameters varied are switching mode, number of lanes, buffer size, wraparound, and channel selection. Some assumptions we make are deterministic routing and small fixed-sized packets. We obtain results using two methods: i) RTL cycle-driven simulations to determine latency and capacity with respect to load, communication pattern, and packet size and ii) hardware synthesis to a current technology to find the operating frequency and chip area. These results are also combined to yield performance/area measures for all of the designs. One of the results is deeper understanding of virtual cut-through in terms of deadlock properties and the capability of dynamic load balancing among buffers. We find that lanes are even more likely to improve performance of virtual cut-through as wormhole networks, and that virtual cut-through routing is preferable to wormhole routing in more domains than may have been previously realized. Other results include finding that, after factoring in operating frequency, having more than two lanes per physical channel is not likely to be useful and various observations about the utility of varying buffer sizes.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116994917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Parallel algorithms on the rotation-exchange network-a trivalent variant of the star graph 旋转交换网络上的并行算法——星图的三价变体

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1999-02-21 DOI: 10.1109/FMPC.1999.750613

C. Yeh, Emmanouel Varvarigos

We investigate a trivalent Cayley graph, which we call the rotation-exchange (RE) network, and present communication algorithms to perform one-to-one routing, single-node broadcasting, multinode broadcasting, and total exchange in it. The RE network can be viewed as a stargraph counterpart to the hypercubic shuffle-exchange network, with the important difference that the RE network is regular and symmetric. We show that RE networks can efficiently embed and emulate star graphs, meshes, hypercubes, cube connected cycles (CCC), pancake graphs, bubble-sort graphs, complete transposition graphs, and the shuffle-exchange permutation graphs. We also show that the performance of RE networks can be significantly improved for a variety of applications if the transmission rate of on-chip links is considerably higher than that of off-chip links.

我们研究了一个三价Cayley图，我们称之为旋转交换(RE)网络，并提出了在其中执行一对一路由、单节点广播、多节点广播和总交换的通信算法。正则网络可以看作是超立方洗牌交换网络的星图对应，重要的区别在于正则网络是规则和对称的。我们证明了RE网络可以有效地嵌入和模拟星图、网格、超立方体、立方体连接循环(CCC)、煎饼图、气泡排序图、完全换位图和洗牌交换排列图。我们还表明，如果片上链路的传输速率大大高于片外链路的传输速率，则可显着改善RE网络的性能。

引用次数: 8

The Cactus computational collaboratory: enabling technologies for relativistic astrophysics, and a toolkit for solving PDE's by communities in science and engineering Cactus计算合作实验室:相对论天体物理学的支持技术，以及由科学和工程社区解决PDE的工具包

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1999-02-21 DOI: 10.1109/FMPC.1999.750582

Gabrielle Allen, T. Goodale, E. Seidel

We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics, The design of this "Collaboratory" allows many users, with very different areas of expertise, to work coherently together on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called "Cactus". The scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributed computing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.

我们正在为分布在世界各地不同机构的研究人员开发一个协作研究和开发系统。在协作计算科学的新范式中，计算机代码和支持基础设施本身成为协作工具，就像加速器成为粒子物理学中大量分布式研究人员的协作工具一样。这个“协作实验室”的设计允许具有不同专业领域的许多用户在世界各地的分布式计算机上协同工作。不同的超级计算机可以单独使用，或者对于超出任何单个系统能力的问题，多台超级计算机可以通过高速千兆网络联网在一起。这个合作实验室的核心是一种新型的社区模拟代码，称为“Cactus”。这个项目背后的科学驱动力是对爱因斯坦方程的模拟，用于研究黑洞、引力波和中子星，它将来自世界各地许多不同领域的研究人员聚集在一起，在相对论和天体物理学的研究方面取得进展。但是，该系统的开发也为科学家和工程师提供了一个简单的框架，可以在许多并行计算机系统(从传统的超级计算机到工作站网络)上求解任何偏微分方程系统，这些科学家和工程师没有并行或分布式计算、网格细化等方面的专业知识。

{"title":"The Cactus computational collaboratory: enabling technologies for relativistic astrophysics, and a toolkit for solving PDE's by communities in science and engineering","authors":"Gabrielle Allen, T. Goodale, E. Seidel","doi":"10.1109/FMPC.1999.750582","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750582","url":null,"abstract":"We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics, The design of this \"Collaboratory\" allows many users, with very different areas of expertise, to work coherently together on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called \"Cactus\". The scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributed computing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133930904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

The priority broadcast scheme for dynamic broadcast in hypercubes and related networks 超立方体及相关网络中动态广播的优先级广播方案

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1999-02-21 DOI: 10.1109/FMPC.1999.750612

C. Yeh, Emmanouel Varvarigos, Hua Lee

Dynamic broadcast is a communication problem where each node in a parallel computer generates packets to be broadcast to all the other nodes according to a certain random process. The lower bound on the average time required by any oblivious dynamic broadcast algorithm in an n-dimensional hypercube is /spl Omega/(n+1/(1-/spl rho/)) when packets are generated according to a Poisson process, where /spl rho/ is the load factor. The best previous algorithms, however only achieve /spl Omega/(n/(1-/spl rho/)) time, which is suboptimal by a factor of /spl Theta/(n). In this paper we propose the priority broadcast scheme for designing dynamic broadcast algorithms that require optimal O(n+1/(1-/spl rho/)) time in an n-dimensional hypercube. We apply the routing scheme to other network topologies, including k-ary n-cubes, meshes, tori, star graphs, generalized hypercubes, as well as any symmetric network, for efficient dynamic broadcast. In particular the algorithms for star graphs, generalized hypercubes, and k-ary n-cubes with k=0(1) are also asymptotically optimal. We also propose a method for assigning priority classes to packets, called optimal priority assignment, which achieves the best possible performance for dynamic multiple broadcast in any network topology.

动态广播是并行计算机上的每个节点按照一定的随机过程生成数据包并广播给所有其他节点的通信问题。在n维超立方体中，根据泊松过程生成数据包时，任意遗忘动态广播算法所需的平均时间下界为/spl ω /(n+1/(1-/spl rho/))，其中/spl rho/为负载因子。然而，之前最好的算法只能达到/spl Omega/(n/(1-/spl rho/)))时间，这是次优的/spl Theta/(n)。本文提出了优先广播方案，用于设计动态广播算法，该算法在n维超立方体中需要最优的O(n+1/(1-/spl rho/))时间。我们将路由方案应用于其他网络拓扑，包括k-ary n-立方体、网格、环面、星图、广义超立方体以及任何对称网络，以实现有效的动态广播。特别是星图、广义超立方体和k=0(1)的k-ary n-立方体的算法也是渐近最优的。我们还提出了一种为数据包分配优先级的方法，称为最优优先级分配，该方法可以在任何网络拓扑中实现动态多广播的最佳性能。

{"title":"The priority broadcast scheme for dynamic broadcast in hypercubes and related networks","authors":"C. Yeh, Emmanouel Varvarigos, Hua Lee","doi":"10.1109/FMPC.1999.750612","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750612","url":null,"abstract":"Dynamic broadcast is a communication problem where each node in a parallel computer generates packets to be broadcast to all the other nodes according to a certain random process. The lower bound on the average time required by any oblivious dynamic broadcast algorithm in an n-dimensional hypercube is /spl Omega/(n+1/(1-/spl rho/)) when packets are generated according to a Poisson process, where /spl rho/ is the load factor. The best previous algorithms, however only achieve /spl Omega/(n/(1-/spl rho/)) time, which is suboptimal by a factor of /spl Theta/(n). In this paper we propose the priority broadcast scheme for designing dynamic broadcast algorithms that require optimal O(n+1/(1-/spl rho/)) time in an n-dimensional hypercube. We apply the routing scheme to other network topologies, including k-ary n-cubes, meshes, tori, star graphs, generalized hypercubes, as well as any symmetric network, for efficient dynamic broadcast. In particular the algorithms for star graphs, generalized hypercubes, and k-ary n-cubes with k=0(1) are also asymptotically optimal. We also propose a method for assigning priority classes to packets, called optimal priority assignment, which achieves the best possible performance for dynamic multiple broadcast in any network topology.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131172997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Data sieving and collective I/O in ROMIO 在romeo中数据筛选和集合I/O

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1998-09-11 DOI: 10.1109/FMPC.1999.750599

R. Thakur, W. Gropp, W. Gropp, E. Lusk

The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.

并行程序的I/O访问模式通常包括对大量小的、不连续的数据块的访问。但是，如果通过发出许多小的、不同的I/O请求来满足应用程序的I/O需求，则I/O性能会急剧下降。为了避免这个问题，MPI-IO允许用户通过单个I/O函数调用访问不连续的数据集。该特性为MPI-IO实现提供了优化数据访问的机会。我们描述了我们的MPI-IO实现ROMIO如何在不连续请求存在的情况下提供高性能。我们详细解释了romeo执行的两个关键优化:针对来自一个进程的不连续请求的数据筛选，以及针对来自多个进程的不连续请求的集合I/O。我们描述了如何在多台机器和文件系统上可移植地实现这些优化，控制它们的内存需求，并实现高性能。我们用三个应用程序的性能结果来演示性能和可移植性——一个天体物理学应用程序模板(DIST3D)、NAS BTIO基准和一个非结构化代码(UNSTRUC)——在五个不同的并行机器上:HP Exemplar IBM SP、Intel Paragon、NEC SX-4和SGI Origin2000。

{"title":"Data sieving and collective I/O in ROMIO","authors":"R. Thakur, W. Gropp, W. Gropp, E. Lusk","doi":"10.1109/FMPC.1999.750599","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750599","url":null,"abstract":"The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128714003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 542

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀