Proceedings. Advances in Parallel and Distributed Computing最新文献

英文中文

A scalable parallel workstation cluster system 一个可扩展的并行工作站集群系统

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574048

C. Dong, Weimin Zheng, Dingxing Wang, M. Shen

In this paper, we argue that because of recent advance of network & CPU technologies, workstation clusters are poised to become the primary parallel computing infrastructure for science and engineering computing. After analyzing and comparing the communication performance of three popular networks: 10 Mbps Ethernet, 100 Mbps Ethernet and 640 Mbps Myrinet on an experimental workstation cluster, we point out that two main factors hinder the wider application of workstation cluster: low efficiency of communication system (both hardware and software) and lack of friendly parallel program development environment with accessory tools. For these two problem, we implemented two workstation cluster systems for different performance/price rate requirements: one is 8 PowerPCs with shared media network, another is 8 Sun Sparcstations with switch network. By using Reduced Communication Protocol (RCP), we dramatically improved the performance of communication system; by expanding the language support of PVM and adding several useful tools, we build a visual integrated parallel program development environment IPCE. On our platform, we also analyzed several massive applications, such as GRI benchmark, earthquake simulator, weather forecasting and some NAS benchmarks, and we get very good results for these coarse-grain to middle-grain applications. The speedup ranges from 5.83 to 7.98 and parallel efficiency reaches to 72.88%-99.7%.

在本文中，我们认为，由于最近网络和CPU技术的进步，工作站集群准备成为科学和工程计算的主要并行计算基础设施。通过对10 Mbps以太网、100 Mbps以太网和640 Mbps Myrinet三种常用网络在实验工作站集群上的通信性能进行分析和比较，指出了阻碍工作站集群广泛应用的两个主要因素:通信系统(硬件和软件)效率低下以及缺乏友好的并行程序开发环境和辅助工具。针对这两个问题，我们实现了两种不同性能/价格要求的工作站集群系统:一种是8台powerpc共享媒体网络，另一种是8台Sun sparcstation交换网络。通过采用精简通信协议(RCP)，大大提高了通信系统的性能;通过扩展PVM的语言支持和增加一些有用的工具，我们构建了一个可视化的集成并行程序开发环境IPCE。在我们的平台上，我们还分析了几个大型应用程序，例如GRI基准测试，地震模拟器，天气预报和一些NAS基准测试，我们对这些粗粒度到中粒度的应用程序得到了非常好的结果。加速范围为5.83 ~ 7.98，并行效率为72.88% ~ 99.7%。

{"title":"A scalable parallel workstation cluster system","authors":"C. Dong, Weimin Zheng, Dingxing Wang, M. Shen","doi":"10.1109/APDC.1997.574048","DOIUrl":"https://doi.org/10.1109/APDC.1997.574048","url":null,"abstract":"In this paper, we argue that because of recent advance of network & CPU technologies, workstation clusters are poised to become the primary parallel computing infrastructure for science and engineering computing. After analyzing and comparing the communication performance of three popular networks: 10 Mbps Ethernet, 100 Mbps Ethernet and 640 Mbps Myrinet on an experimental workstation cluster, we point out that two main factors hinder the wider application of workstation cluster: low efficiency of communication system (both hardware and software) and lack of friendly parallel program development environment with accessory tools. For these two problem, we implemented two workstation cluster systems for different performance/price rate requirements: one is 8 PowerPCs with shared media network, another is 8 Sun Sparcstations with switch network. By using Reduced Communication Protocol (RCP), we dramatically improved the performance of communication system; by expanding the language support of PVM and adding several useful tools, we build a visual integrated parallel program development environment IPCE. On our platform, we also analyzed several massive applications, such as GRI benchmark, earthquake simulator, weather forecasting and some NAS benchmarks, and we get very good results for these coarse-grain to middle-grain applications. The speedup ranges from 5.83 to 7.98 and parallel efficiency reaches to 72.88%-99.7%.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122477444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parallel solver of generalized eigenproblem on Dawning-1000 dawn -1000上广义特征问题的并行求解器

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574025

Xue-bin Chi

In this paper, we consider the parallel implementation of solving generalized eigenproblem of Hermitian type matrices on Dawning-1000. It arises from the theoretical analysis of nonlinear optical crystal structures. We use Cholesky factorisation, Househoulder transformation, bisection method and inverse iteration to complete the computation. The implementation is based on the BLAS library and communication function library provided on Dawning-1000. The numerical results show very good performance and the application in physics is satisfactory.

本文研究了在dawn -1000上求解厄米型矩阵广义特征问题的并行实现。它源于对非线性光学晶体结构的理论分析。我们使用Cholesky分解、householder变换、等分法和逆迭代来完成计算。该实现基于曙光-1000提供的BLAS库和通信函数库。数值结果表明，该方法具有良好的性能，在物理上的应用令人满意。

引用次数: 1

A simulation research on multiprocessor interconnection networks with wormhole routing 基于虫洞路由的多处理机互连网络仿真研究

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574014

Yueming Hu

To design a parallel computer system, selecting an appropriate network is an important issue. This paper presents the simulation results on the performance of message passing interconnection networks used commonly in multiprocessor systems. Comparisons have been made on the performance of various interconnection networks like crossbar, mesh, hypercube, tree and hypertree with wormhole routing. The performance factors compared include the throughput of these networks and message delay. To make a more general model for tree structured network, this paper present the definition of m-fold n-ary tree, which is the extension of the hypertree network.

在设计并行计算机系统时，选择合适的网络是一个重要的问题。本文给出了多处理机系统中常用的消息传递互连网络的性能仿真结果。比较了采用虫洞路由的横条、网状、超立方体、树形和超树形互连网络的性能。比较的性能因素包括这些网络的吞吐量和消息延迟。为了给树结构网络建立一个更一般的模型，本文给出了m-fold n-ary树的定义，这是超树网络的扩展。

引用次数: 1

Parallel recursive algorithm for tridiagonal systems 三对角系统的并行递归算法

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574022

Yuguang Huang

In this paper, a parallel algorithm for solving tridiagonal equations based on recurrence is presented. Compared with the parallel prefix method (PP) which is also based on the recursive method, the computation cost is reduced by a factor of two while maintaining the same communication cost. The method can be viewed as a modified prefix method or prefix with substructuring. The complexity of the algorithm is analysed using the BSP model (Bulk Synchronous Parallel). Experimental results are obtained on a Sun workstation using the Oxford BSP Library.

本文提出了一种基于递归的求解三对角方程的并行算法。与同样基于递归方法的并行前缀法(PP)相比，在保持通信成本不变的情况下，计算量减少了1 / 2。该方法可以看作是一种修改的前缀方法或具有子结构的前缀。采用BSP (Bulk Synchronous Parallel)模型分析了该算法的复杂度。实验结果在使用牛津BSP库的Sun工作站上得到。

引用次数: 1

Study and design of scalable memory-shared multiprocessing system 可扩展内存共享多处理系统的研究与设计

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574036

Chengqing Ye, Zhonghai Wu, Changsheng Yang

This paper proposes the design of a scalable memory-shared multiprocessing system SMMP which supports Client/Server mode. SMMP system is composed of two-level interconnection networks, three-level memory subsystem and three-level I/O subsystem. There are many advantages in the design of our SMMP, such as scalable, easy to implement and operate, general purpose and large I/O throughput. It can be an excellent server for high-speed communication network.

提出了一种支持Client/Server模式的可扩展内存共享多处理系统SMMP的设计方案。SMMP系统由两级互连网络、三级存储子系统和三级I/O子系统组成。我们设计的SMMP具有可扩展性强、易于实现和操作、通用性强、I/O吞吐量大等优点。它可以成为高速通信网络的优秀服务器。

引用次数: 0

Implementation of efficient and reliable multicast servers 实现高效可靠的组播服务器

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574041

W. Jia, Chan-Hee Lee, X. Jia, Jiannong Cao

Reliable multicast services in a group of autonomous distributed processes/sites are desirable to maintain the consistent state of shared information accessed by transactions in distributed systems. Many existing protocols are complicated and thus quite expensive and not efficient for availability of distributed systems. This paper discusses the design and implementations of a new logical token ring based multicast communications services. It provides total ordering, atomicity of multicast messages membership and fault-tolerant services in the presence of sites fail stop and network partitioning. An unique feature of the protocol is that all members, knowing exactly, in the group, who holds the token, are able to detect right order of a multicast message, thereby, reducing the synchronous overhead, preventing possible token loss problems and minimizing control messages. The services are implemented by using finite state machine approach and they are highly efficient comparing with related services in the same network settings.

在一组自治的分布式进程/站点中，需要可靠的多播服务来保持分布式系统中事务访问的共享信息的一致状态。许多现有的协议都很复杂，因此非常昂贵，而且对于分布式系统的可用性来说效率不高。本文讨论了一种新的基于逻辑令牌环的组播通信服务的设计与实现。它提供了总排序、多播消息成员的原子性以及在站点故障停止和网络分区存在时的容错服务。该协议的一个独特之处在于，组中的所有成员都确切地知道谁持有令牌，从而能够检测多播消息的正确顺序，从而减少同步开销，防止可能的令牌丢失问题并最大限度地减少控制消息。该服务采用有限状态机方法实现，与相同网络环境下的相关服务相比，具有较高的效率。

引用次数: 1

Parallel matrix computations and their applications for biomagnetic fields 并行矩阵计算及其在生物磁场中的应用

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574024

V. Zerbe, Harald Keller, G. Schorcht

In this paper we present the results of a parallel implementation of a heart field simulation algorithm. The application of biomagnetic fields offers a wide range for using parallel algorithms. Pathological changes in the human body, especially in the heart muscle, can be diagnosed and localised by means of biomagnetic field parameters. The benefit of this diagnosis method is to fit an individual reference model of the heart field of a patient. Based on differences between the reference model and the real measured biomagnetic field parameters, the type and the position of defects in the heart can be located. The most time consuming components of the whole algorithm are the matrix computations, especially the matrix inversion. The matrix inversion can be implemented on a parallel distributed memory system. In this paper we discuss the routing, the parallel matrix inversion, and the speed up for different network topologies that depends on the number of processors and different problem sizes.

在本文中，我们提出了一个心脏场模拟算法的并行实现的结果。生物磁场的应用为并行算法的应用提供了广阔的空间。人体的病理变化，特别是心肌的病理变化，可以通过生物磁场参数进行诊断和定位。这种诊断方法的好处是适合病人心脏场的个人参考模型。根据参考模型与实际测量的生物磁场参数之间的差异，可以定位心脏缺陷的类型和位置。整个算法中最耗时的部分是矩阵计算，尤其是矩阵反演。矩阵反演可以在并行分布式存储系统上实现。在本文中，我们讨论了路由，并行矩阵反演，以及不同网络拓扑的速度取决于处理器的数量和不同的问题大小。

引用次数: 1

GPR-Tree: a global parallel index structure for multiattribute declustering on cluster of workstations GPR-Tree:一种面向工作站集群多属性聚类的全局并行索引结构

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574047

Xiaodong Fu, Dingxing Wang, Weimin Zheng

R-tree is a very popular dynamic access structure cable of storing multidimensional and spatial data. Considering it's merit of the efficient global balance and dynamic reorganization, we try to use R-tree to decluster the multiattribute data in database system or file system. As many previous multiattribute declustering mechanisms do not take into account the properties of the Cluster of Workstations (COW), we present the Global Parallel R-tree (GPR-Tree) under the architecture of COW. Firstly we inspect the issues in efficiency of R-tree and it's variants, we try to enhance the R-Tree efficiency by using heuristics information in the reconstruction of R-Tree during the node splitting and the treatment of the orphan entries of the underfilled node. Then we parallelize the improved R-Tree among the components in the system. The basic thought is to alleviate the bottleneck effect of the I/O subsystem, making use of the high speed network communication and the memory. The GPR-Tree is shared among the processing units (PU) of the system. We use a mixed LRU algorithm to schedule pages in memory to maintain the nodes visited frequently in memory. A write-update-like protocol is used to keep the coherency among multiple copies maintained in the system. This mechanism is proved efficient to improve the salability and performance of the system.

R-tree是一种非常流行的存储多维空间数据的动态访问结构。考虑到R-tree具有高效的全局平衡和动态重组的优点，我们尝试使用R-tree对数据库系统或文件系统中的多属性数据进行聚类。针对以往许多多属性聚类机制未考虑工作站集群(COW)特性的问题，提出了基于COW架构的全局并行r树(GPR-Tree)。首先考察了r树及其变体在效率方面存在的问题，并尝试通过在节点分裂过程中利用启发式信息重构r树和处理未填充节点的孤儿项来提高r树的效率。然后对改进后的r树进行并行化处理。其基本思想是利用高速网络通信和内存来缓解I/O子系统的瓶颈效应。gpr树在系统的处理单元(PU)之间共享。我们使用混合LRU算法在内存中调度页面，以维护内存中频繁访问的节点。类似于写更新的协议用于保持系统中维护的多个副本之间的一致性。实践证明，该机制有效地提高了系统的可销售性和性能。

{"title":"GPR-Tree: a global parallel index structure for multiattribute declustering on cluster of workstations","authors":"Xiaodong Fu, Dingxing Wang, Weimin Zheng","doi":"10.1109/APDC.1997.574047","DOIUrl":"https://doi.org/10.1109/APDC.1997.574047","url":null,"abstract":"R-tree is a very popular dynamic access structure cable of storing multidimensional and spatial data. Considering it's merit of the efficient global balance and dynamic reorganization, we try to use R-tree to decluster the multiattribute data in database system or file system. As many previous multiattribute declustering mechanisms do not take into account the properties of the Cluster of Workstations (COW), we present the Global Parallel R-tree (GPR-Tree) under the architecture of COW. Firstly we inspect the issues in efficiency of R-tree and it's variants, we try to enhance the R-Tree efficiency by using heuristics information in the reconstruction of R-Tree during the node splitting and the treatment of the orphan entries of the underfilled node. Then we parallelize the improved R-Tree among the components in the system. The basic thought is to alleviate the bottleneck effect of the I/O subsystem, making use of the high speed network communication and the memory. The GPR-Tree is shared among the processing units (PU) of the system. We use a mixed LRU algorithm to schedule pages in memory to maintain the nodes visited frequently in memory. A write-update-like protocol is used to keep the coherency among multiple copies maintained in the system. This mechanism is proved efficient to improve the salability and performance of the system.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126914746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Parallel VLSI neural system design for time-delay speech recognition computing 用于延迟语音识别计算的并行VLSI神经系统设计

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574008

D. Zhang

Neural system, as processors of time-sequence patterns, have been successfully applied to several speaker-dependent speech recognition computing. They can be efficiently implemented by a pipelined architecture. In this paper, parallel time-delay speech recognition computing for VLSI neural systems is presented. The system design methodology is to emphasize coordination between computational model, architectural description, and VLSI systolic implementation. Examples of time-delay speech recognition applications to VLSI neural system design and performance analysis are given to illustrate effectiveness of the parallel computation.

神经系统作为时间序列模式的处理器，已经成功地应用于几种依赖说话人的语音识别计算。它们可以通过流水线架构有效地实现。提出了一种基于VLSI神经系统的并行时延语音识别算法。系统设计方法强调计算模型、架构描述和VLSI系统实现之间的协调。最后给出了时延语音识别在VLSI神经系统设计和性能分析中的应用实例，以说明并行计算的有效性。

引用次数: 1

Parallel processing on traditional serial programs by huge node data flow 利用大节点数据流对传统串行程序进行并行处理

Proceedings. Advances in Parallel and Distributed Computing

Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574062

Siwei Luo, Anfeng Huang, Yaping Huang

This paper introduces an algorithm that can generate huge node data flow by compiling existing programs. The purpose of this algorithm is to improve the speed of parallel processing and utilize the large amount of existing program resources. In addition, this idea of huge node data flow algorithm can also be used in distributed processing and multi-thread processing.

本文介绍了一种通过编译现有程序生成海量节点数据流的算法。该算法的目的是提高并行处理的速度，并充分利用现有的大量程序资源。此外，这种大节点数据流算法的思想也可用于分布式处理和多线程处理。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. Advances in Parallel and Distributed Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀