Proceedings 11th International Parallel Processing Symposium最新文献

英文中文

Matrix transpose on meshes: theory and practice 网格上的矩阵转置:理论与实践

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580918

M. Kaufmann, U. Meyer, J. F. Sibeyn

Matrix transpose is a fundamental communication operation which is not dealt with optimally by general purpose routing schemes. For two dimensional meshes, the first optimal routing schedule is given. The strategy is simple enough to be implemented, but details of the available hardware are not favorable. However, alternative algorithms, designed along the same lines, give an improvement on the Intel Paragon.

矩阵转置是一种基本的通信操作，一般路由方案不能很好地处理它。对于二维网格，给出了第一个最优路由调度。该策略非常简单，可以实现，但是可用硬件的细节并不理想。然而，沿着相同路线设计的替代算法在英特尔Paragon上进行了改进。

引用次数: 9

Gracefully degradable pipeline networks 优雅的可降解管道网络

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580847

R. Cypher, Ambrose Kofi Laing

A pipeline is a linear array of processors with an input node at one end and an output node at the other end. This paper presents k-gracefully-degradable graphs which, given any set of up to k faults, contain a pipeline that uses all the healthy processor nodes. Our constructions are designed to tolerate faulty input and output nodes, but they can be adapted to provide solutions when the input and output nodes are guaranteed to be healthy. All of our constructions are optimal in terms of the number of nodes and the maximum degree of the processor nodes.

流水线是处理器的线性阵列，一端有输入节点，另一端有输出节点。本文提出了k-优雅可降解图，给定任意一组最多k个故障，其中包含一个使用所有健康处理器节点的管道。我们的结构被设计为可以容忍错误的输入和输出节点，但是当输入和输出节点保证健康时，它们可以被调整为提供解决方案。就节点数量和处理器节点的最大程度而言，我们所有的结构都是最优的。

引用次数: 0

High performance computational steering of physical simulations 物理模拟的高性能计算转向

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580866

J. Vetter, K. Schwan

Computational steering allows researchers to monitor and manage long running, resource intensive applications at runtime. Limited research has addressed high performance computational steering. High performance in computational steering is necessary for three reasons. First, a computational steering system must act intelligently at runtime in order to minimize its perturbation of the target application. Second, monitoring information extracted from the target must be analyzed and forwarded to the user in a timely fashion to allow fast decision making. Finally, steering actions must be executed with low latency to prevent undesirable feedback. The paper describes the use of language constructs, coined ACSL, within a system for computational steering. The steering system interprets ACSL statements and optimizes the requests for steering and monitoring. Specifically, the steering system, called Magellan, utilizes ACSL to intelligently control multithreaded, asynchronous steering servers that cooperatively steer applications. These results compare favorably to our previous Progress steering system.

计算转向允许研究人员在运行时监控和管理长时间运行的资源密集型应用程序。有限的研究解决了高性能的计算转向。计算转向的高性能是必要的，原因有三。首先，计算转向系统必须在运行时智能地行动，以最小化其对目标应用程序的扰动。其次，从目标中提取的监测信息必须及时分析并转发给用户，以便快速决策。最后，转向操作必须以低延迟执行，以防止不良反馈。本文描述了在计算导向系统中使用的语言结构，即ACSL。转向系统解释ACSL语句，并优化转向和监控请求。具体来说，称为Magellan的转向系统利用ACSL智能地控制多线程、异步转向服务器，这些服务器可以协同转向应用程序。这些结果与我们以前的进步转向系统相比是有利的。

引用次数: 84

An efficient parallel strategy for computing K-terminal reliability and finding most vital edge in 2-trees and partial 2-trees 一种计算2-树和部分2-树中k端可靠性和寻找最重要边的有效并行策略

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580963

Chin-Wen Ho, S. Hsieh, Gen-Huey Chen

The authors develop a parallel strategy to compute K-terminal reliability in 2-trees and partial 2-trees. They also solve the problem of finding the most vital edge with respect to K-terminal reliability in partial 2-trees. The algorithms take O(log n) time with C(m,n) processors on a CRCW PRAM, where C(m,n) is the number of processors required to find connected components of a graph with m edges and n vertices in logarithmic time.

提出了一种计算2树和部分2树k端可靠性的并行策略。他们还解决了在部分2树中找到关于k端可靠性的最重要边的问题。这些算法在CRCW PRAM上使用C(m,n)个处理器需要O(log n)时间，其中C(m,n)是在对数时间内找到具有m条边和n个顶点的图的连接组件所需的处理器数量。

引用次数: 4

Parallel inference on a linguistic knowledge base 基于语言知识库的并行推理

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580892

S. Harabagiu, D. Moldovan

This paper presents a possible solution for the text inference problem extracting information unstated in a text, but implied. The inference algorithm consists of a set of highly parallel search methods that when applied to the knowledge base find contexts of sentences that reveal information relevant to the text. Implementation, results and parallelism analysis are discussed.

本文提出了一种文本推理问题的可能解决方案，即提取文本中未陈述但隐含的信息。推理算法由一组高度并行的搜索方法组成，当应用于知识库时，可以找到揭示与文本相关信息的句子上下文。讨论了实现、结果和并行性分析。

引用次数: 2

Nearly optimal one-to-many parallel routing in star networks 星型网络中近乎最优的一对多并行路由

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580987

Chi-Chang Chen, Jianer Chen

Star networks were proposed recently as an attractive alternative to the well-known hypercube models for interconnection networks. Extensive research has been performed that shows that star networks are as versatile as hypercubes. This paper is an effort in the same direction. Based on the well-known paradigms, we study the one-to-many parallel routing problem on star networks and develop an improved routing algorithm that finds n-1 node-disjoint paths between one node and a set of other n-1 nodes in the n-star network. These parallel paths are proven of minimum length within a small additive constant, and our algorithm has an optimal time complexity. This result significantly improves the previous known algorithms for the problem. Moreover, the algorithm well illustrates an application of the orthogonal partition of star networks, which was observed by the original inventors of the star networks but seems generally overlooked in the subsequent study. We should also point out that similar problems are already studied for hypercubes and have proven useful in designing efficient and fault tolerant routing algorithms on hypercube networks.

星型网络最近被提出，作为众所周知的超立方体互连网络模型的一个有吸引力的替代方案。广泛的研究表明星形网络和超立方体一样通用。这篇论文就是在同一方向上的努力。基于已知的范式，我们研究了星型网络上的一对多并行路由问题，并开发了一种改进的路由算法，该算法在n-星型网络中找到一个节点与其他n-1个节点集合之间的n-1节点不相交路径。证明了这些并行路径在一个小的附加常数内具有最小长度，并且我们的算法具有最优的时间复杂度。该结果显著改进了先前已知的问题算法。此外，该算法还很好地说明了星形网络正交划分的一个应用，这是星形网络的原始发明者所观察到的，但在随后的研究中往往被忽视。我们还应该指出，类似的问题已经在超立方体网络中得到了研究，并被证明有助于在超立方体网络上设计高效和容错的路由算法。

引用次数: 65

Time-stamping algorithms for parallelization of loops at run-time 运行时循环并行化的时间戳算法

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580939

Chengzhong Xu, V. Chaudhary

In this paper we present two new run-time algorithms for the parallelization of loops that have indirect access patterns. The algorithms can handle any type of loop-carried dependencies. They follow the INSPECTOR/EXECUTOR scheme and improve upon previous algorithms with the same generality by allowing concurrent reads of the same location and by increasing the overlap of dependent iterations. The algorithms are based on time-stamping rules and implemented using multithreading tools. The experimental results on an SMP server with four processors show that our schemes are efficient and outperform their competitors consistently in all test cases. The difference between the two proposed algorithms is that one allows partially concurrent reads without causing extra overhead in its inspector while the other allows fully concurrent reads at a slight overhead in the dependence analysis. The algorithm allowing fully concurrent reads obtains up to an 80% improvement over its competitor.

本文提出了两种新的具有间接访问模式的循环并行化运行时算法。该算法可以处理任何类型的循环携带的依赖关系。它们遵循INSPECTOR/EXECUTOR模式，并通过允许对相同位置的并发读取和增加依赖迭代的重叠，以相同的通用性改进了以前的算法。该算法基于时间戳规则，并使用多线程工具实现。在一个带有四个处理器的SMP服务器上的实验结果表明，我们的方案是有效的，并且在所有测试用例中都始终优于竞争对手。这两种算法的不同之处在于，一种允许部分并发读取，而不会在检查器中造成额外的开销，而另一种允许完全并发读取，但依赖性分析中的开销很小。允许完全并发读取的算法比其竞争对手获得高达80%的改进。

引用次数: 8

Crossbar analysis for optimal deadlock recovery router architecture 最优死锁恢复路由器架构的交叉分析

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580960

Yungho Choi, T. Pinkston

We explore the design of optimal deadlock recovery-based fully adaptive routers by evaluating promising internal router crossbar designs. Unified and decoupled crossbar designs aimed at exploiting the full capabilities of adaptive routing are evaluated by analyzing their effect on overall network performance. We show that an enhanced hierarchical crossbar design that supports routing locality in virtual network class achieves highest performance with relatively low cost.

我们通过评估有前途的内部路由器交叉设计来探索基于死锁恢复的最优全自适应路由器的设计。通过分析其对整体网络性能的影响，评估了旨在充分利用自适应路由功能的统一和解耦交叉栏设计。我们证明了一种支持虚拟网络类路由局部性的增强分层交叉设计以相对较低的成本获得了最高的性能。

引用次数: 10

Designing efficient distributed algorithms using sampling techniques 使用抽样技术设计高效的分布式算法

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580932

S. Rajasekaran, David S. L. Wei

Shows the power of sampling techniques in designing efficient distributed algorithms. In particular, we show that, by using sampling techniques, selection can be done on some networks in such a way that the message complexity is independent of the cardinality of the set (file), provided the file size is polynomial in the network size. For example, given a file F of size n and an integer k (1/spl les/k/spl les/n), on a p-processor de Bruijn network our deterministic selection algorithm can find the kth smallest key from F using O(p log/sup 3/p) messages and with a communication delay of O(log/sup 3/p), and our randomized selection algorithm can finish the same task using only O(p) messages and a communication delay of O(log p) with high probability, provided the file size is polynomial in network size. Our randomized selection outperforms the existing approaches in terms of both message complexity and communication delay. The property that the number of messages needed and the communication delay are independent of the size of the file makes our distributed selection schemes extremely attractive in such domains as very large database systems. Making use of our selection algorithms to select pivot element(s), we also develop a near-optimal quicksort-based sorting scheme and a nearly-optimal enumeration sorting scheme for sorting large distributed files on the hypercube and de Bruijn networks. Our algorithms are fully distributed without any a priori central control.

展示了抽样技术在设计高效分布式算法中的作用。特别是，我们表明，通过使用采样技术，可以在某些网络上以这样的方式进行选择，即消息复杂性与集合(文件)的基数无关，前提是文件大小是网络大小的多项式。例如，给定大小为n的文件F和整数k (1/spl les/k/spl les/n)，在p处理器de Bruijn网络上，我们的确定性选择算法可以使用O(p log/sup 3/p)条消息和O(log/sup 3/p)的通信延迟从F中找到第k个最小的密钥，并且我们的随机选择算法可以仅使用O(p)条消息和O(log p)的通信延迟以高概率完成相同的任务，前提是文件大小是网络大小的多项式。我们的随机选择方法在消息复杂性和通信延迟方面都优于现有的方法。所需的消息数量和通信延迟与文件大小无关的特性使我们的分布式选择方案在非常大的数据库系统等领域非常有吸引力。利用我们的选择算法来选择主元素，我们还开发了一个近乎最优的基于快速排序的排序方案和一个近乎最优的枚举排序方案，用于对超立方体和de Bruijn网络上的大型分布式文件进行排序。我们的算法是完全分布式的，没有任何先验的中心控制。

{"title":"Designing efficient distributed algorithms using sampling techniques","authors":"S. Rajasekaran, David S. L. Wei","doi":"10.1109/IPPS.1997.580932","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580932","url":null,"abstract":"Shows the power of sampling techniques in designing efficient distributed algorithms. In particular, we show that, by using sampling techniques, selection can be done on some networks in such a way that the message complexity is independent of the cardinality of the set (file), provided the file size is polynomial in the network size. For example, given a file F of size n and an integer k (1/spl les/k/spl les/n), on a p-processor de Bruijn network our deterministic selection algorithm can find the kth smallest key from F using O(p log/sup 3/p) messages and with a communication delay of O(log/sup 3/p), and our randomized selection algorithm can finish the same task using only O(p) messages and a communication delay of O(log p) with high probability, provided the file size is polynomial in network size. Our randomized selection outperforms the existing approaches in terms of both message complexity and communication delay. The property that the number of messages needed and the communication delay are independent of the size of the file makes our distributed selection schemes extremely attractive in such domains as very large database systems. Making use of our selection algorithms to select pivot element(s), we also develop a near-optimal quicksort-based sorting scheme and a nearly-optimal enumeration sorting scheme for sorting large distributed files on the hypercube and de Bruijn networks. Our algorithms are fully distributed without any a priori central control.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122209449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An evaluation of a commercial CC-NUMA architecture-the CONVEX Exemplar SPP1200 商用CC-NUMA架构- CONVEX Exemplar SPP1200的评估

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580831

R. Thekkath, A. Singh, J. Singh, S. John, J. Hennessy

Studies done with academic CC-NUMA machines and simulators indicate a good potential for application performance. Our goal therefore, is to investigate whether the CONVEX Exemplar a commercial distributed shared memory machine, lives up to the expected potential of CC-NUMA machines. If not, we would like to understand what architectural or implementation decisions make it less efficient. On evaluating the delivered performance on the Exemplar we find that, while a moderate-scale Exemplar machine works well for several applications, it does not for some important classes. Further performance was affected by four fundamental characteristics of the machine, all of which are due to basic implementation and design choices made on the Exemplar. These are: the effect of processor clustering together with limited node-to-network bandwidth, the effect of tertiary caches, the limited user control over data placement, the sequential memory consistency model together with a cache-based cache coherence protocol, and lastly, longer remote latencies.

用学术CC-NUMA机器和模拟器进行的研究表明，它具有提高应用程序性能的良好潜力。因此，我们的目标是调查商用分布式共享内存机器CONVEX Exemplar是否符合CC-NUMA机器的预期潜力。如果不是，我们希望了解哪些架构或实现决策使其效率降低。在评估Exemplar上交付的性能时，我们发现，虽然中等规模的Exemplar机器可以很好地用于几个应用程序，但它不适用于一些重要的类。进一步的性能受到机器的四个基本特性的影响，所有这些都是由于在Exemplar上做出的基本实现和设计选择。它们是:处理器集群的影响以及有限的节点到网络带宽，第三缓存的影响，有限的用户对数据放置的控制，顺序内存一致性模型以及基于缓存的缓存一致性协议，最后，更长的远程延迟。

引用次数: 27

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 11th International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀