[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing最新文献

英文中文

A fast sort using parallelism within memory 在内存中使用并行性的快速排序

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242727

C. Leopold

The author models the internal structure of memory by a tree, where nodes represent memory modules (like cache, disks), and edges represent buses between them. The modules have smaller access time, capacity, and block size the nearer they are to the root. All buses may transmit blocks of data in parallel. The author gives a deterministic sorting algorithm based on greed-sort. Its running time is shown to be optimal up to a constant factor. The bound implies the number of parallel modules necessary at each hierarchy level to overcome the I/O bottlenecks of sorting. The proposed algorithm also applies to the less general models UMH (uniform memory hierarchies) and P-UMH.<>

作者通过树来建模内存的内部结构，其中节点表示内存模块(如缓存、磁盘)，边表示它们之间的总线。模块越靠近根，其访问时间、容量和块大小就越小。所有总线都可以并行传输数据块。给出了一种基于贪婪排序的确定性排序算法。它的运行时间被证明是最优的，直到一个常数因子。该边界表示在每个层次结构级别上克服排序的I/O瓶颈所需的并行模块的数量。提出的算法也适用于不太通用的模型UMH(统一内存层次结构)和P-UMH。

引用次数: 2

A general purpose distributed implementation of simulated annealing 模拟退火的通用分布式实现

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242758

Ralf Diekmann, Reinhard Lüling, J. Simon

The authors present a problem-independent general-purpose parallel implementation of simulated annealing (SA) on distributed message-passing multiprocessor systems. The sequential algorithm is studied, and a classification of combinatorial optimization problems together with their neighborhood structures is given. Several parallelization approaches are examined, considering their suitability for problems of the various classes. For typical representatives of the different classes, good parallel SA implementations are presented. A novel parallel SA algorithm that works simultaneously on several Markov chains and decreases the number of chains dynamically is presented. This method yields good results with a parallel self-adapting cooling schedule. All algorithms are implemented in OCCAM-2 on a free configurable transputer system. Measurements on various numbers of processors up to 128 transputers are presented.<>

在分布式消息传递多处理器系统上，提出了一种与问题无关的通用并行模拟退火算法(SA)。研究了序列优化算法，给出了组合优化问题及其邻域结构的分类。考察了几种并行化方法，考虑了它们对各种类问题的适用性。对于不同类的典型代表，给出了良好的并行SA实现。提出了一种同时处理多条马尔可夫链并动态减少链数的并行SA算法。该方法采用并行自适应冷却方案，效果良好。所有的算法都是在OCCAM-2中在一个自由配置的转发器系统上实现的。对不同数量的处理器进行了测量，最多可达128个转发器。

引用次数: 17

Residue number systems: a key to parallelism in public key cryptography 余数系统:公钥加密中并行性的关键

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242713

K. C. Posch, R. Posch

Public key cryptography and parallel algorithms are considered. Special attention is paid to algorithms using long integer modulo arithmetic. A modification of the commonly known RSA algorithm is taken as a candidate. So far all implementations have been more or less sequential in the sense that no partitions of a long integer among various processing elements have been performed. The proposed approach allows the use of a dedicated processor for each group of about 30 to 50 bits of a long integer. Efficiency is primarily gained when special-purpose processors are used. In this regard this work is the basis of a VLSI approach to a multiprocessor-based cryptographic design with 15 to 100 processors involved.<>

讨论了公钥加密和并行算法。特别注意使用长整数模运算的算法。采用了一种对RSA算法的修改作为候选算法。到目前为止，所有的实现或多或少都是顺序的，即没有在各种处理元素之间执行长整数的分区。所提出的方法允许为每组约30至50位的长整数使用专用处理器。效率主要是在使用专用处理器时获得的。在这方面，这项工作是VLSI方法的基础，以多处理器为基础的加密设计，涉及15到100个处理器

引用次数: 18

The Scalable Tree Protocol-a cache coherence approach for large-scale multiprocessors 可扩展树协议-大规模多处理器的缓存一致性方法

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242703

H. Nilsson, P. Stenström

The problem of cache coherence in large-scale shared-memory multiprocessors has been addressed using directory-schemes. Two problems arise when the number of processors increases: the network latency increases and the implementation cost must be kept acceptable. The authors present a tree-based cache coherence protocol called the scalable tree protocol (STP). They show that it can be implemented at a reasonable implementation cost and that the write latency is logarithmic to the size of the sharing set. How to maintain an optimal tree structure and how to handle replacements efficiently are critical issues the authors address for this type of protocol. They compare the performance of the STP with that of the scalable coherent interface (SCI) (IEEE standard P1596) by considering a classical matrix-oriented algorithm targeted for large-scale parallel processing. They show that the STP manages to reduce the execution time considerably by reducing the write latency.<>

利用目录模式解决了大规模共享内存多处理器中的缓存一致性问题。当处理器数量增加时，会出现两个问题:网络延迟增加，实现成本必须保持在可接受的范围内。作者提出了一种基于树的缓存一致性协议，称为可扩展树协议(STP)。他们表明，它可以以合理的实现成本实现，并且写入延迟与共享集的大小成对数关系。如何保持一个最优的树结构和如何有效地处理替换是作者为这种类型的协议解决的关键问题。他们通过考虑针对大规模并行处理的经典面向矩阵算法，将STP的性能与可扩展相干接口(SCI) (IEEE标准P1596)的性能进行了比较。他们表明STP通过减少写延迟来大大减少执行时间。

引用次数: 44

Software caching on cache-coherent multiprocessors 缓存一致多处理器上的软件缓存

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242700

R. Bianchini, T. LeBlanc

The authors explore the utility of software caching on a machine with coherent caches. In particular, they show that by caching at the application level one can avoid the problem of false sharing on cache-coherent machines. They compare the performance of software caching with that of other techniques for alleviating false sharing and show that software caching performs better than the alternatives when the reference behavior of an application changes dynamically. It is concluded that software caching, as well as other techniques developed for noncoherent shared-memory multiprocessors, can be profitably used on machines with hardware coherent caches and that programs based on these techniques are efficient across a variety of shared-memory machines.<>

作者探讨了软件缓存在具有一致缓存的机器上的效用。特别是，它们表明，通过在应用程序级别进行缓存，可以避免在缓存一致的机器上错误共享的问题。他们将软件缓存的性能与其他缓解虚假共享的技术进行了比较，并表明当应用程序的引用行为发生动态变化时，软件缓存的性能优于其他技术。结论是，软件缓存以及为非相干共享内存多处理器开发的其他技术，可以在具有硬件相干缓存的机器上有效地使用，并且基于这些技术的程序在各种共享内存机器上都是高效的

引用次数: 18

A tight bound on the diameter of one dimensional PEC networks 一维PEC网络直径的紧界

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242722

Cho-Chin Lin, V. Prasanna

The diameter of a packed exponential connections (PEC) network on N nodes is shown to be theta ( square root log N*2 square root /sup (2log/ /sup N)/, where log N denotes log to the base 2. The present results can be extended to the case of two-dimensional PEC networks.<>

N个节点上的填充指数连接(PEC)网络的直径显示为theta(平方根log N*2平方根/sup (2log/ /sup N)/，其中log N表示log以2为底。本文的结果可以推广到二维PEC网络的情况。

引用次数: 3

A methodology for generating data distributions to optimize communication 一种生成数据分布以优化通信的方法

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242712

S. Gupta, S. Kaushik, Chua-Huang Huang, John R. Johnson, Rodney W. Johnson, P. Sadayappan

The authors present an algebraic theory, based on the tensor product for describing the semantics of regular data distributions such as block, cyclic, and block-cyclic distributions. These distributions have been proposed in high performance Fortran, an ongoing effort for developing a Fortran extension for massively parallel computing. This algebraic theory has been used for designing and implementing block recursive algorithms on shared-memory and vector multiprocessors. In the present work, the authors extend this theory to generate programs with explicit data distribution commands from tensor product formulas. A methodology to generate data distributions that optimize communication is described. This methodology is demonstrated by generating efficient programs with data distribution for the fast Fourier transform.<>

作者提出了一种基于张量积的代数理论，用于描述正则数据分布(如块分布、循环分布和块循环分布)的语义。这些发行版是在高性能Fortran中提出的，这是一项为大规模并行计算开发Fortran扩展的持续努力。该代数理论已被用于设计和实现共享内存和矢量多处理器上的块递归算法。在本工作中，作者将这一理论扩展到从张量积公式中生成具有显式数据分布命令的程序。描述了一种生成优化通信的数据分布的方法。该方法通过生成具有快速傅里叶变换数据分布的高效程序来证明。

引用次数: 10

Using communication-to-computation ratio in parallel program design and performance prediction 通信计算比在并行程序设计和性能预测中的应用

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242738

M. Crovella, R. Bianchini, T. LeBlanc

The authors goal is to be able to predict the performance of a parallel program early in the program development process; to that end they require prediction methods that can be based on incomplete programs. They describe how a single method based on communication-to-computation (C/C) ratio can be used to predict performance accurately and yet fairly simply in some commonly encountered cases. They show how C/C-ratio-based methods are accomplished for both distributed-memory and coherent-memory multiprocessors. They show that focusing on C/C ratio simplifies the use of theory, machine benchmarking and application measurement necessary to provide good parallel performance prediction. In addition, the methods demonstrated are useful because they can be applied to program fragments, or serially executed code.<>

作者的目标是能够在程序开发过程的早期预测并行程序的性能;为此，他们需要基于不完整程序的预测方法。它们描述了在一些常见情况下，如何使用基于通信与计算(C/C)比率的单一方法来准确而又相当简单地预测性能。它们展示了基于C/C比率的方法是如何为分布式内存和相干内存多处理器实现的。他们表明，关注C/C比率简化了提供良好并行性能预测所需的理论、机器基准测试和应用测量的使用。此外，所演示的方法很有用，因为它们可以应用于程序片段或串行执行的代码。

引用次数: 32

Evaluating reliability improvements of fault tolerant VLSI processor arrays 可容错VLSI处理器阵列可靠性改进评估

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242752

D. Tao

An important and meaningful criterion for evaluating a VLSI processor array incorporating an ABFT (algorithm-based fault tolerance) technique is identified. A reliability model which can be used to accurately compute the reliability improvement of a fault-tolerant processor array is established. Examples showing that, when an ABFT technique is incorporated, the reliability improvement depends on the size of the processor array, the nature of the failure, and the failure rate are presented. Therefore, by using the reliability model and methods discussed here, a system designer will be able to determine whether it is beneficial to incorporate an ABFT technique a priori. Moreover, if the reliability of an ABFT processor array cannot meet the specified requirement, the proposed method can also be used as a guide to partition it into smaller ones so that this ABFT technique is still effective and a minimal amount of overhead is introduced.<>

提出了一种评估集成了ABFT(基于算法的容错)技术的VLSI处理器阵列的重要而有意义的准则。建立了可准确计算容错处理器阵列可靠性改进的可靠性模型。实例表明，当采用ABFT技术时，可靠性的提高取决于处理器阵列的大小、故障的性质和故障率。因此，通过使用这里讨论的可靠性模型和方法，系统设计者将能够确定是否有利于纳入先验的ABFT技术。此外，如果一个ABFT处理器阵列的可靠性不能满足规定的要求，该方法还可以作为指导，将其划分为更小的处理器阵列，使该ABFT技术仍然有效，并且引入的开销最小。

引用次数: 1

Deterministic routing on circular arrays 圆形阵列上的确定性路由

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

Pub Date : 1992-12-01 DOI: 10.1109/SPDP.1992.242721

Michael Kaufmann, J. F. Sibeyn

The authors analyze the routing of k-permutations on circular processor arrays connected by bidirectional links. In contrast to linear processor arrays, it is hard to prove lower bounds for the routing time or to construct efficient algorithms for routing k-permutations on circular arrays (except for the case k=1). The authors prove nontrivial lower bounds for routing with global knowledge and for routing with local knowledge. They present deterministic algorithms that use local information only. The best of these algorithms requires only k*n/4+emsn routing steps for all k>or=4. This almost matches the k*n/4 lower bound. Special attention is given to the cases k=2 and 3.<>

本文分析了双向链路连接的圆形处理器阵列上k-置换的路由问题。与线性处理器阵列相比，很难证明路由时间的下界，也很难构建有效的算法来在圆形阵列上路由k-排列(除了k=1的情况)。证明了具有全局知识的路由和具有局部知识的路由的非平凡下界。他们提出了只使用局部信息的确定性算法。对于所有k>或=4的算法，最好的算法只需要k*n/4+emsn的路由步骤。这和k*n/4的下界差不多。特别注意k=2和3的情况。

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀