首页 > 最新文献

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing最新文献

英文 中文
A fast sort using parallelism within memory 在内存中使用并行性的快速排序
C. Leopold
The author models the internal structure of memory by a tree, where nodes represent memory modules (like cache, disks), and edges represent buses between them. The modules have smaller access time, capacity, and block size the nearer they are to the root. All buses may transmit blocks of data in parallel. The author gives a deterministic sorting algorithm based on greed-sort. Its running time is shown to be optimal up to a constant factor. The bound implies the number of parallel modules necessary at each hierarchy level to overcome the I/O bottlenecks of sorting. The proposed algorithm also applies to the less general models UMH (uniform memory hierarchies) and P-UMH.<>
作者通过树来建模内存的内部结构,其中节点表示内存模块(如缓存、磁盘),边表示它们之间的总线。模块越靠近根,其访问时间、容量和块大小就越小。所有总线都可以并行传输数据块。给出了一种基于贪婪排序的确定性排序算法。它的运行时间被证明是最优的,直到一个常数因子。该边界表示在每个层次结构级别上克服排序的I/O瓶颈所需的并行模块的数量。提出的算法也适用于不太通用的模型UMH(统一内存层次结构)和P-UMH。
{"title":"A fast sort using parallelism within memory","authors":"C. Leopold","doi":"10.1109/SPDP.1992.242727","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242727","url":null,"abstract":"The author models the internal structure of memory by a tree, where nodes represent memory modules (like cache, disks), and edges represent buses between them. The modules have smaller access time, capacity, and block size the nearer they are to the root. All buses may transmit blocks of data in parallel. The author gives a deterministic sorting algorithm based on greed-sort. Its running time is shown to be optimal up to a constant factor. The bound implies the number of parallel modules necessary at each hierarchy level to overcome the I/O bottlenecks of sorting. The proposed algorithm also applies to the less general models UMH (uniform memory hierarchies) and P-UMH.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114925674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A general purpose distributed implementation of simulated annealing 模拟退火的通用分布式实现
Ralf Diekmann, Reinhard Lüling, J. Simon
The authors present a problem-independent general-purpose parallel implementation of simulated annealing (SA) on distributed message-passing multiprocessor systems. The sequential algorithm is studied, and a classification of combinatorial optimization problems together with their neighborhood structures is given. Several parallelization approaches are examined, considering their suitability for problems of the various classes. For typical representatives of the different classes, good parallel SA implementations are presented. A novel parallel SA algorithm that works simultaneously on several Markov chains and decreases the number of chains dynamically is presented. This method yields good results with a parallel self-adapting cooling schedule. All algorithms are implemented in OCCAM-2 on a free configurable transputer system. Measurements on various numbers of processors up to 128 transputers are presented.<>
在分布式消息传递多处理器系统上,提出了一种与问题无关的通用并行模拟退火算法(SA)。研究了序列优化算法,给出了组合优化问题及其邻域结构的分类。考察了几种并行化方法,考虑了它们对各种类问题的适用性。对于不同类的典型代表,给出了良好的并行SA实现。提出了一种同时处理多条马尔可夫链并动态减少链数的并行SA算法。该方法采用并行自适应冷却方案,效果良好。所有的算法都是在OCCAM-2中在一个自由配置的转发器系统上实现的。对不同数量的处理器进行了测量,最多可达128个转发器。
{"title":"A general purpose distributed implementation of simulated annealing","authors":"Ralf Diekmann, Reinhard Lüling, J. Simon","doi":"10.1109/SPDP.1992.242758","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242758","url":null,"abstract":"The authors present a problem-independent general-purpose parallel implementation of simulated annealing (SA) on distributed message-passing multiprocessor systems. The sequential algorithm is studied, and a classification of combinatorial optimization problems together with their neighborhood structures is given. Several parallelization approaches are examined, considering their suitability for problems of the various classes. For typical representatives of the different classes, good parallel SA implementations are presented. A novel parallel SA algorithm that works simultaneously on several Markov chains and decreases the number of chains dynamically is presented. This method yields good results with a parallel self-adapting cooling schedule. All algorithms are implemented in OCCAM-2 on a free configurable transputer system. Measurements on various numbers of processors up to 128 transputers are presented.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130433651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Residue number systems: a key to parallelism in public key cryptography 余数系统:公钥加密中并行性的关键
K. C. Posch, R. Posch
Public key cryptography and parallel algorithms are considered. Special attention is paid to algorithms using long integer modulo arithmetic. A modification of the commonly known RSA algorithm is taken as a candidate. So far all implementations have been more or less sequential in the sense that no partitions of a long integer among various processing elements have been performed. The proposed approach allows the use of a dedicated processor for each group of about 30 to 50 bits of a long integer. Efficiency is primarily gained when special-purpose processors are used. In this regard this work is the basis of a VLSI approach to a multiprocessor-based cryptographic design with 15 to 100 processors involved.<>
讨论了公钥加密和并行算法。特别注意使用长整数模运算的算法。采用了一种对RSA算法的修改作为候选算法。到目前为止,所有的实现或多或少都是顺序的,即没有在各种处理元素之间执行长整数的分区。所提出的方法允许为每组约30至50位的长整数使用专用处理器。效率主要是在使用专用处理器时获得的。在这方面,这项工作是VLSI方法的基础,以多处理器为基础的加密设计,涉及15到100个处理器
{"title":"Residue number systems: a key to parallelism in public key cryptography","authors":"K. C. Posch, R. Posch","doi":"10.1109/SPDP.1992.242713","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242713","url":null,"abstract":"Public key cryptography and parallel algorithms are considered. Special attention is paid to algorithms using long integer modulo arithmetic. A modification of the commonly known RSA algorithm is taken as a candidate. So far all implementations have been more or less sequential in the sense that no partitions of a long integer among various processing elements have been performed. The proposed approach allows the use of a dedicated processor for each group of about 30 to 50 bits of a long integer. Efficiency is primarily gained when special-purpose processors are used. In this regard this work is the basis of a VLSI approach to a multiprocessor-based cryptographic design with 15 to 100 processors involved.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131656323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The Scalable Tree Protocol-a cache coherence approach for large-scale multiprocessors 可扩展树协议-大规模多处理器的缓存一致性方法
H. Nilsson, P. Stenström
The problem of cache coherence in large-scale shared-memory multiprocessors has been addressed using directory-schemes. Two problems arise when the number of processors increases: the network latency increases and the implementation cost must be kept acceptable. The authors present a tree-based cache coherence protocol called the scalable tree protocol (STP). They show that it can be implemented at a reasonable implementation cost and that the write latency is logarithmic to the size of the sharing set. How to maintain an optimal tree structure and how to handle replacements efficiently are critical issues the authors address for this type of protocol. They compare the performance of the STP with that of the scalable coherent interface (SCI) (IEEE standard P1596) by considering a classical matrix-oriented algorithm targeted for large-scale parallel processing. They show that the STP manages to reduce the execution time considerably by reducing the write latency.<>
利用目录模式解决了大规模共享内存多处理器中的缓存一致性问题。当处理器数量增加时,会出现两个问题:网络延迟增加,实现成本必须保持在可接受的范围内。作者提出了一种基于树的缓存一致性协议,称为可扩展树协议(STP)。他们表明,它可以以合理的实现成本实现,并且写入延迟与共享集的大小成对数关系。如何保持一个最优的树结构和如何有效地处理替换是作者为这种类型的协议解决的关键问题。他们通过考虑针对大规模并行处理的经典面向矩阵算法,将STP的性能与可扩展相干接口(SCI) (IEEE标准P1596)的性能进行了比较。他们表明STP通过减少写延迟来大大减少执行时间。
{"title":"The Scalable Tree Protocol-a cache coherence approach for large-scale multiprocessors","authors":"H. Nilsson, P. Stenström","doi":"10.1109/SPDP.1992.242703","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242703","url":null,"abstract":"The problem of cache coherence in large-scale shared-memory multiprocessors has been addressed using directory-schemes. Two problems arise when the number of processors increases: the network latency increases and the implementation cost must be kept acceptable. The authors present a tree-based cache coherence protocol called the scalable tree protocol (STP). They show that it can be implemented at a reasonable implementation cost and that the write latency is logarithmic to the size of the sharing set. How to maintain an optimal tree structure and how to handle replacements efficiently are critical issues the authors address for this type of protocol. They compare the performance of the STP with that of the scalable coherent interface (SCI) (IEEE standard P1596) by considering a classical matrix-oriented algorithm targeted for large-scale parallel processing. They show that the STP manages to reduce the execution time considerably by reducing the write latency.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134623294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Software caching on cache-coherent multiprocessors 缓存一致多处理器上的软件缓存
R. Bianchini, T. LeBlanc
The authors explore the utility of software caching on a machine with coherent caches. In particular, they show that by caching at the application level one can avoid the problem of false sharing on cache-coherent machines. They compare the performance of software caching with that of other techniques for alleviating false sharing and show that software caching performs better than the alternatives when the reference behavior of an application changes dynamically. It is concluded that software caching, as well as other techniques developed for noncoherent shared-memory multiprocessors, can be profitably used on machines with hardware coherent caches and that programs based on these techniques are efficient across a variety of shared-memory machines.<>
作者探讨了软件缓存在具有一致缓存的机器上的效用。特别是,它们表明,通过在应用程序级别进行缓存,可以避免在缓存一致的机器上错误共享的问题。他们将软件缓存的性能与其他缓解虚假共享的技术进行了比较,并表明当应用程序的引用行为发生动态变化时,软件缓存的性能优于其他技术。结论是,软件缓存以及为非相干共享内存多处理器开发的其他技术,可以在具有硬件相干缓存的机器上有效地使用,并且基于这些技术的程序在各种共享内存机器上都是高效的
{"title":"Software caching on cache-coherent multiprocessors","authors":"R. Bianchini, T. LeBlanc","doi":"10.1109/SPDP.1992.242700","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242700","url":null,"abstract":"The authors explore the utility of software caching on a machine with coherent caches. In particular, they show that by caching at the application level one can avoid the problem of false sharing on cache-coherent machines. They compare the performance of software caching with that of other techniques for alleviating false sharing and show that software caching performs better than the alternatives when the reference behavior of an application changes dynamically. It is concluded that software caching, as well as other techniques developed for noncoherent shared-memory multiprocessors, can be profitably used on machines with hardware coherent caches and that programs based on these techniques are efficient across a variety of shared-memory machines.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A tight bound on the diameter of one dimensional PEC networks 一维PEC网络直径的紧界
Cho-Chin Lin, V. Prasanna
The diameter of a packed exponential connections (PEC) network on N nodes is shown to be theta ( square root log N*2 square root /sup (2log/ /sup N)/, where log N denotes log to the base 2. The present results can be extended to the case of two-dimensional PEC networks.<>
N个节点上的填充指数连接(PEC)网络的直径显示为theta(平方根log N*2平方根/sup (2log/ /sup N)/,其中log N表示log以2为底。本文的结果可以推广到二维PEC网络的情况。
{"title":"A tight bound on the diameter of one dimensional PEC networks","authors":"Cho-Chin Lin, V. Prasanna","doi":"10.1109/SPDP.1992.242722","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242722","url":null,"abstract":"The diameter of a packed exponential connections (PEC) network on N nodes is shown to be theta ( square root log N*2 square root /sup (2log/ /sup N)/, where log N denotes log to the base 2. The present results can be extended to the case of two-dimensional PEC networks.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129198511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A methodology for generating data distributions to optimize communication 一种生成数据分布以优化通信的方法
S. Gupta, S. Kaushik, Chua-Huang Huang, John R. Johnson, Rodney W. Johnson, P. Sadayappan
The authors present an algebraic theory, based on the tensor product for describing the semantics of regular data distributions such as block, cyclic, and block-cyclic distributions. These distributions have been proposed in high performance Fortran, an ongoing effort for developing a Fortran extension for massively parallel computing. This algebraic theory has been used for designing and implementing block recursive algorithms on shared-memory and vector multiprocessors. In the present work, the authors extend this theory to generate programs with explicit data distribution commands from tensor product formulas. A methodology to generate data distributions that optimize communication is described. This methodology is demonstrated by generating efficient programs with data distribution for the fast Fourier transform.<>
作者提出了一种基于张量积的代数理论,用于描述正则数据分布(如块分布、循环分布和块循环分布)的语义。这些发行版是在高性能Fortran中提出的,这是一项为大规模并行计算开发Fortran扩展的持续努力。该代数理论已被用于设计和实现共享内存和矢量多处理器上的块递归算法。在本工作中,作者将这一理论扩展到从张量积公式中生成具有显式数据分布命令的程序。描述了一种生成优化通信的数据分布的方法。该方法通过生成具有快速傅里叶变换数据分布的高效程序来证明。
{"title":"A methodology for generating data distributions to optimize communication","authors":"S. Gupta, S. Kaushik, Chua-Huang Huang, John R. Johnson, Rodney W. Johnson, P. Sadayappan","doi":"10.1109/SPDP.1992.242712","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242712","url":null,"abstract":"The authors present an algebraic theory, based on the tensor product for describing the semantics of regular data distributions such as block, cyclic, and block-cyclic distributions. These distributions have been proposed in high performance Fortran, an ongoing effort for developing a Fortran extension for massively parallel computing. This algebraic theory has been used for designing and implementing block recursive algorithms on shared-memory and vector multiprocessors. In the present work, the authors extend this theory to generate programs with explicit data distribution commands from tensor product formulas. A methodology to generate data distributions that optimize communication is described. This methodology is demonstrated by generating efficient programs with data distribution for the fast Fourier transform.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125425544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Using communication-to-computation ratio in parallel program design and performance prediction 通信计算比在并行程序设计和性能预测中的应用
M. Crovella, R. Bianchini, T. LeBlanc
The authors goal is to be able to predict the performance of a parallel program early in the program development process; to that end they require prediction methods that can be based on incomplete programs. They describe how a single method based on communication-to-computation (C/C) ratio can be used to predict performance accurately and yet fairly simply in some commonly encountered cases. They show how C/C-ratio-based methods are accomplished for both distributed-memory and coherent-memory multiprocessors. They show that focusing on C/C ratio simplifies the use of theory, machine benchmarking and application measurement necessary to provide good parallel performance prediction. In addition, the methods demonstrated are useful because they can be applied to program fragments, or serially executed code.<>
作者的目标是能够在程序开发过程的早期预测并行程序的性能;为此,他们需要基于不完整程序的预测方法。它们描述了在一些常见情况下,如何使用基于通信与计算(C/C)比率的单一方法来准确而又相当简单地预测性能。它们展示了基于C/C比率的方法是如何为分布式内存和相干内存多处理器实现的。他们表明,关注C/C比率简化了提供良好并行性能预测所需的理论、机器基准测试和应用测量的使用。此外,所演示的方法很有用,因为它们可以应用于程序片段或串行执行的代码。
{"title":"Using communication-to-computation ratio in parallel program design and performance prediction","authors":"M. Crovella, R. Bianchini, T. LeBlanc","doi":"10.1109/SPDP.1992.242738","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242738","url":null,"abstract":"The authors goal is to be able to predict the performance of a parallel program early in the program development process; to that end they require prediction methods that can be based on incomplete programs. They describe how a single method based on communication-to-computation (C/C) ratio can be used to predict performance accurately and yet fairly simply in some commonly encountered cases. They show how C/C-ratio-based methods are accomplished for both distributed-memory and coherent-memory multiprocessors. They show that focusing on C/C ratio simplifies the use of theory, machine benchmarking and application measurement necessary to provide good parallel performance prediction. In addition, the methods demonstrated are useful because they can be applied to program fragments, or serially executed code.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126392749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Evaluating reliability improvements of fault tolerant VLSI processor arrays 可容错VLSI处理器阵列可靠性改进评估
D. Tao
An important and meaningful criterion for evaluating a VLSI processor array incorporating an ABFT (algorithm-based fault tolerance) technique is identified. A reliability model which can be used to accurately compute the reliability improvement of a fault-tolerant processor array is established. Examples showing that, when an ABFT technique is incorporated, the reliability improvement depends on the size of the processor array, the nature of the failure, and the failure rate are presented. Therefore, by using the reliability model and methods discussed here, a system designer will be able to determine whether it is beneficial to incorporate an ABFT technique a priori. Moreover, if the reliability of an ABFT processor array cannot meet the specified requirement, the proposed method can also be used as a guide to partition it into smaller ones so that this ABFT technique is still effective and a minimal amount of overhead is introduced.<>
提出了一种评估集成了ABFT(基于算法的容错)技术的VLSI处理器阵列的重要而有意义的准则。建立了可准确计算容错处理器阵列可靠性改进的可靠性模型。实例表明,当采用ABFT技术时,可靠性的提高取决于处理器阵列的大小、故障的性质和故障率。因此,通过使用这里讨论的可靠性模型和方法,系统设计者将能够确定是否有利于纳入先验的ABFT技术。此外,如果一个ABFT处理器阵列的可靠性不能满足规定的要求,该方法还可以作为指导,将其划分为更小的处理器阵列,使该ABFT技术仍然有效,并且引入的开销最小。
{"title":"Evaluating reliability improvements of fault tolerant VLSI processor arrays","authors":"D. Tao","doi":"10.1109/SPDP.1992.242752","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242752","url":null,"abstract":"An important and meaningful criterion for evaluating a VLSI processor array incorporating an ABFT (algorithm-based fault tolerance) technique is identified. A reliability model which can be used to accurately compute the reliability improvement of a fault-tolerant processor array is established. Examples showing that, when an ABFT technique is incorporated, the reliability improvement depends on the size of the processor array, the nature of the failure, and the failure rate are presented. Therefore, by using the reliability model and methods discussed here, a system designer will be able to determine whether it is beneficial to incorporate an ABFT technique a priori. Moreover, if the reliability of an ABFT processor array cannot meet the specified requirement, the proposed method can also be used as a guide to partition it into smaller ones so that this ABFT technique is still effective and a minimal amount of overhead is introduced.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116477876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deterministic routing on circular arrays 圆形阵列上的确定性路由
Michael Kaufmann, J. F. Sibeyn
The authors analyze the routing of k-permutations on circular processor arrays connected by bidirectional links. In contrast to linear processor arrays, it is hard to prove lower bounds for the routing time or to construct efficient algorithms for routing k-permutations on circular arrays (except for the case k=1). The authors prove nontrivial lower bounds for routing with global knowledge and for routing with local knowledge. They present deterministic algorithms that use local information only. The best of these algorithms requires only k*n/4+emsn routing steps for all k>or=4. This almost matches the k*n/4 lower bound. Special attention is given to the cases k=2 and 3.<>
本文分析了双向链路连接的圆形处理器阵列上k-置换的路由问题。与线性处理器阵列相比,很难证明路由时间的下界,也很难构建有效的算法来在圆形阵列上路由k-排列(除了k=1的情况)。证明了具有全局知识的路由和具有局部知识的路由的非平凡下界。他们提出了只使用局部信息的确定性算法。对于所有k>或=4的算法,最好的算法只需要k*n/4+emsn的路由步骤。这和k*n/4的下界差不多。特别注意k=2和3的情况。
{"title":"Deterministic routing on circular arrays","authors":"Michael Kaufmann, J. F. Sibeyn","doi":"10.1109/SPDP.1992.242721","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242721","url":null,"abstract":"The authors analyze the routing of k-permutations on circular processor arrays connected by bidirectional links. In contrast to linear processor arrays, it is hard to prove lower bounds for the routing time or to construct efficient algorithms for routing k-permutations on circular arrays (except for the case k=1). The authors prove nontrivial lower bounds for routing with global knowledge and for routing with local knowledge. They present deterministic algorithms that use local information only. The best of these algorithms requires only k*n/4+emsn routing steps for all k>or=4. This almost matches the k*n/4 lower bound. Special attention is given to the cases k=2 and 3.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133808990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1