首页 > 最新文献

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing最新文献

英文 中文
The Parallel Asynchronous Recursion model 并行异步递归模型
L. Higham, Eric Schenk
The authors introduce and evaluate a novel model of parallel computation, called the parallel asynchronous recursion (PAR) model. This model offers distinct advantages to the program designer and the parallel machine architect, while avoiding some of the parallel random-access machine; (PRAM's) shortcomings. The PAR model can be thought of as a procedural programming language augmented with a process control structure that can, in parallel, recursively fork independent processes and merge their results. The unique aspect of the PAR model lies in its memory semantics, which differ substantially from both global and distributed memory models. It provides a high level of abstraction that removes the tasks of explicit processor scheduling and synchronization. Efficient simulations of the PAR model on well-established models confirm that the PAR model's advantages can be obtained at a reasonable cost.<>
作者介绍并评价了一种新的并行计算模型,称为并行异步递归(PAR)模型。该模型为程序设计者和并行机架构师提供了明显的优势,同时避免了并行随机存取机的一些问题;婴儿车的缺点。PAR模型可以看作是一种带有进程控制结构的过程性编程语言,该结构可以并行地递归地分叉独立进程并合并它们的结果。PAR模型的独特之处在于它的内存语义,这与全局和分布式内存模型有很大的不同。它提供了一个高层次的抽象,消除了显式处理器调度和同步的任务。在已建立的模型上对PAR模型进行了有效的仿真,证实了PAR模型的优点可以在合理的成本下获得。
{"title":"The Parallel Asynchronous Recursion model","authors":"L. Higham, Eric Schenk","doi":"10.1109/SPDP.1992.242729","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242729","url":null,"abstract":"The authors introduce and evaluate a novel model of parallel computation, called the parallel asynchronous recursion (PAR) model. This model offers distinct advantages to the program designer and the parallel machine architect, while avoiding some of the parallel random-access machine; (PRAM's) shortcomings. The PAR model can be thought of as a procedural programming language augmented with a process control structure that can, in parallel, recursively fork independent processes and merge their results. The unique aspect of the PAR model lies in its memory semantics, which differ substantially from both global and distributed memory models. It provides a high level of abstraction that removes the tasks of explicit processor scheduling and synchronization. Efficient simulations of the PAR model on well-established models confirm that the PAR model's advantages can be obtained at a reasonable cost.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121839959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A new framework for designing parallel algorithms on series parallel graphs 串联并行图上并行算法设计的新框架
Yuval Caspi, E. Dekel
The authors propose a novel framework for designing efficient parallel algorithms on series parallel graphs. Recently, a novel approach for recognizing series parallel graphs was presented by D. Eppstein. Eppstein explored characterizations of the ear decomposition of series parallel graphs, which can be identified efficiently, in parallel. The authors extend Eppstein's results and show in a unified manner how to solve problems on series parallel graphs efficiently, in parallel, by finding a special ear decomposition of the graph. They demonstrate the utility of their novel framework by presenting O(log n) concurrent read exclusive write (CREW) parallel random access machine (PRAM) algorithms for the construction of a depth-first spanning tree, st-numbering, and a breadth-first spanning tree on series parallel graphs.<>
作者提出了一种新的框架,用于在串联并行图上设计高效的并行算法。最近,d.e ppstein提出了一种新的序列平行图识别方法。Eppstein探索了一系列并行图的耳朵分解的特征,可以有效地并行识别。作者推广了Eppstein的结果,并通过寻找图的一个特殊的耳分解,统一地展示了如何有效地并行求解串联并行图上的问题。他们通过展示O(log n)并发读独占写(CREW)并行随机存取机(PRAM)算法来展示他们的新框架的实用性,该算法用于在序列并行图上构建深度优先生成树、st-编号和宽度优先生成树。
{"title":"A new framework for designing parallel algorithms on series parallel graphs","authors":"Yuval Caspi, E. Dekel","doi":"10.1109/SPDP.1992.242748","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242748","url":null,"abstract":"The authors propose a novel framework for designing efficient parallel algorithms on series parallel graphs. Recently, a novel approach for recognizing series parallel graphs was presented by D. Eppstein. Eppstein explored characterizations of the ear decomposition of series parallel graphs, which can be identified efficiently, in parallel. The authors extend Eppstein's results and show in a unified manner how to solve problems on series parallel graphs efficiently, in parallel, by finding a special ear decomposition of the graph. They demonstrate the utility of their novel framework by presenting O(log n) concurrent read exclusive write (CREW) parallel random access machine (PRAM) algorithms for the construction of a depth-first spanning tree, st-numbering, and a breadth-first spanning tree on series parallel graphs.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127854094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMALL: a scalable multithreaded architecture to exploit large locality SMALL:一个可扩展的多线程架构,利用大型局部性
R. Govindarajan, S. Nemawarkar
The authors propose a multithreaded architecture that performs synchronization efficiently by following a layered approach, exploits larger locality by using large, resident activations, and reduces the number of load stalls with the help of a novel high-speed buffer organization. The performance of the proposed architecture is evaluated using deterministic discrete-event simulation. Initial simulation results indicate that the architecture can achieve high performance in terms of both speedup and processor utilization.<>
作者提出了一种多线程架构,该架构通过遵循分层方法有效地执行同步,通过使用大型驻留激活来利用更大的局域性,并通过新的高速缓冲区组织来减少负载停滞的数量。采用确定性离散事件仿真对所提体系结构的性能进行了评估。初步仿真结果表明,该架构在加速和处理器利用率方面都能达到较高的性能
{"title":"SMALL: a scalable multithreaded architecture to exploit large locality","authors":"R. Govindarajan, S. Nemawarkar","doi":"10.1109/SPDP.1992.242766","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242766","url":null,"abstract":"The authors propose a multithreaded architecture that performs synchronization efficiently by following a layered approach, exploits larger locality by using large, resident activations, and reduces the number of load stalls with the help of a novel high-speed buffer organization. The performance of the proposed architecture is evaluated using deterministic discrete-event simulation. Initial simulation results indicate that the architecture can achieve high performance in terms of both speedup and processor utilization.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115549440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cache coherent shared memory hypercube multiprocessors 缓存一致共享内存超立方体多处理器
J. Ding, L. Bhuyan
The authors examine the feasibility of building cache coherent shared memory multiprocessor systems on hypercube. Various shared memory schemes are investigated and compared with each other. The schemes considered are based on memory coherence algorithms for distributed shared memory and cache coherence protocols for other shared memory architectures. It is concluded that efficient cache coherent architectures can be built using hypercubes.<>
研究了在超立方体上构建高速缓存一致共享内存多处理器系统的可行性。对各种共享内存方案进行了研究和比较。所考虑的方案是基于分布式共享内存的内存一致性算法和其他共享内存体系结构的缓存一致性协议。结论是,使用超立方体可以构建高效的缓存一致性架构。
{"title":"Cache coherent shared memory hypercube multiprocessors","authors":"J. Ding, L. Bhuyan","doi":"10.1109/SPDP.1992.242701","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242701","url":null,"abstract":"The authors examine the feasibility of building cache coherent shared memory multiprocessor systems on hypercube. Various shared memory schemes are investigated and compared with each other. The schemes considered are based on memory coherence algorithms for distributed shared memory and cache coherence protocols for other shared memory architectures. It is concluded that efficient cache coherent architectures can be built using hypercubes.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115471267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Memory architecture support for the SIMD construction of a Gaussian pyramid 内存架构支持SIMD构造高斯金字塔
Jong Won Park, D. Harper
A memory system is introduced for the efficient construction of a Gaussian pyramid. The memory system consists of an address calculating circuit, an address routing circuit, a data routing circuit, a memory module selection circuit, and 2/sup n/+1 memory modules. The memory system provides parallel access to 2/sup n/ image points whose patterns are a block, a row, or a column, where the interval of the block or column is one and the interval of the row is one or two. The performance of a generic SIMD (single-instruction multiple-data) processor using the proposed memory system is compared with that of one using an interleaved memory system for the recursive construction of a Gaussian pyramid.<>
介绍了一种有效构造高斯金字塔的存储系统。存储系统由地址计算电路、地址路由电路、数据路由电路、内存模块选择电路和2/sup n/+1内存模块组成。存储系统提供对2/sup / n个映像点的并行访问,这些映像点的模式为块、行或列,其中块或列的间隔为1,行间隔为1或2。在高斯金字塔的递归构造中,比较了使用该存储系统的通用单指令多数据处理器与使用交错存储系统的通用单指令多数据处理器的性能。
{"title":"Memory architecture support for the SIMD construction of a Gaussian pyramid","authors":"Jong Won Park, D. Harper","doi":"10.1109/SPDP.1992.242711","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242711","url":null,"abstract":"A memory system is introduced for the efficient construction of a Gaussian pyramid. The memory system consists of an address calculating circuit, an address routing circuit, a data routing circuit, a memory module selection circuit, and 2/sup n/+1 memory modules. The memory system provides parallel access to 2/sup n/ image points whose patterns are a block, a row, or a column, where the interval of the block or column is one and the interval of the row is one or two. The performance of a generic SIMD (single-instruction multiple-data) processor using the proposed memory system is compared with that of one using an interleaved memory system for the recursive construction of a Gaussian pyramid.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115886129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallel image sequence coding on multiprocessor systems 多处理器系统并行图像序列编码
S. Rampal, D. Agrawal
The authors introduce dictionary-based image sequence coding (DISC) as a new approach to the problem of compression of image sequence data. The DISC algorithm is an adaptation of textual data compression techniques for image sequence data. The algorithm is extremely well suited for parallel implementation on standard configurations such as the rectangular mesh and the hypercube. For N*N images, the authors present SIMD (single-instruction multiple-data) algorithms of time complexities approximately theta (DN) for the mesh and theta (D log N+log/sup 2/ N) for the hypercube (D is proportional to dictionary size). The DISC approach has the additional advantage of involving essentially only simple data movement and lookup operations. Simulation results indicate that moderate to high compression ratios can be achieved along with good visual fidelity and quality of reconstruction.<>
基于字典的图像序列编码(DISC)是解决图像序列数据压缩问题的一种新方法。DISC算法是对图像序列数据的文本数据压缩技术的一种改进。该算法非常适合在标准配置(如矩形网格和超立方体)上并行实现。对于N*N个图像,作者提出了SIMD(单指令多数据)算法,其时间复杂度对于网格近似为theta (DN),对于超立方体近似为theta (D log N+log/sup 2/ N) (D与字典大小成正比)。DISC方法还有一个额外的优点,即基本上只涉及简单的数据移动和查找操作。仿真结果表明,该方法可以获得中等到较高的压缩比,并具有良好的视觉保真度和重建质量。
{"title":"Parallel image sequence coding on multiprocessor systems","authors":"S. Rampal, D. Agrawal","doi":"10.1109/SPDP.1992.242759","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242759","url":null,"abstract":"The authors introduce dictionary-based image sequence coding (DISC) as a new approach to the problem of compression of image sequence data. The DISC algorithm is an adaptation of textual data compression techniques for image sequence data. The algorithm is extremely well suited for parallel implementation on standard configurations such as the rectangular mesh and the hypercube. For N*N images, the authors present SIMD (single-instruction multiple-data) algorithms of time complexities approximately theta (DN) for the mesh and theta (D log N+log/sup 2/ N) for the hypercube (D is proportional to dictionary size). The DISC approach has the additional advantage of involving essentially only simple data movement and lookup operations. Simulation results indicate that moderate to high compression ratios can be achieved along with good visual fidelity and quality of reconstruction.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123693186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On uniformization of affine dependence algorithms 仿射依赖算法的均匀化
Weijia Shang, E. Hodzic, Zhigang Chen
The authors consider the problem of transforming irregular data dependence structures of algorithms with nested loops into more regular ones. Algorithms under consideration are n-dimensional algorithms (algorithms with n nested loops) with affine dependences where dependences are linear functions of index variables of the loop. Methods are proposed to transform these algorithms into uniform dependence algorithms where dependences are independent of the index variables (constant). Some parallelism might be lost due to making them uniform. The parallelism preserved by the uniformity is measured by (1) the total execution time by the optimal linear schedule which assigns each computation in the algorithm an execution time according to a linear function of the index of the computation and (2) the size of the cone spanned by the dependence vectors after achieving uniformity. The objective of making the dependence uniform is to maximize parallelism preserved by the uniformity or to minimize the number of dependences after uniformity.<>
作者考虑了将具有嵌套循环的算法的不规则数据依赖结构转换为更规则的数据依赖结构的问题。考虑的算法是具有仿射依赖关系的n维算法(具有n个嵌套循环的算法),其中依赖关系是循环索引变量的线性函数。提出了将这些算法转化为一致依赖算法的方法,其中一致依赖算法与索引变量(常量)无关。由于使它们一致,可能会失去一些并行性。均匀性所保持的并行性是通过(1)最优线性调度的总执行时间来衡量的,该调度根据计算索引的线性函数为算法中的每个计算分配一个执行时间;(2)在实现均匀性后,依赖向量所跨越的锥的大小。使依赖一致的目标是使均匀性所保持的并行性最大化,或使均匀性之后的依赖数最小化。
{"title":"On uniformization of affine dependence algorithms","authors":"Weijia Shang, E. Hodzic, Zhigang Chen","doi":"10.1109/SPDP.1992.242753","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242753","url":null,"abstract":"The authors consider the problem of transforming irregular data dependence structures of algorithms with nested loops into more regular ones. Algorithms under consideration are n-dimensional algorithms (algorithms with n nested loops) with affine dependences where dependences are linear functions of index variables of the loop. Methods are proposed to transform these algorithms into uniform dependence algorithms where dependences are independent of the index variables (constant). Some parallelism might be lost due to making them uniform. The parallelism preserved by the uniformity is measured by (1) the total execution time by the optimal linear schedule which assigns each computation in the algorithm an execution time according to a linear function of the index of the computation and (2) the size of the cone spanned by the dependence vectors after achieving uniformity. The objective of making the dependence uniform is to maximize parallelism preserved by the uniformity or to minimize the number of dependences after uniformity.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"40 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113936944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
The Dharma scheduler-definitive scheduling in Aurora on multiprocessors architecture Dharma调度器——Aurora在多处理器架构下的确定性调度
Raéd Yousef Sindaha
In any Or-parallel system which implements the full Prolog language, such as Aurora, there is the problem of processing time being wasted in regions of the search tree which are later pruned away. The author proposes the Dharma scheduler, which introduces a new concept in scheduling for Aurora. Rather than performing scheduling based on the nodes in the search tree, the Dharma scheduler uses the branches of the tree. The author believes that scheduling at this higher level of abstraction has a number of advantages and will make it possible to tackle the problem of wasted speculative work. Early performance results suggest that the Dharma scheduler is faster than any other existing scheduler for Aurora in applications where only the first solution is required. The author presents the design of the Dharma scheduler and performance results.<>
在任何实现完整Prolog语言的或并行系统中,比如Aurora,都存在处理时间浪费在搜索树的区域上的问题,这些区域后来被修剪掉了。作者提出了Dharma调度器,为Aurora的调度引入了一个新的概念。Dharma调度器不是基于搜索树中的节点执行调度,而是使用树的分支。作者认为,在这种更高的抽象层次上进行调度有许多优点,并将使解决浪费思测工作的问题成为可能。早期的性能结果表明,在只需要第一个解决方案的应用程序中,Dharma调度器比Aurora的任何其他现有调度器都要快。作者给出了Dharma调度器的设计和性能结果。
{"title":"The Dharma scheduler-definitive scheduling in Aurora on multiprocessors architecture","authors":"Raéd Yousef Sindaha","doi":"10.1109/SPDP.1992.242731","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242731","url":null,"abstract":"In any Or-parallel system which implements the full Prolog language, such as Aurora, there is the problem of processing time being wasted in regions of the search tree which are later pruned away. The author proposes the Dharma scheduler, which introduces a new concept in scheduling for Aurora. Rather than performing scheduling based on the nodes in the search tree, the Dharma scheduler uses the branches of the tree. The author believes that scheduling at this higher level of abstraction has a number of advantages and will make it possible to tackle the problem of wasted speculative work. Early performance results suggest that the Dharma scheduler is faster than any other existing scheduler for Aurora in applications where only the first solution is required. The author presents the design of the Dharma scheduler and performance results.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124301558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Two system state calculation algorithms for optimal load balancing 两种最优负载平衡的系统状态计算算法
A. Winckler
The author introduces and explains two algorithms, OFCup and OFCdown, allowing one to calculate the global state of a decentralized distributed system by interpreting measurements that are easy to obtain to facilitate cooperative optimal load balancing without a central job dispatcher. The information required is exchanged using the communication protocol of a receiver-initiated load balancing policy and does not induce any additional message transmission overhead. The author presents and interprets measurements from simulation. These studies show that the performance of systems applying any of the OFCx algorithms is significantly better than a no-information policy called 'random routing' and induces only little additional waiting time compared to the M/D/n model. This is true even for high transmission times relative to the mean time between system state changes. Both algorithms are shown to perform equally well under normal conditions with better variance values of OFC-down, but the degradation of OFCdown is significantly worse than that of OFCup, if the not-accept-counter is not incremented at the time expected.<>
作者介绍并解释了OFCup和OFCdown两种算法,允许人们通过解释易于获得的测量值来计算分散分布式系统的全局状态,以促进协作最优负载平衡,而无需中央作业调度程序。使用接收方发起的负载平衡策略的通信协议交换所需的信息,并且不会引起任何额外的消息传输开销。作者给出并解释了仿真测量结果。这些研究表明,应用任何OFCx算法的系统性能明显优于称为“随机路由”的无信息策略,并且与M/D/n模型相比,只引起很少的额外等待时间。即使对于相对于系统状态变化之间的平均时间的高传输时间也是如此。这两种算法在正常条件下都表现得同样好,OFC-down的方差值更好,但如果不接受计数器在预期时间内增加,OFCdown的退化明显比OFCup更差。
{"title":"Two system state calculation algorithms for optimal load balancing","authors":"A. Winckler","doi":"10.1109/SPDP.1992.242735","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242735","url":null,"abstract":"The author introduces and explains two algorithms, OFCup and OFCdown, allowing one to calculate the global state of a decentralized distributed system by interpreting measurements that are easy to obtain to facilitate cooperative optimal load balancing without a central job dispatcher. The information required is exchanged using the communication protocol of a receiver-initiated load balancing policy and does not induce any additional message transmission overhead. The author presents and interprets measurements from simulation. These studies show that the performance of systems applying any of the OFCx algorithms is significantly better than a no-information policy called 'random routing' and induces only little additional waiting time compared to the M/D/n model. This is true even for high transmission times relative to the mean time between system state changes. Both algorithms are shown to perform equally well under normal conditions with better variance values of OFC-down, but the degradation of OFCdown is significantly worse than that of OFCup, if the not-accept-counter is not incremented at the time expected.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122541096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mapping tree-structured computations onto mesh-connected arrays of processors 将树状结构的计算映射到网格连接的处理器阵列
Jyh-Jong Tsay
The author shows how to parallelize tree-structured computations for d-dimensional (d>or=1) mesh-connected arrays of processors. A tree-structured computation T consists of n computational tasks whose dependencies form a task tree T of n constant degree nodes. Each task can be executed in unit time and sends one value to its parent task after it has been executed. The author presents linear time algorithms for partitioning and mapping the task tree T onto a p/sup 1/d/*. . .*p/sup 1/d/ mesh-connected array of processors so that one can schedule the processors to perform computation T in O(n/p) time, for p>
作者展示了如何并行化d维(d>或=1)网格连接处理器阵列的树结构计算。一个树状结构的计算T由n个计算任务组成,这些计算任务之间的依赖关系形成一个由n个常次节点组成的任务树T。每个任务可以在单位时间内执行,并在执行后向其父任务发送一个值。作者提出了将任务树T划分和映射到p/sup 1/d/*. .*p/sup 1/d/网格连接的处理器阵列上的线性时间算法,从而可以调度处理器在O(n/p)时间内执行计算T,对于p>
{"title":"Mapping tree-structured computations onto mesh-connected arrays of processors","authors":"Jyh-Jong Tsay","doi":"10.1109/SPDP.1992.242760","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242760","url":null,"abstract":"The author shows how to parallelize tree-structured computations for d-dimensional (d>or=1) mesh-connected arrays of processors. A tree-structured computation T consists of n computational tasks whose dependencies form a task tree T of n constant degree nodes. Each task can be executed in unit time and sends one value to its parent task after it has been executed. The author presents linear time algorithms for partitioning and mapping the task tree T onto a p/sup 1/d/*. . .*p/sup 1/d/ mesh-connected array of processors so that one can schedule the processors to perform computation T in O(n/p) time, for p<or= min(n/h, n/sup d/(d+1)/). It is shown that one can schedule a p/sup 1/d/ * . .* p/sup 1/d/ mesh to evaluate an n-node expression tree of associative operators in O(n/p) optimal time, for p<or= n/sup d/(d+1)/.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124814635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1