Hypercube Dynamic Load Balancing

D. King, E. Wegman
{"title":"Hypercube Dynamic Load Balancing","authors":"D. King, E. Wegman","doi":"10.1109/DMCC.1990.556305","DOIUrl":null,"url":null,"abstract":"This paper reports on the results of a preliminary study in dynamic load balancing on an Intel Hypercube. The purpose of this research is to provide experimental data in how parallel algorithms should be constructed to obtain maximal utilization of a parallel architecture. This study is one aspect of an ongoing research project into the construction of an automated parallelization tool. This tool will take FORTRAN source as input, and construct a parallel algorithm that will produce the same results as the original serial input. The focus of this paper is on the load balancing aspect of that project. The basic idea is to reserve a certain percentage of the computation task, subdivide that percentage into arbitrarily fine tasks, and dole those small tasks out to nodes on request. Ij” the percentage is chosen correctly, then a minority of nodes should be involved in consuming the filler tasks, and the overall throughput of the job should increase as a result of the individual node efJciencies having increased. This paper will outline our approach to performing dynamic load balancing on an Intel iPSC/2. We take the view that the problem of load balancing is really a problem of dividing a “computational task” into smaller components, each of roughly equal complexity, and each an independent event. After this is done, the components of the task can be sent to a node for execution. The key to an optimally balanced load across all computational nodes is the ability to form a statistical profile of the individual components of each computational task. This statistical profile will determine an initial sequence of execution. Our experience indicates that a speedup on the order of 80% is achievable with the judicious use of profiled load balancing. During the process of execution, the initial profile will be altered according to the actual behavior exhibited by the nodes. The difference between the actual and expected performance will be used to determine how much additional time should be devoted to altering the current execution schedule. Currently, our work involves statically setting the load balancing parameters. Our load balancing system determines the execution schedule","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMCC.1990.556305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper reports on the results of a preliminary study in dynamic load balancing on an Intel Hypercube. The purpose of this research is to provide experimental data in how parallel algorithms should be constructed to obtain maximal utilization of a parallel architecture. This study is one aspect of an ongoing research project into the construction of an automated parallelization tool. This tool will take FORTRAN source as input, and construct a parallel algorithm that will produce the same results as the original serial input. The focus of this paper is on the load balancing aspect of that project. The basic idea is to reserve a certain percentage of the computation task, subdivide that percentage into arbitrarily fine tasks, and dole those small tasks out to nodes on request. Ij” the percentage is chosen correctly, then a minority of nodes should be involved in consuming the filler tasks, and the overall throughput of the job should increase as a result of the individual node efJciencies having increased. This paper will outline our approach to performing dynamic load balancing on an Intel iPSC/2. We take the view that the problem of load balancing is really a problem of dividing a “computational task” into smaller components, each of roughly equal complexity, and each an independent event. After this is done, the components of the task can be sent to a node for execution. The key to an optimally balanced load across all computational nodes is the ability to form a statistical profile of the individual components of each computational task. This statistical profile will determine an initial sequence of execution. Our experience indicates that a speedup on the order of 80% is achievable with the judicious use of profiled load balancing. During the process of execution, the initial profile will be altered according to the actual behavior exhibited by the nodes. The difference between the actual and expected performance will be used to determine how much additional time should be devoted to altering the current execution schedule. Currently, our work involves statically setting the load balancing parameters. Our load balancing system determines the execution schedule
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hypercube动态负载平衡
本文报告了在Intel Hypercube上进行动态负载平衡的初步研究结果。本研究的目的是为如何构建并行算法以最大限度地利用并行架构提供实验数据。这项研究是正在进行的自动化并行化工具构建研究项目的一个方面。该工具将采用FORTRAN源作为输入,并构造一个并行算法,该算法将产生与原始串行输入相同的结果。本文的重点是该项目的负载平衡方面。其基本思想是预留一定百分比的计算任务,将该百分比细分为任意精细的任务,并根据请求将这些小任务分发给节点。如果正确选择了百分比,则应该有少数节点参与使用填充任务,并且由于单个节点效率的提高,作业的总体吞吐量应该增加。本文将概述我们在英特尔iPSC/2上执行动态负载平衡的方法。我们认为,负载平衡问题实际上是将“计算任务”划分为更小的组件的问题,每个组件的复杂性大致相等,每个组件都是独立的事件。完成此操作后,可以将任务的组件发送到节点执行。在所有计算节点之间实现最佳均衡负载的关键是能够形成每个计算任务的单个组件的统计概要。此统计概要文件将确定初始执行顺序。我们的经验表明,通过明智地使用概要负载平衡,可以实现80%左右的加速。在执行过程中,初始配置文件将根据节点显示的实际行为进行更改。实际性能和预期性能之间的差异将用于确定应该投入多少额外时间来更改当前执行计划。目前,我们的工作涉及静态设置负载平衡参数。我们的负载平衡系统决定执行时间表
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reducing Inner Product Computation in the Parallel One-Sided Jacobi Algorithm Experience with Concurrent Aggregates (CA): Implementation and Programming A Distributed Memory Implementation of SISAL Performance Results on the Intel Touchstone Gamma Prototype Quick Recovery of Embedded Structures in Hypercube Computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1