大规模并行分布式处理中任务复制的性能分析:一个极值理论方法

Q4 Decision Sciences Journal of the Operations Research Society of Japan Pub Date : 2016-04-01 DOI:10.15807/JORSJ.59.174

T. Hirai, H. Masuyama, S. Kasahara, Yutaka Takahashi

{"title":"大规模并行分布式处理中任务复制的性能分析:一个极值理论方法","authors":"T. Hirai, H. Masuyama, S. Kasahara, Yutaka Takahashi","doi":"10.15807/JORSJ.59.174","DOIUrl":null,"url":null,"abstract":"In cloud computing, a large-scale parallel-distributed processing service is provided in which a huge task is split into a number of subtasks, which are processed independently on a cluster of machines referred to as workers. Those workers that take longer to process their assigned subtasks result in the processing delay of the task (the issue of stragglers). An efficient way to address this issue is for other workers to execute the troubled subtasks for backup purposes (task replication). In this paper, we evaluate the efficiency of task replication from a theoretical point of view. The mean value and standard deviation of the task-processing time are derived approximately using extreme value theory, while the mean total processing time is evaluated exactly, for cases in which the worker-processing time follows a hyper-exponential, Weibull, or Pareto distribution. The numerical results reveal that the efficiency of task replication depends significantly on the tail of the worker-processing time distribution. In addition, the optimal number of replications which achieves the shortest task-processing time mainly depends on the coefficient of variation of the worker-processing time. Furthermore, three replications are effective to guarantee a low variance of the task-processing time, regardless of the tail.","PeriodicalId":51107,"journal":{"name":"Journal of the Operations Research Society of Japan","volume":"59 1","pages":"174-194"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.15807/JORSJ.59.174","citationCount":"1","resultStr":"{\"title\":\"PERFORMANCE ANALYSIS OF TASK REPLICATION IN LARGE-SCALE PARALLEL-DISTRIBUTED PROCESSING : AN EXTREME VALUE THEORY APPROACH\",\"authors\":\"T. Hirai, H. Masuyama, S. Kasahara, Yutaka Takahashi\",\"doi\":\"10.15807/JORSJ.59.174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In cloud computing, a large-scale parallel-distributed processing service is provided in which a huge task is split into a number of subtasks, which are processed independently on a cluster of machines referred to as workers. Those workers that take longer to process their assigned subtasks result in the processing delay of the task (the issue of stragglers). An efficient way to address this issue is for other workers to execute the troubled subtasks for backup purposes (task replication). In this paper, we evaluate the efficiency of task replication from a theoretical point of view. The mean value and standard deviation of the task-processing time are derived approximately using extreme value theory, while the mean total processing time is evaluated exactly, for cases in which the worker-processing time follows a hyper-exponential, Weibull, or Pareto distribution. The numerical results reveal that the efficiency of task replication depends significantly on the tail of the worker-processing time distribution. In addition, the optimal number of replications which achieves the shortest task-processing time mainly depends on the coefficient of variation of the worker-processing time. Furthermore, three replications are effective to guarantee a low variance of the task-processing time, regardless of the tail.\",\"PeriodicalId\":51107,\"journal\":{\"name\":\"Journal of the Operations Research Society of Japan\",\"volume\":\"59 1\",\"pages\":\"174-194\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.15807/JORSJ.59.174\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Operations Research Society of Japan\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15807/JORSJ.59.174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Operations Research Society of Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15807/JORSJ.59.174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 1

摘要

在云计算中，提供了一种大规模并行分布式处理服务，其中一个巨大的任务被分成许多子任务，这些子任务在称为工人的机器集群上独立处理。那些需要更长的时间来处理分配给它们的子任务的工作者会导致任务的处理延迟(掉队者的问题)。解决此问题的有效方法是让其他工作人员执行有问题的子任务以进行备份(任务复制)。本文从理论的角度对任务复制的效率进行了评价。任务处理时间的平均值和标准差是使用极值理论近似导出的，而平均总处理时间是精确评估的，对于工人处理时间遵循超指数分布、威布尔分布或帕累托分布的情况。数值结果表明，任务复制的效率很大程度上取决于工人-加工时间分布的尾部。此外，实现最短任务处理时间的最优复制数主要取决于工人处理时间的变异系数。此外，三次重复可以有效地保证任务处理时间的低方差，而不考虑尾部。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PERFORMANCE ANALYSIS OF TASK REPLICATION IN LARGE-SCALE PARALLEL-DISTRIBUTED PROCESSING : AN EXTREME VALUE THEORY APPROACH

In cloud computing, a large-scale parallel-distributed processing service is provided in which a huge task is split into a number of subtasks, which are processed independently on a cluster of machines referred to as workers. Those workers that take longer to process their assigned subtasks result in the processing delay of the task (the issue of stragglers). An efficient way to address this issue is for other workers to execute the troubled subtasks for backup purposes (task replication). In this paper, we evaluate the efficiency of task replication from a theoretical point of view. The mean value and standard deviation of the task-processing time are derived approximately using extreme value theory, while the mean total processing time is evaluated exactly, for cases in which the worker-processing time follows a hyper-exponential, Weibull, or Pareto distribution. The numerical results reveal that the efficiency of task replication depends significantly on the tail of the worker-processing time distribution. In addition, the optimal number of replications which achieves the shortest task-processing time mainly depends on the coefficient of variation of the worker-processing time. Furthermore, three replications are effective to guarantee a low variance of the task-processing time, regardless of the tail.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Operations Research Society of Japan 管理科学-运筹学与管理科学

CiteScore

0.70

自引率

0.00%

发文量

审稿时长

12 months

期刊介绍： The journal publishes original work and quality reviews in the field of operations research and management science to OR practitioners and researchers in two substantive categories: operations research methods; applications and practices of operations research in industry, public sector, and all areas of science and engineering.