分布式机器学习集成的最大时间跨度最小化任务调度

2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE) Pub Date : 2022-10-28 DOI:10.1109/ECICE55674.2022.10042894

Jose Monteiro, Óscar Oliveira, Davide Carneiro

{"title":"分布式机器学习集成的最大时间跨度最小化任务调度","authors":"Jose Monteiro, Óscar Oliveira, Davide Carneiro","doi":"10.1109/ECICE55674.2022.10042894","DOIUrl":null,"url":null,"abstract":"Machine Learning problems are becoming increasingly complex, mostly due to the size of datasets. Data are also generated at increasing speed, which requires models to be updated regularly, at a significant computational cost. The project Continuously Evolving Distributed Ensembles proposes the creation of a distributed Machine Learning environment, in which datasets are divided into fixed-size blocks, and stored in a fault-tolerant distributed file system with replication. The base-models of the Ensembles, with a 1:1 relationship with data blocks, are then trained in a distributed manner, according to the principle of data locality. Specifically, the system is able to select which data blocks to use and in which nodes of the cluster, in order to minimize training time. A similar process takes place when making predictions: the best base-models are selected in real-time, according to their predictive performance and to the state of the nodes where they reside. This paper addresses the problem of assigning base model training tasks to cluster nodes, adhering to the principle of data locality. We present an instance generator and three datasets that will provide a means for comparison while studying other solution methods. For testing the system architecture, we solved the datasets with an exact method and the computational results validate, to comply to the project requirements, the need for a more stable and less demanding (in computational resource terms) solution method.","PeriodicalId":282635,"journal":{"name":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Task Scheduling with Makespan Minimization for Distributed Machine Learning Ensembles\",\"authors\":\"Jose Monteiro, Óscar Oliveira, Davide Carneiro\",\"doi\":\"10.1109/ECICE55674.2022.10042894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine Learning problems are becoming increasingly complex, mostly due to the size of datasets. Data are also generated at increasing speed, which requires models to be updated regularly, at a significant computational cost. The project Continuously Evolving Distributed Ensembles proposes the creation of a distributed Machine Learning environment, in which datasets are divided into fixed-size blocks, and stored in a fault-tolerant distributed file system with replication. The base-models of the Ensembles, with a 1:1 relationship with data blocks, are then trained in a distributed manner, according to the principle of data locality. Specifically, the system is able to select which data blocks to use and in which nodes of the cluster, in order to minimize training time. A similar process takes place when making predictions: the best base-models are selected in real-time, according to their predictive performance and to the state of the nodes where they reside. This paper addresses the problem of assigning base model training tasks to cluster nodes, adhering to the principle of data locality. We present an instance generator and three datasets that will provide a means for comparison while studying other solution methods. For testing the system architecture, we solved the datasets with an exact method and the computational results validate, to comply to the project requirements, the need for a more stable and less demanding (in computational resource terms) solution method.\",\"PeriodicalId\":282635,\"journal\":{\"name\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECICE55674.2022.10042894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECICE55674.2022.10042894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习问题正变得越来越复杂，主要是由于数据集的规模。数据生成的速度也越来越快，这需要定期更新模型，这需要大量的计算成本。continuous evolution Distributed Ensembles项目提出创建分布式机器学习环境，其中数据集被划分为固定大小的块，并存储在具有复制功能的容错分布式文件系统中。然后，根据数据局部性原则，以分布式方式训练与数据块成1:1关系的Ensembles基础模型。具体来说，系统能够选择使用哪些数据块以及在集群的哪些节点上，以最大限度地减少训练时间。在进行预测时也会发生类似的过程:根据预测性能及其所在节点的状态，实时选择最佳基本模型。本文在坚持数据局部性原则的前提下，解决了将基础模型训练任务分配给集群节点的问题。我们提出了一个实例生成器和三个数据集，这将提供一种比较的手段，同时研究其他解决方法。为了测试系统架构，我们用精确的方法求解了数据集并对计算结果进行了验证，为了符合项目需求，需要一种更稳定且要求更低(在计算资源方面)的求解方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Task Scheduling with Makespan Minimization for Distributed Machine Learning Ensembles

Machine Learning problems are becoming increasingly complex, mostly due to the size of datasets. Data are also generated at increasing speed, which requires models to be updated regularly, at a significant computational cost. The project Continuously Evolving Distributed Ensembles proposes the creation of a distributed Machine Learning environment, in which datasets are divided into fixed-size blocks, and stored in a fault-tolerant distributed file system with replication. The base-models of the Ensembles, with a 1:1 relationship with data blocks, are then trained in a distributed manner, according to the principle of data locality. Specifically, the system is able to select which data blocks to use and in which nodes of the cluster, in order to minimize training time. A similar process takes place when making predictions: the best base-models are selected in real-time, according to their predictive performance and to the state of the nodes where they reside. This paper addresses the problem of assigning base model training tasks to cluster nodes, adhering to the principle of data locality. We present an instance generator and three datasets that will provide a means for comparison while studying other solution methods. For testing the system architecture, we solved the datasets with an exact method and the computational results validate, to comply to the project requirements, the need for a more stable and less demanding (in computational resource terms) solution method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)

自引率

0.00%

发文量