Energy–time modelling of distributed multi-population genetic algorithms with dynamic workload in HPC clusters

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-02-10 DOI:10.1016/j.future.2025.107753

Juan José Escobar , Pablo Sánchez-Cuevas , Beatriz Prieto , Rukiye Savran Kızıltepe , Fernando Díaz-del-Río , Dragi Kimovski

{"title":"Energy–time modelling of distributed multi-population genetic algorithms with dynamic workload in HPC clusters","authors":"Juan José Escobar , Pablo Sánchez-Cuevas , Beatriz Prieto , Rukiye Savran Kızıltepe , Fernando Díaz-del-Río , Dragi Kimovski","doi":"10.1016/j.future.2025.107753","DOIUrl":null,"url":null,"abstract":"<div><div>Time and energy efficiency is a highly relevant objective in high-performance computing systems, with high costs for executing the tasks. Among these tasks, evolutionary algorithms are of consideration due to their inherent parallel scalability and usually costly fitness evaluation functions. In this respect, several scheduling strategies for workload balancing in heterogeneous systems have been proposed in the literature, with runtime and energy consumption reduction as their goals. Our hypothesis is that a dynamic workload distribution can be fitted with greater precision using metaheuristics, such as genetic algorithms, instead of linear regression. Therefore, this paper proposes a new mathematical model to predict the energy–time behaviour of applications based on multi-population genetic algorithms, which dynamically distributes the evaluation of individuals among the CPU–GPU devices of heterogeneous clusters. An accurate predictor would save time and energy by selecting the best resource set before running such applications. The estimation of the workload distributed to each device has been carried out by simulation, while the model parameters have been fitted in a two-phase run using another genetic algorithm and the experimental energy–time values of the target application as input. When the new model is analysed and compared with another based on linear regression, the one proposed in this work significantly improves the baseline approach, showing normalised prediction errors of 0.081 for runtime and 0.091 for energy consumption, compared to 0.213 and 0.256 shown in the baseline approach.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"167 ","pages":"Article 107753"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25000482","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Time and energy efficiency is a highly relevant objective in high-performance computing systems, with high costs for executing the tasks. Among these tasks, evolutionary algorithms are of consideration due to their inherent parallel scalability and usually costly fitness evaluation functions. In this respect, several scheduling strategies for workload balancing in heterogeneous systems have been proposed in the literature, with runtime and energy consumption reduction as their goals. Our hypothesis is that a dynamic workload distribution can be fitted with greater precision using metaheuristics, such as genetic algorithms, instead of linear regression. Therefore, this paper proposes a new mathematical model to predict the energy–time behaviour of applications based on multi-population genetic algorithms, which dynamically distributes the evaluation of individuals among the CPU–GPU devices of heterogeneous clusters. An accurate predictor would save time and energy by selecting the best resource set before running such applications. The estimation of the workload distributed to each device has been carried out by simulation, while the model parameters have been fitted in a two-phase run using another genetic algorithm and the experimental energy–time values of the target application as input. When the new model is analysed and compared with another based on linear regression, the one proposed in this work significantly improves the baseline approach, showing normalised prediction errors of 0.081 for runtime and 0.091 for energy consumption, compared to 0.213 and 0.256 shown in the baseline approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高性能计算集群中具有动态工作负载的分布式多种群遗传算法的能量时间建模

在高性能计算系统中，时间和能源效率是一个高度相关的目标，执行任务的成本很高。在这些任务中，进化算法由于其固有的并行可扩展性和通常昂贵的适应度评估函数而被考虑。在这方面，文献中提出了几种异构系统中工作负载平衡的调度策略，以减少运行时间和能源消耗为目标。我们的假设是，动态工作量分布可以使用元启发式（如遗传算法）而不是线性回归来更精确地拟合。因此，本文提出了一种基于多种群遗传算法的预测应用程序能量时间行为的数学模型，该模型在异构集群的CPU-GPU设备之间动态分配个体的评价。通过在运行此类应用程序之前选择最佳资源集，准确的预测器将节省时间和精力。通过仿真对分配到每个设备的工作量进行了估计，同时使用另一种遗传算法以目标应用程序的实验能量-时间值作为输入，分两阶段拟合模型参数。当对新模型进行分析并与另一个基于线性回归的模型进行比较时，本工作中提出的模型显着改进了基线方法，显示运行时的归一化预测误差为0.081，能源消耗为0.091，而基线方法显示的归一化预测误差为0.213和0.256。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.