Pub Date : 2026-06-01Epub Date: 2026-01-13DOI: 10.1016/j.future.2026.108374
Claude Tadonki , Gabriele Mencagli , Leonel Sousa
High Performance Computing (HPC) recently entered into the exascale era, marking an important milestone of its history. High-end supercomputers and clusters with remarkable levels of performance are now commonly available for general and specific computational needs, thereby increasing the focus on HPC and related topics. Leveraging the potential of high-speed processing units is an HPC skillful task that requires in-depth knowledge in both hardware and software domains. In fact, the architectural structure of cutting-edge HPC processors is complex and involves several specialized features provided through specific units/mechanisms, the processing constraint/overhead of which can turn out to be an efficiency bottleneck. Large-scale supercomputers present greater challenges due to the significant overhead associated with interprocessor communication and synchronization. The evolution of HPC appears closely tied to the growing demand for speed from large-scale applications like complex combinatorial problems, big data applications, the training of large-scale AI models and high-precision simulations, to name a few. As a result, the implementation of cutting-edge techniques should remain scalable on large-scale machines for the benefit of end-users.
{"title":"Leveraging cutting-edge high performance computing for large-scale applications","authors":"Claude Tadonki , Gabriele Mencagli , Leonel Sousa","doi":"10.1016/j.future.2026.108374","DOIUrl":"10.1016/j.future.2026.108374","url":null,"abstract":"<div><div>High Performance Computing (HPC) recently entered into the exascale era, marking an important milestone of its history. High-end supercomputers and clusters with remarkable levels of performance are now commonly available for general and specific computational needs, thereby increasing the focus on HPC and related topics. Leveraging the potential of high-speed processing units is an HPC skillful task that requires in-depth knowledge in both hardware and software domains. In fact, the architectural structure of cutting-edge HPC processors is complex and involves several specialized features provided through specific units/mechanisms, the processing constraint/overhead of which can turn out to be an efficiency bottleneck. Large-scale supercomputers present greater challenges due to the significant overhead associated with interprocessor communication and synchronization. The evolution of HPC appears closely tied to the growing demand for speed from large-scale applications like complex combinatorial problems, big data applications, the training of large-scale AI models and high-precision simulations, to name a few. As a result, the implementation of cutting-edge techniques should remain scalable on large-scale machines for the benefit of end-users.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108374"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-23DOI: 10.1016/j.future.2025.108335
Mehboob Hussain , Ying Xu , Zeeshan Abbas , Ali Kamran , Amir Rehman , Muhammad Yasir
Big data applications often use workflows shown as Directed Acyclic Graphs (DAGs). These DAGs link jobs and stages with order rules. Regular Spark-based workflow scheduling assumes all executors are identical and expects jobs to run one after another. This approach fails in mixed cloud setups, where machines vary in speed and computing capabilities. Efficient workflow scheduling in heterogeneous Spark nodes is crucial for reducing makespan and achieving load balancing. However, many existing methods overlook the challenge created by DAG-constrained jobs, stage-level dependencies, and varied node performance. This paper addresses efficient workflow scheduling on a mixed Spark cluster. We seek to minimize workflow completion time while keeping node loads balanced. We propose a modified Spark framework. It includes both a job scheduler and a stage scheduler tailored for mixed setups. Our method introduces multi-level node classification based on load status, a Speculative Stage Execution strategy for dynamic scheduling, and a Node Awareness Strategy for real-time task assignment. Compared to Rainbow, SAF, and DSWTS algorithms, SWTS reduces makespan by up to 40%, improves load balancing by 55%, and increases resource utilization by 20%. This demonstrates superior efficiency across all workflows.
{"title":"A heuristic approach to Spark workflow task scheduling on heterogeneous nodes","authors":"Mehboob Hussain , Ying Xu , Zeeshan Abbas , Ali Kamran , Amir Rehman , Muhammad Yasir","doi":"10.1016/j.future.2025.108335","DOIUrl":"10.1016/j.future.2025.108335","url":null,"abstract":"<div><div>Big data applications often use workflows shown as Directed Acyclic Graphs (DAGs). These DAGs link jobs and stages with order rules. Regular Spark-based workflow scheduling assumes all executors are identical and expects jobs to run one after another. This approach fails in mixed cloud setups, where machines vary in speed and computing capabilities. Efficient workflow scheduling in heterogeneous Spark nodes is crucial for reducing makespan and achieving load balancing. However, many existing methods overlook the challenge created by DAG-constrained jobs, stage-level dependencies, and varied node performance. This paper addresses efficient workflow scheduling on a mixed Spark cluster. We seek to minimize workflow completion time while keeping node loads balanced. We propose a modified Spark framework. It includes both a job scheduler and a stage scheduler tailored for mixed setups. Our method introduces multi-level node classification based on load status, a Speculative Stage Execution strategy for dynamic scheduling, and a Node Awareness Strategy for real-time task assignment. Compared to Rainbow, SAF, and DSWTS algorithms, SWTS reduces makespan by up to 40%, improves load balancing by 55%, and increases resource utilization by 20%. This demonstrates superior efficiency across all workflows.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108335"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-21DOI: 10.1016/j.future.2025.108324
Amine Roukh, Saïd Mahmoudi
Agriculture faces escalating demands to increase food production amid shrinking arable land, resource depletion, and climate variability. Existing smart farming solutions often lack scalability, interoperability, real-time analytics, and region-specific adaptability. This paper presents WALLeSmart, a cloud-based smart farming platform designed to address these challenges through a scalable Lambda architecture and a modular plugin system. Hosted on a GDPR-compliant private cloud, WALLeSmart integrates diverse data sources (e.g., IoT sensors, satellite imagery, weather data) to deliver real-time insights and predictive analytics, achieving low-latency processing (e.g., 80 seconds for weather data streams). Key features include a one-stop shop for accessing agricultural platforms (e.g., Myawenet, MyCDL, Cerise), a consent management system for data control, a Walloon Agricultural DataHub for secure data exchange, and a personalized dashboard for farmers. The platform’s unique governance model, led by farmers, ensures autonomy and transparency. Real-world case studies in Wallonia, Belgium, demonstrate its ability to process over 3 million weather measurements and 61,130 dairy farm datasets, supporting applications like SALVE, W@llHerbe, and MyFieldBook. WALLeSmart’s generalizable design enables adaptation to diverse regions, addressing ethical concerns like algorithmic bias and data ownership through transparent AI and user-centric consent mechanisms, fostering efficiency, sustainability, and profitability.
{"title":"Leveraging big data and cloud technology for scalable and interoperable smart farming","authors":"Amine Roukh, Saïd Mahmoudi","doi":"10.1016/j.future.2025.108324","DOIUrl":"10.1016/j.future.2025.108324","url":null,"abstract":"<div><div>Agriculture faces escalating demands to increase food production amid shrinking arable land, resource depletion, and climate variability. Existing smart farming solutions often lack scalability, interoperability, real-time analytics, and region-specific adaptability. This paper presents <em>WALLeSmart</em>, a cloud-based smart farming platform designed to address these challenges through a scalable Lambda architecture and a modular plugin system. Hosted on a GDPR-compliant private cloud, <em>WALLeSmart</em> integrates diverse data sources (e.g., IoT sensors, satellite imagery, weather data) to deliver real-time insights and predictive analytics, achieving low-latency processing (e.g., 80 seconds for weather data streams). Key features include a one-stop shop for accessing agricultural platforms (e.g., Myawenet, MyCDL, Cerise), a consent management system for data control, a Walloon Agricultural DataHub for secure data exchange, and a personalized dashboard for farmers. The platform’s unique governance model, led by farmers, ensures autonomy and transparency. Real-world case studies in Wallonia, Belgium, demonstrate its ability to process over 3 million weather measurements and 61,130 dairy farm datasets, supporting applications like <em>SALVE, W@llHerbe</em>, and <em>MyFieldBook. WALLeSmart</em>’s generalizable design enables adaptation to diverse regions, addressing ethical concerns like algorithmic bias and data ownership through transparent AI and user-centric consent mechanisms, fostering efficiency, sustainability, and profitability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108324"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-05DOI: 10.1016/j.future.2025.108359
Yongxin Zhao , Hao Lin , Xiangtian Zheng , Yixuan Song , Jinze Du
Differential privacy (DP) is crucial for trajectory data publication, yet the utility of existing prefix-tree-based mechanisms heavily depends on intricate parameter tuning, often yielding suboptimal performance. To address this, we propose SP-IFDA-Traj, a novel framework for automatically and jointly optimizing critical parameters in personalized noisy prefix-tree trajectory publishing. SP-IFDA-Traj enhances the Flow Direction Algorithm (FDA) for optimizing parameters by: 1) employing chaotic mapping initialization for improved population diversity, 2) incorporating adaptive neighborhood generation to balance exploration and exploitation, and 3) leveraging Spark-based parallelized fitness evaluation for enhanced efficiency. Guided by a meticulously designed fitness function targeting maximal data utility, our framework optimizes Hilbert encoding order, privacy budget allocation, and pruning strategies. Extensive experiments on real-world large-scale trajectory datasets demonstrate that SP-IFDA-Traj substantially improves the privacy-utility trade-off. In convergence tests, the optimization engine improves the average fitness over FDA by 1.74% on BJ-Day3 and 103.98% on BJ-Day7, indicating consistently superior convergence across heterogeneous datasets. In terms of trajectory query accuracy, SP-IFDA-Traj reduces error to about 1% of that of baseline methods and approximately 10% of that of other existing optimization strategy models.
差分隐私(DP)对于轨迹数据发布至关重要,但现有基于前缀树的机制的效用严重依赖于复杂的参数调优,通常会产生次优性能。为了解决这个问题,我们提出了SP-IFDA-Traj,这是一个新的框架,用于自动和联合优化个性化噪声前缀树轨迹发布中的关键参数。SP-IFDA-Traj对Flow Direction Algorithm (FDA)的参数优化进行了改进:1)采用混沌映射初始化来提高种群多样性;2)采用自适应邻域生成来平衡探索和开发;3)利用基于spark的并行适应度评估来提高效率。在精心设计的适应度函数的指导下,以最大的数据效用为目标,我们的框架优化了希尔伯特编码顺序、隐私预算分配和修剪策略。在真实世界的大规模轨迹数据集上进行的大量实验表明,SP-IFDA-Traj极大地改善了隐私-效用权衡。在收敛测试中,优化引擎在BJ-Day3上的平均适应度提高了1.74%,在BJ-Day7上提高了103.98%,表明在异构数据集上始终具有优异的收敛性。在轨迹查询精度方面,SP-IFDA-Traj将误差降低到基线方法的1%左右,将误差降低到其他现有优化策略模型的10%左右。
{"title":"SP-IFDA-Traj: Optimizing differentially private trajectory publishing for enhanced utility","authors":"Yongxin Zhao , Hao Lin , Xiangtian Zheng , Yixuan Song , Jinze Du","doi":"10.1016/j.future.2025.108359","DOIUrl":"10.1016/j.future.2025.108359","url":null,"abstract":"<div><div>Differential privacy (DP) is crucial for trajectory data publication, yet the utility of existing prefix-tree-based mechanisms heavily depends on intricate parameter tuning, often yielding suboptimal performance. To address this, we propose SP-IFDA-Traj, a novel framework for automatically and jointly optimizing critical parameters in personalized noisy prefix-tree trajectory publishing. SP-IFDA-Traj enhances the Flow Direction Algorithm (FDA) for optimizing parameters by: 1) employing chaotic mapping initialization for improved population diversity, 2) incorporating adaptive neighborhood generation to balance exploration and exploitation, and 3) leveraging Spark-based parallelized fitness evaluation for enhanced efficiency. Guided by a meticulously designed fitness function targeting maximal data utility, our framework optimizes Hilbert encoding order, privacy budget allocation, and pruning strategies. Extensive experiments on real-world large-scale trajectory datasets demonstrate that SP-IFDA-Traj substantially improves the privacy-utility trade-off. In convergence tests, the optimization engine improves the average fitness over FDA by 1.74% on BJ-Day3 and 103.98% on BJ-Day7, indicating consistently superior convergence across heterogeneous datasets. In terms of trajectory query accuracy, SP-IFDA-Traj reduces error to about 1% of that of baseline methods and approximately 10% of that of other existing optimization strategy models.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108359"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145897246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-05DOI: 10.1016/j.future.2025.108362
Ning Liu , Yizhi Zhou , Yuchen Qin , Qinzheng Feng , Heng Qi
Federated Learning (FL) serves as a crucial enabler for digital supply chain transformation by supporting secure multi-party data collaboration while preserving privacy. However, its real-world implementation encounters two critical limitations: (1) conventional contribution assessment approaches (e.g., Shapley value) often ignore individual rationality, resulting in biased incentives that undermine long-term engagement; and (2) the underlying blockchain storage infrastructure suffers from inherent performance constraints, including slow read/write operations that cannot keep pace with FL’s need for real-time data exchange and frequent model updates. To overcome these challenges, we propose a framework that leverages Clique Games Solution theory to ensure fair and efficient contribution measurement. Our solution incorporates coalition screening and Pareto optimization to guarantee individual rationality with polynomial computational complexity. Additionally, we integrate COLE-a columnar storage module enhanced with learned indexes and LSM-tree merge mechanisms-to dramatically accelerate data access. Experimental results demonstrate that our approach achieves 5.72e-4 reduction in core distance for contribution assessment stability, improves storage IOPS by 28.7%, and maintains model accuracy within 1.2% of the baseline methods, outperforming mainstream baselines in evaluation stability and storage performance while maintaining competitive model accuracy, thus offering a comprehensive and practical solution for privacy-aware distributed learning in sensitive environments.
{"title":"COZO : A secure and efficient blockchain-enhanced federated learning paradigm with optimized storage and equitable contribution valuation","authors":"Ning Liu , Yizhi Zhou , Yuchen Qin , Qinzheng Feng , Heng Qi","doi":"10.1016/j.future.2025.108362","DOIUrl":"10.1016/j.future.2025.108362","url":null,"abstract":"<div><div>Federated Learning (FL) serves as a crucial enabler for digital supply chain transformation by supporting secure multi-party data collaboration while preserving privacy. However, its real-world implementation encounters two critical limitations: (1) conventional contribution assessment approaches (e.g., Shapley value) often ignore individual rationality, resulting in biased incentives that undermine long-term engagement; and (2) the underlying blockchain storage infrastructure suffers from inherent performance constraints, including slow read/write operations that cannot keep pace with FL’s need for real-time data exchange and frequent model updates. To overcome these challenges, we propose a framework that leverages Clique Games Solution theory to ensure fair and efficient contribution measurement. Our solution incorporates coalition screening and Pareto optimization to guarantee individual rationality with polynomial computational complexity. Additionally, we integrate COLE-a columnar storage module enhanced with learned indexes and LSM-tree merge mechanisms-to dramatically accelerate data access. Experimental results demonstrate that our approach achieves 5.72e-4 reduction in core distance for contribution assessment stability, improves storage IOPS by 28.7%, and maintains model accuracy within 1.2% of the baseline methods, outperforming mainstream baselines in evaluation stability and storage performance while maintaining competitive model accuracy, thus offering a comprehensive and practical solution for privacy-aware distributed learning in sensitive environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108362"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-18DOI: 10.1016/j.future.2025.108330
Antonio Emmanuele , Mario Barbareschi , Alberto Bosio
The deployment of machine learning models at the edge is crucial for enabling low-latency decision-making, optimizing resource utilization, and enhancing data confidentiality. Random Forest classifiers have proven to be highly accurate while offering computationally efficient inference, making them well-suited for resource-constrained edge devices. However, as the volume of training data grows, the complexity and size of these models also increase, limiting their deployment in edge computing scenarios. In order to address this challenge, we propose a novel approximation strategy for Random Forest classifiers leveraging on the concept of modular redundancy. In particular, our approach imposes that each target class is determined by only a subset of trees in a modular redundant fashion. This allows to prune from each tree the leaves related to no-longer relevant classes, significantly reducing the size of the model. To achieve an optimal balance between accuracy and resource savings with minimal computational time, we introduce an heuristic algorithm that determine the best subset of trees for each class. We evaluate our approach on multiple UCI machine learning datasets using a hardware accelerator for tree ensembles, demonstrating its effectiveness. The result shows that, on average, a 2.5 % reduction in accuracy leads to save up to 50 % in hardware overhead and energy consumption.
{"title":"Exploiting Modular Redundancy for approximating Random Forest classifiers","authors":"Antonio Emmanuele , Mario Barbareschi , Alberto Bosio","doi":"10.1016/j.future.2025.108330","DOIUrl":"10.1016/j.future.2025.108330","url":null,"abstract":"<div><div>The deployment of machine learning models at the edge is crucial for enabling low-latency decision-making, optimizing resource utilization, and enhancing data confidentiality. Random Forest classifiers have proven to be highly accurate while offering computationally efficient inference, making them well-suited for resource-constrained edge devices. However, as the volume of training data grows, the complexity and size of these models also increase, limiting their deployment in edge computing scenarios. In order to address this challenge, we propose a novel approximation strategy for Random Forest classifiers leveraging on the concept of modular redundancy. In particular, our approach imposes that each target class is determined by only a subset of trees in a modular redundant fashion. This allows to prune from each tree the leaves related to no-longer relevant classes, significantly reducing the size of the model. To achieve an optimal balance between accuracy and resource savings with minimal computational time, we introduce an heuristic algorithm that determine the best subset of trees for each class. We evaluate our approach on multiple UCI machine learning datasets using a hardware accelerator for tree ensembles, demonstrating its effectiveness. The result shows that, on average, a 2.5 % reduction in accuracy leads to save up to 50 % in hardware overhead and energy consumption.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108330"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-15DOI: 10.1016/j.future.2025.108312
Andrea Manzi , Raul Bardaji , Ivan Rodero , Germán Moltó , Sandro Fiore , Isabel Campos , Donatello Elia , Francesco Sarandrea , A. Paul Millar , Daniele Spiga , Matteo Bunino , Gabriele Accarino , Lorenzo Asprea , Samuel Bernardo , Miguel Caballer , Charis Chatzikyriakou , Diego Ciangottini , Michele Claus , Andrea Cristofori , Davide Donno , Juraj Zvolensky
The EU project interTwin, co-designed and implemented the prototype of an interdisciplinary Digital Twin Engine (DTE), an open-source platform that provides generic and domain-specific software components for modelling and simulation to integrate application-specific Digital Twins (DTs). The DTE is built upon a co-designed conceptual model - the DTE blueprint architecture - guided by open standards and interoperability principles. The ambition is to develop a unified approach to the implementation of DTs that is applicable across diverse scientific disciplines to foster collaborations and facilitate developments. Co-design involved DT use cases from high-energy physics, radio astronomy, astroparticle physics, climate research, and environmental monitoring, which drove advancements in modelling and simulation by leveraging heterogeneous distributed digital infrastructures, enabling dynamic workflow composition, real-time data management and processing, quality and uncertainty tracing of models, and multi-source data fusion.
{"title":"interTwin: Advancing Scientific Digital Twins through AI, Federated Computing and Data","authors":"Andrea Manzi , Raul Bardaji , Ivan Rodero , Germán Moltó , Sandro Fiore , Isabel Campos , Donatello Elia , Francesco Sarandrea , A. Paul Millar , Daniele Spiga , Matteo Bunino , Gabriele Accarino , Lorenzo Asprea , Samuel Bernardo , Miguel Caballer , Charis Chatzikyriakou , Diego Ciangottini , Michele Claus , Andrea Cristofori , Davide Donno , Juraj Zvolensky","doi":"10.1016/j.future.2025.108312","DOIUrl":"10.1016/j.future.2025.108312","url":null,"abstract":"<div><div>The EU project interTwin, co-designed and implemented the prototype of an interdisciplinary Digital Twin Engine (DTE), an open-source platform that provides generic and domain-specific software components for modelling and simulation to integrate application-specific Digital Twins (DTs). The DTE is built upon a co-designed conceptual model - the DTE blueprint architecture - guided by open standards and interoperability principles. The ambition is to develop a unified approach to the implementation of DTs that is applicable across diverse scientific disciplines to foster collaborations and facilitate developments. Co-design involved DT use cases from high-energy physics, radio astronomy, astroparticle physics, climate research, and environmental monitoring, which drove advancements in modelling and simulation by leveraging heterogeneous distributed digital infrastructures, enabling dynamic workflow composition, real-time data management and processing, quality and uncertainty tracing of models, and multi-source data fusion.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108312"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, serverless computing has become a new paradigm for efficient and low-cost model inference due to its advantages in dynamic scaling and fine-grained resource allocation. Dynamic batching improves resource utilization by dynamically aggregating inference requests to balance latency and throughput, while adaptive model partitioning divides models into lightweight edge modules and cloud-intensive computing units to optimize global resources through collaborative architecture, driving the optimization and performance enhancement of intelligent inference. However, existing model partitioning methods do not consider the resource utilization of low-cost user local devices, which may lead to insufficient utilization of local resources and increased operational costs. This paper explores how to optimize batch processing decisions and model partitioning strategies in a device-cloud collaborative scenario under resource constraints, aiming to maximize local resource utilization, minimize costs, and improve inference efficiency. To characterize the mathematical relationship between batch processing decisions, model partitioning strategies, and delay-cost combination decisions, we formulate the problem as a mixed integer nonlinear programming model and prove its NP-hardness. We introduce the Model Partitioning and Batch Scheduling Algorithm (MPBS), a serverless-accelerated inference mechanism that leverages dynamic Deep Neural Network (DNN) model partitioning. This architecture generates resource utilization efficient partitioning schemes according to heterogeneous hardware characteristics and request priorities, ensuring efficient device-task alignment. Additionally, a dynamic parallel scheduling mechanism, driven by a scheduling engine, enables global resource optimization by leveraging the batch processing capabilities of cloud instances, collectively enhancing local resource utilization and accelerating cloud container performance. Extensive simulation results demonstrate that compared to state-of-the-art solutions, MPBS reduces resource overhead by 20.6% while satisfying the execution time specified by Service Level Objective (SLO).
{"title":"Model partitioning and batch scheduling: Leveraging local resources for cost-efficient device-cloud collaborative serverless inference","authors":"Yong Pan , Chenchen Zhang , Yue Zeng , Haiyu Yue , Ziye Hou , Jiang Xiong","doi":"10.1016/j.future.2025.108338","DOIUrl":"10.1016/j.future.2025.108338","url":null,"abstract":"<div><div>In recent years, serverless computing has become a new paradigm for efficient and low-cost model inference due to its advantages in dynamic scaling and fine-grained resource allocation. Dynamic batching improves resource utilization by dynamically aggregating inference requests to balance latency and throughput, while adaptive model partitioning divides models into lightweight edge modules and cloud-intensive computing units to optimize global resources through collaborative architecture, driving the optimization and performance enhancement of intelligent inference. However, existing model partitioning methods do not consider the resource utilization of low-cost user local devices, which may lead to insufficient utilization of local resources and increased operational costs. This paper explores how to optimize batch processing decisions and model partitioning strategies in a device-cloud collaborative scenario under resource constraints, aiming to maximize local resource utilization, minimize costs, and improve inference efficiency. To characterize the mathematical relationship between batch processing decisions, model partitioning strategies, and delay-cost combination decisions, we formulate the problem as a mixed integer nonlinear programming model and prove its NP-hardness. We introduce the Model Partitioning and Batch Scheduling Algorithm (MPBS), a serverless-accelerated inference mechanism that leverages dynamic Deep Neural Network (DNN) model partitioning. This architecture generates resource utilization efficient partitioning schemes according to heterogeneous hardware characteristics and request priorities, ensuring efficient device-task alignment. Additionally, a dynamic parallel scheduling mechanism, driven by a scheduling engine, enables global resource optimization by leveraging the batch processing capabilities of cloud instances, collectively enhancing local resource utilization and accelerating cloud container performance. Extensive simulation results demonstrate that compared to state-of-the-art solutions, MPBS reduces resource overhead by 20.6% while satisfying the execution time specified by Service Level Objective (SLO).</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108338"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-06DOI: 10.1016/j.future.2026.108368
Omer Chughtai , Muhammad Waqas Rehan , Muhammad Naeem , Ali Hamdan Alenezi , Sajjad Ali Haider
LoRa-enabled Flying Ad Hoc Networks (FANETs) offer long-range and energy-efficient connectivity for next-generation IoT and post-disaster communication infrastructures, yet their performance is fundamentally constrained by limited bandwidth, dynamic topologies, and uneven energy depletion across aerial nodes. This work develops a distributed consumer-centric multi-objective routing framework (Proposed-CCR) that jointly optimizes residual energy, link quality, and flow-level priority through a lightweight utility-driven forwarding mechanism. The design integrates composite cost modeling, two-hop neighbor awareness, adaptive path monitoring, and local repair to ensure scalable, resilient, and delay-aware multi-hop communication. Extensive simulations demonstrate that Proposed-CCR reduces per-packet energy consumption by 28–35%, extends network lifetime by over 35%, and decreases high-priority flow delay by nearly 40% relative to state-of-the-art schemes including MinHop, ACOR, GCCR, and BBCCR. These results confirm the effectiveness of a consumer-centric, LoRa-aware multi-objective heuristic for UAV-IoT integration and emergency communication scenarios, while highlighting practical opportunities for sustainable and resource-efficient airborne networking architectures.
{"title":"Distributed multi-objective consumer-centric routing for LoRa-based IoT-enabled FANET","authors":"Omer Chughtai , Muhammad Waqas Rehan , Muhammad Naeem , Ali Hamdan Alenezi , Sajjad Ali Haider","doi":"10.1016/j.future.2026.108368","DOIUrl":"10.1016/j.future.2026.108368","url":null,"abstract":"<div><div>LoRa-enabled Flying Ad Hoc Networks (FANETs) offer long-range and energy-efficient connectivity for next-generation IoT and post-disaster communication infrastructures, yet their performance is fundamentally constrained by limited bandwidth, dynamic topologies, and uneven energy depletion across aerial nodes. This work develops a distributed consumer-centric multi-objective routing framework (Proposed-CCR) that jointly optimizes residual energy, link quality, and flow-level priority through a lightweight utility-driven forwarding mechanism. The design integrates composite cost modeling, two-hop neighbor awareness, adaptive path monitoring, and local repair to ensure scalable, resilient, and delay-aware multi-hop communication. Extensive simulations demonstrate that Proposed-CCR reduces per-packet energy consumption by 28–35%, extends network lifetime by over 35%, and decreases high-priority flow delay by nearly 40% relative to state-of-the-art schemes including MinHop, ACOR, GCCR, and BBCCR. These results confirm the effectiveness of a consumer-centric, LoRa-aware multi-objective heuristic for UAV-IoT integration and emergency communication scenarios, while highlighting practical opportunities for sustainable and resource-efficient airborne networking architectures.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108368"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-28DOI: 10.1016/j.future.2025.108350
Genaro Sánchez-Gallegos, Cosmin Petre, Javier Garcia-Blas, Jesus Carretero
The increasing demand for data processing by new, data-intensive applications is placing significant strain on the performance and capacity of HPC storage systems. Advancements in storage technologies, such as NVMe and persistent memory, have been introduced to address these demands. However, relying exclusively on ultra-fast storage devices is not cost-effective, necessitating multi-tier storage hierarchies to manage data based on its usage. In response, ad-hoc file systems have been proposed as a solution. These systems use the storage resources available in compute nodes, including memory and persistent storage, to create temporary file systems that adapt to application behavior in the HPC environment. This work presents the design, implementation, and evaluation of HERCULES, a distributed ad-hoc in-memory storage system, with a focus on its new metadata and elasticity model. HERCULES takes advantage of the Unified Communication X (UCX) framework, leveraging RDMA protocols such as Infiniband, Omnipath, shared-memory, and zero-copy transfers for data transfer. It includes elasticity features at runtime and fault-tolerant facilities. The elasticity features, together with flexible policies for data allocation, allow HERCULES to migrate data so that the available resources can be efficiently used. Our exhaustive evaluation results demonstrate a better performance than Lustre and BeeGFS, two parallel file systems heavily used in High-Performance Computing systems, and GekkoFS, an ad-hoc state-of-the-art solution.
{"title":"HERCULES: A scalable and elastic ad-hoc file system for large-scale computing systems","authors":"Genaro Sánchez-Gallegos, Cosmin Petre, Javier Garcia-Blas, Jesus Carretero","doi":"10.1016/j.future.2025.108350","DOIUrl":"10.1016/j.future.2025.108350","url":null,"abstract":"<div><div>The increasing demand for data processing by new, data-intensive applications is placing significant strain on the performance and capacity of HPC storage systems. Advancements in storage technologies, such as NVMe and persistent memory, have been introduced to address these demands. However, relying exclusively on ultra-fast storage devices is not cost-effective, necessitating multi-tier storage hierarchies to manage data based on its usage. In response, <em>ad-hoc</em> file systems have been proposed as a solution. These systems use the storage resources available in compute nodes, including memory and persistent storage, to create temporary file systems that adapt to application behavior in the HPC environment. This work presents the design, implementation, and evaluation of HERCULES, a distributed <em>ad-hoc</em> in-memory storage system, with a focus on its new metadata and elasticity model. HERCULES takes advantage of the Unified Communication X (UCX) framework, leveraging RDMA protocols such as Infiniband, Omnipath, shared-memory, and zero-copy transfers for data transfer. It includes elasticity features at runtime and fault-tolerant facilities. The elasticity features, together with flexible policies for data allocation, allow HERCULES to migrate data so that the available resources can be efficiently used. Our exhaustive evaluation results demonstrate a better performance than Lustre and BeeGFS, two parallel file systems heavily used in High-Performance Computing systems, and GekkoFS, an <em>ad-hoc</em> state-of-the-art solution.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108350"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}