首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
Let’s trace it: Fine-grained serverless benchmarking for synchronous and asynchronous applications 让我们跟踪它:同步和异步应用程序的细粒度无服务器基准测试
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2025-12-28 DOI: 10.1016/j.future.2025.108336
Joel Scheuner , Simon Eismann , Sacheendra Talluri , Erwin van Eyk , Cristina Abad , Philipp Leitner , Alexandru Iosup
Making serverless computing widely applicable requires detailed understanding of performance. Although benchmarking approaches exist, their insights are coarse-grained and typically insufficient for (root cause) analysis of realistic serverless applications, which often consist of asynchronously coordinated functions and services. Addressing this gap, we design and implement ServiTrace, an approach for fine-grained distributed trace analysis and an application-level benchmarking suite for diverse serverless-application architectures. ServiTrace (i) analyzes distributed serverless traces using a novel algorithm and heuristics for extracting a detailed latency breakdown, (ii) leverages a suite of serverless applications representative of production usage, including synchronous and asynchronous serverless applications with external service integrations, and (iii) automates comprehensive, end-to-end experiments to capture application-level performance. Using our ServiTrace reference implementation, we conduct a large-scale empirical performance study in the market-leading AWS environment, collecting over 7.5 million execution traces. We make four main observations enabled by our latency breakdown analysis of median latency, cold starts, and tail latency for different application types and invocation patterns. For example, the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, and trigger-based coordination; all of which could be hidden without ServiTrace-like benchmarking. We release empirical data under FAIR principles and ServiTrace as a tested, extensible, open-source tool at https://github.com/ServiTrace/ReplicationPackage.
要使无服务器计算得到广泛应用,需要对性能有详细的了解。虽然存在基准测试方法,但它们的见解是粗粒度的,通常不足以(根本原因)分析实际的无服务器应用程序,这些应用程序通常由异步协调的功能和服务组成。为了解决这个问题,我们设计并实现了ServiTrace,这是一种用于细粒度分布式跟踪分析的方法,也是一种用于各种无服务器应用程序架构的应用程序级基准测试套件。ServiTrace (i)使用一种新颖的算法和启发法分析分布式无服务器跟踪,以提取详细的延迟分解,(ii)利用一套代表生产使用的无服务器应用程序,包括具有外部服务集成的同步和异步无服务器应用程序,以及(iii)自动化全面的端到端实验以捕获应用程序级性能。使用我们的ServiTrace参考实现,我们在市场领先的AWS环境中进行了大规模的实证性能研究,收集了超过750万条执行痕迹。通过对不同应用程序类型和调用模式的中位延迟、冷启动和尾延迟的延迟分解分析,我们进行了四项主要观察。例如,无服务器应用程序的中位数端到端延迟通常不是由功能计算决定的,而是由外部服务调用、编排和基于触发器的协调决定的;所有这些都可以在没有类似servitrace的基准测试的情况下隐藏起来。我们在FAIR原则下发布经验数据,并将ServiTrace作为经过测试的、可扩展的开源工具发布在https://github.com/ServiTrace/ReplicationPackage上。
{"title":"Let’s trace it: Fine-grained serverless benchmarking for synchronous and asynchronous applications","authors":"Joel Scheuner ,&nbsp;Simon Eismann ,&nbsp;Sacheendra Talluri ,&nbsp;Erwin van Eyk ,&nbsp;Cristina Abad ,&nbsp;Philipp Leitner ,&nbsp;Alexandru Iosup","doi":"10.1016/j.future.2025.108336","DOIUrl":"10.1016/j.future.2025.108336","url":null,"abstract":"<div><div>Making serverless computing widely applicable requires detailed understanding of performance. Although benchmarking approaches exist, their insights are coarse-grained and typically insufficient for (root cause) analysis of realistic serverless applications, which often consist of asynchronously coordinated functions and services. Addressing this gap, we design and implement ServiTrace, an approach for fine-grained distributed trace analysis and an application-level benchmarking suite for diverse serverless-application architectures. ServiTrace (i) analyzes distributed serverless traces using a novel algorithm and heuristics for extracting a detailed <em>latency breakdown</em>, (ii) leverages a suite of serverless applications representative of production usage, including <em>synchronous and asynchronous serverless applications</em> with external service integrations, and (iii) automates comprehensive, <em>end-to-end experiments</em> to capture application-level performance. Using our ServiTrace reference implementation, we conduct a large-scale empirical performance study in the market-leading AWS environment, collecting over 7.5 million execution traces. We make four main observations enabled by our <em>latency breakdown analysis</em> of median latency, cold starts, and tail latency for different application types and invocation patterns. For example, the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, and trigger-based coordination; all of which could be hidden without ServiTrace-like benchmarking. We release empirical data under FAIR principles and ServiTrace as a tested, extensible, open-source tool at <span><span>https://github.com/ServiTrace/ReplicationPackage</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108336"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing deep reinforcement learning through vectorized and parallel NeuroEvolution 通过向量化并行神经进化优化深度强化学习
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2025-12-22 DOI: 10.1016/j.future.2025.108334
Hesham Magdy, Amany M. Sarhan, Mohammad Ali Eita
Deep reinforcement learning (DRL) has achieved remarkable success in solving complex decision-making problems; however, training efficiency, scalability, and ease of implementation remain significant challenges. It is evident that the advances in hardware accelerators such as GPUs and TPUs can reduce the time obstacle that faces the development in the deep learning field. As a novel approach, we propose a framework for NeuroEvolution-based deep reinforcement learning, leveraging JAX, a high-performance numerical computing library optimized for machine learning and automatic differentiation, for efficient parallel execution. Our framework enables seamless vectorized policy optimization, significantly reducing computational overhead while maintaining sample efficiency. We introduce a user-friendly interface designed for accessibility and flexibility, allowing researchers to easily experiment with diverse evolutionary strategies, leading to broader exploration and improved performance in certain environments. Additionally, we extend the application of NeuroEvolution-based DRL to environments that have not been previously explored using such methods, further demonstrating the versatility of our approach. Finally, we incorporate recent evolutionary algorithms into the training process, achieving better results. Through extensive benchmarking, we show that our framework outperforms traditional evolutionary strategies and gradient-based DRL methods in both convergence speed and scalability, achieving a speedup of up to 34 × compared to state-of-the-art approaches.
深度强化学习(DRL)在解决复杂决策问题方面取得了显著的成功;然而,培训效率、可扩展性和实现的便利性仍然是重大的挑战。很明显,gpu、tpu等硬件加速器的进步可以减少深度学习领域发展所面临的时间障碍。作为一种新颖的方法,我们提出了一个基于神经进化的深度强化学习框架,利用JAX(一个为机器学习和自动微分优化的高性能数值计算库)来实现高效的并行执行。我们的框架实现了无缝的矢量化策略优化,在保持样本效率的同时显著降低了计算开销。我们介绍了一个用户友好的界面设计的可访问性和灵活性,使研究人员可以轻松地实验不同的进化策略,导致更广泛的探索和提高在某些环境中的性能。此外,我们将基于neuroevolution的DRL的应用扩展到以前没有使用此类方法探索过的环境中,进一步展示了我们方法的多功能性。最后,我们将最新的进化算法结合到训练过程中,获得了更好的结果。通过广泛的基准测试,我们表明我们的框架在收敛速度和可扩展性方面优于传统的进化策略和基于梯度的DRL方法,与最先进的方法相比,实现了高达34 × 的加速。
{"title":"Optimizing deep reinforcement learning through vectorized and parallel NeuroEvolution","authors":"Hesham Magdy,&nbsp;Amany M. Sarhan,&nbsp;Mohammad Ali Eita","doi":"10.1016/j.future.2025.108334","DOIUrl":"10.1016/j.future.2025.108334","url":null,"abstract":"<div><div>Deep reinforcement learning (DRL) has achieved remarkable success in solving complex decision-making problems; however, training efficiency, scalability, and ease of implementation remain significant challenges. It is evident that the advances in hardware accelerators such as GPUs and TPUs can reduce the time obstacle that faces the development in the deep learning field. As a novel approach, we propose a framework for NeuroEvolution-based deep reinforcement learning, leveraging JAX, a high-performance numerical computing library optimized for machine learning and automatic differentiation, for efficient parallel execution. Our framework enables seamless vectorized policy optimization, significantly reducing computational overhead while maintaining sample efficiency. We introduce a user-friendly interface designed for accessibility and flexibility, allowing researchers to easily experiment with diverse evolutionary strategies, leading to broader exploration and improved performance in certain environments. Additionally, we extend the application of NeuroEvolution-based DRL to environments that have not been previously explored using such methods, further demonstrating the versatility of our approach. Finally, we incorporate recent evolutionary algorithms into the training process, achieving better results. Through extensive benchmarking, we show that our framework outperforms traditional evolutionary strategies and gradient-based DRL methods in both convergence speed and scalability, achieving a speedup of up to 34 × compared to state-of-the-art approaches.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108334"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145813865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resource-Efficient joint clustering and storage optimization for blockchain-Based IoT systems 基于区块链的物联网系统资源高效联合集群与存储优化
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2025-12-30 DOI: 10.1016/j.future.2025.108354
Kai Peng , Xueyan Hu , Jiaxing Hu , Zhiheng Yao , Tianping Deng , Menglan Hu , Chao Cai , Zehui Xiong
Blockchain technology is leveraged in the Internet of Things (IoT) systems to enhance data reliability and management efficiency, ensuring integrity, security, and auditability through a decentralized ledger architecture. However, resource-constrained IoT devices are unable to store the complete blockchain due to prohibitive resource consumption and performance degradation. While collaborative storage strategies have been proposed to mitigate these constraints, existing approaches often prioritize storage scalability without sufficiently addressing the selection of cooperative nodes for distributed ledger maintenance. This can lead to significant communication delays during block retrieval, undermining the real-time performance and overall efficiency of the blockchain-enabled IoT system. To address this challenge, this paper introduces a clustering-based collaborative storage scheme and proposes a novel joint optimization algorithm that iteratively refines both node clustering and block allocation strategies within the blockchain network. By structuring IoT devices into clustered peers, the algorithm reduces block query latency and facilitates efficient blockchain synchronization and update processes. Experimental evaluations confirm that the proposed method effectively alleviates storage limitations and lowers access costs in static blockchain-based IoT environments.
区块链技术被应用于物联网(IoT)系统,增强数据可靠性和管理效率,通过分布式账本架构确保完整性、安全性和可审计性。然而,由于资源消耗和性能下降,资源受限的物联网设备无法存储完整的区块链。虽然已经提出了协作存储策略来缓解这些限制,但现有的方法通常优先考虑存储可扩展性,而没有充分解决分布式账本维护的合作节点的选择。这可能导致在块检索过程中出现严重的通信延迟,从而破坏支持区块链的物联网系统的实时性能和整体效率。为了解决这一挑战,本文引入了一种基于聚类的协同存储方案,并提出了一种新的联合优化算法,该算法迭代地改进了区块链网络中的节点聚类和块分配策略。通过将物联网设备构建成集群对等体,该算法减少了块查询延迟,促进了高效的区块链同步和更新过程。实验评估证实,在基于区块链的静态物联网环境中,该方法有效地缓解了存储限制并降低了访问成本。
{"title":"Resource-Efficient joint clustering and storage optimization for blockchain-Based IoT systems","authors":"Kai Peng ,&nbsp;Xueyan Hu ,&nbsp;Jiaxing Hu ,&nbsp;Zhiheng Yao ,&nbsp;Tianping Deng ,&nbsp;Menglan Hu ,&nbsp;Chao Cai ,&nbsp;Zehui Xiong","doi":"10.1016/j.future.2025.108354","DOIUrl":"10.1016/j.future.2025.108354","url":null,"abstract":"<div><div>Blockchain technology is leveraged in the Internet of Things (IoT) systems to enhance data reliability and management efficiency, ensuring integrity, security, and auditability through a decentralized ledger architecture. However, resource-constrained IoT devices are unable to store the complete blockchain due to prohibitive resource consumption and performance degradation. While collaborative storage strategies have been proposed to mitigate these constraints, existing approaches often prioritize storage scalability without sufficiently addressing the selection of cooperative nodes for distributed ledger maintenance. This can lead to significant communication delays during block retrieval, undermining the real-time performance and overall efficiency of the blockchain-enabled IoT system. To address this challenge, this paper introduces a clustering-based collaborative storage scheme and proposes a novel joint optimization algorithm that iteratively refines both node clustering and block allocation strategies within the blockchain network. By structuring IoT devices into clustered peers, the algorithm reduces block query latency and facilitates efficient blockchain synchronization and update processes. Experimental evaluations confirm that the proposed method effectively alleviates storage limitations and lowers access costs in static blockchain-based IoT environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108354"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge-based proactive and stable two-tier routing for IoV 基于边缘的主动稳定的两层车联网路由
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2026-01-06 DOI: 10.1016/j.future.2026.108367
Asif Mehmood , Muhammad Afaq , Faisal Mehmood , Wang-Cheol Song
The Internet of Vehicles (IoV) is an evolving domain fueled by advancements in vehicular communications and networking. To enhance vehicle coverage, integrating vehicle-to-everything (V2X) networks with cellular networks has become essential, though this integration places increased demand on cellular infrastructure. To address this, we propose a two-tier stable path routing algorithm designed to improve the stability and proactiveness of V2X networks. Our approach divides the coverage area into zones, further segmented into road segments based on road structure. The first tier manages routing within a road segment, while the second tier handles routing between vehicles in adjacent segments. This method improves road awareness, stabilizes topologies, adapts to dynamic changes, and reduces routing overhead. Additionally, the incorporation of Kalman filter-based prediction model further strengthens proactive routing. To validate the proposed approach, we conduct synthetic evaluations across varying vehicular densities with different mobility and traffic scenarios. We compare the traditional centralized routing strategy with the proposed distributed two-tier mechanism to assess execution cost, end-to-end latency, network resource consumption, data rates, packet flow, and packet loss. Quantified results demonstrate that our two-tier approach reduces the average execution cost from 438.61 to 230.48, lowers average latency from 232.34 ms to 129.03 ms, and minimizes average network consumption from 231.26 MB to 129.39 MB. The proposed approach continues to significantly enhance data rates, reduce packet flow processing, decrease packet loss across various routing strategies. Overall, the proposed solution enhances stability, responsiveness, and robustness of V2X communication, making it suitable for future large-scale IoV deployments.
车联网(IoV)是一个不断发展的领域,受到汽车通信和网络技术进步的推动。为了提高车辆覆盖范围,将V2X网络与蜂窝网络集成变得至关重要,尽管这种集成对蜂窝基础设施的需求增加了。为了解决这个问题,我们提出了一种两层稳定路径路由算法,旨在提高V2X网络的稳定性和主动性。我们的方法将覆盖区域划分为区域,并根据道路结构进一步细分为道路段。第一层管理路段内的路线,而第二层处理相邻路段车辆之间的路线。该方法提高了道路感知能力,稳定了拓扑结构,适应动态变化,减少了路由开销。此外,结合基于卡尔曼滤波的预测模型,进一步加强了主动路由。为了验证所提出的方法,我们在不同的车辆密度、不同的机动性和交通场景下进行了综合评估。我们将传统的集中式路由策略与提出的分布式两层机制进行比较,以评估执行成本、端到端延迟、网络资源消耗、数据速率、数据包流和数据包丢失。量化结果表明,我们的两层方法将平均执行成本从438.61降低到230.48,将平均延迟从232.34 ms降低到129.03 ms,并将平均网络消耗从231.26 MB降低到129.39 MB。所提出的方法继续显著提高数据速率,减少数据包流处理,减少各种路由策略之间的数据包丢失。总体而言,该解决方案增强了V2X通信的稳定性、响应性和鲁棒性,适用于未来的大规模车联网部署。
{"title":"Edge-based proactive and stable two-tier routing for IoV","authors":"Asif Mehmood ,&nbsp;Muhammad Afaq ,&nbsp;Faisal Mehmood ,&nbsp;Wang-Cheol Song","doi":"10.1016/j.future.2026.108367","DOIUrl":"10.1016/j.future.2026.108367","url":null,"abstract":"<div><div>The Internet of Vehicles (IoV) is an evolving domain fueled by advancements in vehicular communications and networking. To enhance vehicle coverage, integrating vehicle-to-everything (V2X) networks with cellular networks has become essential, though this integration places increased demand on cellular infrastructure. To address this, we propose a two-tier stable path routing algorithm designed to improve the stability and proactiveness of V2X networks. Our approach divides the coverage area into zones, further segmented into road segments based on road structure. The first tier manages routing within a road segment, while the second tier handles routing between vehicles in adjacent segments. This method improves road awareness, stabilizes topologies, adapts to dynamic changes, and reduces routing overhead. Additionally, the incorporation of Kalman filter-based prediction model further strengthens proactive routing. To validate the proposed approach, we conduct synthetic evaluations across varying vehicular densities with different mobility and traffic scenarios. We compare the traditional centralized routing strategy with the proposed distributed two-tier mechanism to assess execution cost, end-to-end latency, network resource consumption, data rates, packet flow, and packet loss. Quantified results demonstrate that our two-tier approach reduces the average execution cost from 438.61 to 230.48, lowers average latency from 232.34 ms to 129.03 ms, and minimizes average network consumption from 231.26 MB to 129.39 MB. The proposed approach continues to significantly enhance data rates, reduce packet flow processing, decrease packet loss across various routing strategies. Overall, the proposed solution enhances stability, responsiveness, and robustness of V2X communication, making it suitable for future large-scale IoV deployments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108367"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic response time analysis for preemption threshold scheduling in real-time systems 实时系统中抢占阈值调度的概率响应时间分析
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2026-01-01 DOI: 10.1016/j.future.2025.108363
Jin-Long Zhang, Yi-Wen Zhang
Probabilistic real-time systems employ probabilistic models to characterize the dynamic variations and uncertainties in task attributes. Such systems can tolerate occasional deadline misses, provided that their occurrence probability does not exceed a predefined threshold. However, existing scheduling approaches for probabilistic real-time systems often result in excessive preemption overhead or suffer from limited scheduling flexibility, leading to suboptimal system performance. To address this issue, this paper investigates the application of preemption threshold scheduling in probabilistic real-time systems, derives the corresponding probabilistic response time analysis, and proposes a priority allocation algorithm (MDMP-MTBPA) that aims to minimize the deadline miss probability while maximizing the tolerable blocking time. Experimental evaluations indicate that, compared to existing fixed-priority scheduling strategies, the proposed approach can improve schedulability, with results showing an average improvement of 23.7%.
概率实时系统采用概率模型来描述任务属性的动态变化和不确定性。这样的系统可以容忍偶尔错过最后期限,只要它们的发生概率不超过预定义的阈值。然而,现有的概率实时系统调度方法往往会导致过多的抢占开销或调度灵活性有限,从而导致系统性能次优。针对这一问题,本文研究了抢占阈值调度在概率实时系统中的应用,推导了相应的概率响应时间分析,提出了一种以最小化截止日期错过概率和最大化可容忍阻塞时间为目标的优先级分配算法(MDMP-MTBPA)。实验结果表明,与现有的固定优先级调度策略相比,该方法可提高系统的可调度性,平均提高23.7%。
{"title":"Probabilistic response time analysis for preemption threshold scheduling in real-time systems","authors":"Jin-Long Zhang,&nbsp;Yi-Wen Zhang","doi":"10.1016/j.future.2025.108363","DOIUrl":"10.1016/j.future.2025.108363","url":null,"abstract":"<div><div>Probabilistic real-time systems employ probabilistic models to characterize the dynamic variations and uncertainties in task attributes. Such systems can tolerate occasional deadline misses, provided that their occurrence probability does not exceed a predefined threshold. However, existing scheduling approaches for probabilistic real-time systems often result in excessive preemption overhead or suffer from limited scheduling flexibility, leading to suboptimal system performance. To address this issue, this paper investigates the application of preemption threshold scheduling in probabilistic real-time systems, derives the corresponding probabilistic response time analysis, and proposes a priority allocation algorithm (MDMP-MTBPA) that aims to minimize the deadline miss probability while maximizing the tolerable blocking time. Experimental evaluations indicate that, compared to existing fixed-priority scheduling strategies, the proposed approach can improve schedulability, with results showing an average improvement of 23.7%.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108363"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-scale HPC approaches and applications on highly distributed platforms. 大规模高性能计算方法及其在高度分布式平台上的应用
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2026-01-07 DOI: 10.1016/j.future.2025.108365
Alessia Antelmi , Emanuele Carlini
The ever-increasing complexity of scientific and industrial challenges due to the enormous amount of data available nowadays requires advanced high-performance computing (HPC) solutions capable of processing and analyzing data efficiently on highly distributed platforms. Traditional centralized HPC systems frequently fall short of the demands of contemporary large-scale applications (e.g., large language models), prompting a move towards more flexible and scalable distributed computing environments. Furthermore, the growing emphasis on the environmental impact of large-scale computing has highlighted the need for sustainable computing practices that minimize energy consumption and carbon footprint. This special issue targets contributions that investigate both the challenges and the opportunities arising from this evolution. The accepted articles highlight enhancements in five key areas: (i) HPC in the cloud continuum, (ii) heterogeneous HPC architectures, performance tools, and programming models, (iii) parallel and distributed algorithms and applications, (iv) data management and storage systems, and (v) sustainable and energy-efficient HPC systems. In total, 29 submissions were received, and 20 papers were selected after a rigorous peer-review process. Collectively, these contributions provide a representative snapshot of current research efforts towards resilient, efficient, and sustainable HPC approaches and applications on highly distributed platforms.
由于目前可用的大量数据,科学和工业挑战的复杂性不断增加,需要能够在高度分布式平台上有效处理和分析数据的先进高性能计算(HPC)解决方案。传统的集中式HPC系统经常无法满足当代大规模应用程序(例如,大型语言模型)的需求,这促使人们转向更灵活和可扩展的分布式计算环境。此外,大规模计算对环境的影响日益受到重视,这突出了对可持续计算实践的需求,这些实践可以最大限度地减少能源消耗和碳足迹。本期特刊针对的是调查这一演变所带来的挑战和机遇的文章。被接受的文章强调了五个关键领域的增强:(i)云连续体中的HPC, (ii)异构HPC架构,性能工具和编程模型,(iii)并行和分布式算法和应用程序,(iv)数据管理和存储系统,以及(v)可持续和节能的HPC系统。总共收到了29份意见书,经过严格的同行评议过程,20篇论文被选中。总的来说,这些贡献提供了当前对高度分布式平台上弹性、高效和可持续的高性能计算方法和应用的研究工作的代表性快照。
{"title":"Large-scale HPC approaches and applications on highly distributed platforms.","authors":"Alessia Antelmi ,&nbsp;Emanuele Carlini","doi":"10.1016/j.future.2025.108365","DOIUrl":"10.1016/j.future.2025.108365","url":null,"abstract":"<div><div>The ever-increasing complexity of scientific and industrial challenges due to the enormous amount of data available nowadays requires advanced high-performance computing (HPC) solutions capable of processing and analyzing data efficiently on highly distributed platforms. Traditional centralized HPC systems frequently fall short of the demands of contemporary large-scale applications (e.g., large language models), prompting a move towards more flexible and scalable distributed computing environments. Furthermore, the growing emphasis on the environmental impact of large-scale computing has highlighted the need for sustainable computing practices that minimize energy consumption and carbon footprint. This special issue targets contributions that investigate both the challenges and the opportunities arising from this evolution. The accepted articles highlight enhancements in five key areas: <em>(i)</em> HPC in the cloud continuum, <em>(ii)</em> heterogeneous HPC architectures, performance tools, and programming models, <em>(iii)</em> parallel and distributed algorithms and applications, <em>(iv)</em> data management and storage systems, and <em>(v)</em> sustainable and energy-efficient HPC systems. In total, 29 submissions were received, and 20 papers were selected after a rigorous peer-review process. Collectively, these contributions provide a representative snapshot of current research efforts towards resilient, efficient, and sustainable HPC approaches and applications on highly distributed platforms.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108365"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LE OS: A lightweight edge operating system for industrial internet of things under resource constraints LE OS:资源受限下的工业物联网轻量级边缘操作系统
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2026-01-07 DOI: 10.1016/j.future.2025.108360
Xianhui Liu , Yangyang Yang , Chenlin Zhu , Yihan Hu , Weidong Zhao
With the rise of Industry 4.0 and edge computing, intelligent manufacturing has undergone rapid development. However, existing research on operating systems for resource-constrained edge devices still exhibits significant limitations: mainstream operating systems require large hardware resources and lack adaptability for edge deployment; the industrial Internet lacks a unified and efficient scheduling and management framework for large-scale devices; and traditional monolithic systems suffer from tight component coupling, where a single component failure can cause system-wide crashes, threatening production stability. To address these challenges, this paper proposes LE OS, a lightweight edge operating system tailored for resource-constrained industrial Internet environments. LE OS leverages container technology to encapsulate system-level components into functional system containers and integrates them with the seL4 microkernel, forming a lightweight, containerized microkernel operating system.Experimental evaluation shows that LE OS improves CPU and I/O performance by 10%-40% and reduces system-level memory usage by over 70% compared with mainstream operating systems, while maintaining high resource efficiency and strong isolation. These results demonstrate that LE OS effectively overcomes the limitations of existing systems and provides a practical and scalable foundation for next-generation industrial Internet edge operating systems.
随着工业4.0和边缘计算的兴起,智能制造得到了快速发展。然而,现有的针对资源受限边缘设备的操作系统研究仍然存在明显的局限性:主流操作系统需要大量硬件资源,缺乏对边缘部署的适应性;工业互联网缺乏统一高效的大规模设备调度管理框架;传统的单片系统受到组件紧密耦合的影响,其中单个组件的故障可能导致系统范围的崩溃,从而威胁到生产的稳定性。为了应对这些挑战,本文提出了LE OS,这是一种为资源受限的工业互联网环境量身定制的轻量级边缘操作系统。LE OS利用容器技术将系统级组件封装到功能性系统容器中,并将它们与seL4微内核集成,形成轻量级的容器化微内核操作系统。实验评估表明,与主流操作系统相比,LE OS的CPU和I/O性能提高了10%-40%,系统级内存使用率降低了70%以上,同时保持了较高的资源效率和强隔离性。这些结果表明,LE OS有效地克服了现有系统的局限性,为下一代工业互联网边缘操作系统提供了实用和可扩展的基础。
{"title":"LE OS: A lightweight edge operating system for industrial internet of things under resource constraints","authors":"Xianhui Liu ,&nbsp;Yangyang Yang ,&nbsp;Chenlin Zhu ,&nbsp;Yihan Hu ,&nbsp;Weidong Zhao","doi":"10.1016/j.future.2025.108360","DOIUrl":"10.1016/j.future.2025.108360","url":null,"abstract":"<div><div>With the rise of Industry 4.0 and edge computing, intelligent manufacturing has undergone rapid development. However, existing research on operating systems for resource-constrained edge devices still exhibits significant limitations: mainstream operating systems require large hardware resources and lack adaptability for edge deployment; the industrial Internet lacks a unified and efficient scheduling and management framework for large-scale devices; and traditional monolithic systems suffer from tight component coupling, where a single component failure can cause system-wide crashes, threatening production stability. To address these challenges, this paper proposes LE OS, a lightweight edge operating system tailored for resource-constrained industrial Internet environments. LE OS leverages container technology to encapsulate system-level components into functional system containers and integrates them with the seL4 microkernel, forming a lightweight, containerized microkernel operating system.Experimental evaluation shows that LE OS improves CPU and I/O performance by 10%-40% and reduces system-level memory usage by over 70% compared with mainstream operating systems, while maintaining high resource efficiency and strong isolation. These results demonstrate that LE OS effectively overcomes the limitations of existing systems and provides a practical and scalable foundation for next-generation industrial Internet edge operating systems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108360"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An in-depth study of GPU frequency-scaling latency and its optimization on modern architectures 现代架构下GPU频率缩放延迟的深入研究及其优化
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2025-12-23 DOI: 10.1016/j.future.2025.108331
Daniel Velicka, Ondrej Vysocky, Osman Yasal, Lubomir Riha
The move towards the exascale systems in High-Performance Computing and the demand for Artificial Intelligence brought together thousands of CPUs and even more GPU accelerators. This massive hardware consolidation has made energy optimization a critical challenge. The immense amount of energy consumption creates a cascade of secondary issues: it increases the carbon footprint, generates significant heat that demands advanced cooling, and causes dramatic power fluctuations that threaten the stability of the electrical grid. Although energy-saving techniques based on Dynamic Voltage and Frequency Scaling are well understood for CPUs, a critical knowledge gap exists for GPU accelerators, limiting the ability to apply similar optimizations.
This paper presents a method for measuring how long it takes the CPU to adjust the operating frequency of the GPU (switching latency), and how long the frequency change itself takes to complete (transition latency). The approach employs a minimal iterative workload that allows statistically distinguishing runtime differences between frequency pairs. It first measures execution times for each frequency and then determines the switching and transition latency of the change from an initial to a target frequency by tracking runtime changes and repeating measurements to ensure statistical robustness. Finally, the methodology filters out outliers from external factors such as driver management or system interruptions.
The methodology is implemented in the open-source LATEST [1] tool with support for NVIDIA GPU accelerators. It is evaluated on three GPUs based on different generations of architecture, GH200, A100–SXM4, and RTX Quadro 6000. These results show that the transition latency takes from hundreds of microseconds up to hundreds of milliseconds, while the absolute majority of the time is spent in the GPU applying the frequency change. From the analysed GPUs, the GH200 exhibited the widest range, with switching latencies spanning from 5.6 ms to 477 ms and transition latencies from 0.2 ms to 471 ms. Additionally, the transition latency measurement can be used to identify manufacturing variability of accelerators, showing differences in frequency scaling reactivity.
Our analysis identifies specific frequency pairs with high switching latencies, creating a challenge that the slow transitions discourage their use, yet the target frequencies themselves may be highly efficient in terms of energy consumption. To address this, we introduce an indirect switching method that leverages an intermediate frequency. This technique effectively circumvents overhead, allowing the system to access these efficient frequency states without the high latency penalty of a direct transition. The use of the indirect frequency switching technique produced a latency reduction between 250 and 431 ms, for a single frequency change on GH200.
高性能计算向百亿亿级系统的发展,以及对人工智能的需求,将成千上万的cpu和更多的GPU加速器聚集在一起。这种大规模的硬件整合使得能源优化成为一个关键的挑战。巨大的能源消耗产生了一连串的次要问题:它增加了碳足迹,产生了需要先进冷却技术的大量热量,并造成了巨大的电力波动,威胁到电网的稳定性。尽管基于动态电压和频率缩放的节能技术在cpu上得到了很好的理解,但在GPU加速器上存在一个关键的知识鸿沟,限制了应用类似优化的能力。本文提出了一种方法来测量CPU调整GPU的工作频率所需的时间(切换延迟),以及频率变化本身完成所需的时间(转换延迟)。该方法采用最小的迭代工作负载,允许在统计上区分频率对之间的运行时差异。它首先测量每个频率的执行时间,然后通过跟踪运行时更改和重复测量来确定从初始频率到目标频率的切换和转换延迟,以确保统计稳健性。最后,该方法从驱动程序管理或系统中断等外部因素中过滤出异常值。该方法在支持NVIDIA GPU加速器的开源LATEST[1]工具中实现。在GH200、A100-SXM4和RTX Quadro 6000三种不同代架构的gpu上进行了测试。这些结果表明,转换延迟从数百微秒到数百毫秒不等,而绝大多数时间都花在GPU应用频率变化上。从分析的gpu来看,GH200表现出最宽的范围,切换延迟从5.6 ms到477 ms,转换延迟从0.2 ms到471 ms。此外,跃迁延迟测量可用于识别加速器的制造变异性,显示频率缩放反应性的差异。我们的分析确定了具有高切换延迟的特定频率对,这带来了一个挑战,即缓慢的转换阻碍了它们的使用,但目标频率本身在能量消耗方面可能是高效的。为了解决这个问题,我们引入了一种利用中频的间接开关方法。这种技术有效地规避了开销,允许系统访问这些有效的频率状态,而不会造成直接转换的高延迟损失。使用间接频率切换技术,GH200上的单次频率更改可将延迟减少250到431毫秒。
{"title":"An in-depth study of GPU frequency-scaling latency and its optimization on modern architectures","authors":"Daniel Velicka,&nbsp;Ondrej Vysocky,&nbsp;Osman Yasal,&nbsp;Lubomir Riha","doi":"10.1016/j.future.2025.108331","DOIUrl":"10.1016/j.future.2025.108331","url":null,"abstract":"<div><div>The move towards the exascale systems in High-Performance Computing and the demand for Artificial Intelligence brought together thousands of CPUs and even more GPU accelerators. This massive hardware consolidation has made energy optimization a critical challenge. The immense amount of energy consumption creates a cascade of secondary issues: it increases the carbon footprint, generates significant heat that demands advanced cooling, and causes dramatic power fluctuations that threaten the stability of the electrical grid. Although energy-saving techniques based on Dynamic Voltage and Frequency Scaling are well understood for CPUs, a critical knowledge gap exists for GPU accelerators, limiting the ability to apply similar optimizations.</div><div>This paper presents a method for measuring how long it takes the CPU to adjust the operating frequency of the GPU (switching latency), and how long the frequency change itself takes to complete (transition latency). The approach employs a minimal iterative workload that allows statistically distinguishing runtime differences between frequency pairs. It first measures execution times for each frequency and then determines the switching and transition latency of the change from an initial to a target frequency by tracking runtime changes and repeating measurements to ensure statistical robustness. Finally, the methodology filters out outliers from external factors such as driver management or system interruptions.</div><div>The methodology is implemented in the open-source LATEST [1] tool with support for NVIDIA GPU accelerators. It is evaluated on three GPUs based on different generations of architecture, GH200, A100–SXM4, and RTX Quadro 6000. These results show that the transition latency takes from hundreds of microseconds up to hundreds of milliseconds, while the absolute majority of the time is spent in the GPU applying the frequency change. From the analysed GPUs, the GH200 exhibited the widest range, with switching latencies spanning from 5.6 ms to 477 ms and transition latencies from 0.2 ms to 471 ms. Additionally, the transition latency measurement can be used to identify manufacturing variability of accelerators, showing differences in frequency scaling reactivity.</div><div>Our analysis identifies specific frequency pairs with high switching latencies, creating a challenge that the slow transitions discourage their use, yet the target frequencies themselves may be highly efficient in terms of energy consumption. To address this, we introduce an indirect switching method that leverages an intermediate frequency. This technique effectively circumvents overhead, allowing the system to access these efficient frequency states without the high latency penalty of a direct transition. The use of the indirect frequency switching technique produced a latency reduction between 250 and 431 ms, for a single frequency change on GH200.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108331"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtoFedGAN: A novel federated learning framework for training generative adversarial networks via dynamic dual-prototype alignment ProtoFedGAN:一种通过动态双原型对齐训练生成对抗网络的新型联邦学习框架
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2025-12-27 DOI: 10.1016/j.future.2025.108353
Zhigang Wang, Yuzi Li, Qinghua Zhang, Junfeng Zhao
Generative Adversarial Networks (GANs) have demonstrated significant potential in data-generation tasks. However, traditional centralized training requires the sharing of raw data, which poses risks of sensitive information leakage. Federated learning offers a solution, leading to the development of Federated GANs. This approach mitigates the risk to some extent by enabling distributed training without exchanging raw data. Nevertheless, existing Federated GAN frameworks face challenges in real-world scenarios characterized by heterogeneous client data and heterogeneous client models, including degraded generation performance, mode collapse, and potential privacy leaks. To address these challenges, this paper proposes ProtoFedGAN, a Federated Generative Adversarial Network based on Dynamic Dual-Prototype Alignment. Specifically, ProtoFedGAN introduces a prototype learning-based federated knowledge-sharing paradigm, which abstracts local client features into lightweight class prototypes and dynamically aggregates them on the server. This approach facilitates knowledge sharing among heterogeneous client models, enhances privacy protection through feature abstraction, and reduces communication overhead. Furthermore, a latent space alignment mechanism is proposed to enforce consistency between client generators’ latent spaces and the global distribution, coupled with a dynamic prototype aggregator that mitigates feature shifts induced by non-independent and identically distributed (Non-IID) data through similarity-weighted parameter adjustment. Finally, a dual-prototype-driven generation enhancement strategy is proposed, where the Main Prototype ensures global distribution stability by anchoring consensus features across clients, while the subprototypes promote multi-modal feature expression, thereby jointly optimizing both realism and diversity in generated data. Experimental results across four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) demonstrate that ProtoFedGAN consistently achieves the lowest FID, KL, and MMD, and the highest IS under both IID and Non-IID settings, outperforming recent federated GANs such as CAP-GAN, IFL-GAN, PRIVATE FL-GAN, and PerFED-GAN, particularly in heterogeneous environments.
生成对抗网络(GANs)在数据生成任务中显示出巨大的潜力。然而,传统的集中式培训需要共享原始数据,存在敏感信息泄露的风险。联邦学习提供了一个解决方案,从而导致了联邦gan的开发。这种方法在不交换原始数据的情况下支持分布式训练,从而在一定程度上降低了风险。然而,现有的Federated GAN框架在以异构客户端数据和异构客户端模型为特征的现实场景中面临挑战,包括生成性能下降、模式崩溃和潜在的隐私泄露。为了解决这些挑战,本文提出了ProtoFedGAN,一种基于动态双原型对齐的联邦生成对抗网络。具体来说,ProtoFedGAN引入了一个基于原型学习的联邦知识共享范例,它将本地客户端特性抽象为轻量级类原型,并在服务器上动态地聚合它们。这种方法促进了异构客户机模型之间的知识共享,通过特性抽象增强了隐私保护,并减少了通信开销。此外,提出了一种潜在空间对齐机制,以增强客户端生成器的潜在空间与全局分布之间的一致性,并结合一个动态原型聚合器,通过相似度加权参数调整来减轻非独立和同分布(Non-IID)数据引起的特征偏移。最后,提出了一种双原型驱动的生成增强策略,其中主原型通过锚定客户端共识特征来保证全局分布的稳定性,子原型促进多模态特征表达,从而共同优化生成数据的真实感和多样性。四个基准数据集(MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100)的实验结果表明,ProtoFedGAN在IID和非IID设置下始终具有最低的FID、KL和MMD,以及最高的IS,优于最近的联合gan,如CAP-GAN、IFL-GAN、PRIVATE FL-GAN和PerFED-GAN,特别是在异构环境中。
{"title":"ProtoFedGAN: A novel federated learning framework for training generative adversarial networks via dynamic dual-prototype alignment","authors":"Zhigang Wang,&nbsp;Yuzi Li,&nbsp;Qinghua Zhang,&nbsp;Junfeng Zhao","doi":"10.1016/j.future.2025.108353","DOIUrl":"10.1016/j.future.2025.108353","url":null,"abstract":"<div><div>Generative Adversarial Networks (GANs) have demonstrated significant potential in data-generation tasks. However, traditional centralized training requires the sharing of raw data, which poses risks of sensitive information leakage. Federated learning offers a solution, leading to the development of Federated GANs. This approach mitigates the risk to some extent by enabling distributed training without exchanging raw data. Nevertheless, existing Federated GAN frameworks face challenges in real-world scenarios characterized by heterogeneous client data and heterogeneous client models, including degraded generation performance, mode collapse, and potential privacy leaks. To address these challenges, this paper proposes ProtoFedGAN, a Federated Generative Adversarial Network based on Dynamic Dual-Prototype Alignment. Specifically, ProtoFedGAN introduces a prototype learning-based federated knowledge-sharing paradigm, which abstracts local client features into lightweight class prototypes and dynamically aggregates them on the server. This approach facilitates knowledge sharing among heterogeneous client models, enhances privacy protection through feature abstraction, and reduces communication overhead. Furthermore, a latent space alignment mechanism is proposed to enforce consistency between client generators’ latent spaces and the global distribution, coupled with a dynamic prototype aggregator that mitigates feature shifts induced by non-independent and identically distributed (Non-IID) data through similarity-weighted parameter adjustment. Finally, a dual-prototype-driven generation enhancement strategy is proposed, where the Main Prototype ensures global distribution stability by anchoring consensus features across clients, while the subprototypes promote multi-modal feature expression, thereby jointly optimizing both realism and diversity in generated data. Experimental results across four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) demonstrate that ProtoFedGAN consistently achieves the lowest FID, KL, and MMD, and the highest IS under both IID and Non-IID settings, outperforming recent federated GANs such as CAP-GAN, IFL-GAN, PRIVATE FL-GAN, and PerFED-GAN, particularly in heterogeneous environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108353"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-asynchronous energy-efficient federated prototype learning for end-edge-cloud architectures 面向端缘云架构的半异步节能联邦原型学习
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-06-01 Epub Date: 2025-12-31 DOI: 10.1016/j.future.2025.108351
Wendian Luo , Tong Yu , Shengxin Dai , Bing Guo , Xuesen Lin , Yanglin Pu
As Industry 5.0 advances rapidly, the Industrial Internet of Things (IIoT) integrates Artificial Intelligence (AI) technology to significantly enhance the intelligence of production processes. However, this advancement results in faster data generation and a higher demand for data processing in industrial scenarios. This leads to the sustained high-load operation of edge devices and cloud servers, which increases carbon emissions and raises concerns about data security. Implementing Federated Learning (FL) in IIoT frameworks effectively distributes the computational burden between the client and server, resolving privacy issues and enhancing energy efficiency. However, achieving energy efficiency while improving model performance is challenging within an IIoT system marked by heterogeneous AI models and imbalanced data. We present a semi-asynchronous, energy-efficient, federated prototype learning approach tailored to tackle these challenges with end-edge-cloud architectures. This method uploads data distribution instead of the raw data for privacy protection, employing Dynamic Voltage and Frequency Scaling (DVFS) technology to manage power consumption during training, thus achieving optimal energy efficiency. To boost model performance, we confront data imbalance by collecting feature distribution data from clients, generating virtual samples on the cloud server, and training a global classifier to promote local client learning. Our experiments across various datasets, including industrial datasets and large-scale heterogeneous scenarios, demonstrate that the proposed method enhances model accuracy and significantly reduces energy consumption compared to competitive methods, thereby validating its applicability in real-world, diverse environments.
随着工业5.0的快速发展,工业物联网(IIoT)集成了人工智能(AI)技术,大大提高了生产过程的智能化。然而,这种进步导致了更快的数据生成和对工业场景中数据处理的更高要求。这导致边缘设备和云服务器持续高负载运行,从而增加了碳排放,并引发了对数据安全的担忧。在工业物联网框架中实施联邦学习(FL)可以有效地在客户端和服务器之间分配计算负担,解决隐私问题并提高能源效率。然而,在以异构人工智能模型和不平衡数据为特征的工业物联网系统中,在提高模型性能的同时实现能源效率是一项挑战。我们提出了一种半异步、节能的联合原型学习方法,专门用于解决端边缘云架构的这些挑战。该方法通过上传数据分布代替原始数据进行隐私保护,采用动态电压和频率缩放(DVFS)技术对训练过程中的功耗进行管理,从而达到最佳的能源效率。为了提高模型性能,我们通过从客户端收集特征分布数据,在云服务器上生成虚拟样本,以及训练全局分类器来促进本地客户端学习来解决数据不平衡问题。我们在各种数据集(包括工业数据集和大规模异构场景)上的实验表明,与竞争方法相比,所提出的方法提高了模型精度,显著降低了能耗,从而验证了其在现实世界中不同环境的适用性。
{"title":"Semi-asynchronous energy-efficient federated prototype learning for end-edge-cloud architectures","authors":"Wendian Luo ,&nbsp;Tong Yu ,&nbsp;Shengxin Dai ,&nbsp;Bing Guo ,&nbsp;Xuesen Lin ,&nbsp;Yanglin Pu","doi":"10.1016/j.future.2025.108351","DOIUrl":"10.1016/j.future.2025.108351","url":null,"abstract":"<div><div>As Industry 5.0 advances rapidly, the Industrial Internet of Things (IIoT) integrates Artificial Intelligence (AI) technology to significantly enhance the intelligence of production processes. However, this advancement results in faster data generation and a higher demand for data processing in industrial scenarios. This leads to the sustained high-load operation of edge devices and cloud servers, which increases carbon emissions and raises concerns about data security. Implementing Federated Learning (FL) in IIoT frameworks effectively distributes the computational burden between the client and server, resolving privacy issues and enhancing energy efficiency. However, achieving energy efficiency while improving model performance is challenging within an IIoT system marked by heterogeneous AI models and imbalanced data. We present a semi-asynchronous, energy-efficient, federated prototype learning approach tailored to tackle these challenges with end-edge-cloud architectures. This method uploads data distribution instead of the raw data for privacy protection, employing Dynamic Voltage and Frequency Scaling (DVFS) technology to manage power consumption during training, thus achieving optimal energy efficiency. To boost model performance, we confront data imbalance by collecting feature distribution data from clients, generating virtual samples on the cloud server, and training a global classifier to promote local client learning. Our experiments across various datasets, including industrial datasets and large-scale heterogeneous scenarios, demonstrate that the proposed method enhances model accuracy and significantly reduces energy consumption compared to competitive methods, thereby validating its applicability in real-world, diverse environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108351"},"PeriodicalIF":6.2,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1