首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
Explainable AI-guided test-time adversarial defense for resilient YOLO detectors in Industrial Internet of Things 工业物联网中弹性YOLO探测器的可解释ai引导测试时间对抗防御
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-30 DOI: 10.1016/j.future.2025.108356
Ruinan Ma , Zuobin Ying , Wenjuan Li , Dehua Zhu , Wanlei Zhou , Yu-An Tan , Hongyi Liu
With deep learning-based object detectors widely deployed as visual components in Industrial Internet of Things (IIoT) devices like cameras, their adversarial robustness has become paramount to the security and resilience of hyperconnected industrial systems. Existing adversarial defenses are often inadequate for the complexities of object detection, and securing already deployed detectors with a lightweight defense that avoids costly retraining remains a major challenge. In this paper, we propose XAIAD-YOLO: Explainable AI-Guided Adversarial Defense for YOLO detectors, a novel test-time defense to enable resilient YOLO detectors. XAIAD-YOLO introduces a synergistic two-stage purification framework grounded in distinct theoretical principles. Its initial stage, based on signal processing principles, filters high-frequency adversarial noise from genuine image structures. The second stage performs targeted feature destabilization; guided by our efficient XAI saliency map and grounded in the principle of differential feature stability, it precisely neutralizes fragile adversarial artifacts. Experiments show that our XAI method achieves 66.08 FPS (1.56x faster than Grad-CAM++), and our defense method significantly improves adversarial robustness, making anchor-based, anchor-free, lightweight, and non-lightweight YOLO detectors more resilient in both white-box and black-box scenarios. By uniquely integrating explainability into the defense mechanism, XAIAD-YOLO provides a practical and effective solution for enhancing the resilience and trustworthiness of AI in critical industrial applications. Our source code and datasets are available https://anonymous.4open.science/r/XAIAD-YOLO-B0A3/here.
随着基于深度学习的对象检测器作为视觉组件广泛部署在工业物联网(IIoT)设备(如摄像头)中,它们的对抗性鲁棒性对于超连接工业系统的安全性和弹性至关重要。现有的对抗性防御通常不足以应对目标检测的复杂性,并且使用轻量级防御来保护已经部署的探测器,以避免昂贵的再培训仍然是一个主要挑战。在本文中,我们提出了XAIAD-YOLO:用于YOLO探测器的可解释ai制导对抗防御,这是一种新的测试时间防御,可以使YOLO探测器具有弹性。XAIAD-YOLO引入了基于不同理论原理的协同两阶段净化框架。它的初始阶段,基于信号处理原理,从真实图像结构中过滤高频对抗噪声。第二阶段执行目标特征不稳定;在我们高效的XAI显著性地图的指导下,基于差分特征稳定性的原则,它精确地中和了脆弱的对抗性人工制品。实验表明,我们的XAI方法达到了66.08 FPS(比Grad-CAM++快1.56倍),并且我们的防御方法显著提高了对抗鲁棒性,使基于锚点的、无锚点的、轻量级的和非轻量级的YOLO探测器在白盒和黑盒场景下都更具弹性。通过独特地将可解释性集成到防御机制中,XAIAD-YOLO为增强关键工业应用中人工智能的弹性和可信度提供了实用有效的解决方案。我们的源代码和数据集可以在https://anonymous.4open.science/r/XAIAD-YOLO-B0A3/here上找到。
{"title":"Explainable AI-guided test-time adversarial defense for resilient YOLO detectors in Industrial Internet of Things","authors":"Ruinan Ma ,&nbsp;Zuobin Ying ,&nbsp;Wenjuan Li ,&nbsp;Dehua Zhu ,&nbsp;Wanlei Zhou ,&nbsp;Yu-An Tan ,&nbsp;Hongyi Liu","doi":"10.1016/j.future.2025.108356","DOIUrl":"10.1016/j.future.2025.108356","url":null,"abstract":"<div><div>With deep learning-based object detectors widely deployed as visual components in Industrial Internet of Things (IIoT) devices like cameras, their adversarial robustness has become paramount to the security and resilience of hyperconnected industrial systems. Existing adversarial defenses are often inadequate for the complexities of object detection, and securing already deployed detectors with a lightweight defense that avoids costly retraining remains a major challenge. In this paper, we propose XAIAD-YOLO: Explainable AI-Guided Adversarial Defense for YOLO detectors, a novel test-time defense to enable resilient YOLO detectors. XAIAD-YOLO introduces a synergistic two-stage purification framework grounded in distinct theoretical principles. Its initial stage, based on signal processing principles, filters high-frequency adversarial noise from genuine image structures. The second stage performs targeted feature destabilization; guided by our efficient XAI saliency map and grounded in the principle of differential feature stability, it precisely neutralizes fragile adversarial artifacts. Experiments show that our XAI method achieves 66.08 FPS (1.56x faster than Grad-CAM++), and our defense method significantly improves adversarial robustness, making anchor-based, anchor-free, lightweight, and non-lightweight YOLO detectors more resilient in both white-box and black-box scenarios. By uniquely integrating explainability into the defense mechanism, XAIAD-YOLO provides a practical and effective solution for enhancing the resilience and trustworthiness of AI in critical industrial applications. Our source code and datasets are available <span><span>https://anonymous.4open.science/r/XAIAD-YOLO-B0A3/here</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108356"},"PeriodicalIF":6.2,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic tuning based on hardware performance counters and machine learning 基于硬件性能计数器和机器学习的自动调优
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-30 DOI: 10.1016/j.future.2025.108358
Suren Harutyunyan Gevorgyan , Eduardo César , Anna Sikora , Jiří Filipovič , Jordi Alcaraz
This paper presents a Machine Learning (ML) methodology for automatically tuning parallel applications in heterogeneous High Performance Computing (HPC) environments using Hardware Performance Counters (HwPCs). The methodology addresses three critical challenges: counter quantity versus accessibility tradeoff, data interpretation complexity, and dynamic optimization needs. The introduced ensemble-based methodology automatically identifies minimal yet informative HwPC sets for code region identification and tuning parameter optimization. Experimental validation demonstrates high accuracy in predicting optimal thread allocation ( > 0.90 K-fold accuracy) and thread affinity ( > 0.95 accuracy) while requiring only 4–6 HwPCs. Compared to search-based methods like OpenTuner, the methodology achieves competitive performance with dramatically reduced optimization time. The architecture-agnostic design enables consistent performance across CPU and GPU platforms. These results establish a foundation for efficient, portable, automatic, and scalable tuning of parallel applications.
本文提出了一种机器学习(ML)方法,用于使用硬件性能计数器(hwpc)在异构高性能计算(HPC)环境中自动调整并行应用程序。该方法解决了三个关键挑战:计数器数量与可访问性的权衡、数据解释的复杂性和动态优化需求。引入的基于集成的方法自动识别最小但信息HwPC集代码区域识别和调优参数优化。实验验证表明,在预测最佳线程分配( >; 0.90 k倍精度)和线程亲和性( >; 0.95精度)时,只需要4-6个hwpc。与基于搜索的方法(如OpenTuner)相比,该方法在显著减少优化时间的情况下实现了具有竞争力的性能。与架构无关的设计使CPU和GPU平台的性能保持一致。这些结果为并行应用程序的高效、可移植、自动和可伸缩调优奠定了基础。
{"title":"Automatic tuning based on hardware performance counters and machine learning","authors":"Suren Harutyunyan Gevorgyan ,&nbsp;Eduardo César ,&nbsp;Anna Sikora ,&nbsp;Jiří Filipovič ,&nbsp;Jordi Alcaraz","doi":"10.1016/j.future.2025.108358","DOIUrl":"10.1016/j.future.2025.108358","url":null,"abstract":"<div><div>This paper presents a Machine Learning (ML) methodology for automatically tuning parallel applications in heterogeneous High Performance Computing (HPC) environments using Hardware Performance Counters (HwPCs). The methodology addresses three critical challenges: counter quantity versus accessibility tradeoff, data interpretation complexity, and dynamic optimization needs. The introduced ensemble-based methodology automatically identifies minimal yet informative HwPC sets for code region identification and tuning parameter optimization. Experimental validation demonstrates high accuracy in predicting optimal thread allocation ( &gt; 0.90 K-fold accuracy) and thread affinity ( &gt; 0.95 accuracy) while requiring only 4–6 HwPCs. Compared to search-based methods like OpenTuner, the methodology achieves competitive performance with dramatically reduced optimization time. The architecture-agnostic design enables consistent performance across CPU and GPU platforms. These results establish a foundation for efficient, portable, automatic, and scalable tuning of parallel applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108358"},"PeriodicalIF":6.2,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resource-Efficient joint clustering and storage optimization for blockchain-Based IoT systems 基于区块链的物联网系统资源高效联合集群与存储优化
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-30 DOI: 10.1016/j.future.2025.108354
Kai Peng , Xueyan Hu , Jiaxing Hu , Zhiheng Yao , Tianping Deng , Menglan Hu , Chao Cai , Zehui Xiong
Blockchain technology is leveraged in the Internet of Things (IoT) systems to enhance data reliability and management efficiency, ensuring integrity, security, and auditability through a decentralized ledger architecture. However, resource-constrained IoT devices are unable to store the complete blockchain due to prohibitive resource consumption and performance degradation. While collaborative storage strategies have been proposed to mitigate these constraints, existing approaches often prioritize storage scalability without sufficiently addressing the selection of cooperative nodes for distributed ledger maintenance. This can lead to significant communication delays during block retrieval, undermining the real-time performance and overall efficiency of the blockchain-enabled IoT system. To address this challenge, this paper introduces a clustering-based collaborative storage scheme and proposes a novel joint optimization algorithm that iteratively refines both node clustering and block allocation strategies within the blockchain network. By structuring IoT devices into clustered peers, the algorithm reduces block query latency and facilitates efficient blockchain synchronization and update processes. Experimental evaluations confirm that the proposed method effectively alleviates storage limitations and lowers access costs in static blockchain-based IoT environments.
区块链技术被应用于物联网(IoT)系统,增强数据可靠性和管理效率,通过分布式账本架构确保完整性、安全性和可审计性。然而,由于资源消耗和性能下降,资源受限的物联网设备无法存储完整的区块链。虽然已经提出了协作存储策略来缓解这些限制,但现有的方法通常优先考虑存储可扩展性,而没有充分解决分布式账本维护的合作节点的选择。这可能导致在块检索过程中出现严重的通信延迟,从而破坏支持区块链的物联网系统的实时性能和整体效率。为了解决这一挑战,本文引入了一种基于聚类的协同存储方案,并提出了一种新的联合优化算法,该算法迭代地改进了区块链网络中的节点聚类和块分配策略。通过将物联网设备构建成集群对等体,该算法减少了块查询延迟,促进了高效的区块链同步和更新过程。实验评估证实,在基于区块链的静态物联网环境中,该方法有效地缓解了存储限制并降低了访问成本。
{"title":"Resource-Efficient joint clustering and storage optimization for blockchain-Based IoT systems","authors":"Kai Peng ,&nbsp;Xueyan Hu ,&nbsp;Jiaxing Hu ,&nbsp;Zhiheng Yao ,&nbsp;Tianping Deng ,&nbsp;Menglan Hu ,&nbsp;Chao Cai ,&nbsp;Zehui Xiong","doi":"10.1016/j.future.2025.108354","DOIUrl":"10.1016/j.future.2025.108354","url":null,"abstract":"<div><div>Blockchain technology is leveraged in the Internet of Things (IoT) systems to enhance data reliability and management efficiency, ensuring integrity, security, and auditability through a decentralized ledger architecture. However, resource-constrained IoT devices are unable to store the complete blockchain due to prohibitive resource consumption and performance degradation. While collaborative storage strategies have been proposed to mitigate these constraints, existing approaches often prioritize storage scalability without sufficiently addressing the selection of cooperative nodes for distributed ledger maintenance. This can lead to significant communication delays during block retrieval, undermining the real-time performance and overall efficiency of the blockchain-enabled IoT system. To address this challenge, this paper introduces a clustering-based collaborative storage scheme and proposes a novel joint optimization algorithm that iteratively refines both node clustering and block allocation strategies within the blockchain network. By structuring IoT devices into clustered peers, the algorithm reduces block query latency and facilitates efficient blockchain synchronization and update processes. Experimental evaluations confirm that the proposed method effectively alleviates storage limitations and lowers access costs in static blockchain-based IoT environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108354"},"PeriodicalIF":6.2,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TPQA: Efficient attention architecture with task-aware pattern-guided quantization TPQA:基于任务感知模式导向量化的高效注意架构
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-29 DOI: 10.1016/j.future.2025.108352
Sijia Wang , Shengbing Zhang , Lun Zhang , Yichao Yuan , Yawen Zhao , Xinyu Zhang , Meng Zhang
Attention mechanisms have become a cornerstone of modern deep learning models, yet their computational intensity poses significant deployment challenges for resource-limited devices. While quantization offers a potential solution, current approaches typically employ uniform precision assignment schemes across all attention heads, neglecting critical variations in head-specific contributions across different tasks. This oversight results in substantial computational redundancy for those attention heads with fewer contributions, impacting overall performance. Through systematic analysis of head pattern characteristics in transformer models, we reveal two key insights: different attention heads exhibit distinct task-aware patterns, and their varying contributions to model performance directly dictate differentiated quantization demands across heads. Building on these findings, we propose TPQA, a novel algorithm and accelerator co-design architecture for efficient deployment of transformer models. TPQA strategically assigns adaptive precision levels to each head based on pre-identified patterns, thereby reducing computational overhead while preserving model accuracy. Furthermore, TPQA employs a data reordering strategy to transform irregular workloads into structured formats and introduces a dedicated accelerator with an attention-weights-stationary dataflow to efficiently process these structured workloads. Comprehensive evaluations demonstrate TPQA’s superior performance, achieving up to 2.1 ×  speedup and 3.4 ×  energy efficiency improvement over state-of-the-art accelerators while maintaining <1% accuracy loss on various tasks.
注意机制已成为现代深度学习模型的基石,但其计算强度对资源有限的设备构成了重大的部署挑战。虽然量化提供了一个潜在的解决方案,但目前的方法通常在所有注意头中采用统一的精度分配方案,忽略了不同任务中头部特定贡献的关键变化。这种疏忽导致那些贡献较少的注意力头产生大量的计算冗余,从而影响整体性能。通过系统分析变压器模型中的头部模式特征,我们揭示了两个关键见解:不同的注意头部表现出不同的任务感知模式,它们对模型性能的不同贡献直接决定了不同头部的量化需求。基于这些发现,我们提出了TPQA,一种新的算法和加速器协同设计架构,用于有效部署变压器模型。TPQA基于预先识别的模式有策略地为每个头部分配自适应精度级别,从而在保持模型准确性的同时减少计算开销。此外,TPQA采用数据重新排序策略将不规则的工作负载转换为结构化格式,并引入一个专用加速器,该加速器具有注意力权重固定的数据流,可以有效地处理这些结构化工作负载。综合评估表明,TPQA具有卓越的性能,与最先进的加速器相比,可实现高达2.1 × 的加速和3.4 × 的能效改进,同时在各种任务中保持1%的精度损失。
{"title":"TPQA: Efficient attention architecture with task-aware pattern-guided quantization","authors":"Sijia Wang ,&nbsp;Shengbing Zhang ,&nbsp;Lun Zhang ,&nbsp;Yichao Yuan ,&nbsp;Yawen Zhao ,&nbsp;Xinyu Zhang ,&nbsp;Meng Zhang","doi":"10.1016/j.future.2025.108352","DOIUrl":"10.1016/j.future.2025.108352","url":null,"abstract":"<div><div>Attention mechanisms have become a cornerstone of modern deep learning models, yet their computational intensity poses significant deployment challenges for resource-limited devices. While quantization offers a potential solution, current approaches typically employ uniform precision assignment schemes across all attention heads, neglecting critical variations in head-specific contributions across different tasks. This oversight results in substantial computational redundancy for those attention heads with fewer contributions, impacting overall performance. Through systematic analysis of head pattern characteristics in transformer models, we reveal two key insights: different attention heads exhibit distinct task-aware patterns, and their varying contributions to model performance directly dictate differentiated quantization demands across heads. Building on these findings, we propose TPQA, a novel algorithm and accelerator co-design architecture for efficient deployment of transformer models. TPQA strategically assigns adaptive precision levels to each head based on pre-identified patterns, thereby reducing computational overhead while preserving model accuracy. Furthermore, TPQA employs a data reordering strategy to transform irregular workloads into structured formats and introduces a dedicated accelerator with an attention-weights-stationary dataflow to efficiently process these structured workloads. Comprehensive evaluations demonstrate TPQA’s superior performance, achieving up to 2.1 ×  speedup and 3.4 ×  energy efficiency improvement over state-of-the-art accelerators while maintaining &lt;1% accuracy loss on various tasks.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108352"},"PeriodicalIF":6.2,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software aging issues and rejuvenation strategies for a container orchestration system 容器编排系统的软件老化问题和复兴策略
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-28 DOI: 10.1016/j.future.2025.108274
Marcelo Santos , Rubens Matos , Marco Vieira , Jean Araujo
Software Aging and Rejuvenation (SAR) has been extensively studied due to its critical role in ensuring the reliable operation of systems. Although container orchestration is essential for efficiently managing and scaling cloud resources, the impact of SAR is not yet fully understood. This paper presents experiments conducted on two versions of Ubuntu Linux, simulating the operational scenarios of a private cloud. Each cluster includes one Main node and three Worker nodes, utilizing Containerd as the container runtime and Kubernetes as the orchestrator, across four distinct scenarios. The primary experimental conditions were maintained across all scenarios, including configurations, workloads, and test duration. Throughout each experiment, metrics such as CPU utilization, memory usage and disk utilization were monitored, considering system-wide values and observations for the Containerd and Kubelet services. The experiments also included measuring the response time of a web server for external HTTP requests submitted to the clusters. The initial scenario focused on investigating the effects of software aging, while subsequent scenarios explored the adoption of different rejuvenation strategies. Effects of software aging were observed across all scenarios, with resource leaks identified, particularly in memory usage, even when the cluster was under no load. The issues observed can lead to performance degradation and compromise reliability and availability if the system crashes due to memory exhaustion.
软件老化与返老还老(SAR)是保证系统可靠运行的关键问题,因此得到了广泛的研究。尽管容器编排对于有效地管理和扩展云资源是必不可少的,但是SAR的影响还没有被完全理解。本文在两个版本的Ubuntu Linux上进行了实验,模拟了私有云的操作场景。每个集群包括一个Main节点和三个Worker节点,使用Containerd作为容器运行时,使用Kubernetes作为编排器,跨越四个不同的场景。在所有场景中维持主要实验条件,包括配置、工作负载和测试持续时间。在每次实验中,考虑到Containerd和Kubelet服务的系统范围值和观察结果,对CPU利用率、内存使用和磁盘利用率等指标进行了监控。实验还包括测量web服务器对提交到集群的外部HTTP请求的响应时间。最初的场景侧重于调查软件老化的影响,而随后的场景则探讨了采用不同的恢复策略。在所有场景中都可以观察到软件老化的影响,即使集群处于无负载状态,也会发现资源泄漏,特别是内存使用。如果系统由于内存耗尽而崩溃,所观察到的问题可能导致性能下降,并损害可靠性和可用性。
{"title":"Software aging issues and rejuvenation strategies for a container orchestration system","authors":"Marcelo Santos ,&nbsp;Rubens Matos ,&nbsp;Marco Vieira ,&nbsp;Jean Araujo","doi":"10.1016/j.future.2025.108274","DOIUrl":"10.1016/j.future.2025.108274","url":null,"abstract":"<div><div>Software Aging and Rejuvenation (SAR) has been extensively studied due to its critical role in ensuring the reliable operation of systems. Although container orchestration is essential for efficiently managing and scaling cloud resources, the impact of SAR is not yet fully understood. This paper presents experiments conducted on two versions of Ubuntu Linux, simulating the operational scenarios of a private cloud. Each cluster includes one Main node and three Worker nodes, utilizing Containerd as the container runtime and Kubernetes as the orchestrator, across four distinct scenarios. The primary experimental conditions were maintained across all scenarios, including configurations, workloads, and test duration. Throughout each experiment, metrics such as CPU utilization, memory usage and disk utilization were monitored, considering system-wide values and observations for the Containerd and Kubelet services. The experiments also included measuring the response time of a web server for external HTTP requests submitted to the clusters. The initial scenario focused on investigating the effects of software aging, while subsequent scenarios explored the adoption of different rejuvenation strategies. Effects of software aging were observed across all scenarios, with resource leaks identified, particularly in memory usage, even when the cluster was under no load. The issues observed can lead to performance degradation and compromise reliability and availability if the system crashes due to memory exhaustion.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108274"},"PeriodicalIF":6.2,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HERCULES: A scalable and elastic ad-hoc file system for large-scale computing systems HERCULES:用于大规模计算系统的可伸缩和弹性的临时文件系统
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-28 DOI: 10.1016/j.future.2025.108350
Genaro Sánchez-Gallegos, Cosmin Petre, Javier Garcia-Blas, Jesus Carretero
The increasing demand for data processing by new, data-intensive applications is placing significant strain on the performance and capacity of HPC storage systems. Advancements in storage technologies, such as NVMe and persistent memory, have been introduced to address these demands. However, relying exclusively on ultra-fast storage devices is not cost-effective, necessitating multi-tier storage hierarchies to manage data based on its usage. In response, ad-hoc file systems have been proposed as a solution. These systems use the storage resources available in compute nodes, including memory and persistent storage, to create temporary file systems that adapt to application behavior in the HPC environment. This work presents the design, implementation, and evaluation of HERCULES, a distributed ad-hoc in-memory storage system, with a focus on its new metadata and elasticity model. HERCULES takes advantage of the Unified Communication X (UCX) framework, leveraging RDMA protocols such as Infiniband, Omnipath, shared-memory, and zero-copy transfers for data transfer. It includes elasticity features at runtime and fault-tolerant facilities. The elasticity features, together with flexible policies for data allocation, allow HERCULES to migrate data so that the available resources can be efficiently used. Our exhaustive evaluation results demonstrate a better performance than Lustre and BeeGFS, two parallel file systems heavily used in High-Performance Computing systems, and GekkoFS, an ad-hoc state-of-the-art solution.
新的数据密集型应用对数据处理的需求日益增长,给高性能计算存储系统的性能和容量带来了巨大的压力。为了满足这些需求,已经引入了NVMe和持久内存等存储技术的进步。然而,完全依赖超快存储设备并不划算,需要多层存储层次结构来根据其使用情况管理数据。为此,特设文件系统被提议作为一种解决方案。这些系统使用计算节点中可用的存储资源(包括内存和持久存储)来创建临时文件系统,以适应HPC环境中的应用程序行为。本文介绍了分布式ad-hoc内存存储系统HERCULES的设计、实现和评估,重点介绍了其新的元数据和弹性模型。HERCULES利用统一通信X (UCX)框架,利用RDMA协议(如Infiniband、Omnipath、共享内存和零拷贝传输)进行数据传输。它包括运行时的弹性特性和容错功能。弹性特性和灵活的数据分配策略使HERCULES能够迁移数据,从而有效地利用可用资源。我们详尽的评估结果表明,它的性能优于Lustre和BeeGFS(高性能计算系统中大量使用的两个并行文件系统)和GekkoFS(一种特别的最先进的解决方案)。
{"title":"HERCULES: A scalable and elastic ad-hoc file system for large-scale computing systems","authors":"Genaro Sánchez-Gallegos,&nbsp;Cosmin Petre,&nbsp;Javier Garcia-Blas,&nbsp;Jesus Carretero","doi":"10.1016/j.future.2025.108350","DOIUrl":"10.1016/j.future.2025.108350","url":null,"abstract":"<div><div>The increasing demand for data processing by new, data-intensive applications is placing significant strain on the performance and capacity of HPC storage systems. Advancements in storage technologies, such as NVMe and persistent memory, have been introduced to address these demands. However, relying exclusively on ultra-fast storage devices is not cost-effective, necessitating multi-tier storage hierarchies to manage data based on its usage. In response, <em>ad-hoc</em> file systems have been proposed as a solution. These systems use the storage resources available in compute nodes, including memory and persistent storage, to create temporary file systems that adapt to application behavior in the HPC environment. This work presents the design, implementation, and evaluation of HERCULES, a distributed <em>ad-hoc</em> in-memory storage system, with a focus on its new metadata and elasticity model. HERCULES takes advantage of the Unified Communication X (UCX) framework, leveraging RDMA protocols such as Infiniband, Omnipath, shared-memory, and zero-copy transfers for data transfer. It includes elasticity features at runtime and fault-tolerant facilities. The elasticity features, together with flexible policies for data allocation, allow HERCULES to migrate data so that the available resources can be efficiently used. Our exhaustive evaluation results demonstrate a better performance than Lustre and BeeGFS, two parallel file systems heavily used in High-Performance Computing systems, and GekkoFS, an <em>ad-hoc</em> state-of-the-art solution.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108350"},"PeriodicalIF":6.2,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Let’s trace it: Fine-grained serverless benchmarking for synchronous and asynchronous applications 让我们跟踪它:同步和异步应用程序的细粒度无服务器基准测试
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-28 DOI: 10.1016/j.future.2025.108336
Joel Scheuner , Simon Eismann , Sacheendra Talluri , Erwin van Eyk , Cristina Abad , Philipp Leitner , Alexandru Iosup
Making serverless computing widely applicable requires detailed understanding of performance. Although benchmarking approaches exist, their insights are coarse-grained and typically insufficient for (root cause) analysis of realistic serverless applications, which often consist of asynchronously coordinated functions and services. Addressing this gap, we design and implement ServiTrace, an approach for fine-grained distributed trace analysis and an application-level benchmarking suite for diverse serverless-application architectures. ServiTrace (i) analyzes distributed serverless traces using a novel algorithm and heuristics for extracting a detailed latency breakdown, (ii) leverages a suite of serverless applications representative of production usage, including synchronous and asynchronous serverless applications with external service integrations, and (iii) automates comprehensive, end-to-end experiments to capture application-level performance. Using our ServiTrace reference implementation, we conduct a large-scale empirical performance study in the market-leading AWS environment, collecting over 7.5 million execution traces. We make four main observations enabled by our latency breakdown analysis of median latency, cold starts, and tail latency for different application types and invocation patterns. For example, the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, and trigger-based coordination; all of which could be hidden without ServiTrace-like benchmarking. We release empirical data under FAIR principles and ServiTrace as a tested, extensible, open-source tool at https://github.com/ServiTrace/ReplicationPackage.
要使无服务器计算得到广泛应用,需要对性能有详细的了解。虽然存在基准测试方法,但它们的见解是粗粒度的,通常不足以(根本原因)分析实际的无服务器应用程序,这些应用程序通常由异步协调的功能和服务组成。为了解决这个问题,我们设计并实现了ServiTrace,这是一种用于细粒度分布式跟踪分析的方法,也是一种用于各种无服务器应用程序架构的应用程序级基准测试套件。ServiTrace (i)使用一种新颖的算法和启发法分析分布式无服务器跟踪,以提取详细的延迟分解,(ii)利用一套代表生产使用的无服务器应用程序,包括具有外部服务集成的同步和异步无服务器应用程序,以及(iii)自动化全面的端到端实验以捕获应用程序级性能。使用我们的ServiTrace参考实现,我们在市场领先的AWS环境中进行了大规模的实证性能研究,收集了超过750万条执行痕迹。通过对不同应用程序类型和调用模式的中位延迟、冷启动和尾延迟的延迟分解分析,我们进行了四项主要观察。例如,无服务器应用程序的中位数端到端延迟通常不是由功能计算决定的,而是由外部服务调用、编排和基于触发器的协调决定的;所有这些都可以在没有类似servitrace的基准测试的情况下隐藏起来。我们在FAIR原则下发布经验数据,并将ServiTrace作为经过测试的、可扩展的开源工具发布在https://github.com/ServiTrace/ReplicationPackage上。
{"title":"Let’s trace it: Fine-grained serverless benchmarking for synchronous and asynchronous applications","authors":"Joel Scheuner ,&nbsp;Simon Eismann ,&nbsp;Sacheendra Talluri ,&nbsp;Erwin van Eyk ,&nbsp;Cristina Abad ,&nbsp;Philipp Leitner ,&nbsp;Alexandru Iosup","doi":"10.1016/j.future.2025.108336","DOIUrl":"10.1016/j.future.2025.108336","url":null,"abstract":"<div><div>Making serverless computing widely applicable requires detailed understanding of performance. Although benchmarking approaches exist, their insights are coarse-grained and typically insufficient for (root cause) analysis of realistic serverless applications, which often consist of asynchronously coordinated functions and services. Addressing this gap, we design and implement ServiTrace, an approach for fine-grained distributed trace analysis and an application-level benchmarking suite for diverse serverless-application architectures. ServiTrace (i) analyzes distributed serverless traces using a novel algorithm and heuristics for extracting a detailed <em>latency breakdown</em>, (ii) leverages a suite of serverless applications representative of production usage, including <em>synchronous and asynchronous serverless applications</em> with external service integrations, and (iii) automates comprehensive, <em>end-to-end experiments</em> to capture application-level performance. Using our ServiTrace reference implementation, we conduct a large-scale empirical performance study in the market-leading AWS environment, collecting over 7.5 million execution traces. We make four main observations enabled by our <em>latency breakdown analysis</em> of median latency, cold starts, and tail latency for different application types and invocation patterns. For example, the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, and trigger-based coordination; all of which could be hidden without ServiTrace-like benchmarking. We release empirical data under FAIR principles and ServiTrace as a tested, extensible, open-source tool at <span><span>https://github.com/ServiTrace/ReplicationPackage</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108336"},"PeriodicalIF":6.2,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interference modeling and scheduling for compute-intensive batch applications 计算密集型批处理应用的干扰建模和调度
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-27 DOI: 10.1016/j.future.2025.108355
Chennian Xiong , Weiwei Lin , Huikang Huang , Jianpeng Lin , Keqin Li
Cloud computing and virtualization technologies have significantly improved resource utilization in data centers. However, performance interference caused by resource contention remains a major challenge, particularly for compute-intensive batch applications, which are vital for large-scale data processing and task scheduling. Addressing performance interference in the modeling and scheduling of such applications still requires improvement. Existing interference models often rely on stereotypical metrics and average values, ignoring the impact of temporal fluctuations, while conventional scheduling algorithms overlook interference dynamics, leading to suboptimal scheduling results. To overcome these limitations, this article investigates the key factors influencing the performance of compute-intensive workloads and introduces a novel performance interference model that incorporates temporal fluctuations. Furthermore, we propose a historical-data-driven scheduling method that accounts for both temporal dynamics and batch application interference characteristics. Experimental results demonstrate that the proposed performance interference model achieves higher accuracy and robustness against overfitting compared to existing models that neglect temporal variations. Additionally, our interference-aware scheduling algorithm significantly outperforms traditional methods in throughput, scheduling efficiency, and server load balancing, providing an effective solution to mitigate performance interference in cloud environments.
云计算和虚拟化技术显著提高了数据中心的资源利用率。然而,由资源争用引起的性能干扰仍然是一个主要挑战,特别是对于计算密集型批处理应用程序,这对于大规模数据处理和任务调度至关重要。在这些应用程序的建模和调度中处理性能干扰仍然需要改进。现有的干扰模型往往依赖于刻板的度量和平均值,忽略了时间波动的影响,而传统的调度算法忽略了干扰动力学,导致调度结果不是最优的。为了克服这些限制,本文研究了影响计算密集型工作负载性能的关键因素,并引入了一种包含时间波动的新型性能干扰模型。此外,我们提出了一种历史数据驱动的调度方法,该方法考虑了时间动态和批处理应用程序的干扰特性。实验结果表明,与忽略时间变化的现有模型相比,所提出的性能干扰模型对过拟合具有更高的精度和鲁棒性。此外,我们的干扰感知调度算法在吞吐量、调度效率和服务器负载平衡方面显著优于传统方法,为减轻云环境中的性能干扰提供了有效的解决方案。
{"title":"Interference modeling and scheduling for compute-intensive batch applications","authors":"Chennian Xiong ,&nbsp;Weiwei Lin ,&nbsp;Huikang Huang ,&nbsp;Jianpeng Lin ,&nbsp;Keqin Li","doi":"10.1016/j.future.2025.108355","DOIUrl":"10.1016/j.future.2025.108355","url":null,"abstract":"<div><div>Cloud computing and virtualization technologies have significantly improved resource utilization in data centers. However, performance interference caused by resource contention remains a major challenge, particularly for compute-intensive batch applications, which are vital for large-scale data processing and task scheduling. Addressing performance interference in the modeling and scheduling of such applications still requires improvement. Existing interference models often rely on stereotypical metrics and average values, ignoring the impact of temporal fluctuations, while conventional scheduling algorithms overlook interference dynamics, leading to suboptimal scheduling results. To overcome these limitations, this article investigates the key factors influencing the performance of compute-intensive workloads and introduces a novel performance interference model that incorporates temporal fluctuations. Furthermore, we propose a historical-data-driven scheduling method that accounts for both temporal dynamics and batch application interference characteristics. Experimental results demonstrate that the proposed performance interference model achieves higher accuracy and robustness against overfitting compared to existing models that neglect temporal variations. Additionally, our interference-aware scheduling algorithm significantly outperforms traditional methods in throughput, scheduling efficiency, and server load balancing, providing an effective solution to mitigate performance interference in cloud environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108355"},"PeriodicalIF":6.2,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtoFedGAN: A novel federated learning framework for training generative adversarial networks via dynamic dual-prototype alignment ProtoFedGAN:一种通过动态双原型对齐训练生成对抗网络的新型联邦学习框架
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-27 DOI: 10.1016/j.future.2025.108353
Zhigang Wang, Yuzi Li, Qinghua Zhang, Junfeng Zhao
Generative Adversarial Networks (GANs) have demonstrated significant potential in data-generation tasks. However, traditional centralized training requires the sharing of raw data, which poses risks of sensitive information leakage. Federated learning offers a solution, leading to the development of Federated GANs. This approach mitigates the risk to some extent by enabling distributed training without exchanging raw data. Nevertheless, existing Federated GAN frameworks face challenges in real-world scenarios characterized by heterogeneous client data and heterogeneous client models, including degraded generation performance, mode collapse, and potential privacy leaks. To address these challenges, this paper proposes ProtoFedGAN, a Federated Generative Adversarial Network based on Dynamic Dual-Prototype Alignment. Specifically, ProtoFedGAN introduces a prototype learning-based federated knowledge-sharing paradigm, which abstracts local client features into lightweight class prototypes and dynamically aggregates them on the server. This approach facilitates knowledge sharing among heterogeneous client models, enhances privacy protection through feature abstraction, and reduces communication overhead. Furthermore, a latent space alignment mechanism is proposed to enforce consistency between client generators’ latent spaces and the global distribution, coupled with a dynamic prototype aggregator that mitigates feature shifts induced by non-independent and identically distributed (Non-IID) data through similarity-weighted parameter adjustment. Finally, a dual-prototype-driven generation enhancement strategy is proposed, where the Main Prototype ensures global distribution stability by anchoring consensus features across clients, while the subprototypes promote multi-modal feature expression, thereby jointly optimizing both realism and diversity in generated data. Experimental results across four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) demonstrate that ProtoFedGAN consistently achieves the lowest FID, KL, and MMD, and the highest IS under both IID and Non-IID settings, outperforming recent federated GANs such as CAP-GAN, IFL-GAN, PRIVATE FL-GAN, and PerFED-GAN, particularly in heterogeneous environments.
生成对抗网络(GANs)在数据生成任务中显示出巨大的潜力。然而,传统的集中式培训需要共享原始数据,存在敏感信息泄露的风险。联邦学习提供了一个解决方案,从而导致了联邦gan的开发。这种方法在不交换原始数据的情况下支持分布式训练,从而在一定程度上降低了风险。然而,现有的Federated GAN框架在以异构客户端数据和异构客户端模型为特征的现实场景中面临挑战,包括生成性能下降、模式崩溃和潜在的隐私泄露。为了解决这些挑战,本文提出了ProtoFedGAN,一种基于动态双原型对齐的联邦生成对抗网络。具体来说,ProtoFedGAN引入了一个基于原型学习的联邦知识共享范例,它将本地客户端特性抽象为轻量级类原型,并在服务器上动态地聚合它们。这种方法促进了异构客户机模型之间的知识共享,通过特性抽象增强了隐私保护,并减少了通信开销。此外,提出了一种潜在空间对齐机制,以增强客户端生成器的潜在空间与全局分布之间的一致性,并结合一个动态原型聚合器,通过相似度加权参数调整来减轻非独立和同分布(Non-IID)数据引起的特征偏移。最后,提出了一种双原型驱动的生成增强策略,其中主原型通过锚定客户端共识特征来保证全局分布的稳定性,子原型促进多模态特征表达,从而共同优化生成数据的真实感和多样性。四个基准数据集(MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100)的实验结果表明,ProtoFedGAN在IID和非IID设置下始终具有最低的FID、KL和MMD,以及最高的IS,优于最近的联合gan,如CAP-GAN、IFL-GAN、PRIVATE FL-GAN和PerFED-GAN,特别是在异构环境中。
{"title":"ProtoFedGAN: A novel federated learning framework for training generative adversarial networks via dynamic dual-prototype alignment","authors":"Zhigang Wang,&nbsp;Yuzi Li,&nbsp;Qinghua Zhang,&nbsp;Junfeng Zhao","doi":"10.1016/j.future.2025.108353","DOIUrl":"10.1016/j.future.2025.108353","url":null,"abstract":"<div><div>Generative Adversarial Networks (GANs) have demonstrated significant potential in data-generation tasks. However, traditional centralized training requires the sharing of raw data, which poses risks of sensitive information leakage. Federated learning offers a solution, leading to the development of Federated GANs. This approach mitigates the risk to some extent by enabling distributed training without exchanging raw data. Nevertheless, existing Federated GAN frameworks face challenges in real-world scenarios characterized by heterogeneous client data and heterogeneous client models, including degraded generation performance, mode collapse, and potential privacy leaks. To address these challenges, this paper proposes ProtoFedGAN, a Federated Generative Adversarial Network based on Dynamic Dual-Prototype Alignment. Specifically, ProtoFedGAN introduces a prototype learning-based federated knowledge-sharing paradigm, which abstracts local client features into lightweight class prototypes and dynamically aggregates them on the server. This approach facilitates knowledge sharing among heterogeneous client models, enhances privacy protection through feature abstraction, and reduces communication overhead. Furthermore, a latent space alignment mechanism is proposed to enforce consistency between client generators’ latent spaces and the global distribution, coupled with a dynamic prototype aggregator that mitigates feature shifts induced by non-independent and identically distributed (Non-IID) data through similarity-weighted parameter adjustment. Finally, a dual-prototype-driven generation enhancement strategy is proposed, where the Main Prototype ensures global distribution stability by anchoring consensus features across clients, while the subprototypes promote multi-modal feature expression, thereby jointly optimizing both realism and diversity in generated data. Experimental results across four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) demonstrate that ProtoFedGAN consistently achieves the lowest FID, KL, and MMD, and the highest IS under both IID and Non-IID settings, outperforming recent federated GANs such as CAP-GAN, IFL-GAN, PRIVATE FL-GAN, and PerFED-GAN, particularly in heterogeneous environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108353"},"PeriodicalIF":6.2,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin HP2C-DT:高精度高性能计算机数字孪生
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-26 DOI: 10.1016/j.future.2025.108333
E. Iraola , M. García-Lorenzo , F. Lordan-Gomis , F. Rossi , E. Prieto-Araujo , R.M. Badia
Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.
To address this problem, we propose the High-Precision High-Performance Computer-enabled Digital Twin (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.
Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.
数字孪生正在改变我们监控、分析和控制物理系统的方式,但设计平衡实时响应和大量计算需求的架构仍然是一个挑战。基于云计算的解决方案经常与延迟和资源限制作斗争,而基于边缘的方法缺乏复杂模拟和数据驱动优化的处理能力。为了解决这个问题,我们提出了高精度高性能计算机支持的数字孪生(HP2C-DT)参考架构,它将高性能计算(HPC)集成到计算连续体中。与仅将HPC用于离线模拟的传统设置不同,HP2C-DT使其成为数字孪生工作流程的活跃部分,可以根据紧急情况和计算需求动态地将任务分配给边缘、云或HPC资源。此外,为了弥合理论与实践之间的差距,我们引入了HP2C-DT框架,这是一个使用comps在不同基础设施之间无缝分配工作负载的工作实现。我们在电网用例中对其进行了测试,展示了它如何通过边缘侧数据聚合将通信带宽减少一个数量级,通过动态卸载将响应时间提高2倍,并在实际资源范围内为计算密集型工作流保持近乎理想的强大扩展。这些结果表明,高性能计算驱动的方法可以推动数字孪生超越当前的限制,使它们更智能、更快、更有能力处理现实世界的复杂性。
{"title":"HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin","authors":"E. Iraola ,&nbsp;M. García-Lorenzo ,&nbsp;F. Lordan-Gomis ,&nbsp;F. Rossi ,&nbsp;E. Prieto-Araujo ,&nbsp;R.M. Badia","doi":"10.1016/j.future.2025.108333","DOIUrl":"10.1016/j.future.2025.108333","url":null,"abstract":"<div><div>Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.</div><div>To address this problem, we propose the <em>High-Precision High-Performance Computer-enabled Digital Twin</em> (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.</div><div>Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108333"},"PeriodicalIF":6.2,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1