首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads 动态工作负载下共定位延迟关键型JVM应用程序和批处理作业的自适应CPU共享
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.future.2026.108387
Dishi Xu , Fagui Liu , Bin Wang , Xuhao Tang , Qingbo Wu
Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.
在Java虚拟机(jlra)上运行的延迟关键型(LC)长时间运行的应用程序通常依赖大量的CPU过度配置来满足动态工作负载下的服务水平目标(Service-Level Objectives, slo),从而导致严重的资源利用率不足。此外,jlra表现出较差的冷启动性能,频繁删除和创建应用程序实例以调整资源分配会导致性能下降。此外,由于共享CPU资源的争用,通过与jlra一起部署尽力而为(BE)批处理作业来获取冗余资源会遇到严重的挑战。因此,我们提出了ChaosRM,一个用于JVM工作负载托管的双层资源管理框架,以提高资源利用效率,同时消除资源争用。与在非重叠CPU集中隔离jlra和批处理作业的传统方法不同,ChaosRM提出了一种三区CPU隔离机制,利用两个CPU区域隔离jlra和批处理作业,并利用一个共享区域并发执行它们的线程。应用程序范围的、基于学习的应用程序管理器根据全局工作负载调整jlra的实例状态,并自适应地学习共享区域分配策略和线程排队时间表示的性能目标;每个服务器上的节点管理器启发式地将CPU集绑定到JLRA,并根据此性能目标和JLRA实例状态在CPU区域之间动态调度批处理作业。实验结果表明,在保证jlra的SLOs的同时,ChaosRM将批处理作业的完成时间比最佳基准减少了14.10%,比所有基准减少了54.29%。
{"title":"Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads","authors":"Dishi Xu ,&nbsp;Fagui Liu ,&nbsp;Bin Wang ,&nbsp;Xuhao Tang ,&nbsp;Qingbo Wu","doi":"10.1016/j.future.2026.108387","DOIUrl":"10.1016/j.future.2026.108387","url":null,"abstract":"<div><div>Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108387"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IRL-D3QN: An intelligent multi-agent learning framework for dynamic spectrum management in vehicular networks 基于IRL-D3QN的车辆网络动态频谱管理智能多智能体学习框架
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-10 DOI: 10.1016/j.future.2026.108371
Jing Wang , Wenshi Dan , Ke Yang , Xing Tang , Lingyu Yan
The proliferation of vehicular networks within intelligent transportation systems (ITS) has significantly increased the demand for efficient and adaptive spectrum resource allocation. Spectrum coordination is challenging due to high vehicle traffic, intensive communication environments and diversified service requirements. These are of particular significance in Vehicle-to-Everything (V2X) communications, where adaptive conditions call out powerful solutions. Multi-agent reinforcement learning (MARL) techniques are promising and have been applied to the management of dynamic spectrum access, but with limitations including overestimated value functions, unsteady policy convergence, and dependence on manual choices of rewards, these techniques have limitations as far as their application in practice. This paper presents a new framework of spectrum management IRL-D3QN, which combines Inverse Reinforcement Learning (IRL) and a Dueling Double Deep Q-Network (D3QN). This algorithm involves a prediction network of rewards on determining intrinsic motivation according to its interplay with environments, eliminating the necessity of a danger of designing rewards manually. This enhances generalization in various situations. The dueling network design contributes to learning that is more stable because it keeps the values of state and values of the action apart. In the meantime, the bias of overestimation is minimized in the case of double q-learning. It has been demonstrated through simulations that IRL-D3QN can support a higher Vehicle to Infrastructure (V2I) transmission rate by 7.94 percent and demonstrate significantly less performance degradation under heavy communication loads than state of the art RL algorithms. Therefore, it will provide a solution to the distribution of dynamic spectrum, which will be scalable and self-sufficient in the next generation of vehicular communication systems.
智能交通系统(ITS)中车辆网络的激增显著增加了对高效和自适应频谱资源分配的需求。由于高车流量、密集通信环境和多样化的业务需求,频谱协调具有挑战性。这在车联网(V2X)通信中尤为重要,因为自适应条件需要强大的解决方案。多智能体强化学习(MARL)技术很有前途,已经应用于动态频谱接入的管理,但由于存在价值函数高估、策略收敛不稳定以及依赖人工选择奖励等局限性,这些技术在实际应用中存在局限性。本文提出了一种新的频谱管理框架IRL-D3QN,该框架将逆强化学习(IRL)和Dueling双深度Q-Network (D3QN)相结合。该算法包含一个奖励预测网络,根据其与环境的相互作用来确定内在动机,从而消除了手动设计奖励的必要性。这增强了在各种情况下的泛化。决斗网络设计有助于学习更稳定,因为它使状态值和动作值分开。同时,在双q学习的情况下,高估的偏差被最小化。通过仿真已经证明,IRL-D3QN可以支持7.94%的更高的车辆到基础设施(V2I)传输速率,并且在高通信负载下的性能下降明显小于最先进的RL算法。因此,它将为下一代车载通信系统提供可扩展和自给自足的动态频谱分配解决方案。
{"title":"IRL-D3QN: An intelligent multi-agent learning framework for dynamic spectrum management in vehicular networks","authors":"Jing Wang ,&nbsp;Wenshi Dan ,&nbsp;Ke Yang ,&nbsp;Xing Tang ,&nbsp;Lingyu Yan","doi":"10.1016/j.future.2026.108371","DOIUrl":"10.1016/j.future.2026.108371","url":null,"abstract":"<div><div>The proliferation of vehicular networks within intelligent transportation systems (ITS) has significantly increased the demand for efficient and adaptive spectrum resource allocation. Spectrum coordination is challenging due to high vehicle traffic, intensive communication environments and diversified service requirements. These are of particular significance in Vehicle-to-Everything (V2X) communications, where adaptive conditions call out powerful solutions. Multi-agent reinforcement learning (MARL) techniques are promising and have been applied to the management of dynamic spectrum access, but with limitations including overestimated value functions, unsteady policy convergence, and dependence on manual choices of rewards, these techniques have limitations as far as their application in practice. This paper presents a new framework of spectrum management IRL-D3QN, which combines Inverse Reinforcement Learning (IRL) and a Dueling Double Deep Q-Network (D3QN). This algorithm involves a prediction network of rewards on determining intrinsic motivation according to its interplay with environments, eliminating the necessity of a danger of designing rewards manually. This enhances generalization in various situations. The dueling network design contributes to learning that is more stable because it keeps the values of state and values of the action apart. In the meantime, the bias of overestimation is minimized in the case of double q-learning. It has been demonstrated through simulations that IRL-D3QN can support a higher Vehicle to Infrastructure (V2I) transmission rate by 7.94 percent and demonstrate significantly less performance degradation under heavy communication loads than state of the art RL algorithms. Therefore, it will provide a solution to the distribution of dynamic spectrum, which will be scalable and self-sufficient in the next generation of vehicular communication systems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108371"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative performance and efficiency analysis of Apple’s M architectures: A GEMM case study 苹果M架构的性能和效率比较分析:一个GEMM案例研究
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.future.2026.108393
Sandra Catalán , Rafael Rodríguez-Sánchez , Carlos García Sánchez , Luis Piñuel Moreno
This paper evaluates the performance and energy efficiency of Apple processors across multiple ARM-based M-series generations and models (standard and Pro). The study is motivated by the increasing heterogeneity of Apple´s SoC architectures, which integrate multiple computing engines raising the scientific question of which hardware components are best suited for executing general-purpose and domain-specific computations such as the GEneral Matrix Multiply (GEMM). The analysis focuses on four key components: the Central Processing Unit (CPU), the Graphics Processing Unit (GPU), the matrix calculation accelerator (AMX), and the Apple Neural Engine (ANE).
The assessments use the GEMM as benchmark to characterize the performance of the CPU and GPU, alongside tests on AMX, which is specialized in handling large-scale mathematical operations, and tests on the ANE, which is specifically designed for Deep Learning purposes. Additionally, energy consumption data has been collected to analyze the energy efficiency of the aforementioned resources. Results highlight notable improvements in computational capacity and energy efficiency over successive generations. On one hand, the AMX stands out as the most efficient component for FP32 and FP64 workloads, significantly boosting overall system performance. In the M4 Pro, which integrates two matrix accelerators, it achieves up to 68% of the GPU’s FP32 performance while consuming only 42% of its power. On the other hand, the ANE, although limited to FP16 precision, excels in energy efficiency for low-precision tasks, surpassing other accelerators with over 700 GFLOPs/Watt under batched workloads.
This analysis offers a clear understanding of how Apple´s custom ARM designs optimize both performance and energy use, particularly in the context of multi-core processing and specialized acceleration units. In addition, a significant contribution of this study is the comprehensive comparative analysis of Apple’s accelerators, which have previously been poorly documented and scarcely studied. The analysis spans different generations and compares the accelerators against both CPU and GPU performance.
本文评估了苹果处理器在多个基于arm的m系列世代和型号(标准和专业)中的性能和能效。这项研究的动机是苹果SoC架构日益增加的异质性,它集成了多个计算引擎,提出了哪个硬件组件最适合执行通用和特定领域的计算(如通用矩阵乘法(GEMM))的科学问题。分析集中在四个关键组件上:中央处理器(CPU)、图形处理单元(GPU)、矩阵计算加速器(AMX)和苹果神经引擎(ANE)。
{"title":"A comparative performance and efficiency analysis of Apple’s M architectures: A GEMM case study","authors":"Sandra Catalán ,&nbsp;Rafael Rodríguez-Sánchez ,&nbsp;Carlos García Sánchez ,&nbsp;Luis Piñuel Moreno","doi":"10.1016/j.future.2026.108393","DOIUrl":"10.1016/j.future.2026.108393","url":null,"abstract":"<div><div>This paper evaluates the performance and energy efficiency of Apple processors across multiple ARM-based M-series generations and models (standard and Pro). The study is motivated by the increasing heterogeneity of Apple´s SoC architectures, which integrate multiple computing engines raising the scientific question of which hardware components are best suited for executing general-purpose and domain-specific computations such as the GEneral Matrix Multiply (<span>GEMM</span>). The analysis focuses on four key components: the Central Processing Unit (CPU), the Graphics Processing Unit (GPU), the matrix calculation accelerator (AMX), and the Apple Neural Engine (ANE).</div><div>The assessments use the <span>GEMM</span> as benchmark to characterize the performance of the CPU and GPU, alongside tests on AMX, which is specialized in handling large-scale mathematical operations, and tests on the ANE, which is specifically designed for Deep Learning purposes. Additionally, energy consumption data has been collected to analyze the energy efficiency of the aforementioned resources. Results highlight notable improvements in computational capacity and energy efficiency over successive generations. On one hand, the AMX stands out as the most efficient component for FP32 and FP64 workloads, significantly boosting overall system performance. In the M4 Pro, which integrates two matrix accelerators, it achieves up to 68% of the GPU’s FP32 performance while consuming only 42% of its power. On the other hand, the ANE, although limited to FP16 precision, excels in energy efficiency for low-precision tasks, surpassing other accelerators with over 700 GFLOPs/Watt under batched workloads.</div><div>This analysis offers a clear understanding of how Apple´s custom ARM designs optimize both performance and energy use, particularly in the context of multi-core processing and specialized acceleration units. In addition, a significant contribution of this study is the comprehensive comparative analysis of Apple’s accelerators, which have previously been poorly documented and scarcely studied. The analysis spans different generations and compares the accelerators against both CPU and GPU performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108393"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AFMIS: An approximate floating-point multiplier based on input segmentation 基于输入分割的近似浮点乘法器
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-18 DOI: 10.1016/j.future.2026.108377
Asma Naseri Rad , Shaghayegh Vahdat , Ali Afzali-Kusha , Massoud Pedram
This paper proposes an approximate floating-point (FP) multiplier, called AFMIS, which is based on input segmentation. The AFMIS multiplier statically divides the input mantissas into several segments and performs exact multiplication on the selected segments. This approach eliminates the need for a costly leading-one detector (LOD) circuit. The static segmentation and limited segment count in the proposed design reduce the number of required post-multiplication shift values. With only a few possible shifts, a simple multiplexer can replace a full shifter. This substitution improves speed compared with that of dynamic segmentation approaches. The proposed structure allows for adjustable accuracy levels by modifying the number of bits in each segment, making it suitable for a wide range of applications. To evaluate the efficiency of the AFMIS multiplier, its hardware parameters are compared to those of an exact FP multiplier and several other approximate FP multipliers. The comparison is performed using Synopsys Design Compiler in a 7 nm technology. The results show that the proposed multiplier achieves a mean relative error distance (MRED) of 0.27% to 18.6% while improving delay, area, and power consumption by up to 81.7%, 98%, and 99%, respectively, compared to the exact FP multiplier. Furthermore, the AFMIS multiplier outperforms other approximate FP multipliers in terms of speed, area, and energy consumption at similar accuracy levels. The utility of the AFMIS multiplier is demonstrated by its application in regression and classification tasks using neural networks (NNs) and JPEG compression. The results indicate that, in most cases, the output differences between the AFMIS multiplier and the exact multiplier are negligible.
本文提出了一种基于输入分割的近似浮点乘法器AFMIS。AFMIS乘数静态地将输入的尾数分成几个部分,并对所选部分执行精确的乘法。这种方法消除了对昂贵的前置检测器(LOD)电路的需求。所提出的设计中的静态分段和有限分段计数减少了所需的乘法后移位值的数量。只有几个可能的移位,一个简单的多路复用器可以取代一个完整的移位器。与动态分割方法相比,这种替换提高了速度。所提出的结构允许通过修改每个段中的位数来调节精度水平,使其适用于广泛的应用。为了评估AFMIS乘法器的效率,将其硬件参数与精确FP乘法器和其他几种近似FP乘法器的硬件参数进行了比较。比较是使用7纳米技术的Synopsys设计编译器进行的。结果表明,与精确的FP乘法器相比,该乘法器的平均相对误差距离(MRED)为0.27%至18.6%,同时延迟、面积和功耗分别提高了81.7%、98%和99%。此外,在类似精度水平下,AFMIS乘法器在速度、面积和能耗方面优于其他近似FP乘法器。AFMIS乘数在神经网络(nn)和JPEG压缩的回归和分类任务中的应用证明了它的实用性。结果表明,在大多数情况下,AFMIS乘法器和精确乘法器之间的输出差异可以忽略不计。
{"title":"AFMIS: An approximate floating-point multiplier based on input segmentation","authors":"Asma Naseri Rad ,&nbsp;Shaghayegh Vahdat ,&nbsp;Ali Afzali-Kusha ,&nbsp;Massoud Pedram","doi":"10.1016/j.future.2026.108377","DOIUrl":"10.1016/j.future.2026.108377","url":null,"abstract":"<div><div>This paper proposes an approximate floating-point (FP) multiplier, called AFMIS, which is based on input segmentation. The AFMIS multiplier statically divides the input mantissas into several segments and performs exact multiplication on the selected segments. This approach eliminates the need for a costly leading-one detector (LOD) circuit. The static segmentation and limited segment count in the proposed design reduce the number of required post-multiplication shift values. With only a few possible shifts, a simple multiplexer can replace a full shifter. This substitution improves speed compared with that of dynamic segmentation approaches. The proposed structure allows for adjustable accuracy levels by modifying the number of bits in each segment, making it suitable for a wide range of applications. To evaluate the efficiency of the AFMIS multiplier, its hardware parameters are compared to those of an exact FP multiplier and several other approximate FP multipliers. The comparison is performed using Synopsys Design Compiler in a 7 nm technology. The results show that the proposed multiplier achieves a mean relative error distance (MRED) of 0.27% to 18.6% while improving delay, area, and power consumption by up to 81.7%, 98%, and 99%, respectively, compared to the exact FP multiplier. Furthermore, the AFMIS multiplier outperforms other approximate FP multipliers in terms of speed, area, and energy consumption at similar accuracy levels. The utility of the AFMIS multiplier is demonstrated by its application in regression and classification tasks using neural networks (NNs) and JPEG compression. The results indicate that, in most cases, the output differences between the AFMIS multiplier and the exact multiplier are negligible.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108377"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiD-Accel: Accelerated bidimensional input-aware SDC vulnerability assessment for GPU static instructions BiD-Accel: GPU静态指令加速二维输入感知SDC漏洞评估
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-08 DOI: 10.1016/j.future.2026.108372
Zhenyu Qian , Lianguo Wang , Pengfei Zhang , Jianing Rao
Graphics Processing Units (GPUs) are increasingly used in safety-critical systems where Silent Data Corruptions (SDCs) pose severe risks. Selective Instruction Duplication (SID) can mitigate these risks but relies on accurate static-instruction vulnerability assessment, which is complicated by variations in input values and sizes. This paper presents a comprehensive study of how input characteristics shape instruction-level SDC vulnerability, which we quantify using the Static Instruction Error Probability (SIEP) and the SDC Occurrence rate (SDCO). We extend gpuFI-4 to enable fault injection mapping at the static-instruction level. Across 14 benchmarks and more than ten million single-, double-, and triple-bit injections, we find that SIEP is largely value-insensitive, whereas SDCO is highly value-sensitive. For register instructions, SDCO remains stable for random and structured-sparse inputs but differs markedly for all-zero, NaN, or denormal inputs. Moreover, when SIEP is size-sensitive, SDCO also tends to exhibit size sensitivity. We further observe that invalid-injection rates decrease with input size and that shared-memory instructions, though few, can contribute disproportionately to SDCs. Leveraging these insights, we propose BiD-Accel, a bi-dimensional, input-aware framework for accelerated static-instruction SDC vulnerability assessment. Its SIEP-driven Descending Order Sort (DOS) method achieves stable SDCO rankings with injections on only 70.4% of instructions on average, compared with 86.2% for the Random Ordering (RO) method, thereby meaningfully reducing assessment cost while preserving ranking fidelity and providing actionable guidance for robust SID under input-varying GPU workloads.
图形处理单元(gpu)越来越多地用于安全关键系统,在这些系统中,静默数据损坏(sdc)会带来严重的风险。选择性指令复制(SID)可以减轻这些风险,但它依赖于准确的静态指令漏洞评估,而输入值和大小的变化使其变得复杂。本文全面研究了输入特征如何影响指令级SDC漏洞,并使用静态指令错误概率(SIEP)和SDC发生率(SDCO)对其进行量化。我们扩展了gpuFI-4,以在静态指令级别启用故障注入映射。通过14次基准测试和数千万次单、双、三比特注入,我们发现SIEP在很大程度上对值不敏感,但对大小敏感,而SDCO对值高度敏感。对于寄存器指令,SDCO对于随机和结构化稀疏输入保持稳定,但对于全零、NaN或非正常输入则明显不同。此外,当SIEP对规模敏感时,SDCO也表现出规模敏感性(Pearson r=0.609, p=1.85×10−5)。我们进一步观察到,无效注入率随输入大小而降低,共享内存指令虽然很少,但对sdc的贡献不成比例。利用这些见解,我们提出了BiD-Accel,这是一个双向的、输入感知的框架,用于加速静态指令SDC漏洞评估。其siep驱动的递减顺序排序(DOS)方法平均仅对70.4%的指令进行注入,实现了稳定的SDC漏洞排名,而随机排序(RO)方法的平均注入率为86.2%,从而大大降低了评估成本,同时保持了排名保真度,并为不同输入GPU工作负载下的鲁棒SID提供了可操作的指导。
{"title":"BiD-Accel: Accelerated bidimensional input-aware SDC vulnerability assessment for GPU static instructions","authors":"Zhenyu Qian ,&nbsp;Lianguo Wang ,&nbsp;Pengfei Zhang ,&nbsp;Jianing Rao","doi":"10.1016/j.future.2026.108372","DOIUrl":"10.1016/j.future.2026.108372","url":null,"abstract":"<div><div>Graphics Processing Units (GPUs) are increasingly used in safety-critical systems where Silent Data Corruptions (SDCs) pose severe risks. Selective Instruction Duplication (SID) can mitigate these risks but relies on accurate static-instruction vulnerability assessment, which is complicated by variations in input values and sizes. This paper presents a comprehensive study of how input characteristics shape instruction-level SDC vulnerability, which we quantify using the Static Instruction Error Probability (SIEP) and the SDC Occurrence rate (SDCO). We extend gpuFI-4 to enable fault injection mapping at the static-instruction level. Across 14 benchmarks and more than ten million single-, double-, and triple-bit injections, we find that SIEP is largely value-insensitive, whereas SDCO is highly value-sensitive. For register instructions, SDCO remains stable for random and structured-sparse inputs but differs markedly for all-zero, NaN, or denormal inputs. Moreover, when SIEP is size-sensitive, SDCO also tends to exhibit size sensitivity. We further observe that invalid-injection rates decrease with input size and that shared-memory instructions, though few, can contribute disproportionately to SDCs. Leveraging these insights, we propose BiD-Accel, a bi-dimensional, input-aware framework for accelerated static-instruction SDC vulnerability assessment. Its SIEP-driven Descending Order Sort (DOS) method achieves stable SDCO rankings with injections on only 70.4% of instructions on average, compared with 86.2% for the Random Ordering (RO) method, thereby meaningfully reducing assessment cost while preserving ranking fidelity and providing actionable guidance for robust SID under input-varying GPU workloads.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108372"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability analysis of hardware accelerators for decision tree-based classifier systems 基于决策树的分类器系统硬件加速器可靠性分析
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-20 DOI: 10.1016/j.future.2026.108378
Mario Barbareschi , Salvatore Barone , Alberto Bosio , Antonio Emmanuele
The increasing adoption of AI models has driven applications toward the use of hardware accelerators to meet high computational demands and strict performance requirements. Beyond consideration of performance and energy efficiency, explainability and reliability have emerged as pivotal requirements, particularly for critical applications such as automotive, medical, and aerospace systems. Among the various AI models, Decision Tree Ensembles (DTEs) are particularly notable for their high accuracy and explainability. Moreover, they are particularly well-suited for hardware implementations, enabling high-performance and improved energy efficiency. However, a frequently overlooked aspect of DTEs is their reliability in the presence of hardware malfunctions. While DTEs are generally regarded as robust by design, due to their redundancy and voting mechanisms, hardware faults can still have catastrophic consequences. To address this gap, we present an in-depth reliability analysis of two types of DTE hardware accelerators: classical and approximate implementations. Specifically, we conduct a comprehensive fault injection campaign, varying the number of trees involved in the classification task, the approximation technique used, and the tolerated accuracy loss, while evaluating several benchmark datasets. The results of this study demonstrate that approximation techniques have to be carefully designed, as they can significantly impact resilience. However, techniques that target the representation of features and thresholds appear to be better suited for fault tolerance.
人工智能模型的日益普及推动了应用程序对硬件加速器的使用,以满足高计算需求和严格的性能要求。除了考虑性能和能源效率之外,可解释性和可靠性已成为关键要求,特别是对于汽车、医疗和航空航天系统等关键应用。在各种人工智能模型中,决策树集成(dte)以其高精度和可解释性而特别引人注目。此外,它们特别适合硬件实现,从而实现高性能和改进的能源效率。然而,dte的一个经常被忽视的方面是它们在出现硬件故障时的可靠性。虽然dte在设计上通常被认为是健壮的,但由于它们的冗余和投票机制,硬件故障仍然可能产生灾难性的后果。为了解决这一差距,我们对两种类型的DTE硬件加速器进行了深入的可靠性分析:经典实现和近似实现。具体来说,我们进行了全面的故障注入活动,改变了分类任务中涉及的树的数量、使用的近似技术和可容忍的精度损失,同时评估了几个基准数据集。本研究的结果表明,近似技术必须仔细设计,因为它们可以显著影响弹性。然而,以特征和阈值表示为目标的技术似乎更适合于容错。
{"title":"Reliability analysis of hardware accelerators for decision tree-based classifier systems","authors":"Mario Barbareschi ,&nbsp;Salvatore Barone ,&nbsp;Alberto Bosio ,&nbsp;Antonio Emmanuele","doi":"10.1016/j.future.2026.108378","DOIUrl":"10.1016/j.future.2026.108378","url":null,"abstract":"<div><div>The increasing adoption of AI models has driven applications toward the use of hardware accelerators to meet high computational demands and strict performance requirements. Beyond consideration of performance and energy efficiency, explainability and reliability have emerged as pivotal requirements, particularly for critical applications such as automotive, medical, and aerospace systems. Among the various AI models, Decision Tree Ensembles (DTEs) are particularly notable for their high accuracy and explainability. Moreover, they are particularly well-suited for hardware implementations, enabling high-performance and improved energy efficiency. However, a frequently overlooked aspect of DTEs is their reliability in the presence of hardware malfunctions. While DTEs are generally regarded as robust by design, due to their redundancy and voting mechanisms, hardware faults can still have catastrophic consequences. To address this gap, we present an in-depth reliability analysis of two types of DTE hardware accelerators: classical and approximate implementations. Specifically, we conduct a comprehensive fault injection campaign, varying the number of trees involved in the classification task, the approximation technique used, and the tolerated accuracy loss, while evaluating several benchmark datasets. The results of this study demonstrate that approximation techniques have to be carefully designed, as they can significantly impact resilience. However, techniques that target the representation of features and thresholds appear to be better suited for fault tolerance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108378"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems 动态无人机辅助MEC系统的在线三维轨迹与资源优化
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.future.2026.108389
Zhao Tong , Shiyan Zhang , Jing Mei , Can Wang , Keqin Li
The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an Online decision-making algorithm for Dynamic environments based on Exploration-enhanced Greedy DDPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.
无人机(uav)与移动边缘计算(MEC)技术的融合发展,为用户提供更加灵活、可靠、优质的计算服务。然而,大多数无人机辅助MEC模型设计主要集中在静态环境,这并不适用于本工作中考虑的实际场景。在本文中,我们考虑了一个无人机辅助的MEC平台,该平台可以为随机移动和任务到达的多个移动地面用户提供连续服务。此外,我们研究了无人机辅助MEC系统的长期系统效用最大化问题,考虑了任务的持续卸载、用户的移动性、无人机的三维轨迹控制和资源分配。为解决系统信息有限、高维连续动作和状态空间逼近等问题,提出了一种基于探索增强型贪婪DDPG (ODEGD)的动态环境在线决策算法。此外,为了更准确地评估算法的性能,我们在实验中引入了现实世界的道路。实验结果表明,与其他算法相比,该算法的响应延迟降低了26.98%,能耗降低了22.61%,同时实现了最高的系统效用。这些结果验证了ODEGD算法在动态条件下的适用性,表明其具有良好的鲁棒性和可扩展性。
{"title":"Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems","authors":"Zhao Tong ,&nbsp;Shiyan Zhang ,&nbsp;Jing Mei ,&nbsp;Can Wang ,&nbsp;Keqin Li","doi":"10.1016/j.future.2026.108389","DOIUrl":"10.1016/j.future.2026.108389","url":null,"abstract":"<div><div>The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an <u>O</u>nline decision-making algorithm for <u>D</u>ynamic environments based on <u>E</u>xploration-enhanced <u>G</u>reedy <u>D</u>DPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108389"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs 基于任务的多加速器api异构系统数据流方法
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.future.2026.108383
Aleix Boné , Alejandro Aguirre , David Álvarez , Pedro J. Martinez-Ferrer , Vicenç Beltran
Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.
Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.
The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.
These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.
结合多核cpu和多种加速器的异构节点正在迅速成为高性能计算(HPC)和人工智能基础设施的标准。然而,利用这些平台需要协调几个低级加速器api,如CUDA、SYCL和Triton。在某些情况下,它们可以与优化的供应商数学库结合使用:例如,cuBLAS和oneAPI。每个API或库都引入了自己的抽象、执行语义和同步机制。因此,在单个应用程序中组合它们很容易出错,而且需要大量的劳动。我们建议重用基于任务的数据流方法以及任务感知API (TA-libs)来克服这些限制,并促进多个加速器编程模型的无缝集成,同时仍然利用每个API提供的同类最佳内核。
{"title":"A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs","authors":"Aleix Boné ,&nbsp;Alejandro Aguirre ,&nbsp;David Álvarez ,&nbsp;Pedro J. Martinez-Ferrer ,&nbsp;Vicenç Beltran","doi":"10.1016/j.future.2026.108383","DOIUrl":"10.1016/j.future.2026.108383","url":null,"abstract":"<div><div>Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.</div><div>Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.</div><div>The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.</div><div>These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108383"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin HP2C-DT:高精度高性能计算机数字孪生
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2025-12-26 DOI: 10.1016/j.future.2025.108333
E. Iraola , M. García-Lorenzo , F. Lordan-Gomis , F. Rossi , E. Prieto-Araujo , R.M. Badia
Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.
To address this problem, we propose the High-Precision High-Performance Computer-enabled Digital Twin (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.
Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.
数字孪生正在改变我们监控、分析和控制物理系统的方式,但设计平衡实时响应和大量计算需求的架构仍然是一个挑战。基于云计算的解决方案经常与延迟和资源限制作斗争,而基于边缘的方法缺乏复杂模拟和数据驱动优化的处理能力。为了解决这个问题,我们提出了高精度高性能计算机支持的数字孪生(HP2C-DT)参考架构,它将高性能计算(HPC)集成到计算连续体中。与仅将HPC用于离线模拟的传统设置不同,HP2C-DT使其成为数字孪生工作流程的活跃部分,可以根据紧急情况和计算需求动态地将任务分配给边缘、云或HPC资源。此外,为了弥合理论与实践之间的差距,我们引入了HP2C-DT框架,这是一个使用comps在不同基础设施之间无缝分配工作负载的工作实现。我们在电网用例中对其进行了测试,展示了它如何通过边缘侧数据聚合将通信带宽减少一个数量级,通过动态卸载将响应时间提高2倍,并在实际资源范围内为计算密集型工作流保持近乎理想的强大扩展。这些结果表明,高性能计算驱动的方法可以推动数字孪生超越当前的限制,使它们更智能、更快、更有能力处理现实世界的复杂性。
{"title":"HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin","authors":"E. Iraola ,&nbsp;M. García-Lorenzo ,&nbsp;F. Lordan-Gomis ,&nbsp;F. Rossi ,&nbsp;E. Prieto-Araujo ,&nbsp;R.M. Badia","doi":"10.1016/j.future.2025.108333","DOIUrl":"10.1016/j.future.2025.108333","url":null,"abstract":"<div><div>Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.</div><div>To address this problem, we propose the <em>High-Precision High-Performance Computer-enabled Digital Twin</em> (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.</div><div>Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108333"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital twins in public bus transport: A systematic literature review of architectures, intelligence, and interaction 公共巴士交通中的数字孪生:架构、智能和交互的系统文献综述
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-07-01 Epub Date: 2026-01-19 DOI: 10.1016/j.future.2026.108379
Manuel Andruccioli, Giovanni Delnevo, Roberto Girau, Paola Salomoni
The adoption of Digital Twin (DT) technologies in public transport systems, particularly bus networks, is gaining momentum as cities seek smarter, more responsive, and efficient mobility solutions. Enabled by advances in IoT, AI, and Big Data Analytics, DTs offer real-time monitoring, simulation, and optimization of transit operations. However, despite their potential, the application of DTs in bus-based public transport remains relatively underexplored and fragmented across the literature. This study presents a Systematic Literature Review (SLR) aimed at synthesizing current research on DT technologies in this domain. Specifically, it investigates architectural models, technological frameworks, and platform designs; examines how AI and machine learning models are integrated to support operational tasks; and analyzes the role of Human-Computer Interaction (HCI) in the design and usability of such systems. By identifying key trends, challenges, and research gaps, this work provides a structured overview of the current landscape. Furthermore, it outlines directions for future research in DT-enabled public transportation systems.
随着城市寻求更智能、反应更快、更高效的交通解决方案,数字孪生(DT)技术在公共交通系统(特别是公交网络)中的应用势头日益强劲。在物联网、人工智能和大数据分析技术的推动下,DTs可以对交通运营进行实时监控、模拟和优化。然而,尽管其潜力巨大,在以公交为基础的公共交通中的应用在整个文献中仍然相对未被充分探索和分散。本文通过系统文献综述(SLR),对该领域的DT技术研究现状进行了综述。具体来说,它研究了体系结构模型、技术框架和平台设计;研究人工智能和机器学习模型如何集成以支持操作任务;并分析了人机交互(HCI)在这些系统的设计和可用性中的作用。通过确定关键趋势、挑战和研究差距,本工作提供了当前景观的结构化概述。此外,它还概述了未来基于dt的公共交通系统的研究方向。
{"title":"Digital twins in public bus transport: A systematic literature review of architectures, intelligence, and interaction","authors":"Manuel Andruccioli,&nbsp;Giovanni Delnevo,&nbsp;Roberto Girau,&nbsp;Paola Salomoni","doi":"10.1016/j.future.2026.108379","DOIUrl":"10.1016/j.future.2026.108379","url":null,"abstract":"<div><div>The adoption of Digital Twin (DT) technologies in public transport systems, particularly bus networks, is gaining momentum as cities seek smarter, more responsive, and efficient mobility solutions. Enabled by advances in IoT, AI, and Big Data Analytics, DTs offer real-time monitoring, simulation, and optimization of transit operations. However, despite their potential, the application of DTs in bus-based public transport remains relatively underexplored and fragmented across the literature. This study presents a Systematic Literature Review (SLR) aimed at synthesizing current research on DT technologies in this domain. Specifically, it investigates architectural models, technological frameworks, and platform designs; examines how AI and machine learning models are integrated to support operational tasks; and analyzes the role of Human-Computer Interaction (HCI) in the design and usability of such systems. By identifying key trends, challenges, and research gaps, this work provides a structured overview of the current landscape. Furthermore, it outlines directions for future research in DT-enabled public transportation systems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108379"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146000904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1