首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems 动态无人机辅助MEC系统的在线三维轨迹与资源优化
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108389
Zhao Tong , Shiyan Zhang , Jing Mei , Can Wang , Keqin Li
The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an Online decision-making algorithm for Dynamic environments based on Exploration-enhanced Greedy DDPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.
无人机(uav)与移动边缘计算(MEC)技术的融合发展,为用户提供更加灵活、可靠、优质的计算服务。然而,大多数无人机辅助MEC模型设计主要集中在静态环境,这并不适用于本工作中考虑的实际场景。在本文中,我们考虑了一个无人机辅助的MEC平台,该平台可以为随机移动和任务到达的多个移动地面用户提供连续服务。此外,我们研究了无人机辅助MEC系统的长期系统效用最大化问题,考虑了任务的持续卸载、用户的移动性、无人机的三维轨迹控制和资源分配。为解决系统信息有限、高维连续动作和状态空间逼近等问题,提出了一种基于探索增强型贪婪DDPG (ODEGD)的动态环境在线决策算法。此外,为了更准确地评估算法的性能,我们在实验中引入了现实世界的道路。实验结果表明,与其他算法相比,该算法的响应延迟降低了26.98%,能耗降低了22.61%,同时实现了最高的系统效用。这些结果验证了ODEGD算法在动态条件下的适用性,表明其具有良好的鲁棒性和可扩展性。
{"title":"Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems","authors":"Zhao Tong ,&nbsp;Shiyan Zhang ,&nbsp;Jing Mei ,&nbsp;Can Wang ,&nbsp;Keqin Li","doi":"10.1016/j.future.2026.108389","DOIUrl":"10.1016/j.future.2026.108389","url":null,"abstract":"<div><div>The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an <u>O</u>nline decision-making algorithm for <u>D</u>ynamic environments based on <u>E</u>xploration-enhanced <u>G</u>reedy <u>D</u>DPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108389"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs 基于任务的多加速器api异构系统数据流方法
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108383
Aleix Boné , Alejandro Aguirre , David Álvarez , Pedro J. Martinez-Ferrer , Vicenç Beltran
Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.
Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.
The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.
These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.
结合多核cpu和多种加速器的异构节点正在迅速成为高性能计算(HPC)和人工智能基础设施的标准。然而,利用这些平台需要协调几个低级加速器api,如CUDA、SYCL和Triton。在某些情况下,它们可以与优化的供应商数学库结合使用:例如,cuBLAS和oneAPI。每个API或库都引入了自己的抽象、执行语义和同步机制。因此,在单个应用程序中组合它们很容易出错,而且需要大量的劳动。我们建议重用基于任务的数据流方法以及任务感知API (TA-libs)来克服这些限制,并促进多个加速器编程模型的无缝集成,同时仍然利用每个API提供的同类最佳内核。
{"title":"A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs","authors":"Aleix Boné ,&nbsp;Alejandro Aguirre ,&nbsp;David Álvarez ,&nbsp;Pedro J. Martinez-Ferrer ,&nbsp;Vicenç Beltran","doi":"10.1016/j.future.2026.108383","DOIUrl":"10.1016/j.future.2026.108383","url":null,"abstract":"<div><div>Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.</div><div>Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.</div><div>The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.</div><div>These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108383"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoFormer: A centrality-aware multi-task graph transformer with multi-gate mixture-of-experts for link-level network performance modeling MoFormer:用于链路级网络性能建模的具有多门混合专家的中心性感知多任务图转换器
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108406
Hanlin Liu , Aliya Bao , Mingyue Li , Yintan Ai , Hua Li
Link-level network performance modeling (NPM) facilitates efficient traffic control, precise fault localization, and reliable resource management in emerging network paradigms such as Software-Defined Networking and Intent-Based Networking. A variety of models, such as Long Short-Term Memory and Graph Neural Networks (GNNs), are utilized to enhance the effectiveness of NPM. However, a practical NPM requires the generalization ability to adapt to diverse network topologies and prediction tasks without retraining. To meet this requirement, graph Transformer models are a breakthrough by encoding nodes and their structural features into tokens, breaking free from the dependencies on fixed graph structures typical of traditional GNNs. Nevertheless, they mostly focus on node-centric representations, which are insufficient to capture the fine-grained interactions and dependencies between links, thus limiting their applicability in link-level NPM. In this paper, we propose a centrality-aware multi-task graph Transformer with multi-gate mixture-of-experts (MMoE), named MoFormer, for link-level NPM. Specifically, a link-centric tokenized graph representation method is proposed to transform each link and its neighborhood information into a sequence of tokens guided by the routing protocol. A routing-aware betweenness centrality encoding mechanism is further developed to enhance the ability to characterize the tokens considering the relative importance of each link. MoFormer takes advantage of MMoE combined with Transformer to enable joint learning of multiple prediction tasks. Experimental results on both simulated and real-world datasets demonstrate the significant improvements of MoFormer over existing state-of-the-art baselines while maintaining superior generalization ability.
在软件定义网络和基于意图的网络等新兴网络模式中,链路级网络性能建模(NPM)有助于实现高效的流量控制、精确的故障定位和可靠的资源管理。利用长短期记忆和图神经网络(gnn)等多种模型来提高NPM的有效性。然而,一个实际的NPM需要泛化能力来适应不同的网络拓扑和预测任务,而不需要再训练。为了满足这一需求,图转换模型是一种突破,它将节点及其结构特征编码为token,摆脱了传统gnn对固定图结构的依赖。然而,它们主要关注以节点为中心的表示,这不足以捕捉链接之间的细粒度交互和依赖关系,从而限制了它们在链接级NPM中的适用性。在本文中,我们提出了一个具有多门混合专家(MMoE)的中心感知多任务图转换器,称为MoFormer,用于链路级NPM。具体而言,提出了一种以链路为中心的标记化图表示方法,将每个链路及其邻域信息转换为路由协议引导下的令牌序列。考虑到每个链路的相对重要性,进一步开发了路由感知的中间中心性编码机制,以增强表征令牌的能力。MoFormer利用MMoE与Transformer相结合的优势,实现对多个预测任务的联合学习。在模拟和真实数据集上的实验结果表明,MoFormer在保持优越泛化能力的同时,比现有最先进的基线有了显著的改进。
{"title":"MoFormer: A centrality-aware multi-task graph transformer with multi-gate mixture-of-experts for link-level network performance modeling","authors":"Hanlin Liu ,&nbsp;Aliya Bao ,&nbsp;Mingyue Li ,&nbsp;Yintan Ai ,&nbsp;Hua Li","doi":"10.1016/j.future.2026.108406","DOIUrl":"10.1016/j.future.2026.108406","url":null,"abstract":"<div><div>Link-level network performance modeling (NPM) facilitates efficient traffic control, precise fault localization, and reliable resource management in emerging network paradigms such as Software-Defined Networking and Intent-Based Networking. A variety of models, such as Long Short-Term Memory and Graph Neural Networks (GNNs), are utilized to enhance the effectiveness of NPM. However, a practical NPM requires the generalization ability to adapt to diverse network topologies and prediction tasks without retraining. To meet this requirement, graph Transformer models are a breakthrough by encoding nodes and their structural features into tokens, breaking free from the dependencies on fixed graph structures typical of traditional GNNs. Nevertheless, they mostly focus on node-centric representations, which are insufficient to capture the fine-grained interactions and dependencies between links, thus limiting their applicability in link-level NPM. In this paper, we propose a centrality-aware multi-task graph Transformer with multi-gate mixture-of-experts (MMoE), named MoFormer, for link-level NPM. Specifically, a link-centric tokenized graph representation method is proposed to transform each link and its neighborhood information into a sequence of tokens guided by the routing protocol. A routing-aware betweenness centrality encoding mechanism is further developed to enhance the ability to characterize the tokens considering the relative importance of each link. MoFormer takes advantage of MMoE combined with Transformer to enable joint learning of multiple prediction tasks. Experimental results on both simulated and real-world datasets demonstrate the significant improvements of MoFormer over existing state-of-the-art baselines while maintaining superior generalization ability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108406"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads 动态工作负载下共定位延迟关键型JVM应用程序和批处理作业的自适应CPU共享
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-23 DOI: 10.1016/j.future.2026.108387
Dishi Xu , Fagui Liu , Bin Wang , Xuhao Tang , Qingbo Wu
Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.
在Java虚拟机(jlra)上运行的延迟关键型(LC)长时间运行的应用程序通常依赖大量的CPU过度配置来满足动态工作负载下的服务水平目标(Service-Level Objectives, slo),从而导致严重的资源利用率不足。此外,jlra表现出较差的冷启动性能,频繁删除和创建应用程序实例以调整资源分配会导致性能下降。此外,由于共享CPU资源的争用,通过与jlra一起部署尽力而为(BE)批处理作业来获取冗余资源会遇到严重的挑战。因此,我们提出了ChaosRM,一个用于JVM工作负载托管的双层资源管理框架,以提高资源利用效率,同时消除资源争用。与在非重叠CPU集中隔离jlra和批处理作业的传统方法不同,ChaosRM提出了一种三区CPU隔离机制,利用两个CPU区域隔离jlra和批处理作业,并利用一个共享区域并发执行它们的线程。应用程序范围的、基于学习的应用程序管理器根据全局工作负载调整jlra的实例状态,并自适应地学习共享区域分配策略和线程排队时间表示的性能目标;每个服务器上的节点管理器启发式地将CPU集绑定到JLRA,并根据此性能目标和JLRA实例状态在CPU区域之间动态调度批处理作业。实验结果表明,在保证jlra的SLOs的同时,ChaosRM将批处理作业的完成时间比最佳基准减少了14.10%,比所有基准减少了54.29%。
{"title":"Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads","authors":"Dishi Xu ,&nbsp;Fagui Liu ,&nbsp;Bin Wang ,&nbsp;Xuhao Tang ,&nbsp;Qingbo Wu","doi":"10.1016/j.future.2026.108387","DOIUrl":"10.1016/j.future.2026.108387","url":null,"abstract":"<div><div>Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108387"},"PeriodicalIF":6.2,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High performance graph-parallel accelerator design 高性能图形并行加速器设计
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-23 DOI: 10.1016/j.future.2026.108385
Cemil Kaan Akyol , Muhammet Mustafa Ozdal , Ozcan Ozturk
Graph applications are becoming increasingly important with their widespread usage and the amounts of data they deal with. Biological and social web graphs are well-known examples that show the importance of efficiently processing graph analytic applications and problems. Due to limited resources, efficiency and performance are much more critical in embedded systems. We propose an efficient source-to-source-based methodology for graph applications that gives the freedom of not knowing the low-level details of parallelization and distribution by translating any vertex-centric C++ graph application into a pipelined SystemC model. High-Level Synthesis (HLS) tools can synthesize the generated SystemC model to obtain the design of the hardware. To support different types of graph applications, we have implemented features like non-standard application support, active set functionality, asynchronous execution support, conditional pipeline support, non-neighbor data access support, multiple pipeline support, and user-defined data type functionality. Our accelerator development flow can generate better-performing accelerators than OpenCL. Furthermore, it dramatically reduces the design time compared to using HLS tools. Therefore, the proposed methodology can generate fast accelerators with minimal effort using a high-level language description from the user.
由于图形应用程序的广泛使用和处理的数据量,它们正变得越来越重要。生物和社会网络图是众所周知的例子,显示了有效处理图分析应用程序和问题的重要性。由于资源有限,在嵌入式系统中效率和性能更为重要。我们为图形应用程序提出了一种高效的基于源到源的方法,通过将任何以顶点为中心的c++图形应用程序转换为流水线的SystemC模型,可以自由地不知道并行化和分布的底层细节。高级综合(High-Level Synthesis, HLS)工具可以综合生成的SystemC模型,从而得到硬件的设计。为了支持不同类型的图形应用程序,我们实现了一些特性,如非标准应用程序支持、活动集功能、异步执行支持、条件管道支持、非邻居数据访问支持、多管道支持和用户定义的数据类型功能。我们的加速器开发流程可以生成比OpenCL性能更好的加速器。此外,与使用HLS工具相比,它大大缩短了设计时间。因此,所提出的方法可以使用来自用户的高级语言描述以最小的努力生成快速加速器。
{"title":"High performance graph-parallel accelerator design","authors":"Cemil Kaan Akyol ,&nbsp;Muhammet Mustafa Ozdal ,&nbsp;Ozcan Ozturk","doi":"10.1016/j.future.2026.108385","DOIUrl":"10.1016/j.future.2026.108385","url":null,"abstract":"<div><div>Graph applications are becoming increasingly important with their widespread usage and the amounts of data they deal with. Biological and social web graphs are well-known examples that show the importance of efficiently processing graph analytic applications and problems. Due to limited resources, efficiency and performance are much more critical in embedded systems. We propose an efficient source-to-source-based methodology for graph applications that gives the freedom of not knowing the low-level details of parallelization and distribution by translating any vertex-centric C++ graph application into a pipelined SystemC model. High-Level Synthesis (HLS) tools can synthesize the generated SystemC model to obtain the design of the hardware. To support different types of graph applications, we have implemented features like non-standard application support, active set functionality, asynchronous execution support, conditional pipeline support, non-neighbor data access support, multiple pipeline support, and user-defined data type functionality. Our accelerator development flow can generate better-performing accelerators than OpenCL. Furthermore, it dramatically reduces the design time compared to using HLS tools. Therefore, the proposed methodology can generate fast accelerators with minimal effort using a high-level language description from the user.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108385"},"PeriodicalIF":6.2,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vertical auto-scaling mechanism for elastic memory management of containerized applications in Kubernetes Kubernetes中容器化应用弹性内存管理的垂直自动扩展机制
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-22 DOI: 10.1016/j.future.2026.108407
Taeshin Kang, Minwoo Kang, Heonchang Yu
Cloud service providers typically offer containers with fixed resource sizes. However, cloud users often overprovision container resources to prevent service interruptions caused by resource shortages. This practice leads to low utilization of system resources in the cloud. To address this issue, cloud service providers offer container auto-scaling. They primarily support horizontal auto-scaling, which provides horizontal elasticity. However, this approach has limitations in responding promptly to unexpected spikes in resource usage and in optimizing resource utilization. Vertical auto-scaling can help overcome these limitations. Its importance is increasing, particularly for stateful and real-time applications that require immediate resource elasticity. Nevertheless, vertical elasticity remains difficult to achieve and has not been actively researched or widely implemented. This study proposes a vertical auto-scaling mechanism for elastic memory management in container-based applications running in Kubernetes, which is widely recognized as the standard platform for container orchestration. In the proposed approach, high-priority tasks are given priority for scaling up, while tasks that cannot undergo scale-up are suspended using the cgroup freeze feature to prevent further memory allocation. If memory pressure persists and task termination becomes unavoidable, tasks are terminated in ascending order of priority, starting with the lowest. Once memory pressure is relieved, stateful applications are restarted from the point at which they were suspended. Compared to the default Kubernetes environment without vertical elasticity, EVMMv2 reduced the total execution time of stateful applications by up to 40% and improved the request success rate of stateless applications by 37%.
云服务提供商通常提供固定资源大小的容器。但是,云用户经常过度配置容器资源,以防止资源短缺导致的服务中断。这种做法导致云中系统资源的利用率很低。为了解决这个问题,云服务提供商提供了容器自动伸缩功能。它们主要支持水平自动伸缩,从而提供水平弹性。然而,这种方法在迅速响应资源使用的意外高峰和优化资源利用方面存在局限性。垂直自动伸缩可以帮助克服这些限制。它的重要性正在增加,特别是对于需要即时资源弹性的有状态和实时应用程序。然而,垂直弹性仍然很难实现,没有积极研究或广泛实施。本研究提出了一种垂直自动伸缩机制,用于在Kubernetes上运行的基于容器的应用程序中的弹性内存管理,Kubernetes被广泛认为是容器编排的标准平台。在建议的方法中,高优先级的任务被赋予扩展的优先级,而不能进行扩展的任务则使用cgroup冻结特性挂起,以防止进一步的内存分配。如果内存压力持续存在并且任务终止不可避免,则按优先级升序终止任务,从最低优先级开始。一旦内存压力得到缓解,有状态应用程序将从挂起它们的位置重新启动。与没有垂直弹性的默认Kubernetes环境相比,EVMMv2将有状态应用程序的总执行时间减少了40%,并将无状态应用程序的请求成功率提高了37%。
{"title":"Vertical auto-scaling mechanism for elastic memory management of containerized applications in Kubernetes","authors":"Taeshin Kang,&nbsp;Minwoo Kang,&nbsp;Heonchang Yu","doi":"10.1016/j.future.2026.108407","DOIUrl":"10.1016/j.future.2026.108407","url":null,"abstract":"<div><div>Cloud service providers typically offer containers with fixed resource sizes. However, cloud users often overprovision container resources to prevent service interruptions caused by resource shortages. This practice leads to low utilization of system resources in the cloud. To address this issue, cloud service providers offer container auto-scaling. They primarily support horizontal auto-scaling, which provides horizontal elasticity. However, this approach has limitations in responding promptly to unexpected spikes in resource usage and in optimizing resource utilization. Vertical auto-scaling can help overcome these limitations. Its importance is increasing, particularly for stateful and real-time applications that require immediate resource elasticity. Nevertheless, vertical elasticity remains difficult to achieve and has not been actively researched or widely implemented. This study proposes a vertical auto-scaling mechanism for elastic memory management in container-based applications running in Kubernetes, which is widely recognized as the standard platform for container orchestration. In the proposed approach, high-priority tasks are given priority for scaling up, while tasks that cannot undergo scale-up are suspended using the <em>cgroup freeze</em> feature to prevent further memory allocation. If memory pressure persists and task termination becomes unavoidable, tasks are terminated in ascending order of priority, starting with the lowest. Once memory pressure is relieved, stateful applications are restarted from the point at which they were suspended. Compared to the default Kubernetes environment without vertical elasticity, EVMMv2 reduced the total execution time of stateful applications by up to 40% and improved the request success rate of stateless applications by 37%.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108407"},"PeriodicalIF":6.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoWD: Automated rogue workload detector for HPC security RoWD:用于高性能计算安全的自动流氓工作负载检测器
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-22 DOI: 10.1016/j.future.2026.108392
Francesco Antici , Jens Domke , Andrea Bartolini , Zeynep Kiziltan , Satoshi Matsuoka
The increasing reliance on High-Performance Computing (HPC) systems to execute complex scientific and industrial workloads raises significant security concerns related to the misuse of HPC resources for unauthorized or malicious activities. Rogue job executions can threaten the integrity, confidentiality, and availability of HPC infrastructures. Given the scale and heterogeneity of HPC job submissions, manual or ad hoc monitoring is inadequate to effectively detect such misuse. Therefore, automated solutions capable of systematically analyzing job submissions are essential to detect rogue workloads. To address this challenge, we present RoWD (Rogue Workload Detector), the first framework for automated and systematic security screening of the HPC job-submission pipeline. RoWD is composed of modular plug-ins that classify different types of workloads and enable the detection of rogue jobs through the analysis of job scripts and associated metadata. We deploy RoWD on the Supercomputer Fugaku to classify AI workloads and release SCRIPT-AI, the first dataset of annotated job scripts labeled with workload characteristics. We evaluate RoWD on approximately 50K previously unseen jobs executed on Fugaku between 2021 and 2025. Our results show that RoWD accurately classifies AI jobs (achieving an F1 score of 95%), is robust against adversarial behavior, and incurs low runtime overhead, making it suitable for strengthening the security of HPC environments and for real-time deployment in production systems.
越来越多地依赖高性能计算(HPC)系统来执行复杂的科学和工业工作负载,引发了与滥用HPC资源进行未经授权或恶意活动相关的重大安全问题。恶意作业执行会威胁到HPC基础架构的完整性、机密性和可用性。鉴于高性能计算作业提交的规模和异质性,手工或特别监测不足以有效地检测此类滥用。因此,能够系统地分析作业提交的自动化解决方案对于检测非法工作负载至关重要。为了应对这一挑战,我们提出了RoWD(流氓工作负载检测器),这是第一个对高性能计算作业提交管道进行自动化和系统安全筛选的框架。RoWD由模块化插件组成,这些插件对不同类型的工作负载进行分类,并通过分析作业脚本和相关元数据来检测流氓作业。我们在超级计算机Fugaku上部署了RoWD,对人工智能工作负载进行分类,并发布了SCRIPT-AI,这是第一个标有工作负载特征的注释作业脚本数据集。我们评估了2021年至2025年间在Fugaku执行的约5万个以前未见过的作业的RoWD。我们的研究结果表明,RoWD可以准确地对人工智能作业进行分类(达到95%的F1分数),对对抗行为具有鲁棒性,并且运行时开销低,适用于增强高性能计算环境的安全性和在生产系统中的实时部署。
{"title":"RoWD: Automated rogue workload detector for HPC security","authors":"Francesco Antici ,&nbsp;Jens Domke ,&nbsp;Andrea Bartolini ,&nbsp;Zeynep Kiziltan ,&nbsp;Satoshi Matsuoka","doi":"10.1016/j.future.2026.108392","DOIUrl":"10.1016/j.future.2026.108392","url":null,"abstract":"<div><div>The increasing reliance on High-Performance Computing (HPC) systems to execute complex scientific and industrial workloads raises significant security concerns related to the misuse of HPC resources for unauthorized or malicious activities. Rogue job executions can threaten the integrity, confidentiality, and availability of HPC infrastructures. Given the scale and heterogeneity of HPC job submissions, manual or ad hoc monitoring is inadequate to effectively detect such misuse. Therefore, automated solutions capable of systematically analyzing job submissions are essential to detect rogue workloads. To address this challenge, we present RoWD (Rogue Workload Detector), the first framework for automated and systematic security screening of the HPC job-submission pipeline. RoWD is composed of modular plug-ins that classify different types of workloads and enable the detection of rogue jobs through the analysis of job scripts and associated metadata. We deploy RoWD on the Supercomputer Fugaku to classify AI workloads and release SCRIPT-AI, the first dataset of annotated job scripts labeled with workload characteristics. We evaluate RoWD on approximately 50K previously unseen jobs executed on Fugaku between 2021 and 2025. Our results show that RoWD accurately classifies AI jobs (achieving an F1 score of 95%), is robust against adversarial behavior, and incurs low runtime overhead, making it suitable for strengthening the security of HPC environments and for real-time deployment in production systems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108392"},"PeriodicalIF":6.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantum-resistant blockchain architecture for secure vehicular networks: A ML-KEM-enabled approach with PoA and PoP consensus 安全车联网的抗量子区块链架构:基于PoA和PoP共识的ml - kemm支持方法
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-22 DOI: 10.1016/j.future.2026.108391
Muhammad Asim , Wu Junsheng , Li Weigang , Lin Zhijun , Zhang Peng , He Hao , Wei Dong , Ghulam Mohi-ud-Din
The increasing interconnectivity within modern transportation ecosystems, a cornerstone of Intelligent Transportation Systems (ITS), creates critical vulnerabilities, demanding stronger security measures to prevent unauthorized access to vehicles and private data. Existing blockchain implementations for Vehicular Ad Hoc Networks (VANETs) are fundamentally flawed, exhibiting inefficiency with traditional consensus mechanisms, vulnerability to quantum attacks, or often both. To overcome these critical limitations, this study introduces a novel Quantum-Resistant Blockchain Architecture. The core objectives are to achieve highly efficient vehicular data storage, ensure robust confidentiality through post-quantum cryptography, and automate secure transactions. The proposed methodology employs a dual-blockchain structure: a Registration Blockchain (RBC) using Proof-of-Authority (PoA) for secure identity management, and a Message Blockchain (MBC) using Proof-of-Position (PoP) for low-latency message dissemination. A key innovation is the integration of smart contracts with the NIST-approved Module Lattice-Based Key Encapsulation Mechanism (ML-KEM) to automate and secure all processes. The framework is rigorously evaluated using a realistic 5G-VANET Multi-access Edge Computing(MEC) dataset, which includes key parameters like vehicle ID, speed, and location. The results are compelling, demonstrating an Average Block Processing Time of 0.0326 s and a Transactional Throughput of 30.64 TPS, significantly outperforming RSA and AES benchmarks. This research’s primary contribution is a comprehensive framework that substantially improves data security and scalability while future-proofing VANETs against the imminent and evolving threat of quantum computing.
作为智能交通系统(ITS)的基石,现代交通生态系统的互联性日益增强,也带来了严重的漏洞,需要更强有力的安全措施来防止对车辆和私人数据的未经授权访问。现有的车载自组织网络(vanet)的区块链实现从根本上存在缺陷,与传统的共识机制相比效率低下,容易受到量子攻击,或者两者兼有。为了克服这些关键的限制,本研究引入了一种新的抗量子区块链架构。核心目标是实现高效的车载数据存储,通过后量子加密确保强大的机密性,并自动化安全交易。所提出的方法采用双区块链结构:使用权威证明(PoA)进行安全身份管理的注册区块链(RBC)和使用位置证明(PoP)进行低延迟消息传播的消息区块链(MBC)。一个关键的创新是将智能合约与nist批准的基于模块格的密钥封装机制(ML-KEM)集成在一起,以实现所有流程的自动化和安全。该框架使用现实的5G-VANET多接入边缘计算(MEC)数据集进行严格评估,其中包括车辆ID、速度和位置等关键参数。结果令人信服,平均块处理时间为0.0326秒,事务吞吐量为30.64 TPS,显著优于RSA和AES基准。这项研究的主要贡献是一个全面的框架,大大提高了数据的安全性和可扩展性,同时使vanet能够抵御量子计算迫在眉睫和不断发展的威胁。
{"title":"Quantum-resistant blockchain architecture for secure vehicular networks: A ML-KEM-enabled approach with PoA and PoP consensus","authors":"Muhammad Asim ,&nbsp;Wu Junsheng ,&nbsp;Li Weigang ,&nbsp;Lin Zhijun ,&nbsp;Zhang Peng ,&nbsp;He Hao ,&nbsp;Wei Dong ,&nbsp;Ghulam Mohi-ud-Din","doi":"10.1016/j.future.2026.108391","DOIUrl":"10.1016/j.future.2026.108391","url":null,"abstract":"<div><div>The increasing interconnectivity within modern transportation ecosystems, a cornerstone of Intelligent Transportation Systems (ITS), creates critical vulnerabilities, demanding stronger security measures to prevent unauthorized access to vehicles and private data. Existing blockchain implementations for Vehicular Ad Hoc Networks (VANETs) are fundamentally flawed, exhibiting inefficiency with traditional consensus mechanisms, vulnerability to quantum attacks, or often both. To overcome these critical limitations, this study introduces a novel Quantum-Resistant Blockchain Architecture. The core objectives are to achieve highly efficient vehicular data storage, ensure robust confidentiality through post-quantum cryptography, and automate secure transactions. The proposed methodology employs a dual-blockchain structure: a Registration Blockchain (RBC) using Proof-of-Authority (PoA) for secure identity management, and a Message Blockchain (MBC) using Proof-of-Position (PoP) for low-latency message dissemination. A key innovation is the integration of smart contracts with the NIST-approved Module Lattice-Based Key Encapsulation Mechanism (ML-KEM) to automate and secure all processes. The framework is rigorously evaluated using a realistic 5G-VANET Multi-access Edge Computing(MEC) dataset, which includes key parameters like vehicle ID, speed, and location. The results are compelling, demonstrating an Average Block Processing Time of 0.0326 s and a Transactional Throughput of 30.64 TPS, significantly outperforming RSA and AES benchmarks. This research’s primary contribution is a comprehensive framework that substantially improves data security and scalability while future-proofing VANETs against the imminent and evolving threat of quantum computing.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108391"},"PeriodicalIF":6.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A message-driven system for processing highly skewed graphs 一种处理高倾斜图的消息驱动系统
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-22 DOI: 10.1016/j.future.2026.108394
Bibrak Qamar Chandio, Maciej Brodowicz, Thomas Sterling
The paper provides a unified co-design of: 1) a non-Von Neumann architecture for fine-grain irregular memory computations, 2) a programming and execution model that allows spawning tasks from within the graph vertex data at runtime, 3) language constructs for actions that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 4) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex.
Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distributions. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of actions, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements.
本文提供了一个统一的协同设计:1)用于细粒度不规则内存计算的非冯·诺伊曼架构,2)允许在运行时从图顶点数据中生成任务的编程和执行模型,3)用于将工作发送到数据所在位置的操作的语言结构,结合本地控制对象(LCOs)的并行表达性来实现异步图处理原语,4)和创新的以顶点为中心的数据结构,使用根状茎的概念,这使得顶点对象的出度和入度负载在多个内核上并行化,并为顶点对象提供了一个单一的编程抽象。该数据结构分层并行化顶点的外度负载和内度负载。根茎内部通信并保持一致,使用事件驱动的同步机制,以提供统一和正确的顶点视图。模拟实验结果显示,对于包含高度偏斜度分布的测试输入图数据集,在大芯片尺寸上,BFS、SSSP和Page Rank的性能有所提高。这些改进来自于以动作的形式表达和创建细粒度动态计算任务的能力、帮助编译器生成运行时系统用于优化调度任务的代码的语言构造,以及在内存处理元素之间共享度内和度外计算工作负载的数据结构。
{"title":"A message-driven system for processing highly skewed graphs","authors":"Bibrak Qamar Chandio,&nbsp;Maciej Brodowicz,&nbsp;Thomas Sterling","doi":"10.1016/j.future.2026.108394","DOIUrl":"10.1016/j.future.2026.108394","url":null,"abstract":"<div><div>The paper provides a unified co-design of: 1) a non-Von Neumann architecture for fine-grain irregular memory computations, 2) a programming and execution model that allows spawning tasks from within the graph vertex data at runtime, 3) language constructs for <em>actions</em> that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 4) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex.</div><div>Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distributions. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of <em>actions</em>, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108394"},"PeriodicalIF":6.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability analysis of hardware accelerators for decision tree-based classifier systems 基于决策树的分类器系统硬件加速器可靠性分析
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-20 DOI: 10.1016/j.future.2026.108378
Mario Barbareschi , Salvatore Barone , Alberto Bosio , Antonio Emmanuele
The increasing adoption of AI models has driven applications toward the use of hardware accelerators to meet high computational demands and strict performance requirements. Beyond consideration of performance and energy efficiency, explainability and reliability have emerged as pivotal requirements, particularly for critical applications such as automotive, medical, and aerospace systems. Among the various AI models, Decision Tree Ensembles (DTEs) are particularly notable for their high accuracy and explainability. Moreover, they are particularly well-suited for hardware implementations, enabling high-performance and improved energy efficiency. However, a frequently overlooked aspect of DTEs is their reliability in the presence of hardware malfunctions. While DTEs are generally regarded as robust by design, due to their redundancy and voting mechanisms, hardware faults can still have catastrophic consequences. To address this gap, we present an in-depth reliability analysis of two types of DTE hardware accelerators: classical and approximate implementations. Specifically, we conduct a comprehensive fault injection campaign, varying the number of trees involved in the classification task, the approximation technique used, and the tolerated accuracy loss, while evaluating several benchmark datasets. The results of this study demonstrate that approximation techniques have to be carefully designed, as they can significantly impact resilience. However, techniques that target the representation of features and thresholds appear to be better suited for fault tolerance.
人工智能模型的日益普及推动了应用程序对硬件加速器的使用,以满足高计算需求和严格的性能要求。除了考虑性能和能源效率之外,可解释性和可靠性已成为关键要求,特别是对于汽车、医疗和航空航天系统等关键应用。在各种人工智能模型中,决策树集成(dte)以其高精度和可解释性而特别引人注目。此外,它们特别适合硬件实现,从而实现高性能和改进的能源效率。然而,dte的一个经常被忽视的方面是它们在出现硬件故障时的可靠性。虽然dte在设计上通常被认为是健壮的,但由于它们的冗余和投票机制,硬件故障仍然可能产生灾难性的后果。为了解决这一差距,我们对两种类型的DTE硬件加速器进行了深入的可靠性分析:经典实现和近似实现。具体来说,我们进行了全面的故障注入活动,改变了分类任务中涉及的树的数量、使用的近似技术和可容忍的精度损失,同时评估了几个基准数据集。本研究的结果表明,近似技术必须仔细设计,因为它们可以显著影响弹性。然而,以特征和阈值表示为目标的技术似乎更适合于容错。
{"title":"Reliability analysis of hardware accelerators for decision tree-based classifier systems","authors":"Mario Barbareschi ,&nbsp;Salvatore Barone ,&nbsp;Alberto Bosio ,&nbsp;Antonio Emmanuele","doi":"10.1016/j.future.2026.108378","DOIUrl":"10.1016/j.future.2026.108378","url":null,"abstract":"<div><div>The increasing adoption of AI models has driven applications toward the use of hardware accelerators to meet high computational demands and strict performance requirements. Beyond consideration of performance and energy efficiency, explainability and reliability have emerged as pivotal requirements, particularly for critical applications such as automotive, medical, and aerospace systems. Among the various AI models, Decision Tree Ensembles (DTEs) are particularly notable for their high accuracy and explainability. Moreover, they are particularly well-suited for hardware implementations, enabling high-performance and improved energy efficiency. However, a frequently overlooked aspect of DTEs is their reliability in the presence of hardware malfunctions. While DTEs are generally regarded as robust by design, due to their redundancy and voting mechanisms, hardware faults can still have catastrophic consequences. To address this gap, we present an in-depth reliability analysis of two types of DTE hardware accelerators: classical and approximate implementations. Specifically, we conduct a comprehensive fault injection campaign, varying the number of trees involved in the classification task, the approximation technique used, and the tolerated accuracy loss, while evaluating several benchmark datasets. The results of this study demonstrate that approximation techniques have to be carefully designed, as they can significantly impact resilience. However, techniques that target the representation of features and thresholds appear to be better suited for fault tolerance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108378"},"PeriodicalIF":6.2,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1