首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
AWTO: A latency-optimized task offloading scheme for LLM-driven agentic workflows on heterogeneous edge 异构边缘上llm驱动的代理工作流的延迟优化任务卸载方案
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-02-02 DOI: 10.1016/j.future.2026.108415
Peng Yu , Bo Liu , Shaomin Tang , Dongdong Li , Weiwei Lin
Agentic workflows, driven by Large Language Models (LLMs), present new opportunities for realizing advanced edge intelligence in data-sensitive domains such as finance and healthcare. However, deploying these workflows in private, resource-constrained edge environments poses unique challenges. Unlike public cloud services, these scenarios require computations to be performed locally on dedicated edge clusters to meet strict data compliance and privacy regulations. This restriction, coupled with the limited memory capacity of edge devices relative to the massive size of LLMs, makes dynamic memory management and model loading critical factors. Furthermore, the autoregressive nature of LLMs introduces high dynamic uncertainty in inference latency and memory footprint, fundamentally contradicting the static information assumptions of traditional scheduling methods. To address these challenges, we propose AWTO, a Deep Reinforcement Learning (DRL) offloading scheme designed to minimize the makespan of agentic workflows in isolated edge environments. The core of AWTO is a task-by-task dynamic decision-making mechanism that explicitly handles on-demand model loading and memory contention. We formulate this problem as a Markov Decision Process (MDP) and employ a Proximal Policy Optimization (PPO)-based algorithm. A novel three-module LSTM encoder is designed to capture task dependencies, device heterogeneity, and real-time memory states. Experimental results in heterogeneous environments demonstrate that AWTO reduces the average makespan by 16.99% to 36.36% compared to heuristic baselines. Furthermore, it achieves a 14.00% gain over DRL methods, validating its adaptability to dynamic memory constraints and cache-aware scheduling.
由大型语言模型(llm)驱动的代理工作流为在金融和医疗保健等数据敏感领域实现高级边缘智能提供了新的机会。然而,在私有的、资源受限的边缘环境中部署这些工作流带来了独特的挑战。与公共云服务不同,这些场景需要在专用边缘集群上本地执行计算,以满足严格的数据遵从性和隐私法规。这种限制,再加上边缘设备相对于llm的巨大尺寸的有限内存容量,使得动态内存管理和模型加载成为关键因素。此外,llm的自回归特性在推理延迟和内存占用方面引入了高度的动态不确定性,从根本上与传统调度方法的静态信息假设相矛盾。为了应对这些挑战,我们提出了AWTO,这是一种深度强化学习(DRL)卸载方案,旨在最大限度地减少孤立边缘环境中代理工作流的最长时间。AWTO的核心是一个逐任务的动态决策机制,它显式地处理按需模型加载和内存争用。我们将这个问题表述为一个马尔可夫决策过程(MDP),并采用了一种基于近端策略优化(PPO)的算法。设计了一种新颖的三模块LSTM编码器,用于捕获任务依赖性、设备异质性和实时内存状态。在异构环境下的实验结果表明,与启发式基线相比,AWTO将平均完工时间减少了16.99%至36.36%。此外,它比DRL方法获得了14.00%的增益,验证了它对动态内存约束和缓存感知调度的适应性。
{"title":"AWTO: A latency-optimized task offloading scheme for LLM-driven agentic workflows on heterogeneous edge","authors":"Peng Yu ,&nbsp;Bo Liu ,&nbsp;Shaomin Tang ,&nbsp;Dongdong Li ,&nbsp;Weiwei Lin","doi":"10.1016/j.future.2026.108415","DOIUrl":"10.1016/j.future.2026.108415","url":null,"abstract":"<div><div>Agentic workflows, driven by Large Language Models (LLMs), present new opportunities for realizing advanced edge intelligence in data-sensitive domains such as finance and healthcare. However, deploying these workflows in private, resource-constrained edge environments poses unique challenges. Unlike public cloud services, these scenarios require computations to be performed locally on dedicated edge clusters to meet strict data compliance and privacy regulations. This restriction, coupled with the limited memory capacity of edge devices relative to the massive size of LLMs, makes dynamic memory management and model loading critical factors. Furthermore, the autoregressive nature of LLMs introduces high dynamic uncertainty in inference latency and memory footprint, fundamentally contradicting the static information assumptions of traditional scheduling methods. To address these challenges, we propose AWTO, a Deep Reinforcement Learning (DRL) offloading scheme designed to minimize the makespan of agentic workflows in isolated edge environments. The core of AWTO is a task-by-task dynamic decision-making mechanism that explicitly handles on-demand model loading and memory contention. We formulate this problem as a Markov Decision Process (MDP) and employ a Proximal Policy Optimization (PPO)-based algorithm. A novel three-module LSTM encoder is designed to capture task dependencies, device heterogeneity, and real-time memory states. Experimental results in heterogeneous environments demonstrate that AWTO reduces the average makespan by 16.99% to 36.36% compared to heuristic baselines. Furthermore, it achieves a 14.00% gain over DRL methods, validating its adaptability to dynamic memory constraints and cache-aware scheduling.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108415"},"PeriodicalIF":6.2,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Block-FDT: Blockchain-Enhanced Federated Learning Approach to Secure DT-Assisted IIoT Networks Block-FDT:区块链增强的联邦学习方法,以保护dt辅助的IIoT网络
IF 7.5 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-02-01 DOI: 10.1016/j.future.2026.108410
Sekione Reward Jeremiah, ByungHyun Jo, Kim-Kwang Raymond Choo, Jong Hyuk Park
{"title":"Block-FDT: Blockchain-Enhanced Federated Learning Approach to Secure DT-Assisted IIoT Networks","authors":"Sekione Reward Jeremiah, ByungHyun Jo, Kim-Kwang Raymond Choo, Jong Hyuk Park","doi":"10.1016/j.future.2026.108410","DOIUrl":"https://doi.org/10.1016/j.future.2026.108410","url":null,"abstract":"","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"89 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view pedestrian detection via residual mask fusion and cosine similarity-based passive sampler for video surveillance systems 基于残馀掩模融合和余弦相似度的视频监控系统被动采样器多视点行人检测
IF 7.5 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-25 DOI: 10.1016/j.future.2026.108384
He Li, Jiajia Gui, Weihang Kong, Xingchen Zhang
Multi-view pedestrian detection aims to generate a bird’s-eye view occupancy map of pedestrians from multiple calibrated camera views. Multi-view methods offer advantages over single-view approaches: they can mitigate occlusions, expand scene coverage, and improve robustness. However, existing multi-view detection methods still face two critical challenges: mixing heterogeneous cross-view information in the fused representation and feature misalignment in the world coordinate system caused by various scales across views. To solve these issues, we develop a novel multi-view pedestrian detection framework that includes a residual mask fusion module and a cosine similarity-based passive sampler. Specifically, the residual mask fusion module enables adaptive feature selection and compensation across views, yielding an optimal fusion under geometric redundancy. Moreover, the cosine similarity-based passive sampler computes dynamic coordinate offsets by evaluating feature consistency. This reduces the impact of unavoidable biases introduced during projection. Experimental results on Wildtrack, MultiviewX and CityStreet demonstrate the effectiveness and reliability of the developed framework for multi-view pedestrian detection. Our code is available at https://github.com/guixiaojia/improve-shot.
多视图行人检测旨在从多个校准的相机视图中生成行人的鸟瞰图。多视图方法提供了优于单视图方法的优点:它们可以减轻遮挡,扩大场景覆盖范围,并提高鲁棒性。然而,现有的多视图检测方法仍然面临着两个关键的挑战:在融合表示中混合异构的跨视图信息,以及由于跨视图的不同尺度导致的世界坐标系中的特征不对齐。为了解决这些问题,我们开发了一种新的多视图行人检测框架,该框架包括残差掩模融合模块和基于余弦相似度的被动采样器。具体而言,残差掩模融合模块实现了自适应特征选择和跨视图补偿,在几何冗余下实现了最优融合。此外,基于余弦相似度的被动采样器通过评估特征一致性来计算动态坐标偏移量。这减少了在投影过程中引入的不可避免的偏差的影响。在Wildtrack、MultiviewX和CityStreet上的实验结果证明了所开发的多视图行人检测框架的有效性和可靠性。我们的代码可在https://github.com/guixiaojia/improve-shot上获得。
{"title":"Multi-view pedestrian detection via residual mask fusion and cosine similarity-based passive sampler for video surveillance systems","authors":"He Li, Jiajia Gui, Weihang Kong, Xingchen Zhang","doi":"10.1016/j.future.2026.108384","DOIUrl":"https://doi.org/10.1016/j.future.2026.108384","url":null,"abstract":"Multi-view pedestrian detection aims to generate a bird’s-eye view occupancy map of pedestrians from multiple calibrated camera views. Multi-view methods offer advantages over single-view approaches: they can mitigate occlusions, expand scene coverage, and improve robustness. However, existing multi-view detection methods still face two critical challenges: mixing heterogeneous cross-view information in the fused representation and feature misalignment in the world coordinate system caused by various scales across views. To solve these issues, we develop a novel multi-view pedestrian detection framework that includes a residual mask fusion module and a cosine similarity-based passive sampler. Specifically, the residual mask fusion module enables adaptive feature selection and compensation across views, yielding an optimal fusion under geometric redundancy. Moreover, the cosine similarity-based passive sampler computes dynamic coordinate offsets by evaluating feature consistency. This reduces the impact of unavoidable biases introduced during projection. Experimental results on Wildtrack, MultiviewX and CityStreet demonstrate the effectiveness and reliability of the developed framework for multi-view pedestrian detection. Our code is available at <ce:inter-ref xlink:href=\"https://github.com/guixiaojia/improve-shot\" xlink:type=\"simple\">https://github.com/guixiaojia/improve-shot</ce:inter-ref>.","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"1 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative performance and efficiency analysis of Apple’s M architectures: A GEMM case study 苹果M架构的性能和效率比较分析:一个GEMM案例研究
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108393
Sandra Catalán , Rafael Rodríguez-Sánchez , Carlos García Sánchez , Luis Piñuel Moreno
This paper evaluates the performance and energy efficiency of Apple processors across multiple ARM-based M-series generations and models (standard and Pro). The study is motivated by the increasing heterogeneity of Apple´s SoC architectures, which integrate multiple computing engines raising the scientific question of which hardware components are best suited for executing general-purpose and domain-specific computations such as the GEneral Matrix Multiply (GEMM). The analysis focuses on four key components: the Central Processing Unit (CPU), the Graphics Processing Unit (GPU), the matrix calculation accelerator (AMX), and the Apple Neural Engine (ANE).
The assessments use the GEMM as benchmark to characterize the performance of the CPU and GPU, alongside tests on AMX, which is specialized in handling large-scale mathematical operations, and tests on the ANE, which is specifically designed for Deep Learning purposes. Additionally, energy consumption data has been collected to analyze the energy efficiency of the aforementioned resources. Results highlight notable improvements in computational capacity and energy efficiency over successive generations. On one hand, the AMX stands out as the most efficient component for FP32 and FP64 workloads, significantly boosting overall system performance. In the M4 Pro, which integrates two matrix accelerators, it achieves up to 68% of the GPU’s FP32 performance while consuming only 42% of its power. On the other hand, the ANE, although limited to FP16 precision, excels in energy efficiency for low-precision tasks, surpassing other accelerators with over 700 GFLOPs/Watt under batched workloads.
This analysis offers a clear understanding of how Apple´s custom ARM designs optimize both performance and energy use, particularly in the context of multi-core processing and specialized acceleration units. In addition, a significant contribution of this study is the comprehensive comparative analysis of Apple’s accelerators, which have previously been poorly documented and scarcely studied. The analysis spans different generations and compares the accelerators against both CPU and GPU performance.
本文评估了苹果处理器在多个基于arm的m系列世代和型号(标准和专业)中的性能和能效。这项研究的动机是苹果SoC架构日益增加的异质性,它集成了多个计算引擎,提出了哪个硬件组件最适合执行通用和特定领域的计算(如通用矩阵乘法(GEMM))的科学问题。分析集中在四个关键组件上:中央处理器(CPU)、图形处理单元(GPU)、矩阵计算加速器(AMX)和苹果神经引擎(ANE)。
{"title":"A comparative performance and efficiency analysis of Apple’s M architectures: A GEMM case study","authors":"Sandra Catalán ,&nbsp;Rafael Rodríguez-Sánchez ,&nbsp;Carlos García Sánchez ,&nbsp;Luis Piñuel Moreno","doi":"10.1016/j.future.2026.108393","DOIUrl":"10.1016/j.future.2026.108393","url":null,"abstract":"<div><div>This paper evaluates the performance and energy efficiency of Apple processors across multiple ARM-based M-series generations and models (standard and Pro). The study is motivated by the increasing heterogeneity of Apple´s SoC architectures, which integrate multiple computing engines raising the scientific question of which hardware components are best suited for executing general-purpose and domain-specific computations such as the GEneral Matrix Multiply (<span>GEMM</span>). The analysis focuses on four key components: the Central Processing Unit (CPU), the Graphics Processing Unit (GPU), the matrix calculation accelerator (AMX), and the Apple Neural Engine (ANE).</div><div>The assessments use the <span>GEMM</span> as benchmark to characterize the performance of the CPU and GPU, alongside tests on AMX, which is specialized in handling large-scale mathematical operations, and tests on the ANE, which is specifically designed for Deep Learning purposes. Additionally, energy consumption data has been collected to analyze the energy efficiency of the aforementioned resources. Results highlight notable improvements in computational capacity and energy efficiency over successive generations. On one hand, the AMX stands out as the most efficient component for FP32 and FP64 workloads, significantly boosting overall system performance. In the M4 Pro, which integrates two matrix accelerators, it achieves up to 68% of the GPU’s FP32 performance while consuming only 42% of its power. On the other hand, the ANE, although limited to FP16 precision, excels in energy efficiency for low-precision tasks, surpassing other accelerators with over 700 GFLOPs/Watt under batched workloads.</div><div>This analysis offers a clear understanding of how Apple´s custom ARM designs optimize both performance and energy use, particularly in the context of multi-core processing and specialized acceleration units. In addition, a significant contribution of this study is the comprehensive comparative analysis of Apple’s accelerators, which have previously been poorly documented and scarcely studied. The analysis spans different generations and compares the accelerators against both CPU and GPU performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108393"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems 动态无人机辅助MEC系统的在线三维轨迹与资源优化
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108389
Zhao Tong , Shiyan Zhang , Jing Mei , Can Wang , Keqin Li
The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an Online decision-making algorithm for Dynamic environments based on Exploration-enhanced Greedy DDPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.
无人机(uav)与移动边缘计算(MEC)技术的融合发展,为用户提供更加灵活、可靠、优质的计算服务。然而,大多数无人机辅助MEC模型设计主要集中在静态环境,这并不适用于本工作中考虑的实际场景。在本文中,我们考虑了一个无人机辅助的MEC平台,该平台可以为随机移动和任务到达的多个移动地面用户提供连续服务。此外,我们研究了无人机辅助MEC系统的长期系统效用最大化问题,考虑了任务的持续卸载、用户的移动性、无人机的三维轨迹控制和资源分配。为解决系统信息有限、高维连续动作和状态空间逼近等问题,提出了一种基于探索增强型贪婪DDPG (ODEGD)的动态环境在线决策算法。此外,为了更准确地评估算法的性能,我们在实验中引入了现实世界的道路。实验结果表明,与其他算法相比,该算法的响应延迟降低了26.98%,能耗降低了22.61%,同时实现了最高的系统效用。这些结果验证了ODEGD算法在动态条件下的适用性,表明其具有良好的鲁棒性和可扩展性。
{"title":"Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems","authors":"Zhao Tong ,&nbsp;Shiyan Zhang ,&nbsp;Jing Mei ,&nbsp;Can Wang ,&nbsp;Keqin Li","doi":"10.1016/j.future.2026.108389","DOIUrl":"10.1016/j.future.2026.108389","url":null,"abstract":"<div><div>The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an <u>O</u>nline decision-making algorithm for <u>D</u>ynamic environments based on <u>E</u>xploration-enhanced <u>G</u>reedy <u>D</u>DPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108389"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs 基于任务的多加速器api异构系统数据流方法
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108383
Aleix Boné , Alejandro Aguirre , David Álvarez , Pedro J. Martinez-Ferrer , Vicenç Beltran
Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.
Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.
The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.
These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.
结合多核cpu和多种加速器的异构节点正在迅速成为高性能计算(HPC)和人工智能基础设施的标准。然而,利用这些平台需要协调几个低级加速器api,如CUDA、SYCL和Triton。在某些情况下,它们可以与优化的供应商数学库结合使用:例如,cuBLAS和oneAPI。每个API或库都引入了自己的抽象、执行语义和同步机制。因此,在单个应用程序中组合它们很容易出错,而且需要大量的劳动。我们建议重用基于任务的数据流方法以及任务感知API (TA-libs)来克服这些限制,并促进多个加速器编程模型的无缝集成,同时仍然利用每个API提供的同类最佳内核。
{"title":"A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs","authors":"Aleix Boné ,&nbsp;Alejandro Aguirre ,&nbsp;David Álvarez ,&nbsp;Pedro J. Martinez-Ferrer ,&nbsp;Vicenç Beltran","doi":"10.1016/j.future.2026.108383","DOIUrl":"10.1016/j.future.2026.108383","url":null,"abstract":"<div><div>Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.</div><div>Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.</div><div>The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.</div><div>These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108383"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoFormer: A centrality-aware multi-task graph transformer with multi-gate mixture-of-experts for link-level network performance modeling MoFormer:用于链路级网络性能建模的具有多门混合专家的中心性感知多任务图转换器
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-24 DOI: 10.1016/j.future.2026.108406
Hanlin Liu , Aliya Bao , Mingyue Li , Yintan Ai , Hua Li
Link-level network performance modeling (NPM) facilitates efficient traffic control, precise fault localization, and reliable resource management in emerging network paradigms such as Software-Defined Networking and Intent-Based Networking. A variety of models, such as Long Short-Term Memory and Graph Neural Networks (GNNs), are utilized to enhance the effectiveness of NPM. However, a practical NPM requires the generalization ability to adapt to diverse network topologies and prediction tasks without retraining. To meet this requirement, graph Transformer models are a breakthrough by encoding nodes and their structural features into tokens, breaking free from the dependencies on fixed graph structures typical of traditional GNNs. Nevertheless, they mostly focus on node-centric representations, which are insufficient to capture the fine-grained interactions and dependencies between links, thus limiting their applicability in link-level NPM. In this paper, we propose a centrality-aware multi-task graph Transformer with multi-gate mixture-of-experts (MMoE), named MoFormer, for link-level NPM. Specifically, a link-centric tokenized graph representation method is proposed to transform each link and its neighborhood information into a sequence of tokens guided by the routing protocol. A routing-aware betweenness centrality encoding mechanism is further developed to enhance the ability to characterize the tokens considering the relative importance of each link. MoFormer takes advantage of MMoE combined with Transformer to enable joint learning of multiple prediction tasks. Experimental results on both simulated and real-world datasets demonstrate the significant improvements of MoFormer over existing state-of-the-art baselines while maintaining superior generalization ability.
在软件定义网络和基于意图的网络等新兴网络模式中,链路级网络性能建模(NPM)有助于实现高效的流量控制、精确的故障定位和可靠的资源管理。利用长短期记忆和图神经网络(gnn)等多种模型来提高NPM的有效性。然而,一个实际的NPM需要泛化能力来适应不同的网络拓扑和预测任务,而不需要再训练。为了满足这一需求,图转换模型是一种突破,它将节点及其结构特征编码为token,摆脱了传统gnn对固定图结构的依赖。然而,它们主要关注以节点为中心的表示,这不足以捕捉链接之间的细粒度交互和依赖关系,从而限制了它们在链接级NPM中的适用性。在本文中,我们提出了一个具有多门混合专家(MMoE)的中心感知多任务图转换器,称为MoFormer,用于链路级NPM。具体而言,提出了一种以链路为中心的标记化图表示方法,将每个链路及其邻域信息转换为路由协议引导下的令牌序列。考虑到每个链路的相对重要性,进一步开发了路由感知的中间中心性编码机制,以增强表征令牌的能力。MoFormer利用MMoE与Transformer相结合的优势,实现对多个预测任务的联合学习。在模拟和真实数据集上的实验结果表明,MoFormer在保持优越泛化能力的同时,比现有最先进的基线有了显著的改进。
{"title":"MoFormer: A centrality-aware multi-task graph transformer with multi-gate mixture-of-experts for link-level network performance modeling","authors":"Hanlin Liu ,&nbsp;Aliya Bao ,&nbsp;Mingyue Li ,&nbsp;Yintan Ai ,&nbsp;Hua Li","doi":"10.1016/j.future.2026.108406","DOIUrl":"10.1016/j.future.2026.108406","url":null,"abstract":"<div><div>Link-level network performance modeling (NPM) facilitates efficient traffic control, precise fault localization, and reliable resource management in emerging network paradigms such as Software-Defined Networking and Intent-Based Networking. A variety of models, such as Long Short-Term Memory and Graph Neural Networks (GNNs), are utilized to enhance the effectiveness of NPM. However, a practical NPM requires the generalization ability to adapt to diverse network topologies and prediction tasks without retraining. To meet this requirement, graph Transformer models are a breakthrough by encoding nodes and their structural features into tokens, breaking free from the dependencies on fixed graph structures typical of traditional GNNs. Nevertheless, they mostly focus on node-centric representations, which are insufficient to capture the fine-grained interactions and dependencies between links, thus limiting their applicability in link-level NPM. In this paper, we propose a centrality-aware multi-task graph Transformer with multi-gate mixture-of-experts (MMoE), named MoFormer, for link-level NPM. Specifically, a link-centric tokenized graph representation method is proposed to transform each link and its neighborhood information into a sequence of tokens guided by the routing protocol. A routing-aware betweenness centrality encoding mechanism is further developed to enhance the ability to characterize the tokens considering the relative importance of each link. MoFormer takes advantage of MMoE combined with Transformer to enable joint learning of multiple prediction tasks. Experimental results on both simulated and real-world datasets demonstrate the significant improvements of MoFormer over existing state-of-the-art baselines while maintaining superior generalization ability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108406"},"PeriodicalIF":6.2,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads 动态工作负载下共定位延迟关键型JVM应用程序和批处理作业的自适应CPU共享
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-23 DOI: 10.1016/j.future.2026.108387
Dishi Xu , Fagui Liu , Bin Wang , Xuhao Tang , Qingbo Wu
Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.
在Java虚拟机(jlra)上运行的延迟关键型(LC)长时间运行的应用程序通常依赖大量的CPU过度配置来满足动态工作负载下的服务水平目标(Service-Level Objectives, slo),从而导致严重的资源利用率不足。此外,jlra表现出较差的冷启动性能,频繁删除和创建应用程序实例以调整资源分配会导致性能下降。此外,由于共享CPU资源的争用,通过与jlra一起部署尽力而为(BE)批处理作业来获取冗余资源会遇到严重的挑战。因此,我们提出了ChaosRM,一个用于JVM工作负载托管的双层资源管理框架,以提高资源利用效率,同时消除资源争用。与在非重叠CPU集中隔离jlra和批处理作业的传统方法不同,ChaosRM提出了一种三区CPU隔离机制,利用两个CPU区域隔离jlra和批处理作业,并利用一个共享区域并发执行它们的线程。应用程序范围的、基于学习的应用程序管理器根据全局工作负载调整jlra的实例状态,并自适应地学习共享区域分配策略和线程排队时间表示的性能目标;每个服务器上的节点管理器启发式地将CPU集绑定到JLRA,并根据此性能目标和JLRA实例状态在CPU区域之间动态调度批处理作业。实验结果表明,在保证jlra的SLOs的同时,ChaosRM将批处理作业的完成时间比最佳基准减少了14.10%,比所有基准减少了54.29%。
{"title":"Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads","authors":"Dishi Xu ,&nbsp;Fagui Liu ,&nbsp;Bin Wang ,&nbsp;Xuhao Tang ,&nbsp;Qingbo Wu","doi":"10.1016/j.future.2026.108387","DOIUrl":"10.1016/j.future.2026.108387","url":null,"abstract":"<div><div>Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108387"},"PeriodicalIF":6.2,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High performance graph-parallel accelerator design 高性能图形并行加速器设计
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-23 DOI: 10.1016/j.future.2026.108385
Cemil Kaan Akyol , Muhammet Mustafa Ozdal , Ozcan Ozturk
Graph applications are becoming increasingly important with their widespread usage and the amounts of data they deal with. Biological and social web graphs are well-known examples that show the importance of efficiently processing graph analytic applications and problems. Due to limited resources, efficiency and performance are much more critical in embedded systems. We propose an efficient source-to-source-based methodology for graph applications that gives the freedom of not knowing the low-level details of parallelization and distribution by translating any vertex-centric C++ graph application into a pipelined SystemC model. High-Level Synthesis (HLS) tools can synthesize the generated SystemC model to obtain the design of the hardware. To support different types of graph applications, we have implemented features like non-standard application support, active set functionality, asynchronous execution support, conditional pipeline support, non-neighbor data access support, multiple pipeline support, and user-defined data type functionality. Our accelerator development flow can generate better-performing accelerators than OpenCL. Furthermore, it dramatically reduces the design time compared to using HLS tools. Therefore, the proposed methodology can generate fast accelerators with minimal effort using a high-level language description from the user.
由于图形应用程序的广泛使用和处理的数据量,它们正变得越来越重要。生物和社会网络图是众所周知的例子,显示了有效处理图分析应用程序和问题的重要性。由于资源有限,在嵌入式系统中效率和性能更为重要。我们为图形应用程序提出了一种高效的基于源到源的方法,通过将任何以顶点为中心的c++图形应用程序转换为流水线的SystemC模型,可以自由地不知道并行化和分布的底层细节。高级综合(High-Level Synthesis, HLS)工具可以综合生成的SystemC模型,从而得到硬件的设计。为了支持不同类型的图形应用程序,我们实现了一些特性,如非标准应用程序支持、活动集功能、异步执行支持、条件管道支持、非邻居数据访问支持、多管道支持和用户定义的数据类型功能。我们的加速器开发流程可以生成比OpenCL性能更好的加速器。此外,与使用HLS工具相比,它大大缩短了设计时间。因此,所提出的方法可以使用来自用户的高级语言描述以最小的努力生成快速加速器。
{"title":"High performance graph-parallel accelerator design","authors":"Cemil Kaan Akyol ,&nbsp;Muhammet Mustafa Ozdal ,&nbsp;Ozcan Ozturk","doi":"10.1016/j.future.2026.108385","DOIUrl":"10.1016/j.future.2026.108385","url":null,"abstract":"<div><div>Graph applications are becoming increasingly important with their widespread usage and the amounts of data they deal with. Biological and social web graphs are well-known examples that show the importance of efficiently processing graph analytic applications and problems. Due to limited resources, efficiency and performance are much more critical in embedded systems. We propose an efficient source-to-source-based methodology for graph applications that gives the freedom of not knowing the low-level details of parallelization and distribution by translating any vertex-centric C++ graph application into a pipelined SystemC model. High-Level Synthesis (HLS) tools can synthesize the generated SystemC model to obtain the design of the hardware. To support different types of graph applications, we have implemented features like non-standard application support, active set functionality, asynchronous execution support, conditional pipeline support, non-neighbor data access support, multiple pipeline support, and user-defined data type functionality. Our accelerator development flow can generate better-performing accelerators than OpenCL. Furthermore, it dramatically reduces the design time compared to using HLS tools. Therefore, the proposed methodology can generate fast accelerators with minimal effort using a high-level language description from the user.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108385"},"PeriodicalIF":6.2,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vertical auto-scaling mechanism for elastic memory management of containerized applications in Kubernetes Kubernetes中容器化应用弹性内存管理的垂直自动扩展机制
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-22 DOI: 10.1016/j.future.2026.108407
Taeshin Kang, Minwoo Kang, Heonchang Yu
Cloud service providers typically offer containers with fixed resource sizes. However, cloud users often overprovision container resources to prevent service interruptions caused by resource shortages. This practice leads to low utilization of system resources in the cloud. To address this issue, cloud service providers offer container auto-scaling. They primarily support horizontal auto-scaling, which provides horizontal elasticity. However, this approach has limitations in responding promptly to unexpected spikes in resource usage and in optimizing resource utilization. Vertical auto-scaling can help overcome these limitations. Its importance is increasing, particularly for stateful and real-time applications that require immediate resource elasticity. Nevertheless, vertical elasticity remains difficult to achieve and has not been actively researched or widely implemented. This study proposes a vertical auto-scaling mechanism for elastic memory management in container-based applications running in Kubernetes, which is widely recognized as the standard platform for container orchestration. In the proposed approach, high-priority tasks are given priority for scaling up, while tasks that cannot undergo scale-up are suspended using the cgroup freeze feature to prevent further memory allocation. If memory pressure persists and task termination becomes unavoidable, tasks are terminated in ascending order of priority, starting with the lowest. Once memory pressure is relieved, stateful applications are restarted from the point at which they were suspended. Compared to the default Kubernetes environment without vertical elasticity, EVMMv2 reduced the total execution time of stateful applications by up to 40% and improved the request success rate of stateless applications by 37%.
云服务提供商通常提供固定资源大小的容器。但是,云用户经常过度配置容器资源,以防止资源短缺导致的服务中断。这种做法导致云中系统资源的利用率很低。为了解决这个问题,云服务提供商提供了容器自动伸缩功能。它们主要支持水平自动伸缩,从而提供水平弹性。然而,这种方法在迅速响应资源使用的意外高峰和优化资源利用方面存在局限性。垂直自动伸缩可以帮助克服这些限制。它的重要性正在增加,特别是对于需要即时资源弹性的有状态和实时应用程序。然而,垂直弹性仍然很难实现,没有积极研究或广泛实施。本研究提出了一种垂直自动伸缩机制,用于在Kubernetes上运行的基于容器的应用程序中的弹性内存管理,Kubernetes被广泛认为是容器编排的标准平台。在建议的方法中,高优先级的任务被赋予扩展的优先级,而不能进行扩展的任务则使用cgroup冻结特性挂起,以防止进一步的内存分配。如果内存压力持续存在并且任务终止不可避免,则按优先级升序终止任务,从最低优先级开始。一旦内存压力得到缓解,有状态应用程序将从挂起它们的位置重新启动。与没有垂直弹性的默认Kubernetes环境相比,EVMMv2将有状态应用程序的总执行时间减少了40%,并将无状态应用程序的请求成功率提高了37%。
{"title":"Vertical auto-scaling mechanism for elastic memory management of containerized applications in Kubernetes","authors":"Taeshin Kang,&nbsp;Minwoo Kang,&nbsp;Heonchang Yu","doi":"10.1016/j.future.2026.108407","DOIUrl":"10.1016/j.future.2026.108407","url":null,"abstract":"<div><div>Cloud service providers typically offer containers with fixed resource sizes. However, cloud users often overprovision container resources to prevent service interruptions caused by resource shortages. This practice leads to low utilization of system resources in the cloud. To address this issue, cloud service providers offer container auto-scaling. They primarily support horizontal auto-scaling, which provides horizontal elasticity. However, this approach has limitations in responding promptly to unexpected spikes in resource usage and in optimizing resource utilization. Vertical auto-scaling can help overcome these limitations. Its importance is increasing, particularly for stateful and real-time applications that require immediate resource elasticity. Nevertheless, vertical elasticity remains difficult to achieve and has not been actively researched or widely implemented. This study proposes a vertical auto-scaling mechanism for elastic memory management in container-based applications running in Kubernetes, which is widely recognized as the standard platform for container orchestration. In the proposed approach, high-priority tasks are given priority for scaling up, while tasks that cannot undergo scale-up are suspended using the <em>cgroup freeze</em> feature to prevent further memory allocation. If memory pressure persists and task termination becomes unavoidable, tasks are terminated in ascending order of priority, starting with the lowest. Once memory pressure is relieved, stateful applications are restarted from the point at which they were suspended. Compared to the default Kubernetes environment without vertical elasticity, EVMMv2 reduced the total execution time of stateful applications by up to 40% and improved the request success rate of stateless applications by 37%.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108407"},"PeriodicalIF":6.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1