首页 > 最新文献

Journal of Systems Architecture最新文献

英文 中文
MECSim: A comprehensive simulation platform for multi-access edge computing MECSim:多接入边缘计算综合仿真平台
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-15 DOI: 10.1016/j.sysarc.2026.103706
Akhirul Islam, Manojit Ghose
The rapid growth of CPU-intensive and latency-sensitive applications has intensified the need for efficient resource management within edge computing environments. While existing simulators such as iFogSim, EdgeCloudSim, and PureEdgeSim have contributed significantly to edge computing research, they lack comprehensive support for modeling modern hardware heterogeneity, energy-aware mechanisms, service providers’ economic models, dependent task modeling, and reliability-driven task management. This paper presents MECSim (multi-access edge computing simulator), an enhanced simulation framework that extends PureEdgeSim to enable realistic modeling of heterogeneous, cooperative, and fault-tolerant edge computing ecosystems. MECSim supports multi-data-center clusters along with dynamic voltage and frequency scaling (DVFS) capable user devices for energy-efficient operation. The framework further integrates dependent-task modeling, cost and profit evaluation for service providers, and reliability mechanisms via transient-failure simulation, caching, and task replication. We have implemented five state-of-the-art approaches, demonstrating the effectiveness of our simulation platform and building confidence in its practical utility to handle diverse system architectures. With its extensible architecture and comprehensive modeling capabilities, MECSim provides a promising platform for future research on energy-efficient, profit-driven, and fault-tolerant task offloading and scheduling in heterogeneous MEC environments. The results also demonstrate that MECSim achieves a 44.13% (on average) reduction in simulation time compared to EdgeCloudSim. In addition, we have conducted experiments using dispersion-aware metrics to quantify variability and stability across 50 independent runs, thereby enabling a more robust and reliable performance evaluation.
cpu密集型和延迟敏感型应用程序的快速增长加剧了对边缘计算环境中有效资源管理的需求。虽然现有的模拟器如iFogSim、EdgeCloudSim和PureEdgeSim对边缘计算研究做出了重大贡献,但它们缺乏对现代硬件异构建模、能量感知机制、服务提供商经济模型、依赖任务建模和可靠性驱动任务管理的全面支持。本文介绍了MECSim(多访问边缘计算模拟器),这是一个增强的仿真框架,扩展了PureEdgeSim,可以实现异构、协作和容错边缘计算生态系统的真实建模。MECSim支持多数据中心集群以及具有动态电压和频率缩放(DVFS)功能的用户设备,以实现节能运行。该框架通过瞬时故障模拟、缓存和任务复制进一步集成了依赖任务建模、服务提供者的成本和利润评估以及可靠性机制。我们已经实施了五种最先进的方法,展示了我们的仿真平台的有效性,并建立了对其处理不同系统架构的实际效用的信心。MECSim具有可扩展的体系结构和全面的建模能力,为未来异构MEC环境中节能、利润驱动和容错任务卸载和调度的研究提供了一个有前景的平台。结果还表明,与EdgeCloudSim相比,MECSim实现了44.13%(平均)的模拟时间减少。此外,我们还进行了实验,使用弥散感知指标来量化50次独立运行的可变性和稳定性,从而实现更稳健、更可靠的性能评估。
{"title":"MECSim: A comprehensive simulation platform for multi-access edge computing","authors":"Akhirul Islam,&nbsp;Manojit Ghose","doi":"10.1016/j.sysarc.2026.103706","DOIUrl":"10.1016/j.sysarc.2026.103706","url":null,"abstract":"<div><div>The rapid growth of CPU-intensive and latency-sensitive applications has intensified the need for efficient resource management within edge computing environments. While existing simulators such as iFogSim, EdgeCloudSim, and PureEdgeSim have contributed significantly to edge computing research, they lack comprehensive support for modeling modern hardware heterogeneity, energy-aware mechanisms, service providers’ economic models, dependent task modeling, and reliability-driven task management. This paper presents MECSim (multi-access edge computing simulator), an enhanced simulation framework that extends PureEdgeSim to enable realistic modeling of heterogeneous, cooperative, and fault-tolerant edge computing ecosystems. MECSim supports multi-data-center clusters along with dynamic voltage and frequency scaling (DVFS) capable user devices for energy-efficient operation. The framework further integrates dependent-task modeling, cost and profit evaluation for service providers, and reliability mechanisms via transient-failure simulation, caching, and task replication. We have implemented five state-of-the-art approaches, demonstrating the effectiveness of our simulation platform and building confidence in its practical utility to handle diverse system architectures. With its extensible architecture and comprehensive modeling capabilities, MECSim provides a promising platform for future research on energy-efficient, profit-driven, and fault-tolerant task offloading and scheduling in heterogeneous MEC environments. The results also demonstrate that MECSim achieves a 44.13% (on average) reduction in simulation time compared to EdgeCloudSim. In addition, we have conducted experiments using dispersion-aware metrics to quantify variability and stability across 50 independent runs, thereby enabling a more robust and reliable performance evaluation.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103706"},"PeriodicalIF":4.1,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precision boundary modeling for area-efficient Block Floating Point accumulation 区域高效块浮点积累的精确边界建模
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-13 DOI: 10.1016/j.sysarc.2026.103704
Jun He , Jing Feng , Xin Ju , Yasong Cao , Zhongdi Luo , Jianchao Yang , Jingkui Yang , Gang Li , Jian Cheng , Dong Chen , Mei Wen
Block Floating Point (BFP) is extensively employed for the low-precision quantization of deep network weights and activations to attain advantages in both hardware efficiency and performance. Nevertheless, when the precision of weights and activations is diminished to below 8 bits, the required high-precision floating-point accumulation becomes a dominant hardware bottleneck in the BFP processing element (PE). To address this challenge, we introduce a framework based on the Frobenius norm Retention Ratio (FnRR) to explore the precision boundaries for BFP accumulation, and extend it to a hierarchical chunk-based accumulation scheme. Comprehensive experiments across representative CNN and LLM models demonstrate that our predicted precision boundaries maintain performance closely matching FP32 baselines, while further precision reduction leads to substantial accuracy degradation, validating the effectiveness of our boundary determination. Guided by this analysis, we present a corresponding hardware for BFP computation. This design achieves 13.7%–25.2% improvements in area and power efficiency compared with FP32 accumulation under identical quantization settings, and delivers up to 10.3× area and 11.0× power reductions relative to conventional BFP implementations.
块浮点(BFP)被广泛用于深度网络权重和激活的低精度量化,以获得硬件效率和性能方面的优势。然而,当权重和激活的精度降低到8位以下时,所需的高精度浮点积累成为BFP处理单元(PE)的主要硬件瓶颈。为了解决这一挑战,我们引入了一个基于Frobenius norm Retention Ratio (FnRR)的框架来探索BFP积累的精度边界,并将其扩展到基于分层块的积累方案。具有代表性的CNN和LLM模型的综合实验表明,我们预测的精度边界保持了与FP32基线密切匹配的性能,而进一步的精度降低导致精度大幅下降,验证了我们边界确定的有效性。在此分析的指导下,我们提出了相应的BFP计算硬件。在相同的量化设置下,与FP32积累相比,该设计的面积和功耗效率提高了13.7%-25.2%,与传统的BFP实现相比,该设计的面积和功耗降低了10.3倍,功耗降低了11.0倍。
{"title":"Precision boundary modeling for area-efficient Block Floating Point accumulation","authors":"Jun He ,&nbsp;Jing Feng ,&nbsp;Xin Ju ,&nbsp;Yasong Cao ,&nbsp;Zhongdi Luo ,&nbsp;Jianchao Yang ,&nbsp;Jingkui Yang ,&nbsp;Gang Li ,&nbsp;Jian Cheng ,&nbsp;Dong Chen ,&nbsp;Mei Wen","doi":"10.1016/j.sysarc.2026.103704","DOIUrl":"10.1016/j.sysarc.2026.103704","url":null,"abstract":"<div><div>Block Floating Point (BFP) is extensively employed for the low-precision quantization of deep network weights and activations to attain advantages in both hardware efficiency and performance. Nevertheless, when the precision of weights and activations is diminished to below 8 bits, the required high-precision floating-point accumulation becomes a dominant hardware bottleneck in the BFP processing element (PE). To address this challenge, we introduce a framework based on the Frobenius norm Retention Ratio (FnRR) to explore the precision boundaries for BFP accumulation, and extend it to a hierarchical chunk-based accumulation scheme. Comprehensive experiments across representative CNN and LLM models demonstrate that our predicted precision boundaries maintain performance closely matching FP32 baselines, while further precision reduction leads to substantial accuracy degradation, validating the effectiveness of our boundary determination. Guided by this analysis, we present a corresponding hardware for BFP computation. This design achieves 13.7%–25.2% improvements in area and power efficiency compared with FP32 accumulation under identical quantization settings, and delivers up to <span><math><mrow><mn>10</mn><mo>.</mo><mn>3</mn><mo>×</mo></mrow></math></span> area and <span><math><mrow><mn>11</mn><mo>.</mo><mn>0</mn><mo>×</mo></mrow></math></span> power reductions relative to conventional BFP implementations.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103704"},"PeriodicalIF":4.1,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peak-memory-aware partitioning and scheduling for multi-tenant DNN model inference 多租户DNN模型推理的峰值内存感知分区和调度
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-09 DOI: 10.1016/j.sysarc.2026.103696
Jaeho Lee, Ju Min Lee, Haeeun Jeong, Hyunho Kwon, Youngsok Kim, Yongjun Park, Hanjun Kim
As Deep Neural Networks (DNNs) are widely used in various applications, multiple DNN inference models start to run on a single GPU. The simultaneous execution of multiple DNN models can overwhelm the GPU memory with increasing model size, leading to unexpected out-of-memory (OOM) errors. To avoid the OOM errors, existing systems attempt to schedule models either in model-level granularity or layer-level granularity. However, the model-level scheduling schemes inefficiently utilize memory spaces because they preallocate memory based on the model’s peak memory demand, and the layer-level scheduling schemes suffer from high scheduling overhead due to too fine-grained scheduling units. This work proposes a new peak-memory-aware DNN model partitioning compiler and scheduler, called Quilt. The Quilt compiler partitions a DNN model into multiple tasks based on their peak memory usage, and the Quilt scheduler orchestrates the tasks of multiple models without the OOM errors. Additionally, the compiler generates a memory pool for tensors shared between partitioned tasks, reducing CPU–GPU communication overhead when consecutively executing the tasks. Compared to the model-level and layer-level scheduling schemes, Quilt successfully reduces 25.4% and 37.7% of overall latency, respectively, while preventing the OOM errors. Moreover, Quilt achieves up to 10.8% faster overall inference latency than the state-of-the-art Triton inference server for 6 DNN models.
随着深度神经网络(DNN)在各种应用中的广泛应用,多个DNN推理模型开始在单个GPU上运行。同时执行多个DNN模型可能会随着模型大小的增加而淹没GPU内存,导致意外的内存不足(OOM)错误。为了避免OOM错误,现有系统尝试在模型级粒度或层级粒度中调度模型。然而,模型级调度方案由于基于模型的峰值内存需求预分配内存而无法有效地利用内存空间,而层级调度方案由于过于细粒度的调度单元而存在较高的调度开销。本工作提出了一种新的峰值内存感知DNN模型分区编译器和调度器,称为Quilt。Quilt编译器根据它们的峰值内存使用情况将DNN模型划分为多个任务,而Quilt调度器在没有OOM错误的情况下编排多个模型的任务。此外,编译器为分区任务之间共享的张量生成一个内存池,从而在连续执行任务时减少CPU-GPU通信开销。与模型级和层级调度方案相比,Quilt成功地分别减少了25.4%和37.7%的总延迟,同时防止了OOM错误。此外,对于6个DNN模型,Quilt的总体推理延迟比最先进的Triton推理服务器快10.8%。
{"title":"Peak-memory-aware partitioning and scheduling for multi-tenant DNN model inference","authors":"Jaeho Lee,&nbsp;Ju Min Lee,&nbsp;Haeeun Jeong,&nbsp;Hyunho Kwon,&nbsp;Youngsok Kim,&nbsp;Yongjun Park,&nbsp;Hanjun Kim","doi":"10.1016/j.sysarc.2026.103696","DOIUrl":"10.1016/j.sysarc.2026.103696","url":null,"abstract":"<div><div>As Deep Neural Networks (DNNs) are widely used in various applications, multiple DNN inference models start to run on a single GPU. The simultaneous execution of multiple DNN models can overwhelm the GPU memory with increasing model size, leading to unexpected out-of-memory (OOM) errors. To avoid the OOM errors, existing systems attempt to schedule models either in model-level granularity or layer-level granularity. However, the model-level scheduling schemes inefficiently utilize memory spaces because they preallocate memory based on the model’s peak memory demand, and the layer-level scheduling schemes suffer from high scheduling overhead due to too fine-grained scheduling units. This work proposes a new peak-memory-aware DNN model partitioning compiler and scheduler, called Quilt. The Quilt compiler partitions a DNN model into multiple tasks based on their peak memory usage, and the Quilt scheduler orchestrates the tasks of multiple models without the OOM errors. Additionally, the compiler generates a memory pool for tensors shared between partitioned tasks, reducing CPU–GPU communication overhead when consecutively executing the tasks. Compared to the model-level and layer-level scheduling schemes, Quilt successfully reduces 25.4% and 37.7% of overall latency, respectively, while preventing the OOM errors. Moreover, Quilt achieves up to 10.8% faster overall inference latency than the state-of-the-art Triton inference server for 6 DNN models.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103696"},"PeriodicalIF":4.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A PCM-based hybrid online learning architecture with adaptive threshold sign-based backpropagation and pulse-aware conductance control 基于pcm的混合在线学习体系结构,具有自适应阈值反向传播和脉冲感知电导控制
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-08 DOI: 10.1016/j.sysarc.2026.103683
Zhenhao Jiao , Xiaogang Chen , Tao Hong , Shunfen Li , Xi Li , Weibang Dai , Chengcai Tu , Zhitang Song
The increasing demand for low-latency, energy-efficient online learning in edge devices has driven the exploration of neuromorphic computing and hybrid analog–digital architectures. In this work, we propose a phase-change memory (PCM)-based hybrid architecture for in-situ online learning, which integrates parallel analog matrix–vector multiplication with adaptive digital control. The system features two key innovations: (1) an adaptive threshold sign-based backpropagation (ATSBP) algorithm that dynamically adjusts quantization thresholds for activations and error signals based on real-time feedback from mini-batch statistics, and (2) a pulse-aware conductance control scheme that enables precise conductance tuning of PCM devices using experimentally calibrated pulse-conductance mappings. These mechanisms jointly reduce unnecessary write operations and enhance robustness against device nonidealities such as nonlinearity and drift. Through systematic validation, we demonstrate that our hybrid architecture significantly improves convergence stability and energy efficiency in online learning scenarios, without sacrificing classification accuracy. The proposed system highlights a promising pathway toward scalable, hardware-friendly neuromorphic learning on edge platforms.
边缘设备对低延迟、高能效在线学习的需求不断增长,推动了神经形态计算和混合模拟-数字架构的探索。在这项工作中,我们提出了一种基于相变存储器(PCM)的原位在线学习混合架构,该架构集成了并行模拟矩阵向量乘法和自适应数字控制。该系统具有两个关键创新:(1)基于自适应阈值符号的反向传播(ATSBP)算法,该算法基于小批量统计数据的实时反馈动态调整激活和错误信号的量化阈值;(2)脉冲感知电导控制方案,该方案使用实验校准的脉冲电导映射实现PCM器件的精确电导调谐。这些机制共同减少了不必要的写入操作,并增强了对器件非理想性(如非线性和漂移)的鲁棒性。通过系统验证,我们证明了我们的混合架构在不牺牲分类精度的情况下显著提高了在线学习场景下的收敛稳定性和能源效率。提出的系统强调了在边缘平台上实现可扩展、硬件友好的神经形态学习的有希望的途径。
{"title":"A PCM-based hybrid online learning architecture with adaptive threshold sign-based backpropagation and pulse-aware conductance control","authors":"Zhenhao Jiao ,&nbsp;Xiaogang Chen ,&nbsp;Tao Hong ,&nbsp;Shunfen Li ,&nbsp;Xi Li ,&nbsp;Weibang Dai ,&nbsp;Chengcai Tu ,&nbsp;Zhitang Song","doi":"10.1016/j.sysarc.2026.103683","DOIUrl":"10.1016/j.sysarc.2026.103683","url":null,"abstract":"<div><div>The increasing demand for low-latency, energy-efficient online learning in edge devices has driven the exploration of neuromorphic computing and hybrid analog–digital architectures. In this work, we propose a phase-change memory (PCM)-based hybrid architecture for in-situ online learning, which integrates parallel analog matrix–vector multiplication with adaptive digital control. The system features two key innovations: (1) an adaptive threshold sign-based backpropagation (ATSBP) algorithm that dynamically adjusts quantization thresholds for activations and error signals based on real-time feedback from mini-batch statistics, and (2) a pulse-aware conductance control scheme that enables precise conductance tuning of PCM devices using experimentally calibrated pulse-conductance mappings. These mechanisms jointly reduce unnecessary write operations and enhance robustness against device nonidealities such as nonlinearity and drift. Through systematic validation, we demonstrate that our hybrid architecture significantly improves convergence stability and energy efficiency in online learning scenarios, without sacrificing classification accuracy. The proposed system highlights a promising pathway toward scalable, hardware-friendly neuromorphic learning on edge platforms.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103683"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FP8ApproxLib: An FPGA-based approximate multiplier library for 8-bit floating point FP8ApproxLib:基于fpga的8位浮点近似乘法器库
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-08 DOI: 10.1016/j.sysarc.2026.103686
Ruiqi Chen , Yangxintong Lyu , Han Bao , Shidi Tang , Jindong Li , Yanxiang Zhu , Ming Ling , Bruno da Silva
The 8-bit floating-point (FP8) data format has been increasingly adopted in neural network (NN) computations due to its superior dynamic range compared to traditional 8-bit integer (INT8). Nevertheless, the heavy reliance on multiplication in neural network workloads leads to considerable energy consumption, even with FP8, particularly in the context of FPGA-based deployments. To this end, this paper presents FP8ApproxLib, an FPGA-based approximate multiplier library for FP8. Firstly, we conduct a bit-level analysis of the prior approximation method and introduce improvements to reduce the resulting computational error. Based on these, we implement a fine-grained optimized design on mainstream FPGAs (Altera and AMD) using primitives and templates combined with physical layout constraints. Moreover, an automated tool is developed to support user configuration and generate HDL code. We then evaluate the accuracy and hardware efficiency of the FP8 approximate multipliers. The results show that our proposed method achieves an average error reduction of 53.15% (36.74%72.82%) compared to previous FP8 approximation method. Moreover, compared to prior 8-bit approximate multipliers, our FP8 designs exhibit the lowest resource utilization. Finally, we integrate the design into the inference phase of three representative NN models (CNN, LLM, and Diffusion), demonstrating its excellent power efficiency. This is the first FP8 approximate multiplier design with architecture-aware fine-grained optimization and deployment for modern FPGA platforms, which can serve as a benchmark for future designs and comparisons of FPGA-based low-precision floating-point approximate multipliers. The code of this work is available in our GitLab.
与传统的8位整数(INT8)相比,8位浮点(FP8)数据格式具有更好的动态范围,因此在神经网络(NN)计算中越来越多地采用。然而,神经网络工作负载对乘法的严重依赖导致了相当大的能源消耗,即使是FP8,特别是在基于fpga的部署环境中。为此,本文提出了基于fpga的FP8近似乘法器库FP8ApproxLib。首先,我们对先验近似方法进行了位级分析,并引入改进以减少计算误差。在此基础上,我们利用原语和模板结合物理布局约束,在主流fpga (Altera和AMD)上实现了细粒度优化设计。此外,还开发了一个支持用户配置和生成HDL代码的自动化工具。然后我们评估了FP8近似乘法器的精度和硬件效率。结果表明,与以前的FP8近似方法相比,我们提出的方法平均误差降低了53.15%(36.74% ~ 72.82%)。此外,与之前的8位近似乘法器相比,我们的FP8设计显示出最低的资源利用率。最后,我们将设计集成到三个代表性NN模型(CNN, LLM和Diffusion)的推理阶段,证明了其出色的功率效率。这是第一个FP8近似乘法器设计,具有架构感知的细粒度优化和现代FPGA平台的部署,可以作为未来设计和比较基于FPGA的低精度浮点近似乘法器的基准。这项工作的代码可在我们的GitLab *中获得。
{"title":"FP8ApproxLib: An FPGA-based approximate multiplier library for 8-bit floating point","authors":"Ruiqi Chen ,&nbsp;Yangxintong Lyu ,&nbsp;Han Bao ,&nbsp;Shidi Tang ,&nbsp;Jindong Li ,&nbsp;Yanxiang Zhu ,&nbsp;Ming Ling ,&nbsp;Bruno da Silva","doi":"10.1016/j.sysarc.2026.103686","DOIUrl":"10.1016/j.sysarc.2026.103686","url":null,"abstract":"<div><div>The 8-bit floating-point (FP8) data format has been increasingly adopted in neural network (NN) computations due to its superior dynamic range compared to traditional 8-bit integer (INT8). Nevertheless, the heavy reliance on multiplication in neural network workloads leads to considerable energy consumption, even with FP8, particularly in the context of FPGA-based deployments. To this end, this paper presents FP8ApproxLib, an FPGA-based approximate multiplier library for FP8. Firstly, we conduct a bit-level analysis of the prior approximation method and introduce improvements to reduce the resulting computational error. Based on these, we implement a fine-grained optimized design on mainstream FPGAs (Altera and AMD) using primitives and templates combined with physical layout constraints. Moreover, an automated tool is developed to support user configuration and generate HDL code. We then evaluate the accuracy and hardware efficiency of the FP8 approximate multipliers. The results show that our proposed method achieves an average error reduction of 53.15% (36.74%<span><math><mo>∼</mo></math></span>72.82%) compared to previous FP8 approximation method. Moreover, compared to prior 8-bit approximate multipliers, our FP8 designs exhibit the lowest resource utilization. Finally, we integrate the design into the inference phase of three representative NN models (CNN, LLM, and Diffusion), demonstrating its excellent power efficiency. This is the first FP8 approximate multiplier design with architecture-aware fine-grained optimization and deployment for modern FPGA platforms, which can serve as a benchmark for future designs and comparisons of FPGA-based low-precision floating-point approximate multipliers. The code of this work is available in our GitLab<span><math><msup><mrow></mrow><mrow><mo>∗</mo></mrow></msup></math></span>.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103686"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing deterministic transmission in Time-Sensitive Networking: A Joint Guard Band Compression and Non-Disruptive Frame Preemption model 增强时间敏感网络中的确定性传输:联合保护频带压缩和非中断帧抢占模型
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-06 DOI: 10.1016/j.sysarc.2026.103697
Ling Zheng , Jingzhuo Liu , Keyao Zhang , Qianxi Men , Li Zhen , Weitao Pan
In the Industrial Internet of Things (IIoT), the real-time interaction of critical control and sensing data imposes stringent requirements on network determinism and reliability. Time-Sensitive Networking (TSN) is widely utilized to ensure low latency and high-precision synchronization; however, efficiently coordinating the nanosecond-level determinism of Time-Triggered (TT) traffic with the high throughput of Event-Triggered (ET) traffic remains a core challenge. Existing Guard Band mechanisms lead to significant bandwidth waste due to fixed reserved windows, while traditional frame preemption mechanisms, though improving efficiency, still suffer from unpredictable TT frame delays. To address these issues, this paper proposes and validates two novel frame preemption strategies. First, a Guard Band Compression and Non-Disruptive Frame Preemption model (GCNFP) is introduced, which compresses the guard band to 76 bytes and employs a padding mechanism to ensure zero-offset handling of ET frames within the guard window, thus eliminating transmission deviations caused by ET blocking and inter-frame gaps (IFG). Second, an enhanced model with judgment mechanism (JM-GCNFP) is developed, which dynamically monitors ET frame transmission within the guard window and delays preemption until the optimal moment. This adaptive strategy maximizes the use of available time slots while reducing unnecessary preemption and padding overhead. Simulation results show that both strategies achieve zero-offset scheduling and significantly improve system performance. Specifically, JM-GCNFP reduces the maximum ET frame delay by more than 40% and improves bandwidth utilization by nearly 20% under high-load scenarios, demonstrating the advantages in achieving enhanced network determinism and transmission efficiency.
在工业物联网(IIoT)中,关键控制和传感数据的实时交互对网络的确定性和可靠性提出了严格的要求。时间敏感网络(TSN)被广泛用于保证低延迟和高精度的同步;然而,如何有效协调时间触发(TT)流量的纳秒级确定性与事件触发(ET)流量的高吞吐量仍然是一个核心挑战。现有的保护带机制由于固定的保留窗口导致了大量的带宽浪费,而传统的帧抢占机制虽然提高了效率,但仍然存在不可预测的TT帧延迟。为了解决这些问题,本文提出并验证了两种新的帧抢占策略。首先,介绍了保护带压缩和非中断帧抢占模型(GCNFP),该模型将保护带压缩到76字节,并采用填充机制确保保护窗口内ET帧的零偏移处理,从而消除了ET阻塞和帧间间隙(IFG)引起的传输偏差。其次,提出了一种带有判断机制的增强模型(JM-GCNFP),该模型在保护窗口内动态监控ET帧传输,并将抢占延迟到最优时刻。这种自适应策略最大限度地利用可用时隙,同时减少不必要的抢占和填充开销。仿真结果表明,两种策略都实现了零偏移调度,显著提高了系统性能。具体而言,在高负载场景下,JM-GCNFP将最大ET帧延迟降低了40%以上,将带宽利用率提高了近20%,显示出在增强网络确定性和传输效率方面的优势。
{"title":"Enhancing deterministic transmission in Time-Sensitive Networking: A Joint Guard Band Compression and Non-Disruptive Frame Preemption model","authors":"Ling Zheng ,&nbsp;Jingzhuo Liu ,&nbsp;Keyao Zhang ,&nbsp;Qianxi Men ,&nbsp;Li Zhen ,&nbsp;Weitao Pan","doi":"10.1016/j.sysarc.2026.103697","DOIUrl":"10.1016/j.sysarc.2026.103697","url":null,"abstract":"<div><div>In the Industrial Internet of Things (IIoT), the real-time interaction of critical control and sensing data imposes stringent requirements on network determinism and reliability. Time-Sensitive Networking (TSN) is widely utilized to ensure low latency and high-precision synchronization; however, efficiently coordinating the nanosecond-level determinism of Time-Triggered (TT) traffic with the high throughput of Event-Triggered (ET) traffic remains a core challenge. Existing Guard Band mechanisms lead to significant bandwidth waste due to fixed reserved windows, while traditional frame preemption mechanisms, though improving efficiency, still suffer from unpredictable TT frame delays. To address these issues, this paper proposes and validates two novel frame preemption strategies. First, a Guard Band Compression and Non-Disruptive Frame Preemption model (GCNFP) is introduced, which compresses the guard band to 76 bytes and employs a padding mechanism to ensure zero-offset handling of ET frames within the guard window, thus eliminating transmission deviations caused by ET blocking and inter-frame gaps (IFG). Second, an enhanced model with judgment mechanism (JM-GCNFP) is developed, which dynamically monitors ET frame transmission within the guard window and delays preemption until the optimal moment. This adaptive strategy maximizes the use of available time slots while reducing unnecessary preemption and padding overhead. Simulation results show that both strategies achieve zero-offset scheduling and significantly improve system performance. Specifically, JM-GCNFP reduces the maximum ET frame delay by more than 40% and improves bandwidth utilization by nearly 20% under high-load scenarios, demonstrating the advantages in achieving enhanced network determinism and transmission efficiency.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103697"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145941385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KVC-Q: A high-fidelity and dynamic KV Cache quantization framework for long-context large language models KVC-Q:用于长上下文大型语言模型的高保真动态KV缓存量化框架
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-06 DOI: 10.1016/j.sysarc.2026.103699
Yusen Wu, Ruiqin Lin, Jiarong Que, Qixiang Zeng, Hongsen Zhang
Large Language Models (LLMs) are increasingly powerful but face a significant memory bottleneck during long-context autoregressive inference due to the growing KV Cache. Existing uniform quantization methods for the KV Cache often neglect its dynamic and heterogeneous nature, leading to substantial performance degradation at low bit-widths. In this paper, we introduce KVC-Q, a novel framework that implements a dynamic, fine-grained quantization strategy. KVC-Q is built upon three core mechanisms: (1) Recency Priority, which preserves high precision for recent tokens; (2) Importance Preservation, which dynamically identifies and retains crucial long-term tokens in high fidelity; and (3) Head-Aware allocation, which assigns precision based on the sensitivity of different attention heads. Our experiments on the LongBench benchmark show that KVC-Q can reduce KV Cache memory footprint by approximately 70% (effectively competing with 4x compression methods) while retaining over 94% of the baseline performance, enabling models to process over 4x longer contexts on a single consumer-grade GPU. This work presents an effective and practical solution to mitigate the memory constraints of LLMs in long-context applications.
大型语言模型(llm)越来越强大,但由于KV缓存的增长,在长上下文自回归推理中面临着显著的内存瓶颈。现有的千伏高速缓存统一量化方法往往忽略了千伏高速缓存的动态性和非均质性,导致其在低比特宽下的性能大幅下降。在本文中,我们介绍了KVC-Q,一个实现动态、细粒度量化策略的新框架。KVC-Q建立在三个核心机制之上:(1)最近优先级,它保持了最近令牌的高精度;(2)重要性保存,动态识别并高保真地保留关键的长期令牌;(3) Head-Aware分配,根据不同注意头的灵敏度分配精度。我们在LongBench基准测试上的实验表明,KVC-Q可以减少大约70%的KV缓存内存占用(有效地与4倍压缩方法竞争),同时保留超过94%的基线性能,使模型能够在单个消费级GPU上处理超过4倍的上下文。这项工作提出了一个有效和实用的解决方案,以减轻llm在长上下文应用中的内存限制。
{"title":"KVC-Q: A high-fidelity and dynamic KV Cache quantization framework for long-context large language models","authors":"Yusen Wu,&nbsp;Ruiqin Lin,&nbsp;Jiarong Que,&nbsp;Qixiang Zeng,&nbsp;Hongsen Zhang","doi":"10.1016/j.sysarc.2026.103699","DOIUrl":"10.1016/j.sysarc.2026.103699","url":null,"abstract":"<div><div>Large Language Models (LLMs) are increasingly powerful but face a significant memory bottleneck during long-context autoregressive inference due to the growing KV Cache. Existing uniform quantization methods for the KV Cache often neglect its dynamic and heterogeneous nature, leading to substantial performance degradation at low bit-widths. In this paper, we introduce KVC-Q, a novel framework that implements a dynamic, fine-grained quantization strategy. KVC-Q is built upon three core mechanisms: (1) Recency Priority, which preserves high precision for recent tokens; (2) Importance Preservation, which dynamically identifies and retains crucial long-term tokens in high fidelity; and (3) Head-Aware allocation, which assigns precision based on the sensitivity of different attention heads. Our experiments on the LongBench benchmark show that KVC-Q can reduce KV Cache memory footprint by approximately 70% (effectively competing with 4x compression methods) while retaining over 94% of the baseline performance, enabling models to process over 4x longer contexts on a single consumer-grade GPU. This work presents an effective and practical solution to mitigate the memory constraints of LLMs in long-context applications.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103699"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orchestrating optimization passes of machine learning compiler for reducing memory footprints of computation graphs 编排机器学习编译器的优化通道,以减少计算图的内存占用
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-06 DOI: 10.1016/j.sysarc.2026.103694
Qianwei Yu , Pengbo Nie , Zihan Wang , Chengcheng Wan , Ziyi Lin , He Jiang , Jianjun Zhao , Lei Qiao , Le Chen , Yuting Chen
With the emergence of the needs of edge computing, there arises a demand for training and inferring deep learning (DL) models on memory-constrained devices. However, many DL models, namely computation graphs, have complex structure and plenty of parameters, incurring heavy memory consumption at runtime. Hence it is challenging but necessary to reduce their memory footprints at runtime.
This paper proposes OPass, a novel approach to perform hierarchical memory-constrained operator scheduling of machine learning models, and orchestrate optimization passes of Apache’s TVM (a machine learning compilation framework) for lowering memory footprints of computation graphs, finally allowing the graphs to run on memory-constrained devices. Firstly, given a computation graph G, OPass optimizes the graph heuristically and iteratively: OPass learns the effects of passes on the graph; it then optimizes G iteratively — each iteration picks up a pass by the reduction of the memory footprint of G and as well the implicit effects of the pass for further optimizations, letting the pass be applied. The second core component of OPass is its memory computation technique, named OPassMem, which hierarchically schedules G’s operators. It constructs a hierarchical computation graph and employs an iterative scheduling algorithm to progressively reduce memory footprints.
We evaluate OPass on ReBench (a suite of computation graphs) and two real-world models (Transformer and ResNet). The results show the strength of OPass: it reduces up to 90.83% of graph’s memory footprints, outperforming TVM’s default by 2.34×. Specifically, pass orchestration and graph scheduling reduce memory footprints by up to 54.34% and 81%, respectively.
随着边缘计算需求的出现,出现了在内存受限设备上训练和推断深度学习(DL)模型的需求。然而,许多深度学习模型(即计算图)具有复杂的结构和大量的参数,在运行时产生大量的内存消耗。因此,在运行时减少它们的内存占用是一项挑战,但也是必要的。本文提出了OPass,一种执行机器学习模型的分层内存约束算子调度的新方法,并编排Apache的TVM(机器学习编译框架)的优化通道,以降低计算图的内存占用,最终允许图在内存受限的设备上运行。首先,给定一个计算图G, OPass对图进行启发式迭代优化:OPass学习通道对图的影响;然后,它迭代地优化G——每次迭代通过减少G的内存占用以及传递的隐式效果来获取传递,以便进一步优化,从而应用传递。OPass的第二个核心组件是它的内存计算技术,称为OPassMem,它分层地调度G的运算符。它构造了一个分层计算图,并采用迭代调度算法逐步减少内存占用。我们在ReBench(一套计算图)和两个现实世界的模型(Transformer和ResNet)上评估OPass。结果显示了OPass的优势:它减少了高达90.83%的图形内存占用,比TVM的默认值高出2.34倍。具体来说,通道编排和图形调度分别减少了54.34%和81%的内存占用。
{"title":"Orchestrating optimization passes of machine learning compiler for reducing memory footprints of computation graphs","authors":"Qianwei Yu ,&nbsp;Pengbo Nie ,&nbsp;Zihan Wang ,&nbsp;Chengcheng Wan ,&nbsp;Ziyi Lin ,&nbsp;He Jiang ,&nbsp;Jianjun Zhao ,&nbsp;Lei Qiao ,&nbsp;Le Chen ,&nbsp;Yuting Chen","doi":"10.1016/j.sysarc.2026.103694","DOIUrl":"10.1016/j.sysarc.2026.103694","url":null,"abstract":"<div><div>With the emergence of the needs of edge computing, there arises a demand for training and inferring deep learning (DL) models on memory-constrained devices. However, many DL models, namely computation graphs, have complex structure and plenty of parameters, incurring heavy memory consumption at runtime. Hence it is challenging but necessary to reduce their memory footprints at runtime.</div><div>This paper proposes <span>OPass</span>, a novel approach to perform hierarchical memory-constrained operator scheduling of machine learning models, and orchestrate optimization passes of Apache’s TVM (a machine learning compilation framework) for lowering memory footprints of computation graphs, finally allowing the graphs to run on memory-constrained devices. Firstly, given a computation graph <span><math><mi>G</mi></math></span>, <span>OPass</span> optimizes the graph heuristically and iteratively: <span>OPass</span> learns the effects of passes on the graph; it then optimizes <span><math><mi>G</mi></math></span> iteratively — each iteration picks up a pass by the reduction of the memory footprint of <span><math><mi>G</mi></math></span> and as well the implicit effects of the pass for further optimizations, letting the pass be applied. The second core component of <span>OPass</span> is its memory computation technique, named <span>OPass</span>Mem, which hierarchically schedules <span><math><mi>G</mi></math></span>’s operators. It constructs a hierarchical computation graph and employs an iterative scheduling algorithm to progressively reduce memory footprints.</div><div>We evaluate <span>OPass</span> on <span>ReBench</span> (a suite of computation graphs) and two real-world models (Transformer and ResNet). The results show the strength of <span>OPass</span>: it reduces up to 90.83% of graph’s memory footprints, outperforming TVM’s default by 2.34<span><math><mo>×</mo></math></span>. Specifically, pass orchestration and graph scheduling reduce memory footprints by up to 54.34% and 81%, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"173 ","pages":"Article 103694"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145941383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Traceable Customized and Privacy-Preserving Data Sharing for IoT-Enabled Smart Society 面向物联网智能社会的可追溯定制和隐私保护数据共享
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-06 DOI: 10.1016/j.sysarc.2025.103672
Fuyuan Song , Hongjun Ye , Yu Liu , Qin Jiang , Zheng Qin , Zhangjie Fu
IoT-enabled smart society paradigm is rapidly advancing, generating massive volumes of data that require efficient and secure cloud-based data sharing. While outsourcing IoT data alleviates the limitations of resource-constrained devices, it exposes users to critical security risks, including privacy leakage, unauthorized access, and secret key compromise. To address these challenges, we propose a Traceable Customized and Privacy-Preserving Data Sharing (TCPS) scheme for IoT-enabled smart society. TCPS enables selective data sharing through customized secret keys derived from user attributes and identity information. Specifically, we incorporate a key sanity check mechanism that allows a trusted authority to trace malicious users who leak secret keys, ensuring accountability. Furthermore, we employ an identity-based proxy re-encryption mechanism to enable flexible and secure data sharing without incurring extra computational burden. Security analysis confirms that TCPS achieves semantic security under the CPA model, and experimental results demonstrate its efficiency and practicality in real-world IoT–cloud environments.
基于物联网的智能社会模式正在迅速发展,产生大量数据,需要高效和安全的基于云的数据共享。虽然外包物联网数据缓解了资源受限设备的局限性,但它也给用户带来了严重的安全风险,包括隐私泄露、未经授权访问和密钥泄露。为了应对这些挑战,我们为支持物联网的智能社会提出了一种可追踪的定制和隐私保护数据共享(TCPS)方案。tcp通过从用户属性和身份信息派生的自定义密钥支持有选择的数据共享。具体来说,我们整合了一个密钥完整性检查机制,允许受信任的权威机构跟踪泄露密钥的恶意用户,从而确保问责制。此外,我们采用基于身份的代理重新加密机制,以实现灵活和安全的数据共享,而不会产生额外的计算负担。安全性分析证实了TCPS在CPA模型下实现了语义安全性,实验结果证明了其在实际物联网云环境中的有效性和实用性。
{"title":"Traceable Customized and Privacy-Preserving Data Sharing for IoT-Enabled Smart Society","authors":"Fuyuan Song ,&nbsp;Hongjun Ye ,&nbsp;Yu Liu ,&nbsp;Qin Jiang ,&nbsp;Zheng Qin ,&nbsp;Zhangjie Fu","doi":"10.1016/j.sysarc.2025.103672","DOIUrl":"10.1016/j.sysarc.2025.103672","url":null,"abstract":"<div><div>IoT-enabled smart society paradigm is rapidly advancing, generating massive volumes of data that require efficient and secure cloud-based data sharing. While outsourcing IoT data alleviates the limitations of resource-constrained devices, it exposes users to critical security risks, including privacy leakage, unauthorized access, and secret key compromise. To address these challenges, we propose a Traceable Customized and Privacy-Preserving Data Sharing (TCPS) scheme for IoT-enabled smart society. TCPS enables selective data sharing through customized secret keys derived from user attributes and identity information. Specifically, we incorporate a key sanity check mechanism that allows a trusted authority to trace malicious users who leak secret keys, ensuring accountability. Furthermore, we employ an identity-based proxy re-encryption mechanism to enable flexible and secure data sharing without incurring extra computational burden. Security analysis confirms that TCPS achieves semantic security under the CPA model, and experimental results demonstrate its efficiency and practicality in real-world IoT–cloud environments.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103672"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VM-PHRs: Efficient and verifiable multi-delegated PHRs search scheme for cloud–edge collaborative services vm - phr:针对云边缘协同服务的高效且可验证的多委托phr搜索方案
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-05 DOI: 10.1016/j.sysarc.2026.103689
Shiwen Zhang , Wenrui Zhu , Wei Liang , Arthur Sandor Voundi Koe , Neal N. Xiong
With the proliferation of smart healthcare services, many hospitals delegate PHRs processing to cloud-based resources. Despite its effectiveness for bounded search and selective record sharing over encrypted data, key-aggregate searchable encryption still suffers from significant drawbacks in current constructions. First, the existing trapdoor matching algorithms fail to achieve accurate matching and exhibit poor robustness against guessing attacks. Second, current works lack efficient mechanisms to enable fine-grained verification of search results. Third, there is currently no efficient mechanism to delegate user privileges. In this paper, we design an efficient and verifiable multi-delegated PHRs search scheme for cloud–edge collaborative services (VM-PHRs). To enable exact trapdoor matching and resist guessing attacks, we develop a new algorithm, EDAsearch. To achieve fine-grained verification of data integrity and correctness, we design a novel distributed protocol that operates over a network of edge servers. To accommodate real-world emergency scenarios, we develop a novel threshold mechanism that supports privilege delegation based on user attributes and hash commitments. Extensive security analysis and performance evaluation of VM-PHRs demonstrate that it is scalable, secure, and practical.
随着智能医疗保健服务的普及,许多医院将phrr处理委托给基于云的资源。尽管它在有界搜索和加密数据的选择性记录共享方面是有效的,但在当前的结构中,键聚合可搜索加密仍然存在明显的缺点。首先,现有的陷门匹配算法无法实现精确匹配,并且对猜测攻击的鲁棒性较差。其次,目前的工作缺乏有效的机制来实现对搜索结果的细粒度验证。第三,目前没有有效的机制来委派用户权限。本文针对云边缘协同服务(VM-PHRs),设计了一种高效、可验证的多委托PHRs搜索方案。为了实现精确的活板门匹配和抵抗猜测攻击,我们开发了一种新的算法,EDAsearch。为了实现对数据完整性和正确性的细粒度验证,我们设计了一个在边缘服务器网络上运行的新型分布式协议。为了适应现实世界的紧急情况,我们开发了一种新的阈值机制,该机制支持基于用户属性和哈希承诺的特权委托。VM-PHRs的广泛安全分析和性能评估表明,它具有可扩展性、安全性和实用性。
{"title":"VM-PHRs: Efficient and verifiable multi-delegated PHRs search scheme for cloud–edge collaborative services","authors":"Shiwen Zhang ,&nbsp;Wenrui Zhu ,&nbsp;Wei Liang ,&nbsp;Arthur Sandor Voundi Koe ,&nbsp;Neal N. Xiong","doi":"10.1016/j.sysarc.2026.103689","DOIUrl":"10.1016/j.sysarc.2026.103689","url":null,"abstract":"<div><div>With the proliferation of smart healthcare services, many hospitals delegate PHRs processing to cloud-based resources. Despite its effectiveness for bounded search and selective record sharing over encrypted data, key-aggregate searchable encryption still suffers from significant drawbacks in current constructions. First, the existing trapdoor matching algorithms fail to achieve accurate matching and exhibit poor robustness against guessing attacks. Second, current works lack efficient mechanisms to enable fine-grained verification of search results. Third, there is currently no efficient mechanism to delegate user privileges. In this paper, we design an efficient and verifiable multi-delegated PHRs search scheme for cloud–edge collaborative services (VM-PHRs). To enable exact trapdoor matching and resist guessing attacks, we develop a new algorithm, EDAsearch. To achieve fine-grained verification of data integrity and correctness, we design a novel distributed protocol that operates over a network of edge servers. To accommodate real-world emergency scenarios, we develop a novel threshold mechanism that supports privilege delegation based on user attributes and hash commitments. Extensive security analysis and performance evaluation of VM-PHRs demonstrate that it is scalable, secure, and practical.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103689"},"PeriodicalIF":4.1,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Systems Architecture
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1