首页 > 最新文献

IEEE Transactions on Computers最新文献

英文 中文
Information Sharing in Multi-Tenant Metaverse via Intent-Driven Multicasting 基于意图驱动多播的多租户元空间信息共享
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-29 DOI: 10.1109/TC.2025.3603720
Yu Qiu;Min Chen;Weifa Liang;Lejun Ai;Dusit Niyato
A multi-tenant metaverse enables multiple users in a common virtual world to interact with each other online. Information sharing will occur when interactions between a user and the environment are multicast to other users by an interactive metaverse (IM) service. However, ineffective information-sharing strategies intensify competitions among users for limited resources in networks, and fail to interpret optimization intent prompts conveyed in high-level natural languages, ultimately diminishing user immersion. In this paper, we explore reliable information sharing in a multi-tenant metaverse with time-varying resource capacities and costs, where IM services are unreliable and alter the volumes of data processed by them, while the service provider dynamically adjusts global intent to minimize multicast delays and costs. To this end, we first formulate the information sharing problem as a Markov decision process and show its NP-hardness. Then, we propose a learning-based system GTP, which combines the proximal policy optimization reinforcement learning with feature extraction networks, including graph attention network and gated recurrent unit, and a Transformer encoder for multi-feature comparison to process a sequence of incoming multicast requests without the knowledge of future arrival information. The GTP operates through three modules: a deployer that allocates primary and backup IM services across the network to minimize a weighted goal of server computation costs and communication distances between users and services, an intent extractor that dynamically infers provider intent conveyed in natural language, and a router that constructs on-demand multicast routing trees adhering to users, the provider, and network constraints. We finally conduct theoretical and empirical analysis on the proposed algorithms for the system. Experimental results show that the proposed algorithms are promising, and superior to their comparison baseline algorithms.
多租户虚拟世界允许公共虚拟世界中的多个用户在线交互。当用户和环境之间的交互通过交互式元空间(IM)服务多播给其他用户时,就会发生信息共享。然而,无效的信息共享策略加剧了用户之间对网络有限资源的竞争,并且无法解释用高级自然语言传达的优化意图提示,最终降低了用户的沉浸感。在本文中,我们探讨了具有时变资源容量和成本的多租户元环境中的可靠信息共享,其中IM服务是不可靠的,并且会改变它们处理的数据量,而服务提供商动态调整全局意图以最小化多播延迟和成本。为此,我们首先将信息共享问题表述为马尔可夫决策过程,并展示了其np -硬度。然后,我们提出了一种基于学习的系统GTP,该系统将近端策略优化强化学习与特征提取网络(包括图注意网络和门控循环单元)和用于多特征比较的Transformer编码器相结合,在不知道未来到达信息的情况下处理一系列传入的多播请求。GTP通过三个模块运行:一个在网络上分配主要和备份IM服务的部署器,以最小化服务器计算成本和用户与服务之间的通信距离的加权目标;一个意图提取器,动态地推断以自然语言传达的提供者意图;一个路由器,根据用户、提供者和网络约束构建按需多播路由树。最后对所提出的算法进行了理论和实证分析。实验结果表明,所提算法具有较好的应用前景,且优于现有的比较基线算法。
{"title":"Information Sharing in Multi-Tenant Metaverse via Intent-Driven Multicasting","authors":"Yu Qiu;Min Chen;Weifa Liang;Lejun Ai;Dusit Niyato","doi":"10.1109/TC.2025.3603720","DOIUrl":"https://doi.org/10.1109/TC.2025.3603720","url":null,"abstract":"A multi-tenant metaverse enables multiple users in a common virtual world to interact with each other online. Information sharing will occur when interactions between a user and the environment are multicast to other users by an interactive metaverse (IM) service. However, ineffective information-sharing strategies intensify competitions among users for limited resources in networks, and fail to interpret optimization intent prompts conveyed in high-level natural languages, ultimately diminishing user immersion. In this paper, we explore reliable information sharing in a multi-tenant metaverse with time-varying resource capacities and costs, where IM services are unreliable and alter the volumes of data processed by them, while the service provider dynamically adjusts global intent to minimize multicast delays and costs. To this end, we first formulate the information sharing problem as a Markov decision process and show its NP-hardness. Then, we propose a learning-based system GTP, which combines the proximal policy optimization reinforcement learning with feature extraction networks, including graph attention network and gated recurrent unit, and a Transformer encoder for multi-feature comparison to process a sequence of incoming multicast requests without the knowledge of future arrival information. The GTP operates through three modules: a deployer that allocates primary and backup IM services across the network to minimize a weighted goal of server computation costs and communication distances between users and services, an intent extractor that dynamically infers provider intent conveyed in natural language, and a router that constructs on-demand multicast routing trees adhering to users, the provider, and network constraints. We finally conduct theoretical and empirical analysis on the proposed algorithms for the system. Experimental results show that the proposed algorithms are promising, and superior to their comparison baseline algorithms.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3763-3777"},"PeriodicalIF":3.8,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Computation-Quantized Training Framework to Generate Accuracy Lossless QNNs for One-Shot Deployment in Embedded Systems 用于嵌入式系统一次性部署的精度无损qnn的计算量化训练框架
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-29 DOI: 10.1109/TC.2025.3603732
Xingzhi Zhou;Wei Jiang;Jinyu Zhan;Lingxin Jin;Lin Zuo
Quantized Neural Networks (QNNs) have received increasing attention, since they can enrich intelligent applications deployed on embedded devices with limited resources, such as mobile devices and AIoT systems. Unfortunately, the numerical and computational discrepancies between training systems (i.e., servers) and deployment systems (e.g., embedded ends) may lead to large accuracy drop for QNNs in real deployments. We propose a Computation-Quantized Training Framework (CQTF), which simulates deployment-time fixed-point computation during training to enable one-shot, lossless deployment. The training procedure of CQTF is built upon a well-formulated quantization-specific numerical representation that quantifies both numerical and computational discrepancies between training and deployment. Leveraging this representation, forward propagation executes all computations in quantization mode to simulate deployment-time inference, while backward propagation identifies and mitigates gradient vanishing through an efficient floating-point gradient update scheme. Benchmark-based experiments demonstrate the efficiency of our approach, which can achieve no accuracy loss from training to deployment. Compared with existing five frameworks, the deployed accuracy of CQTF can be improved by up to 18.41%.
量化神经网络(QNNs)可以丰富部署在资源有限的嵌入式设备(如移动设备和AIoT系统)上的智能应用程序,因此受到越来越多的关注。不幸的是,训练系统(即服务器)和部署系统(例如嵌入式终端)之间的数值和计算差异可能导致实际部署中qnn的精度大幅下降。我们提出了一个计算量化训练框架(CQTF),它在训练期间模拟部署时间的定点计算,以实现一次性无损部署。CQTF的训练过程建立在一个良好的量化特定数值表示的基础上,该表示量化了训练和部署之间的数值和计算差异。利用这种表示,前向传播以量化模式执行所有计算以模拟部署时推断,而后向传播通过有效的浮点梯度更新方案识别和减轻梯度消失。基于基准的实验证明了该方法的有效性,该方法从训练到部署都没有精度损失。与现有的5个框架相比,CQTF的部署精度提高了18.41%。
{"title":"A Computation-Quantized Training Framework to Generate Accuracy Lossless QNNs for One-Shot Deployment in Embedded Systems","authors":"Xingzhi Zhou;Wei Jiang;Jinyu Zhan;Lingxin Jin;Lin Zuo","doi":"10.1109/TC.2025.3603732","DOIUrl":"https://doi.org/10.1109/TC.2025.3603732","url":null,"abstract":"Quantized Neural Networks (QNNs) have received increasing attention, since they can enrich intelligent applications deployed on embedded devices with limited resources, such as mobile devices and AIoT systems. Unfortunately, the numerical and computational discrepancies between training systems (i.e., servers) and deployment systems (e.g., embedded ends) may lead to large accuracy drop for QNNs in real deployments. We propose a Computation-Quantized Training Framework (CQTF), which simulates deployment-time fixed-point computation during training to enable one-shot, lossless deployment. The training procedure of CQTF is built upon a well-formulated quantization-specific numerical representation that quantifies both numerical and computational discrepancies between training and deployment. Leveraging this representation, forward propagation executes all computations in quantization mode to simulate deployment-time inference, while backward propagation identifies and mitigates gradient vanishing through an efficient floating-point gradient update scheme. Benchmark-based experiments demonstrate the efficiency of our approach, which can achieve no accuracy loss from training to deployment. Compared with existing five frameworks, the deployed accuracy of CQTF can be improved by up to 18.41%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3818-3831"},"PeriodicalIF":3.8,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Bin Packing With Heterogeneous Dependent Bins for Regionless in Geo-Distributed Clouds 基于异构依赖箱的地理分布云无区域动态装箱
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-25 DOI: 10.1109/TC.2025.3602297
Yinuo Li;Jin-Kao Hao;Liwei Song
Cloud service providers use geo-distributed datacenters to provide resources and services to clients located in different regions. However, uneven population density leads to unbalanced development of geo-distributed datacenters and cloud service providers face a shortage of land resources to further develop datacenters in densely populated regions. Thus, it is a real challenge for cloud service providers to meet the increasing demand from clients in affluent regions with saturated resources and to better utilize underutilized data centers in other regions. To address this challenge, we study an online resource allocation problem in geo-distributed clouds, whose goal is to assign each user request upon arrival to an appropriate geographic cloud region to minimize the resulting peak utilization of resource pools with different cost coefficients. To this end, we formulate the problem as a dynamic bin packing problem with heterogeneous dependent bins where user requests correspond to items to be packed and heterogeneous cloud resources are bins. To solve this online problem with high uncertainty, we propose a simulation based memetic algorithm to generate robust offline proactive policies based on historical data, which enable fast decision making for online packing. Our experiments based on realistic data show that the proposed approach leads to a reduction in total costs of up to 15% compared to the current practice, while being much faster for decision making compared to a popular online method.
云服务提供商使用地理分布式数据中心为位于不同区域的客户提供资源和服务。然而,人口密度不均导致地理分布式数据中心发展不平衡,云服务提供商在人口密集地区进一步发展数据中心面临土地资源短缺的问题。因此,云服务提供商面临的真正挑战是满足资源饱和的富裕地区客户日益增长的需求,并更好地利用其他地区未充分利用的数据中心。为了解决这一挑战,我们研究了地理分布式云中的在线资源分配问题,其目标是在到达时将每个用户请求分配到适当的地理云区域,以最小化不同成本系数的资源池的峰值利用率。为此,我们将该问题表述为具有异构依赖箱的动态装箱问题,其中用户请求对应于要打包的项目,异构云资源是箱。为了解决这一具有高不确定性的在线问题,我们提出了一种基于仿真的模因算法来生成基于历史数据的鲁棒离线主动策略,从而实现在线打包的快速决策。我们基于实际数据的实验表明,与目前的做法相比,所提出的方法可将总成本降低15%,同时与流行的在线方法相比,决策速度要快得多。
{"title":"Dynamic Bin Packing With Heterogeneous Dependent Bins for Regionless in Geo-Distributed Clouds","authors":"Yinuo Li;Jin-Kao Hao;Liwei Song","doi":"10.1109/TC.2025.3602297","DOIUrl":"https://doi.org/10.1109/TC.2025.3602297","url":null,"abstract":"Cloud service providers use geo-distributed datacenters to provide resources and services to clients located in different regions. However, uneven population density leads to unbalanced development of geo-distributed datacenters and cloud service providers face a shortage of land resources to further develop datacenters in densely populated regions. Thus, it is a real challenge for cloud service providers to meet the increasing demand from clients in affluent regions with saturated resources and to better utilize underutilized data centers in other regions. To address this challenge, we study an online resource allocation problem in geo-distributed clouds, whose goal is to assign each user request upon arrival to an appropriate geographic cloud region to minimize the resulting peak utilization of resource pools with different cost coefficients. To this end, we formulate the problem as a dynamic bin packing problem with heterogeneous dependent bins where user requests correspond to items to be packed and heterogeneous cloud resources are bins. To solve this online problem with high uncertainty, we propose a simulation based memetic algorithm to generate robust offline proactive policies based on historical data, which enable fast decision making for online packing. Our experiments based on realistic data show that the proposed approach leads to a reduction in total costs of up to 15% compared to the current practice, while being much faster for decision making compared to a popular online method.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3596-3608"},"PeriodicalIF":3.8,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Indirect Methodology Based on Execution Traces for Grading Functional Test Programs 一种基于执行轨迹的功能测试程序分级的间接方法
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-19 DOI: 10.1109/TC.2025.3600005
Francesco Angione;Paolo Bernardi;Andrea Calabrese;Lorenzo Cardone;Stefano Quer;Claudia Bertani;Vincenzo Tancorre
Developing functional test programs for hardware testing is time-consuming and experience-wise. A functional test program’s quality is usually assessed only through expensive fault simulation campaigns during early development. This paper presents indirect quality measurements of fault detection capabilities of functional test programs to reduce the total cost of fault simulation in the early development stages. We present a methodology that analyzes the instruction trace generated by running functional test programs on-chip and building its control and dataflow graph. We use the graph to identify potential flaws that affect the program’s fault detection capabilities. We present different graph-based techniques to measure the programs’ quality indirectly. By exploiting standard debugging formats, we individuate instructions in the source code that affect the graph-based measurements. We perform experiments on an automotive device manufactured by STMicroelectronics, running functional test programs of different natures. Our results show that our metric allows test engineers to develop better functional test programs without basing their development solely on fault simulation campaigns.
为硬件测试开发功能测试程序既耗时又需要经验。功能测试程序的质量通常只能通过早期开发中昂贵的故障模拟活动来评估。本文提出了功能测试程序的故障检测能力的间接质量测量,以减少早期开发阶段故障模拟的总成本。我们提出了一种分析芯片上运行功能测试程序所产生的指令轨迹并建立其控制和数据流图的方法。我们使用图来识别影响程序故障检测能力的潜在缺陷。我们提出了不同的基于图表的技术来间接衡量节目的质量。通过利用标准的调试格式,我们可以对源代码中影响基于图的测量的指令进行个性化处理。我们在意法半导体制造的汽车装置上进行实验,运行不同性质的功能测试程序。我们的结果表明,我们的度量允许测试工程师开发更好的功能测试程序,而不需要将他们的开发仅仅建立在故障模拟活动上。
{"title":"A Novel Indirect Methodology Based on Execution Traces for Grading Functional Test Programs","authors":"Francesco Angione;Paolo Bernardi;Andrea Calabrese;Lorenzo Cardone;Stefano Quer;Claudia Bertani;Vincenzo Tancorre","doi":"10.1109/TC.2025.3600005","DOIUrl":"https://doi.org/10.1109/TC.2025.3600005","url":null,"abstract":"Developing functional test programs for hardware testing is time-consuming and experience-wise. A functional test program’s quality is usually assessed only through expensive fault simulation campaigns during early development. This paper presents indirect quality measurements of fault detection capabilities of functional test programs to reduce the total cost of fault simulation in the early development stages. We present a methodology that analyzes the instruction trace generated by running functional test programs on-chip and building its control and dataflow graph. We use the graph to identify potential flaws that affect the program’s fault detection capabilities. We present different graph-based techniques to measure the programs’ quality indirectly. By exploiting standard debugging formats, we individuate instructions in the source code that affect the graph-based measurements. We perform experiments on an automotive device manufactured by STMicroelectronics, running functional test programs of different natures. Our results show that our metric allows test engineers to develop better functional test programs without basing their development solely on fault simulation campaigns.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3582-3595"},"PeriodicalIF":3.8,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Efficient Delay-Bounded Dependent Task Offloading With Service Caching at Edges 具有边缘服务缓存的低成本延迟边界依赖任务卸载
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-18 DOI: 10.1109/TC.2025.3598749
Yu Liang;Sheng Zhang;Jie Wu
We are now embracing an era of edge computing and artificial intelligence, and the combination of the two has spawned a new field of research called edge intelligence. Massive amounts of data is generated at the edge of network, which relies on artificial intelligence to realize its potential. Meanwhile, artificial intelligence is able to flourish when processing diverse edge data. However, the computation and storage resources of edge servers are not unlimited. For some large-scale intelligent applications, it is difficult to meet their service quality requirements by directly offloading the entire application to a nearby server for processing. Due to the heterogeneity of server resources in edge environments, how to balance the workload among edge servers to provide better services also becomes complicated. The goal of this paper is to minimize the total cost of offloading large-scale applications consisting of many dependent tasks in an edge system. We formulate the Dependent task Offloading with Service Caching (DOSC) problem, which is proved to be NP-hard. A dynamic planning-based algorithm is introduced to solve fixed-DOSC, in which some services are pre-configured on the edge server, and other services can not be downloaded from the remote cloud. We also present a theoretical analysis on the performance guarantee of the dynamic planning-based algorithm. Then, we propose a near-optimal algorithm using the Gibbs sampling to solve the general DOSC problem. Testbed experiments and trace-driven simulations are conducted to verify the performance of our algorithm. Our algorithm, shown to be the most effective in terms of cost, considers both service caching and task dependencies when task offloading in comparison to other baseline algorithms.
我们现在正在迎来一个边缘计算和人工智能的时代,两者的结合催生了一个新的研究领域——边缘智能。在网络的边缘产生大量的数据,这需要人工智能来实现其潜力。同时,人工智能在处理各种边缘数据时能够蓬勃发展。但是,边缘服务器的计算和存储资源并不是无限的。对于一些大规模的智能应用程序,直接将整个应用程序卸载到附近的服务器进行处理很难满足其服务质量要求。由于边缘环境中服务器资源的异构性,如何在边缘服务器之间平衡工作负载以提供更好的服务也变得复杂。本文的目标是最小化在边缘系统中卸载由许多依赖任务组成的大规模应用程序的总成本。提出了基于服务缓存的依赖任务卸载问题,并证明了该问题具有np困难。提出了一种基于动态规划的固定dosc算法,解决了固定dosc中部分业务在边缘服务器上预先配置,而其他业务无法从远程云下载的问题。对基于动态规划的算法的性能保证进行了理论分析。然后,我们提出了一种近似最优的Gibbs抽样算法来解决一般DOSC问题。通过试验台实验和跟踪驱动仿真验证了算法的性能。与其他基准算法相比,我们的算法在任务卸载时考虑了服务缓存和任务依赖关系,在成本方面被证明是最有效的。
{"title":"Cost-Efficient Delay-Bounded Dependent Task Offloading With Service Caching at Edges","authors":"Yu Liang;Sheng Zhang;Jie Wu","doi":"10.1109/TC.2025.3598749","DOIUrl":"https://doi.org/10.1109/TC.2025.3598749","url":null,"abstract":"We are now embracing an era of edge computing and artificial intelligence, and the combination of the two has spawned a new field of research called edge intelligence. Massive amounts of data is generated at the edge of network, which relies on artificial intelligence to realize its potential. Meanwhile, artificial intelligence is able to flourish when processing diverse edge data. However, the computation and storage resources of edge servers are not unlimited. For some large-scale intelligent applications, it is difficult to meet their service quality requirements by directly offloading the entire application to a nearby server for processing. Due to the heterogeneity of server resources in edge environments, how to balance the workload among edge servers to provide better services also becomes complicated. The goal of this paper is to minimize the total cost of offloading large-scale applications consisting of many dependent tasks in an edge system. We formulate the Dependent task Offloading with Service Caching (DOSC) problem, which is proved to be NP-hard. A dynamic planning-based algorithm is introduced to solve fixed-DOSC, in which some services are pre-configured on the edge server, and other services can not be downloaded from the remote cloud. We also present a theoretical analysis on the performance guarantee of the dynamic planning-based algorithm. Then, we propose a near-optimal algorithm using the Gibbs sampling to solve the general DOSC problem. Testbed experiments and trace-driven simulations are conducted to verify the performance of our algorithm. Our algorithm, shown to be the most effective in terms of cost, considers both service caching and task dependencies when task offloading in comparison to other baseline algorithms.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3568-3581"},"PeriodicalIF":3.8,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TeeRollup: Efficient Rollup Design Using Heterogeneous TEE teerrollup:使用异构TEE的高效Rollup设计
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-07 DOI: 10.1109/TC.2025.3596698
Xiaoqing Wen;Quanbi Feng;Hanzheng Lyu;Jianyu Niu;Yinqian Zhang;Chen Feng
Rollups have emerged as a promising approach to improving blockchains’ scalability by offloading transaction execution off-chain. Existing rollup solutions either leverage complex zero-knowledge proofs or optimistically assume execution correctness unless challenged. However, these solutions suffer from high gas costs and significant withdrawal delays, hindering their adoption in decentralized applications. This paper introduces TeeRollup, an efficient rollup protocol that leverages Trusted Execution Environments (TEEs) to achieve both low gas costs and short withdrawal delays. Sequencers (i.e., system participants) execute transactions within TEEs and upload signed execution results to the blockchain with confidential keys of TEEs. Unlike most TEE-assisted blockchain designs, TeeRollup adopts a practical threat model where the integrity and availability of TEEs may be compromised. To address these issues, we first introduce a distributed system of sequencers with heterogeneous TEEs, ensuring system security even if a certain proportion of TEEs are compromised. Second, we propose a challenge mechanism to solve the redeemability issue caused by TEE unavailability. Furthermore, TeeRollup incorporates Data Availability Providers (DAPs) to reduce on-chain storage overhead and uses a laziness penalty mechanism to regulate DAP behavior. We implement a prototype of TeeRollup in Golang, using the Ethereum test network, Sepolia. Our experimental results indicate that TeeRollup outperforms zero-knowledge rollups (ZK-rollups), reducing on-chain verification costs by approximately 86% and withdrawal delays to a few minutes.
rollup已经成为一种很有前途的方法,通过在链下卸载交易执行来提高区块链的可扩展性。现有的rollup解决方案要么利用复杂的零知识证明,要么乐观地假设执行正确性,除非受到挑战。然而,这些解决方案存在高昂的天然气成本和严重的退出延迟,阻碍了它们在去中心化应用中的应用。本文介绍了TeeRollup,这是一种高效的rollup协议,它利用可信执行环境(tee)来实现低gas成本和短退出延迟。序列者(即系统参与者)在tee内执行事务,并将签名的执行结果上传到区块链,并使用tee的机密密钥。与大多数tee辅助区块链设计不同,TeeRollup采用了一种实用的威胁模型,其中tee的完整性和可用性可能会受到损害。为了解决这些问题,我们首先引入了一个具有异构tee的分布式测序系统,即使一定比例的tee受到损害,也可以确保系统的安全性。其次,我们提出了一种挑战机制来解决TEE不可用带来的可赎回性问题。此外,TeeRollup还集成了数据可用性提供者(Data Availability Providers, DAP)来减少链上存储开销,并使用惰性惩罚机制来规范DAP行为。我们使用以太坊测试网络Sepolia在Golang中实现了TeeRollup的原型。我们的实验结果表明,TeeRollup优于零知识rollup (zk -rollup),将链上验证成本降低了约86%,并将提取延迟降低到几分钟。
{"title":"TeeRollup: Efficient Rollup Design Using Heterogeneous TEE","authors":"Xiaoqing Wen;Quanbi Feng;Hanzheng Lyu;Jianyu Niu;Yinqian Zhang;Chen Feng","doi":"10.1109/TC.2025.3596698","DOIUrl":"https://doi.org/10.1109/TC.2025.3596698","url":null,"abstract":"Rollups have emerged as a promising approach to improving blockchains’ scalability by offloading transaction execution off-chain. Existing rollup solutions either leverage complex zero-knowledge proofs or optimistically assume execution correctness unless challenged. However, these solutions suffer from high gas costs and significant withdrawal delays, hindering their adoption in decentralized applications. This paper introduces <sc>TeeRollup</small>, an efficient rollup protocol that leverages Trusted Execution Environments (TEEs) to achieve both low gas costs and short withdrawal delays. Sequencers (<italic>i.e.</i>, system participants) execute transactions within TEEs and upload signed execution results to the blockchain with confidential keys of TEEs. Unlike most TEE-assisted blockchain designs, <sc>TeeRollup</small> adopts a practical threat model where the integrity and availability of TEEs may be compromised. To address these issues, we first introduce a distributed system of sequencers with heterogeneous TEEs, ensuring system security even if a certain proportion of TEEs are compromised. Second, we propose a challenge mechanism to solve the redeemability issue caused by TEE unavailability. Furthermore, <sc>TeeRollup</small> incorporates Data Availability Providers (DAPs) to reduce on-chain storage overhead and uses a laziness penalty mechanism to regulate DAP behavior. We implement a prototype of <sc>TeeRollup</small> in Golang, using the Ethereum test network, Sepolia. Our experimental results indicate that <sc>TeeRollup</small> outperforms zero-knowledge rollups (ZK-rollups), reducing on-chain verification costs by approximately 86% and withdrawal delays to a few minutes.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3546-3558"},"PeriodicalIF":3.8,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelization Strategies for DeepMD-Kit Using OpenMP: Enhancing Efficiency in Machine Learning-Based Molecular Simulations 使用OpenMP的DeepMD-Kit并行化策略:提高基于机器学习的分子模拟效率
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-04 DOI: 10.1109/TC.2025.3595078
Qi Du;Feng Wang;Chengkun Wu
DeepMD-kit enables deep learning-based molecular dynamics (MD) simulations that require efficient parallelization to leverage modern HPC architectures. In this work, we optimize DeepMD-kit using advanced OpenMP strategies to improve scalability and computational efficiency on an ARMv8 processor-based server. Our optimizations include data parallelism for neural network inference, force calculation acceleration, NUMA-aware memory management, and synchronization reductions, leading to up to $4.1boldsymbol{times}$ speedup and 82% higher memory bandwidth efficiency compared to the baseline implementation. Strong scaling analysis demonstrates superlinear speedup at mid-range core counts, with improved workload balancing and vectorized computations. However, challenges remain at ultra-large scales due to increasing synchronization overhead.
DeepMD-kit支持基于深度学习的分子动力学(MD)模拟,这些模拟需要高效的并行化来利用现代HPC架构。在这项工作中,我们使用先进的OpenMP策略优化DeepMD-kit,以提高基于ARMv8处理器的服务器上的可扩展性和计算效率。我们的优化包括神经网络推理的数据并行性、力计算加速、numa感知内存管理和同步减少,与基线实现相比,加速高达4.1美元,内存带宽效率提高82%。强大的缩放分析证明了在中等核心数量下的超线性加速,改进了工作负载平衡和矢量化计算。然而,由于同步开销的增加,在超大规模上仍然存在挑战。
{"title":"Parallelization Strategies for DeepMD-Kit Using OpenMP: Enhancing Efficiency in Machine Learning-Based Molecular Simulations","authors":"Qi Du;Feng Wang;Chengkun Wu","doi":"10.1109/TC.2025.3595078","DOIUrl":"https://doi.org/10.1109/TC.2025.3595078","url":null,"abstract":"DeepMD-kit enables deep learning-based molecular dynamics (MD) simulations that require efficient parallelization to leverage modern HPC architectures. In this work, we optimize DeepMD-kit using advanced OpenMP strategies to improve scalability and computational efficiency on an ARMv8 processor-based server. Our optimizations include data parallelism for neural network inference, force calculation acceleration, NUMA-aware memory management, and synchronization reductions, leading to up to <inline-formula><tex-math>$4.1boldsymbol{times}$</tex-math></inline-formula> speedup and 82% higher memory bandwidth efficiency compared to the baseline implementation. Strong scaling analysis demonstrates superlinear speedup at mid-range core counts, with improved workload balancing and vectorized computations. However, challenges remain at ultra-large scales due to increasing synchronization overhead.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3534-3545"},"PeriodicalIF":3.8,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EC2P: Cost-Effective Cross-Chain Payments via Hubs Resisting the Abort Attack EC2P:通过hub抵抗Abort攻击的高性价比跨链支付
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-22 DOI: 10.1109/TC.2025.3590960
Danlei Xiao;Shaobo Xu;Chuan Zhang;Licheng Wang;Xiulong Liu;Liehuang Zhu
Cross-chain technology facilitates the interoperability among isolated blockchains, where users can transfer and exchange coins. While the heterogeneity between Turing-complete (TC) blockchains like Ethereum and non-Turing-complete (NTC) blockchains like Bitcoin presents a significant challenge for cross-chain transactions. Payment Channel Hubs (PCHs) offer a promising solution for enabling TC-NTC cross-chain payments with high throughput and low confirmation delays. However, existing schemes still face two key challenges: (i) significant computation and communication overhead for variable-amount payment, and (ii) limited unlinkability, i.e., vulnerable to the abort attack. This paper proposes EC2P, the first TC-NTC cross-chain PCH that achieves variable-amount payment unlinkability while resisting the abort attack and minimizing reliance on non-interactive zero-knowledge (NIZK) proofs. EC2P introduces two protocols: the NTC-to-TC and TC-to-NTC payment protocols. The NTC-to-TC payment protocol replaces the traditional puzzle-promise and puzzle-solve paradigm with a semi-blind approach, where only one side is blinded and the blinded side’s interactions are eliminated. This achieves unlinkability and resists the abort attack without NIZK. The TC-to-NTC payment protocol enhances the paradigm by utilizing Turing-complete functionality to constrain the inability to carry out an abort attack. Through rigorous security analysis, we show that EC2P is secure and variable-amount payment unlinkable while resisting the abort attack. We implement EC2P on Ethereum and Bitcoin test networks. Our evaluation demonstrates that EC2P outperforms both in terms of communication and computation overhead and reduces communication costs by 3 orders of magnitude compared to existing variable-amount methods.
跨链技术促进了孤立区块链之间的互操作性,用户可以在其中转移和交换硬币。虽然像以太坊这样的图灵完备(TC)区块链和像比特币这样的非图灵完备(NTC)区块链之间的异质性对跨链交易提出了重大挑战。支付通道中心(PCHs)为实现TC-NTC跨链支付提供了一个有希望的解决方案,具有高吞吐量和低确认延迟。然而,现有方案仍然面临两个关键挑战:(i)可变金额支付的巨大计算和通信开销,以及(ii)有限的不可链接性,即容易受到中止攻击。EC2P是第一个实现可变金额支付不可链接性的TC-NTC跨链PCH,同时抵抗中断攻击并最大限度地减少对非交互式零知识(NIZK)证明的依赖。EC2P引入了两种协议:NTC-to-TC和TC-to-NTC支付协议。NTC-to-TC支付协议用半盲方法取代了传统的谜题-承诺和谜题-解决模式,其中只有一方被蒙蔽,而被蒙蔽的一方的互动被消除。在没有NIZK的情况下,这实现了不可链接性和抵抗中止攻击。tc到ntc支付协议通过利用图灵完备功能来约束无法执行中止攻击,从而增强了范式。通过严格的安全性分析,我们证明了EC2P在抵御中止攻击的同时具有安全性和可变金额支付不可链接性。我们在以太坊和比特币测试网络上实现了EC2P。我们的评估表明,EC2P在通信和计算开销方面都优于现有的可变量方法,并将通信成本降低了3个数量级。
{"title":"EC2P: Cost-Effective Cross-Chain Payments via Hubs Resisting the Abort Attack","authors":"Danlei Xiao;Shaobo Xu;Chuan Zhang;Licheng Wang;Xiulong Liu;Liehuang Zhu","doi":"10.1109/TC.2025.3590960","DOIUrl":"https://doi.org/10.1109/TC.2025.3590960","url":null,"abstract":"Cross-chain technology facilitates the interoperability among isolated blockchains, where users can transfer and exchange coins. While the heterogeneity between Turing-complete (TC) blockchains like Ethereum and non-Turing-complete (NTC) blockchains like Bitcoin presents a significant challenge for cross-chain transactions. Payment Channel Hubs (PCHs) offer a promising solution for enabling TC-NTC cross-chain payments with high throughput and low confirmation delays. However, existing schemes still face two key challenges: (i) significant computation and communication overhead for variable-amount payment, and (ii) limited unlinkability, i.e., vulnerable to the abort attack. This paper proposes EC2P, the first TC-NTC cross-chain PCH that achieves variable-amount payment unlinkability while resisting the abort attack and minimizing reliance on non-interactive zero-knowledge (NIZK) proofs. EC2P introduces two protocols: the NTC-to-TC and TC-to-NTC payment protocols. The NTC-to-TC payment protocol replaces the traditional puzzle-promise and puzzle-solve paradigm with a semi-blind approach, where only one side is blinded and the blinded side’s interactions are eliminated. This achieves unlinkability and resists the abort attack without NIZK. The TC-to-NTC payment protocol enhances the paradigm by utilizing Turing-complete functionality to constrain the inability to carry out an abort attack. Through rigorous security analysis, we show that EC2P is secure and variable-amount payment unlinkable while resisting the abort attack. We implement EC2P on Ethereum and Bitcoin test networks. Our evaluation demonstrates that EC2P outperforms both in terms of communication and computation overhead and reduces communication costs by 3 orders of magnitude compared to existing variable-amount methods.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3504-3518"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRECIOUS: Approximate Real-Time Computing in MLC-MRAM Based Heterogeneous CMPs 基于MLC-MRAM的异构cmp的近似实时计算
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-21 DOI: 10.1109/TC.2025.3590809
Sangeet Saha;Shounak Chakraborty;Sukarn Agarwal;Magnus Själander;Klaus McDonald-Maier
Enhancing quality of service (QoS) in approximate-computing (AC) based real-time systems, without violating power limits is becoming increasingly challenging due to contradictory constraints, i.e., power consumption and time criticality, as multicore computing platforms are becoming heterogeneous. To fulfill these constraints and optimise system QoS, AC tasks should be judiciously mapped on such platforms. However, prior approaches rarely considered the problem of AC task deployment on heterogeneous platforms. Moreover, the majority of prior approaches typically neglect the runtime architectural phenomena, which can be accounted for along with the approximation tolerance of the applications to enhance the QoS. We present PRECIOUS, a novel hybrid offline-online approach that first schedules AC real-time tasks on a heterogeneous multicore with an objective to maximise QoS and determines the appropriate cluster for each task constrained by a system-wide power limit, deadline, and task-dependency. At runtime, PRECIOUS introduces novel architectural techniques for the AC tasks, where tasks are executed on a heterogeneous platform equipped with multilevel-cell (MLC)-MRAM based last-level cache to improve energy efficiency and performance by prudentially leveraging storage density of MLC-MRAM while ameliorating associated high write latency and write energy. Our novel block management for the MLC-MRAM cache further improves performance of the system, which we exploit opportunistically to enhance system QoS, and turn off processor cores during the dynamically generated slacks. PRECIOUS-Offline achieves up to 76% QoS for a specific task-set, surpassing prior art, whereas PRECIOUS-Online enhances QoS by 9.0% by reducing cache miss-rate by 19% on a 64-core heterogeneous system without incurring any energy overhead over a conventional MRAM based cache design.
随着多核计算平台的异构化,在不违反功率限制的情况下,提高基于近似计算(AC)的实时系统的服务质量(QoS)正变得越来越具有挑战性,因为功耗和时间临界性等相互矛盾的约束。为了满足这些约束并优化系统QoS,应该明智地将AC任务映射到这样的平台上。然而,以前的方法很少考虑异构平台上的AC任务部署问题。此外,大多数先前的方法通常忽略了运行时体系结构现象,这可以与应用程序的近似容忍度一起考虑,以增强QoS。我们提出了一种新颖的离线-在线混合方法PRECIOUS,该方法首先在异构多核上调度AC实时任务,目标是最大限度地提高QoS,并为受系统范围功率限制、截止日期和任务依赖性约束的每个任务确定适当的集群。在运行时,PRECIOUS为AC任务引入了新颖的架构技术,其中任务在异构平台上执行,该平台配备了基于多级单元(MLC)-MRAM的最后一级缓存,通过谨慎地利用MLC-MRAM的存储密度来提高能效和性能,同时改善相关的高写延迟和写能量。我们对MLC-MRAM缓存的新颖块管理进一步提高了系统的性能,我们利用它来增强系统的QoS,并在动态生成的空闲期间关闭处理器内核。对于特定的任务集,PRECIOUS-Offline实现了高达76%的QoS,超过了现有技术,而PRECIOUS-Online通过在64核异构系统上减少19%的缓存失误率,在不产生任何能量开销的情况下,将QoS提高了9.0%。
{"title":"PRECIOUS: Approximate Real-Time Computing in MLC-MRAM Based Heterogeneous CMPs","authors":"Sangeet Saha;Shounak Chakraborty;Sukarn Agarwal;Magnus Själander;Klaus McDonald-Maier","doi":"10.1109/TC.2025.3590809","DOIUrl":"https://doi.org/10.1109/TC.2025.3590809","url":null,"abstract":"Enhancing quality of service (QoS) in approximate-computing (AC) based real-time systems, without violating power limits is becoming increasingly challenging due to contradictory constraints, i.e., power consumption and time criticality, as multicore computing platforms are becoming heterogeneous. To fulfill these constraints and optimise system QoS, AC tasks should be judiciously mapped on such platforms. However, prior approaches rarely considered the problem of AC task deployment on heterogeneous platforms. Moreover, the majority of prior approaches typically neglect the runtime architectural phenomena, which can be accounted for along with the approximation tolerance of the applications to enhance the QoS. We present <italic>PRECIOUS</i>, a novel hybrid offline-online approach that first <italic>schedules AC real-time</i> tasks on a <italic>heterogeneous multicore</i> with an objective to maximise QoS and determines the appropriate cluster for each task constrained by a system-wide power limit, deadline, and task-dependency. At runtime, <italic>PRECIOUS</i> introduces novel architectural techniques for the AC tasks, where tasks are executed on a heterogeneous platform equipped with <italic>multilevel-cell (MLC)-MRAM</i> based last-level cache to improve energy efficiency and performance by prudentially leveraging storage density of MLC-MRAM while ameliorating associated high write latency and write energy. Our novel block management for the MLC-MRAM cache further improves performance of the system, which we exploit opportunistically to enhance system QoS, and turn off processor cores during the dynamically generated slacks. <italic>PRECIOUS-Offline</i> achieves up to 76% QoS for a specific task-set, surpassing prior art, whereas <italic>PRECIOUS-Online</i> enhances QoS by 9.0% by reducing cache miss-rate by 19% on a 64-core heterogeneous system without incurring any energy overhead over a conventional MRAM based cache design.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3476-3489"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Radix/Mixed-Radix NTT Multiplication Algorithm/Architecture Co-Design Over Fermat Modulus 基于费马模的高基数/混合基数NTT乘法算法/架构协同设计
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-21 DOI: 10.1109/TC.2025.3590972
Yile Xing;Guangyan Li;Zewen Ye;Ryan W. L. Luk;Donglong Chen;Hong Yan;Ray C. C. Cheung
Polynomial multiplication using Number Theoretic Transform (NTT) is crucial in lattice-based post-quantum cryptography (PQC) and fully homomorphic encryption (FHE), with modulus $q$ significantly affecting performance. Fermat moduli of the form $2^{2^{n}}+1$, such as 65537, offer efficiency gains due to simplified modular reduction and powers-of-2 twiddle factors in NTT. While Fermat moduli have been directly applied or explored for incorporation into existing schemes, Fermat NTT-based polynomial multiplication designs remain underexplored in fully exploiting the benefits of Fermat moduli. This work presents a high-radix/mixed-radix NTT architecture tailored for Fermat moduli, which improves the utilization of the powers-of-2 twiddle factors in large transform sizes. In most cases, our design achieves a 30%–85% reduction in DSP area-time product (ATP) and a 70%–100% reduction in BRAM ATP compared to state-of-the-art designs with smaller or equivalent modulus, while maintaining competitive LUT and FF ATP, underscoring the potential of Fermat NTT-based polynomial multipliers in lattice-based cryptography.
在基于格的后量子加密(PQC)和全同态加密(FHE)中,利用数论变换(NTT)进行多项式乘法运算是至关重要的,其模量q对性能有重要影响。形式为$2^{2^{n}}+1$的费马模,例如65537,由于简化的模块化缩减和NTT中的2次幂旋转因子,提供了效率提升。虽然费马模已被直接应用或探索纳入现有方案,但基于费马ntt的多项式乘法设计在充分利用费马模的好处方面仍未得到充分的探索。本文提出了一种适合于费马模的高基数/混合基数NTT架构,该架构提高了大变换尺寸下2次幂旋转因子的利用率。在大多数情况下,与具有较小或等效模量的最先进设计相比,我们的设计实现了DSP面积时间积(ATP)减少30%-85%,BRAM ATP减少70%-100%,同时保持了具有竞争力的LUT和FF ATP,强调了基于费马ntt的多项式乘法器在基于格的密码术中的潜力。
{"title":"High-Radix/Mixed-Radix NTT Multiplication Algorithm/Architecture Co-Design Over Fermat Modulus","authors":"Yile Xing;Guangyan Li;Zewen Ye;Ryan W. L. Luk;Donglong Chen;Hong Yan;Ray C. C. Cheung","doi":"10.1109/TC.2025.3590972","DOIUrl":"https://doi.org/10.1109/TC.2025.3590972","url":null,"abstract":"Polynomial multiplication using Number Theoretic Transform (NTT) is crucial in lattice-based post-quantum cryptography (PQC) and fully homomorphic encryption (FHE), with modulus <inline-formula><tex-math>$q$</tex-math></inline-formula> significantly affecting performance. Fermat moduli of the form <inline-formula><tex-math>$2^{2^{n}}+1$</tex-math></inline-formula>, such as 65537, offer efficiency gains due to simplified modular reduction and powers-of-2 twiddle factors in NTT. While Fermat moduli have been directly applied or explored for incorporation into existing schemes, Fermat NTT-based polynomial multiplication designs remain underexplored in fully exploiting the benefits of Fermat moduli. This work presents a high-radix/mixed-radix NTT architecture tailored for Fermat moduli, which improves the utilization of the powers-of-2 twiddle factors in large transform sizes. In most cases, our design achieves a 30%–85% reduction in DSP area-time product (ATP) and a 70%–100% reduction in BRAM ATP compared to state-of-the-art designs with smaller or equivalent modulus, while maintaining competitive LUT and FF ATP, underscoring the potential of Fermat NTT-based polynomial multipliers in lattice-based cryptography.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3519-3533"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Computers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1