IEEE Transactions on Computers最新文献

Dynamic Graph Publication With Differential Privacy Guarantees for Decentralized Applications

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-19 DOI: 10.1109/TC.2025.3543605

Zhetao Li;Yong Xiao;Haolin Liu;Xiaofei Liao;Ye Yuan;Junzhao Du

Decentralized Applications (DApps) have garnered significant attention due to their decentralization, anonymity, and data autonomy. However, these systems face potential privacy challenge. The privacy challenge arises from the necessity for external service providers to collect and process user interaction data. The untrustworthiness of these providers may lead to privacy breaches, compromising the overall security of such DApp environments. To address this challenge, we model the interaction data in the DApp environments as dynamic graphs and propose a dynamic graph publication method named HMG (Hidden Markov Model for Dynamic Graphs). HMG estimates the interaction probabilities between users by extracting the temporal information from historically collected data and constructs an optimized model to generate synthetic graphs. The synthetic graphs can preserve the dynamic topological characteristics of the interaction processes within DApp environments while effectively protecting user privacy, thus assisting external service providers in performing effective analyses. Finally, we evaluate the performance of HMG using real-world datasets and benchmark it against commonly used graph metrics. The results demonstrate that the synthetic graphs preserve essential features, making them suitable for analysis by service providers.

{"title":"Dynamic Graph Publication With Differential Privacy Guarantees for Decentralized Applications","authors":"Zhetao Li;Yong Xiao;Haolin Liu;Xiaofei Liao;Ye Yuan;Junzhao Du","doi":"10.1109/TC.2025.3543605","DOIUrl":"https://doi.org/10.1109/TC.2025.3543605","url":null,"abstract":"Decentralized Applications (DApps) have garnered significant attention due to their decentralization, anonymity, and data autonomy. However, these systems face potential privacy challenge. The privacy challenge arises from the necessity for external service providers to collect and process user interaction data. The untrustworthiness of these providers may lead to privacy breaches, compromising the overall security of such DApp environments. To address this challenge, we model the interaction data in the DApp environments as dynamic graphs and propose a dynamic graph publication method named HMG (Hidden Markov Model for Dynamic Graphs). HMG estimates the interaction probabilities between users by extracting the temporal information from historically collected data and constructs an optimized model to generate synthetic graphs. The synthetic graphs can preserve the dynamic topological characteristics of the interaction processes within DApp environments while effectively protecting user privacy, thus assisting external service providers in performing effective analyses. Finally, we evaluate the performance of HMG using real-world datasets and benchmark it against commonly used graph metrics. The results demonstrate that the synthetic graphs preserve essential features, making them suitable for analysis by service providers.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1771-1785"},"PeriodicalIF":3.6,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Localizing Multiple Bugs in RTL Designs by Classifying Hit-Statements Using Neural Networks

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-19 DOI: 10.1109/TC.2025.3543609

Mahsa Heidari;Bijan Alizadeh

Nowadays the advanced applications required in our lives have led to a significant increase in the complexity of circuits, which enhances the possibility of occurring design errors. Hence an automated, powerful, and scalable debugging approach is needed. Therefore, this paper proposes a scalable approach for localizing multiple bugs in Register-Transfer level (RTL) designs by using neural networks. The main idea is that hit-statements which are covered by failed test-vectors are more suspicious than those covered by passed test-vectors. We use coverage data as samples of our data set, label these samples, and tune the neural network model. Then we encode hit-statements and give them to the tuned model as new samples. The model classifies hit-statements. Hit-statements that take the failed labels, labels related to the failed test-vectors, are more suspicious of containing bugs. The results demonstrate that the proposed methodology outperforms recent approaches Tarsel and CirFix by localizing 80% of bugs at Top-1. The results also imply that our methodology increases the F₁-score metric by 1.13× in comparison with existing RTL debugging techniques, which are prediction-based.

{"title":"Localizing Multiple Bugs in RTL Designs by Classifying Hit-Statements Using Neural Networks","authors":"Mahsa Heidari;Bijan Alizadeh","doi":"10.1109/TC.2025.3543609","DOIUrl":"https://doi.org/10.1109/TC.2025.3543609","url":null,"abstract":"Nowadays the advanced applications required in our lives have led to a significant increase in the complexity of circuits, which enhances the possibility of occurring design errors. Hence an automated, powerful, and scalable debugging approach is needed. Therefore, this paper proposes a scalable approach for localizing multiple bugs in Register-Transfer level (RTL) designs by using neural networks. The main idea is that hit-statements which are covered by failed test-vectors are more suspicious than those covered by passed test-vectors. We use coverage data as samples of our data set, label these samples, and tune the neural network model. Then we encode hit-statements and give them to the tuned model as new samples. The model classifies hit-statements. Hit-statements that take the failed labels, labels related to the failed test-vectors, are more suspicious of containing bugs. The results demonstrate that the proposed methodology outperforms recent approaches Tarsel and CirFix by localizing 80% of bugs at Top-1. The results also imply that our methodology increases the F1-score metric by 1.13× in comparison with existing RTL debugging techniques, which are prediction-based.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1786-1799"},"PeriodicalIF":3.6,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Methodology for Binary Logarithmic Computations of Floating-Point Numbers With Normalized Output Within One ulp of Accuracy

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-19 DOI: 10.1109/TC.2025.3543676

Fei Lyu;Yuanyong Luo;Weiqiang Liu

Many studies have focused on the hardware implementation of binary logarithmic computation with fixed-point output. Although their outputs are accurate within 1 ulp (unit in the last place) in fixed-point format, they are far from meeting the accuracy requirement of 1 ulp in floating-point format when the output is close to 0. However, normalized floating-point output that is accurate to within 1-3 ulp is needed in many math libraries (for example, OpenCL, NVIDIA CUDA, and AMD AOCL). To the best of our knowledge, this is the first study to propose a hardware implementation of binary logarithmic computation for floating-point numbers with a normalized output that is accurate to within 1 ulp. Instead of calculating

$textrm{log}_{2}(1+fi)$

(where

$boldsymbol{fi}$

is the fractional part of the floating-point number) directly, the proposed methodology uses two novel objective functions for the polynomial approximation method. The novel objective functions make the significant bits of the outputs move forward to eliminate the necessity for high precision near zero. Compared with the designs of fixed-point binary logarithmic converters, the proposed hardware implementation achieves greater accuracy to meet the requirement of 1 ulp of floating-point format with a 21% extra area consumption.

{"title":"An Efficient Methodology for Binary Logarithmic Computations of Floating-Point Numbers With Normalized Output Within One ulp of Accuracy","authors":"Fei Lyu;Yuanyong Luo;Weiqiang Liu","doi":"10.1109/TC.2025.3543676","DOIUrl":"https://doi.org/10.1109/TC.2025.3543676","url":null,"abstract":"Many studies have focused on the hardware implementation of binary logarithmic computation with fixed-point output. Although their outputs are accurate within 1 ulp (unit in the last place) in fixed-point format, they are far from meeting the accuracy requirement of 1 ulp in floating-point format when the output is close to 0. However, normalized floating-point output that is accurate to within 1-3 ulp is needed in many math libraries (for example, OpenCL, NVIDIA CUDA, and AMD AOCL). To the best of our knowledge, this is the first study to propose a hardware implementation of binary logarithmic computation for floating-point numbers with a normalized output that is accurate to within 1 ulp. Instead of calculating <inline-formula><tex-math>$textrm{log}_{2}(1+fi)$</tex-math></inline-formula> (where <inline-formula><tex-math>$boldsymbol{fi}$</tex-math></inline-formula> is the fractional part of the floating-point number) directly, the proposed methodology uses two novel objective functions for the polynomial approximation method. The novel objective functions make the significant bits of the outputs move forward to eliminate the necessity for high precision near zero. Compared with the designs of fixed-point binary logarithmic converters, the proposed hardware implementation achieves greater accuracy to meet the requirement of 1 ulp of floating-point format with a 21% extra area consumption.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1800-1813"},"PeriodicalIF":3.6,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143801035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient and Secure Storage Verification in Cloud-Assisted Industrial IoT Networks

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3540661

Haiyang Yu;Hui Zhang;Zhen Yang;Yuwen Chen;Huan Liu

The rapid development of Industrial IoT (IIoT) has caused the explosion of industrial data, which opens up promising possibilities for data analysis in IIoT networks. Due to the limitation of computation and storage capacity, IIoT devices choose to outsource the collected data to remote cloud servers. Unfortunately, the cloud storage service is not as reliable as it claims, whilst the loss of physical control over the cloud data makes it a significant challenge in ensuring the integrity of the data. Existing schemes are designed to check the data integrity in the cloud. However, it is still an open problem since IIoT devices have to devote lots of computation resources in existing schemes, which are especially not friendly to resource-constrained IIoT devices. In this paper, we propose an efficient storage verification approach for cloud-assisted industrial IoT platform by adopting a homomorphic hash function combined with polynomial commitment. The proposed approach can efficiently generate verification tags and verify the integrity of data in the industrial cloud platform for IIoT devices. Moreover, the proposed scheme can be extended to support privacy-enhanced verification and dynamic updates. We prove the security of the proposed approach under the random oracle model. Extensive experiments demonstrate the superior performance of our approach for resource-constrained devices in comparison with the state-of-the-art.

{"title":"Efficient and Secure Storage Verification in Cloud-Assisted Industrial IoT Networks","authors":"Haiyang Yu;Hui Zhang;Zhen Yang;Yuwen Chen;Huan Liu","doi":"10.1109/TC.2025.3540661","DOIUrl":"https://doi.org/10.1109/TC.2025.3540661","url":null,"abstract":"The rapid development of Industrial IoT (IIoT) has caused the explosion of industrial data, which opens up promising possibilities for data analysis in IIoT networks. Due to the limitation of computation and storage capacity, IIoT devices choose to outsource the collected data to remote cloud servers. Unfortunately, the cloud storage service is not as reliable as it claims, whilst the loss of physical control over the cloud data makes it a significant challenge in ensuring the integrity of the data. Existing schemes are designed to check the data integrity in the cloud. However, it is still an open problem since IIoT devices have to devote lots of computation resources in existing schemes, which are especially not friendly to resource-constrained IIoT devices. In this paper, we propose an efficient storage verification approach for cloud-assisted industrial IoT platform by adopting a homomorphic hash function combined with polynomial commitment. The proposed approach can efficiently generate verification tags and verify the integrity of data in the industrial cloud platform for IIoT devices. Moreover, the proposed scheme can be extended to support privacy-enhanced verification and dynamic updates. We prove the security of the proposed approach under the random oracle model. Extensive experiments demonstrate the superior performance of our approach for resource-constrained devices in comparison with the state-of-the-art.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1702-1716"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lattice-Based Forward Secure Multi-User Authenticated Searchable Encryption for Cloud Storage Systems

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3540649

Shiyuan Xu;Xue Chen;Yu Guo;Yuer Yang;Shengling Wang;Siu-Ming Yiu;Xiuzhen Cheng

Public key authenticated encryption with keyword search (PAEKS) has been widely studied in cloud storage systems, which allows the cloud server to search encrypted data while safeguarding against insider keyword guessing attacks (IKGAs). Most PAEKS schemes are based on the discrete logarithm (DL) hardness. However, this assumption becomes insecure when it comes to quantum attacks. To address this concern, there have been studies on post-quantum PAEKS based on lattice. But to our best knowledge, current lattice-based PAEKS exhibit limited applicability and security, such as only supporting single user scenarios, or encountering secret key leakage problem. In this paper, we propose FS-MUAEKS, the forward-secure multi-user authenticated searchable encryption, mitigating the secret key exposure problem and further supporting multi-user scenarios in a quantum setting. Additionally, we formalize the security models of FS-MUAEKS and prove its security in the random oracle model (ROM). Ultimately, the comprehensive performance evaluation indicates that our scheme is computationally efficient and surpasses other state-of-the-art PAEKS schemes. The ciphertext generation overhead of our scheme is only 0.27 times of others in the best case. The communication overhead of our FS-MUAEKS algorithm is constant at 1.75MB under different security parameter settings.

{"title":"Lattice-Based Forward Secure Multi-User Authenticated Searchable Encryption for Cloud Storage Systems","authors":"Shiyuan Xu;Xue Chen;Yu Guo;Yuer Yang;Shengling Wang;Siu-Ming Yiu;Xiuzhen Cheng","doi":"10.1109/TC.2025.3540649","DOIUrl":"https://doi.org/10.1109/TC.2025.3540649","url":null,"abstract":"Public key authenticated encryption with keyword search (PAEKS) has been widely studied in cloud storage systems, which allows the cloud server to search encrypted data while safeguarding against insider keyword guessing attacks (IKGAs). Most PAEKS schemes are based on the discrete logarithm (DL) hardness. However, this assumption becomes insecure when it comes to quantum attacks. To address this concern, there have been studies on post-quantum PAEKS based on lattice. But to our best knowledge, current lattice-based PAEKS exhibit limited applicability and security, such as only supporting single user scenarios, or encountering secret key leakage problem. In this paper, we propose FS-MUAEKS, the forward-secure multi-user authenticated searchable encryption, mitigating the secret key exposure problem and further supporting multi-user scenarios in a quantum setting. Additionally, we formalize the security models of FS-MUAEKS and prove its security in the random oracle model (ROM). Ultimately, the comprehensive performance evaluation indicates that our scheme is computationally efficient and surpasses other state-of-the-art PAEKS schemes. The ciphertext generation overhead of our scheme is only 0.27 times of others in the best case. The communication overhead of our FS-MUAEKS algorithm is constant at 1.75MB under different security parameter settings.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1663-1677"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3540638

Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim

Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9

$boldsymbol{times}$

, 7.7

$boldsymbol{times}$

, and 5.4

$boldsymbol{times}$

, alongside energy savings of 24.9

$boldsymbol{times}$

, 1.8

$boldsymbol{times}$

, and 16.7

$boldsymbol{times}$

. Moreover, AToM offers 1.2

$boldsymbol{times}$

1.9

$boldsymbol{times}$

higher effective throughput compared to existing transformer accelerators.

{"title":"AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer","authors":"Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim","doi":"10.1109/TC.2025.3540638","DOIUrl":"https://doi.org/10.1109/TC.2025.3540638","url":null,"abstract":"Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>, 7.7<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>, and 5.4<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>, 1.8<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>, and 16.7<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula> 1.9<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula> higher effective throughput compared to existing transformer accelerators.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1620-1633"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

$mathsf{Aurora}$Aurora: Leaderless State-Machine Replication With High Throughput

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3540656

Hao Lu;Jian Liu;Kui Ren

State-machine replication (SMR) allows a deterministic state machine to be replicated across a set of replicas and handle clients’ requests as a single machine. Most existing SMR protocols are leader-based requiring a leader to order requests and coordinate the protocol. This design places a disproportionately high load on the leader, inevitably impairing the scalability. If the leader fails, a complex and bug-prone fail-over protocol is needed to switch to a new leader. An adversary can also exploit the fail-over protocol to slow down the protocol.

In this paper, we propose a crash-fault tolerant SMR named $mathsf{Aurora}$Aurora, with the following properties: •

Leaderless: it does not require a leader, hence completely get rid of the fail-over protocol.

•

Scalable: it can scale up to $11$11 replicas.

•

Robust: it behaves well even under a poor network connection.

We provide a full-fledged implementation of $mathsf{Aurora}$Aurora and systematically evaluate its performance. Our benchmark results show that $mathsf{Aurora}$Aurora achieves a throughput of around two million Transactions Per Second (TPS), up to 8.7$boldsymbol{times}$× higher than the state-of-the-art leaderless SMR.

{"title":"$mathsf{Aurora}$Aurora: Leaderless State-Machine Replication With High Throughput","authors":"Hao Lu;Jian Liu;Kui Ren","doi":"10.1109/TC.2025.3540656","DOIUrl":"https://doi.org/10.1109/TC.2025.3540656","url":null,"abstract":"State-machine replication (SMR) allows a deterministic state machine to be replicated across a set of replicas and handle clients’ requests as a single machine. Most existing SMR protocols are leader-based requiring a leader to order requests and coordinate the protocol. This design places a disproportionately high load on the leader, inevitably impairing the scalability. If the leader fails, a complex and bug-prone fail-over protocol is needed to switch to a new leader. An adversary can also exploit the fail-over protocol to slow down the protocol. In this paper, we propose a crash-fault tolerant SMR named <inline-formula><tex-math>$mathsf{Aurora}$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>Aurora</mml:mi></mml:mrow></mml:math><inline-graphic></alternatives></inline-formula>, with the following properties: <list> <list-item> <label>•</label> Leaderless: it does not require a leader, hence completely get rid of the fail-over protocol.</list-item> <list-item> <label>•</label> Scalable: it can scale up to <inline-formula><tex-math>$11$</tex-math><alternatives><mml:math><mml:mn>11</mml:mn></mml:math><inline-graphic></alternatives></inline-formula> replicas.</list-item> <list-item> <label>•</label> Robust: it behaves well even under a poor network connection.</list-item></list> We provide a full-fledged implementation of <inline-formula><tex-math>$mathsf{Aurora}$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>Aurora</mml:mi></mml:mrow></mml:math><inline-graphic></alternatives></inline-formula> and systematically evaluate its performance. Our benchmark results show that <inline-formula><tex-math>$mathsf{Aurora}$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>Aurora</mml:mi></mml:mrow></mml:math><inline-graphic></alternatives></inline-formula> achieves a throughput of around two million Transactions Per Second (TPS), up to 8.7<inline-formula><tex-math>$boldsymbol{times}$</tex-math><alternatives><mml:math><mml:mo>×</mml:mo></mml:math><inline-graphic></alternatives></inline-formula> higher than the state-of-the-art leaderless SMR.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1690-1701"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KDN-Based Adaptive Computation Offloading and Resource Allocation Strategy Optimization: Maximizing User Satisfaction

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3541142

Kaiqi Yang;Qiang He;Xingwei Wang;Zhi Liu;Yufei Liu;Min Huang;Liang Zhao

In large-scale dynamic network environments, optimizing the computation offloading and resource allocation strategy is key to improving resource utilization and meeting the diverse demands of User Equipment (UE). However, traditional strategies for providing personalized computing services face several challenges: dynamic changes in the environment and UE demands, along with the inefficiency and high costs of real-time data collection; the unpredictability of resource status leads to an inability to ensure long-term UE satisfaction. To address these challenges, we propose a Knowledge-Defined Networking (KDN)-based Adaptive Edge Resource Allocation Optimization (KARO) architecture, facilitating real-time data collection and analysis of environmental conditions. Additionally, we implement an environmental resource change perception module in the KARO to assess current and future resource utilization trends. Based on the real-time state and resource urgency, we develop a deep reinforcement learning-based Adaptive Long-term Computation Offloading and Resource Allocation (AL-CORA) strategy optimization algorithm. This algorithm adapts to the environmental resource urgency, autonomously balancing UE satisfaction and task execution cost. Experimental results indicate that AL-CORA effectively improves long-term UE satisfaction and task execution success rates, under the limited computation resource constraints.

{"title":"KDN-Based Adaptive Computation Offloading and Resource Allocation Strategy Optimization: Maximizing User Satisfaction","authors":"Kaiqi Yang;Qiang He;Xingwei Wang;Zhi Liu;Yufei Liu;Min Huang;Liang Zhao","doi":"10.1109/TC.2025.3541142","DOIUrl":"https://doi.org/10.1109/TC.2025.3541142","url":null,"abstract":"In large-scale dynamic network environments, optimizing the computation offloading and resource allocation strategy is key to improving resource utilization and meeting the diverse demands of User Equipment (UE). However, traditional strategies for providing personalized computing services face several challenges: dynamic changes in the environment and UE demands, along with the inefficiency and high costs of real-time data collection; the unpredictability of resource status leads to an inability to ensure long-term UE satisfaction. To address these challenges, we propose a Knowledge-Defined Networking (KDN)-based Adaptive Edge Resource Allocation Optimization (KARO) architecture, facilitating real-time data collection and analysis of environmental conditions. Additionally, we implement an environmental resource change perception module in the KARO to assess current and future resource utilization trends. Based on the real-time state and resource urgency, we develop a deep reinforcement learning-based Adaptive Long-term Computation Offloading and Resource Allocation (AL-CORA) strategy optimization algorithm. This algorithm adapts to the environmental resource urgency, autonomously balancing UE satisfaction and task execution cost. Experimental results indicate that AL-CORA effectively improves long-term UE satisfaction and task execution success rates, under the limited computation resource constraints.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1743-1757"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Low-Rate DoS Attack Mitigation Scheme Based on Port and Traffic State in SDN 基于 SDN 中端口和流量状态的低速率 DoS 攻击缓解方案

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3541143

Dan Tang;Rui Dai;Chenguang Zuo;Jingwen Chen;Keqin Li;Zheng Qin

Low-rate Denial of Service (DoS) attacks can significantly compromise network availability and are difficult to detect and mitigate due to their stealthy exploitation of flaws in congestion control mechanisms. Software-Defined Networking (SDN) is a revolutionary architecture that decouples network control from packet forwarding, emerging as a promising solution for defending against low-rate DoS attacks. In this paper, we propose Trident, a low-rate DoS attack mitigation scheme based on port and traffic state in SDN. Specifically, we design a multi-step strategy to monitor switch states. First, Trident identifies switches suspected of suffering from low-rate DoS attacks through port state detection. Then, it monitors the traffic state of switches with abnormal port states. Once a switch is identified as suffering from an attack, Trident analyzes the flow information to pinpoint the malicious flow. Finally, Trident issues rules to the switch's flow table to block the malicious flow, effectively mitigating the attack. We prototype Trident on the Mininet platform and conduct experiments using a real-world topology to evaluate its performance. The experiments show that Trident can accurately and robustly detect low-rate DoS attacks, respond quickly to mitigate them, and maintain low overhead.

{"title":"A Low-Rate DoS Attack Mitigation Scheme Based on Port and Traffic State in SDN","authors":"Dan Tang;Rui Dai;Chenguang Zuo;Jingwen Chen;Keqin Li;Zheng Qin","doi":"10.1109/TC.2025.3541143","DOIUrl":"https://doi.org/10.1109/TC.2025.3541143","url":null,"abstract":"Low-rate Denial of Service (DoS) attacks can significantly compromise network availability and are difficult to detect and mitigate due to their stealthy exploitation of flaws in congestion control mechanisms. Software-Defined Networking (SDN) is a revolutionary architecture that decouples network control from packet forwarding, emerging as a promising solution for defending against low-rate DoS attacks. In this paper, we propose Trident, a low-rate DoS attack mitigation scheme based on port and traffic state in SDN. Specifically, we design a multi-step strategy to monitor switch states. First, Trident identifies switches suspected of suffering from low-rate DoS attacks through port state detection. Then, it monitors the traffic state of switches with abnormal port states. Once a switch is identified as suffering from an attack, Trident analyzes the flow information to pinpoint the malicious flow. Finally, Trident issues rules to the switch's flow table to block the malicious flow, effectively mitigating the attack. We prototype Trident on the Mininet platform and conduct experiments using a real-world topology to evaluate its performance. The experiments show that Trident can accurately and robustly detect low-rate DoS attacks, respond quickly to mitigate them, and maintain low overhead.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1758-1770"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PQNTRU: Acceleration of NTRU-Based Schemes via Customized Post-Quantum Processor

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers

Pub Date : 2025-02-11 DOI: 10.1109/TC.2025.3540647

Zewen Ye;Junhao Huang;Tianshun Huang;Yudan Bai;Jinze Li;Hao Zhang;Guangyan Li;Donglong Chen;Ray C. C. Cheung;Kejie Huang

Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. Hawk is currently the only lattice-based candidate in NIST Round 2 additional signatures. Falcon and Hawk are based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained Internet-of-Things (IoT) devices. Despite the popularity of ML-DSA and ML-KEM, research on NTRU-based schemes has been limited due to their complex algorithms and operations. Falcon and Hawk's performance remains constrained by the lack of parallel execution in crucial operations like the Number Theoretic Transform (NTT) and Fast Fourier Transform (FFT), with data dependency being a significant bottleneck. This paper enhances NTRU-based schemes Falcon and Hawk through hardware/software co-design on a customized Single-Instruction-Multiple-Data (SIMD) processor, proposing new SIMD hardware units and instructions to expedite these schemes along with software optimizations to boost performance. Our NTT optimization includes a novel layer merging technique for SIMD architecture to reduce memory accesses, and the use of modular algorithms (Signed Montgomery and Improved Plantard) targets various modulus data widths to enhance performance. We explore applying layer merging to accelerate fixed-point FFT at the SIMD instruction level and devise a dual-issue parser to streamline assembly code organization to maximize dual-issue utilization. A System-on-chip (SoC) architecture is devised to improve the practical application of the processor in real-world scenarios. Evaluation on 28

$nm$

technology and field programmable gate array (FPGA) platform shows that our design and optimizations can increase the performance of Hawk signature generation and verification by over 7

$times$

.

{"title":"PQNTRU: Acceleration of NTRU-Based Schemes via Customized Post-Quantum Processor","authors":"Zewen Ye;Junhao Huang;Tianshun Huang;Yudan Bai;Jinze Li;Hao Zhang;Guangyan Li;Donglong Chen;Ray C. C. Cheung;Kejie Huang","doi":"10.1109/TC.2025.3540647","DOIUrl":"https://doi.org/10.1109/TC.2025.3540647","url":null,"abstract":"Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. Hawk is currently the only lattice-based candidate in NIST Round 2 additional signatures. Falcon and Hawk are based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained Internet-of-Things (IoT) devices. Despite the popularity of ML-DSA and ML-KEM, research on NTRU-based schemes has been limited due to their complex algorithms and operations. Falcon and Hawk's performance remains constrained by the lack of parallel execution in crucial operations like the Number Theoretic Transform (NTT) and Fast Fourier Transform (FFT), with data dependency being a significant bottleneck. This paper enhances NTRU-based schemes Falcon and Hawk through hardware/software co-design on a customized Single-Instruction-Multiple-Data (SIMD) processor, proposing new SIMD hardware units and instructions to expedite these schemes along with software optimizations to boost performance. Our NTT optimization includes a novel layer merging technique for SIMD architecture to reduce memory accesses, and the use of modular algorithms (Signed Montgomery and Improved Plantard) targets various modulus data widths to enhance performance. We explore applying layer merging to accelerate fixed-point FFT at the SIMD instruction level and devise a dual-issue parser to streamline assembly code organization to maximize dual-issue utilization. A System-on-chip (SoC) architecture is devised to improve the practical application of the processor in real-world scenarios. Evaluation on 28 <inline-formula><tex-math>$nm$</tex-math></inline-formula> technology and field programmable gate array (FPGA) platform shows that our design and optimizations can increase the performance of Hawk signature generation and verification by over 7<inline-formula><tex-math>$times$</tex-math></inline-formula>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1649-1662"},"PeriodicalIF":3.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0