Journal of Systems Architecture最新文献_第2页

Optimizing the performance of in-memory file system by thread scheduling and file migration under NUMA multiprocessor systems

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-02-01 DOI: 10.1016/j.sysarc.2025.103344

Ting Wu , Jingting He , Ying Qian , Weichen Liu

Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.

{"title":"Optimizing the performance of in-memory file system by thread scheduling and file migration under NUMA multiprocessor systems","authors":"Ting Wu , Jingting He , Ying Qian , Weichen Liu","doi":"10.1016/j.sysarc.2025.103344","DOIUrl":"10.1016/j.sysarc.2025.103344","url":null,"abstract":"<div><div>Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"159 ","pages":"Article 103344"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secure and privacy-preserving quantum authentication scheme using blockchain identifiers in metaverse environment

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-02-01 DOI: 10.1016/j.sysarc.2025.103329

Sunil Prajapat , Aryan Rana , Pankaj Kumar , Ashok Kumar Das , Youngho Park , Mohammed J.F. Alenazi

The metaverse, a collection of virtual worlds offering a variety of social activities mirroring real-life interactions, is garnering increasing attention. Consequently, the need to ensure security and privacy within these spaces has become paramount. Metaverse users can create multiple avatars, a feature that can be exploited to deceive or threaten others, leading to significant internal security concerns. Additionally, users in the metaverse are susceptible to several external security threats due to the public nature of their communications with service providers.To address these challenges, we propose a novel quantum authentication scheme leveraging blockchain technology, decentralized identifiers, and verifiable credentials. This scheme enables secure identity verification and authentication for metaverse users. By allowing users to independently prove their identity without relying on service providers, the proposed approach mitigates privacy concerns associated with the management of personal information. Furthermore, our scheme achieves mutual authentication between the user and the service provider, while also being resilient against a variety of attacks, including unconditional security breaches, replay attacks, eavesdropping, man-in-the-middle attacks, and impersonation attempts. With a computation time of 17.4608 ms and a communication cost of 872 bits, the suggested protocol outperforms current solutions and provides superior security for the metaverse environment during data transmission and storage. Furthermore, we examine the operational capabilities and security features using the latest technology and techniques.

{"title":"Secure and privacy-preserving quantum authentication scheme using blockchain identifiers in metaverse environment","authors":"Sunil Prajapat , Aryan Rana , Pankaj Kumar , Ashok Kumar Das , Youngho Park , Mohammed J.F. Alenazi","doi":"10.1016/j.sysarc.2025.103329","DOIUrl":"10.1016/j.sysarc.2025.103329","url":null,"abstract":"<div><div>The metaverse, a collection of virtual worlds offering a variety of social activities mirroring real-life interactions, is garnering increasing attention. Consequently, the need to ensure security and privacy within these spaces has become paramount. Metaverse users can create multiple avatars, a feature that can be exploited to deceive or threaten others, leading to significant internal security concerns. Additionally, users in the metaverse are susceptible to several external security threats due to the public nature of their communications with service providers.To address these challenges, we propose a novel quantum authentication scheme leveraging blockchain technology, decentralized identifiers, and verifiable credentials. This scheme enables secure identity verification and authentication for metaverse users. By allowing users to independently prove their identity without relying on service providers, the proposed approach mitigates privacy concerns associated with the management of personal information. Furthermore, our scheme achieves mutual authentication between the user and the service provider, while also being resilient against a variety of attacks, including unconditional security breaches, replay attacks, eavesdropping, man-in-the-middle attacks, and impersonation attempts. With a computation time of 17.4608 ms and a communication cost of 872 bits, the suggested protocol outperforms current solutions and provides superior security for the metaverse environment during data transmission and storage. Furthermore, we examine the operational capabilities and security features using the latest technology and techniques.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"159 ","pages":"Article 103329"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PARL: Page Allocation in hybrid main memory using Reinforcement Learning

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-02-01 DOI: 10.1016/j.sysarc.2024.103310

Emil Karimov , Timon Evenblij , Saeideh Alinezhad Chamazcoti , Francky Catthoor

Hybrid Main Memory introduces emerging non-volatile memory technologies and reduces the DRAM footprint to address the increasing capacity demands of modern workloads and DRAM scaling issues. The resulting heterogeneity requires new policies to distribute data and to minimize the potential workload slowdown. Existing literature proposes fixed-logic or machine learning-based solutions using naive and suboptimal initial data placement and focuses on the subsequent data migration policies. We explore this gap in the initial placement and propose to improve it using an adaptive and technology-agnostic solution. Page Allocation using Reinforcement Learning (PARL) is an agent that learns the target memory device for the initial placement of memory pages based on the rewards received from the system. PARL makes decisions by observing the state of its environment using system-level attributes instead of analyzing memory access patterns. This allows for a smaller state space and makes a reinforcement learning agent feasible for implementation, contrary to the claims found in the literature. Compared to fixed-logic methods, our proposal achieves 16%–43% better workload runtime and 21% better DRAM hitrate on average across the evaluated workloads. PARL also improves the DRAM hitrate by 9% on average (up to 34%), compared to a proposal using machine learning.

{"title":"PARL: Page Allocation in hybrid main memory using Reinforcement Learning","authors":"Emil Karimov , Timon Evenblij , Saeideh Alinezhad Chamazcoti , Francky Catthoor","doi":"10.1016/j.sysarc.2024.103310","DOIUrl":"10.1016/j.sysarc.2024.103310","url":null,"abstract":"<div><div>Hybrid Main Memory introduces emerging non-volatile memory technologies and reduces the DRAM footprint to address the increasing capacity demands of modern workloads and DRAM scaling issues. The resulting heterogeneity requires new policies to distribute data and to minimize the potential workload slowdown. Existing literature proposes fixed-logic or machine learning-based solutions using naive and suboptimal initial data placement and focuses on the subsequent data migration policies. We explore this gap in the initial placement and propose to improve it using an adaptive and technology-agnostic solution. Page Allocation using Reinforcement Learning (PARL) is an agent that learns the target memory device for the initial placement of memory pages based on the rewards received from the system. PARL makes decisions by observing the state of its environment using system-level attributes instead of analyzing memory access patterns. This allows for a smaller state space and makes a reinforcement learning agent feasible for implementation, contrary to the claims found in the literature. Compared to fixed-logic methods, our proposal achieves 16%–43% better workload runtime and 21% better DRAM hitrate on average across the evaluated workloads. PARL also improves the DRAM hitrate by 9% on average (up to 34%), compared to a proposal using machine learning.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"159 ","pages":"Article 103310"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A blockchain-based secure data sharing scheme with efficient attribute revocation

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-02-01 DOI: 10.1016/j.sysarc.2024.103309

Siyue Li , Kele Niu , Bin Wu

With the growth of interdisciplinary projects, cooperation among organizations and secure file sharing are increasingly crucial. The lead party manages these files and must ensure project participants can access them while preventing unauthorized access. CP-ABE(Ciphertext-Policy Attribute-Based Encryption) demonstrates advantages in achieving fine-grained access control through its unique encryption and decryption strategy. However, flawed permission management, lack of traceability and complex ciphertext updatings present challenges. To address these issues, we propose an enhanced CP-ABE scheme for secure data sharing and participant management, along with a tree-based searchable encryption for file index protection. The chameleon hash function assigns unique identities to participants, preventing unauthorized access. Smart contracts and blockchain ensure traceability, and a re-encryption scheme efficiently updates ciphertext without modifying ciphertext of project files. Our solution enhances data sharing efficiency, protects file confidentiality, and enables malicious activity traceability.

引用次数: 0

DELTA: Deadline aware energy and latency-optimized task offloading and resource allocation in GPU-enabled, PiM-enabled distributed heterogeneous MEC architecture

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-02-01 DOI: 10.1016/j.sysarc.2025.103335

Akhirul Islam, Manojit Ghose

The use of Multi-access Edge Computing (MEC) technology holds great potential for supporting modern, computation-intensive, and time-sensitive applications. These applications are mainly generated from resource-constrained handheld or mobile user equipment (UE). As these devices have limited resources and some are also energy-constrained, it is crucial to offload some portions of the applications (or tasks) to the connected MEC servers. However, MEC servers also have limited resources compared to cloud servers, making it imperative to implement efficient task-offloading policies for UE devices and optimal resource allocation policies for MEC servers. In this paper, we first formulate the energy and latency minimization problem as a multi-objective Mixed Integer Programming (MIP) problem, and we propose a novel deadline-aware energy and latency-optimized task offloading and resource allocation (DELTA) strategy to execute the applications on a cooperative heterogeneous MEC architecture efficiently. Our policy aims to minimize the energy consumption of UEs and the latency of applications while meeting the deadline and dependency constraints of the applications. In our heterogeneous cooperative MEC system, as a novel contribution, we consider that the MEC servers are equipped with graphics processing unit (GPUs), solid-state disk (SSD) storage, and processing in-memory (PiM) enabled memory, in addition to the traditional processors, memories, and hard disk storage. Furthermore, we consider the UEs to be dynamic voltage and frequency scaling (DVFS) enabled. We perform an extensive simulation using the real data set on a standard simulator and compare our results with three different policies (Intelligent-TO (Chen et al., 2023), Multi-user (Yang et al., 2020) and Selective-random). Our proposed strategy DELTA achieves a 71.18% reduction in latency on average compared to the considered state-of-the-art policy, and it outperforms the most efficient benchmarked strategy, Intelligent-TO, by 59.6% in terms of latency. Regarding energy consumption for UE devices, the considered state-of-the-art policies consume about 4x more energy on average than the DELTA. Although Intelligent-TO is the most energy-efficient policy among those benchmarked, DELTA surpasses it, achieving a 59.6% reduction in energy consumption.

{"title":"DELTA: Deadline aware energy and latency-optimized task offloading and resource allocation in GPU-enabled, PiM-enabled distributed heterogeneous MEC architecture","authors":"Akhirul Islam, Manojit Ghose","doi":"10.1016/j.sysarc.2025.103335","DOIUrl":"10.1016/j.sysarc.2025.103335","url":null,"abstract":"<div><div>The use of Multi-access Edge Computing (MEC) technology holds great potential for supporting modern, computation-intensive, and time-sensitive applications. These applications are mainly generated from resource-constrained handheld or mobile user equipment (UE). As these devices have limited resources and some are also energy-constrained, it is crucial to offload some portions of the applications (or tasks) to the connected MEC servers. However, MEC servers also have limited resources compared to cloud servers, making it imperative to implement efficient task-offloading policies for UE devices and optimal resource allocation policies for MEC servers. In this paper, we first formulate the energy and latency minimization problem as a multi-objective Mixed Integer Programming (MIP) problem, and we propose a novel deadline-aware energy and latency-optimized task offloading and resource allocation (<strong>DELTA</strong>) strategy to execute the applications on a cooperative heterogeneous MEC architecture efficiently. Our policy aims to minimize the energy consumption of UEs and the latency of applications while meeting the deadline and dependency constraints of the applications. In our heterogeneous cooperative MEC system, as a novel contribution, we consider that the MEC servers are equipped with graphics processing unit (GPUs), solid-state disk (SSD) storage, and processing in-memory (PiM) enabled memory, in addition to the traditional processors, memories, and hard disk storage. Furthermore, we consider the UEs to be dynamic voltage and frequency scaling (DVFS) enabled. We perform an extensive simulation using the real data set on a standard simulator and compare our results with three different policies (Intelligent-TO (Chen et al., 2023), Multi-user (Yang et al., 2020) and Selective-random). Our proposed strategy DELTA achieves a 71.18% reduction in latency on average compared to the considered state-of-the-art policy, and it outperforms the most efficient benchmarked strategy, Intelligent-TO, by 59.6% in terms of latency. Regarding energy consumption for UE devices, the considered state-of-the-art policies consume about 4x more energy on average than the DELTA. Although Intelligent-TO is the most energy-efficient policy among those benchmarked, DELTA surpasses it, achieving a 59.6% reduction in energy consumption.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"159 ","pages":"Article 103335"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time scheduling for multi-object tracking tasks in regions with different criticalities

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-01-28 DOI: 10.1016/j.sysarc.2025.103349

Donghwa Kang , Jinkyu Lee , Hyeongboo Baek

Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision-making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness. These systems must handle regions of varying criticality and dynamically shifting locations, all within limited computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or fail to prioritize safety-critical regions in the input images. We propose CA-MOT, a Criticality-Aware MOT execution and scheduling framework for multiple cameras. CA-MOT provides a control knob that balances tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob, CA-MOT achieves both high accuracy and timing guarantees. We evaluated CA-MOT’s performance using a GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving scenarios.

{"title":"Real-time scheduling for multi-object tracking tasks in regions with different criticalities","authors":"Donghwa Kang , Jinkyu Lee , Hyeongboo Baek","doi":"10.1016/j.sysarc.2025.103349","DOIUrl":"10.1016/j.sysarc.2025.103349","url":null,"abstract":"<div><div>Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision-making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness. These systems must handle regions of varying criticality and dynamically shifting locations, all within limited computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or fail to prioritize safety-critical regions in the input images. We propose <span>CA-MOT</span>, a <u>C</u>riticality-<u>A</u>ware <u>MOT</u> execution and scheduling framework for multiple cameras. <span>CA-MOT</span> provides a control knob that balances tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob, <span>CA-MOT</span> achieves both high accuracy and timing guarantees. We evaluated <span>CA-MOT</span>’s performance using a GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving scenarios.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103349"},"PeriodicalIF":3.7,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

StorStack: A full-stack design for in-storage file systems

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-01-27 DOI: 10.1016/j.sysarc.2025.103348

Juncheng Hu, Shuo Chen, Haoyang Wei, Guoyu Wang, Chenju Pei, Xilong Che

Due to the increasingly significant cost of data movement, In-storage Computing has attracted considerable attention in academia. While most In-storage Computing works allow direct data processing, these methods do not completely eliminate the participation of the CPU during file access, and data still needs to be moved from the file system into memory for processing. Even though there are attempts to put file systems into storage devices to solve this problem, the performance of the system is not ideal when facing high latency storage devices due to bypassing the kernel and lacking page cache.

To address the above issues, we propose StorStack, a full-stack, highly configurable in-storage file system framework, and simulator that facilitates architecture and system-level researches. By offloading the file system into the storage device, the file system can be closer to the data, reducing the overhead of data movements. Meanwhile, it also avoids kernel traps and reduces communication overhead. More importantly, this design enables In-storage Computing applications to completely eliminate CPU participation. StorStack also designs the user-level cache to maintain performance when storage device access latency is high. To study performance, we implement a StorStack prototype and evaluate it under various benchmarks on QEMU and Linux. The results show that StorStack achieves up to 7x performance improvement with direct access and 5.2x with cache.

{"title":"StorStack: A full-stack design for in-storage file systems","authors":"Juncheng Hu, Shuo Chen, Haoyang Wei, Guoyu Wang, Chenju Pei, Xilong Che","doi":"10.1016/j.sysarc.2025.103348","DOIUrl":"10.1016/j.sysarc.2025.103348","url":null,"abstract":"<div><div>Due to the increasingly significant cost of data movement, In-storage Computing has attracted considerable attention in academia. While most In-storage Computing works allow direct data processing, these methods do not completely eliminate the participation of the CPU during file access, and data still needs to be moved from the file system into memory for processing. Even though there are attempts to put file systems into storage devices to solve this problem, the performance of the system is not ideal when facing high latency storage devices due to bypassing the kernel and lacking page cache.</div><div>To address the above issues, we propose StorStack, a full-stack, highly configurable in-storage file system framework, and simulator that facilitates architecture and system-level researches. By offloading the file system into the storage device, the file system can be closer to the data, reducing the overhead of data movements. Meanwhile, it also avoids kernel traps and reduces communication overhead. More importantly, this design enables In-storage Computing applications to completely eliminate CPU participation. StorStack also designs the user-level cache to maintain performance when storage device access latency is high. To study performance, we implement a StorStack prototype and evaluate it under various benchmarks on QEMU and Linux. The results show that StorStack achieves up to 7x performance improvement with direct access and 5.2x with cache.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103348"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast post-quantum private set intersection from oblivious pseudorandom function for mobile social networks

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-01-25 DOI: 10.1016/j.sysarc.2025.103346

Zhuang Shan , Leyou Zhang , Qing Wu , Qiqi Lai , Fuchun Guo

Mobile social networks have become integral to our daily lives, transforming communication methods and facilitating social interactions. With technological advancements, users generate vast amounts of valuable and sensitive personal data, which is stored on servers to enable instant information sharing. To protect the sharing data, each platform has implemented many techniques such as end-to-end encryption mechanisms, fully homomorphic encryption, etc. However, these approaches face several security and privacy challenges, including potential leaks of user data, vulnerabilities in encryption that expose privacy ciphertexts to probabilistic attacks, and threats posed by future quantum computers.

Aimed at the above, we introduce a private set intersection (PSI) protocol based on oblivious pseudorandom functions (OPRF) under ring LPR problem from lattice. The proposed perturbed pseudorandom generator not only enhances the PSI’s resistance to probabilistic attacks, but also leads to generate a more efficient OPRF and a PSI. It boasts a time complexity of

O (n log n)

and is superior to existing well-known fast post-quantum PSI protocol operating at

O (m n log (m n))

, where

m

is the bit length of the cryptographic modulus and

n

represents the dimension of the security parameter. Simulation experiments and security analyses demonstrate that our proposal effectively preserves user privacy, ensures collusion resilience, verifies computation results, and maintains low computational costs. Finally, as an expansion of our OPRF, we also give a fast private information retrieval (PIR) protocol.

{"title":"Fast post-quantum private set intersection from oblivious pseudorandom function for mobile social networks","authors":"Zhuang Shan , Leyou Zhang , Qing Wu , Qiqi Lai , Fuchun Guo","doi":"10.1016/j.sysarc.2025.103346","DOIUrl":"10.1016/j.sysarc.2025.103346","url":null,"abstract":"<div><div>Mobile social networks have become integral to our daily lives, transforming communication methods and facilitating social interactions. With technological advancements, users generate vast amounts of valuable and sensitive personal data, which is stored on servers to enable instant information sharing. To protect the sharing data, each platform has implemented many techniques such as end-to-end encryption mechanisms, fully homomorphic encryption, etc. However, these approaches face several security and privacy challenges, including potential leaks of user data, vulnerabilities in encryption that expose privacy ciphertexts to probabilistic attacks, and threats posed by future quantum computers.</div><div>Aimed at the above, we introduce a private set intersection (PSI) protocol based on oblivious pseudorandom functions (OPRF) under ring LPR problem from lattice. The proposed perturbed pseudorandom generator not only enhances the PSI’s resistance to probabilistic attacks, but also leads to generate a more efficient OPRF and a PSI. It boasts a time complexity of <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>n</mi><mo>log</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span> and is superior to existing well-known fast post-quantum PSI protocol operating at <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>m</mi><mi>n</mi><mo>log</mo><mrow><mo>(</mo><mi>m</mi><mi>n</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>m</mi></math></span> is the bit length of the cryptographic modulus and <span><math><mi>n</mi></math></span> represents the dimension of the security parameter. Simulation experiments and security analyses demonstrate that our proposal effectively preserves user privacy, ensures collusion resilience, verifies computation results, and maintains low computational costs. Finally, as an expansion of our OPRF, we also give a fast private information retrieval (PIR) protocol.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103346"},"PeriodicalIF":3.7,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Eliminating duplicate writes of logging via no-logging flash translation layer in SSDs

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-01-25 DOI: 10.1016/j.sysarc.2025.103347

Zhenghao Yin , Yajuan Du , Yi Fan , Sam H. Noh

With the development of high-density flash memory techniques, SSDs have achieved high performance and large capacity. Databases often use logging to ensure transactional atomicity of data updates. However, it introduces duplicate writes because of multi-versioning, which significantly weakens the performance and endurance of SSDs. This is also often considered as the main reason for slow response of databases. This paper proposes a novel flash translation layer (FTL) for SSDs, which we refer to as NoLgn-FTL, to reduce the overhead of logging-induced duplicate writes by exploiting the inherent multi-version feature of flash memories. Specifically, during a transaction, NoLgn-FTL retains the old data as valid and establishes the mapping between the new physical addresses and the old physical addresses. Thus, the database can easily roll back to the old-version data to maintain system consistency when a power failure occurs. To evaluate NoLgn-FTL, we implement it within FEMU and modify the SQLite database and the file system to make them compatible with the extended abstractions provided by NoLgn-FTL. Experimental results show that, in normal synchronization mode, NoLgn-FTL can reduce SSD writes by 20% and improve database performance by 15% on average.

{"title":"Eliminating duplicate writes of logging via no-logging flash translation layer in SSDs","authors":"Zhenghao Yin , Yajuan Du , Yi Fan , Sam H. Noh","doi":"10.1016/j.sysarc.2025.103347","DOIUrl":"10.1016/j.sysarc.2025.103347","url":null,"abstract":"<div><div>With the development of high-density flash memory techniques, SSDs have achieved high performance and large capacity. Databases often use logging to ensure transactional atomicity of data updates. However, it introduces duplicate writes because of multi-versioning, which significantly weakens the performance and endurance of SSDs. This is also often considered as the main reason for slow response of databases. This paper proposes a novel flash translation layer (FTL) for SSDs, which we refer to as NoLgn-FTL, to reduce the overhead of logging-induced duplicate writes by exploiting the inherent multi-version feature of flash memories. Specifically, during a transaction, NoLgn-FTL retains the old data as valid and establishes the mapping between the new physical addresses and the old physical addresses. Thus, the database can easily roll back to the old-version data to maintain system consistency when a power failure occurs. To evaluate NoLgn-FTL, we implement it within FEMU and modify the SQLite database and the file system to make them compatible with the extended abstractions provided by NoLgn-FTL. Experimental results show that, in normal synchronization mode, NoLgn-FTL can reduce SSD writes by 20% and improve database performance by 15% on average.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103347"},"PeriodicalIF":3.7,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ProckStore: An NDP-empowered key-value store with asynchronous and multi-threaded compaction scheme for optimized performance

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2025-01-24 DOI: 10.1016/j.sysarc.2025.103342

Hui Sun , Chao Zhao , Yinliang Yue , Xiao Qin

With the exponential growth of large-scale unstructured data, LSM-tree-based key-value (KV) stores have become increasingly prevalent in storage systems. However, KV stores face challenges during compaction, particularly when merging and reorganizing SSTables, which leads to high I/O bandwidth consumption and performance degradation due to frequent data migration. Near-data processing (NDP) techniques, which integrate computational units within storage devices, alleviate the data movement bottleneck to the CPU. The NDP framework is a promising solution to address the compaction challenges in KV stores. In this paper, we propose ProckStore, an NDP-enhanced KV store that employs an asynchronous and multi-threaded compaction scheme. ProckStore incorporates a multi-threaded model with a four-level priority scheduling mechanism–covering the compaction stages of triggering, selection, execution, and distribution, thereby minimizing task interference and optimizing scheduling efficiency. To reduce write amplification, ProckStore utilizes a triple-level filtering compaction strategy that minimizes unnecessary writes. Additionally, ProckStore adopts a key-value separation approach to reduce data transmission overhead during host-side compaction. Implemented as an extension of RocksDB on an NDP platform, ProckStore demonstrates significant performance improvements in practical applications. Experimental results indicate a 1.6

\times

throughput increase over the single-threaded and asynchronous model and a 4.2

\times

improvement compared with synchronous schemes.

{"title":"ProckStore: An NDP-empowered key-value store with asynchronous and multi-threaded compaction scheme for optimized performance","authors":"Hui Sun , Chao Zhao , Yinliang Yue , Xiao Qin","doi":"10.1016/j.sysarc.2025.103342","DOIUrl":"10.1016/j.sysarc.2025.103342","url":null,"abstract":"<div><div>With the exponential growth of large-scale unstructured data, LSM-tree-based key-value (KV) stores have become increasingly prevalent in storage systems. However, KV stores face challenges during compaction, particularly when merging and reorganizing SSTables, which leads to high I/O bandwidth consumption and performance degradation due to frequent data migration. Near-data processing (NDP) techniques, which integrate computational units within storage devices, alleviate the data movement bottleneck to the CPU. The NDP framework is a promising solution to address the compaction challenges in KV stores. In this paper, we propose ProckStore, an NDP-enhanced KV store that employs an asynchronous and multi-threaded compaction scheme. ProckStore incorporates a multi-threaded model with a four-level priority scheduling mechanism–covering the compaction stages of triggering, selection, execution, and distribution, thereby minimizing task interference and optimizing scheduling efficiency. To reduce write amplification, ProckStore utilizes a triple-level filtering compaction strategy that minimizes unnecessary writes. Additionally, ProckStore adopts a key-value separation approach to reduce data transmission overhead during host-side compaction. Implemented as an extension of RocksDB on an NDP platform, ProckStore demonstrates significant performance improvements in practical applications. Experimental results indicate a 1.6<span><math><mo>×</mo></math></span> throughput increase over the single-threaded and asynchronous model and a 4.2<span><math><mo>×</mo></math></span> improvement compared with synchronous schemes.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103342"},"PeriodicalIF":3.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0