Pub Date : 2024-05-09DOI: 10.1016/j.sysarc.2024.103157
Jordi Guitart
Checkpoint/Restore techniques had been thoroughly used by the High Performance Computing (HPC) community in the context of failure recovery. Given the current trend in HPC to use containerization to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads, as well as efficient and reliable sharing and management of HPC Cloud infrastructures, there is a need to integrate Checkpoint/Restore with containerization in such a way that the freeze time of the application is minimal and live migrations are practicable. Whereas current Checkpoint/Restore tools (such as CRIU) support several options to accomplish this, most of them are rarely exploited in HPC Clouds and, consequently, their potential impact on the performance is barely known. Therefore, this paper explores the use of CRIU’s advanced features to implement diskless, iterative (pre-copy and post-copy) migrations of containers with external network namespaces and established TCP connections, so that memory-intensive and connection-persistent HPC applications can live-migrate. Our extensive experiments to characterize the performance impact of those features demonstrate that properly-configured live migrations incur low application downtime and memory/disk usage and are indeed feasible in containerized HPC Clouds.
{"title":"Practicable live container migrations in high performance computing clouds: Diskless, iterative, and connection-persistent","authors":"Jordi Guitart","doi":"10.1016/j.sysarc.2024.103157","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103157","url":null,"abstract":"<div><p>Checkpoint/Restore techniques had been thoroughly used by the High Performance Computing (HPC) community in the context of failure recovery. Given the current trend in HPC to use containerization to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads, as well as efficient and reliable sharing and management of HPC Cloud infrastructures, there is a need to integrate Checkpoint/Restore with containerization in such a way that the freeze time of the application is minimal and live migrations are practicable. Whereas current Checkpoint/Restore tools (such as CRIU) support several options to accomplish this, most of them are rarely exploited in HPC Clouds and, consequently, their potential impact on the performance is barely known. Therefore, this paper explores the use of CRIU’s advanced features to implement diskless, iterative (pre-copy and post-copy) migrations of containers with external network namespaces and established TCP connections, so that memory-intensive and connection-persistent HPC applications can live-migrate. Our extensive experiments to characterize the performance impact of those features demonstrate that properly-configured live migrations incur low application downtime and memory/disk usage and are indeed feasible in containerized HPC Clouds.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103157"},"PeriodicalIF":4.5,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124000948/pdfft?md5=cc1942d37d08df364ee498e16b1e96a9&pid=1-s2.0-S1383762124000948-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140917817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06DOI: 10.1016/j.sysarc.2024.103173
Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li
Reliability is a crucial index of the system, and many safety-critical applications have reliability requirements and deadline constraints. In addition, in order to protect the environment and reduce system operating costs, it is necessary to minimize energy consumption as much as possible. This paper considers parallel applications on heterogeneous distributed systems and proposes two algorithms to minimize energy consumption for meeting the deadline and satisfying the reliability requirement of the applications. The first algorithm is called minimizing scheduling length while satisfying the reliability requirement (MSLSRR). It first transforms the reliability requirement of the application into the reliability requirement of the task and then assigns the task to the processor with the earliest finish time. Since the reliability generated by MSLSRR is often higher than the reliability requirement of the application, and the scheduling length is also less than the deadline, an algorithm called improving energy efficiency (IEE) is designed, which redefined the minimum reliability requirement for the task and applied dynamic voltage and frequency scaling (DVFS) technique for energy conservation. The proposed algorithms are compared with existing algorithms by using real parallel applications. Experimental results demonstrate that the proposed algorithms consume the least energy.
{"title":"Energy-efficient scheduling for parallel applications with reliability and time constraints on heterogeneous distributed systems","authors":"Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li","doi":"10.1016/j.sysarc.2024.103173","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103173","url":null,"abstract":"<div><p>Reliability is a crucial index of the system, and many safety-critical applications have reliability requirements and deadline constraints. In addition, in order to protect the environment and reduce system operating costs, it is necessary to minimize energy consumption as much as possible. This paper considers parallel applications on heterogeneous distributed systems and proposes two algorithms to minimize energy consumption for meeting the deadline and satisfying the reliability requirement of the applications. The first algorithm is called minimizing scheduling length while satisfying the reliability requirement (MSLSRR). It first transforms the reliability requirement of the application into the reliability requirement of the task and then assigns the task to the processor with the earliest finish time. Since the reliability generated by MSLSRR is often higher than the reliability requirement of the application, and the scheduling length is also less than the deadline, an algorithm called improving energy efficiency (IEE) is designed, which redefined the minimum reliability requirement for the task and applied dynamic voltage and frequency scaling (DVFS) technique for energy conservation. The proposed algorithms are compared with existing algorithms by using real parallel applications. Experimental results demonstrate that the proposed algorithms consume the least energy.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103173"},"PeriodicalIF":4.5,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid evolution of Internet-of-Things (IoT), billions of IoT devices have connected to the Internet, collecting information via tags and sensors. For an IoT device, the application code itself and data collected by sensors can be of great commercial value. It is challenging to protect them because IoT devices are prone to compromise due to the inevitable vulnerabilities of commodity Operating Systems. Trusted Execution Environment (TEE) is one of the solutions that protects sensitive data by running security-sensitive workloads in a secure world. However, this solution does not work for most of the IoT devices that are limited in resources.
In this paper, we propose Flash Controller-based Secure Execution Environment (FCSEE), an approach to protect security-sensitive code and data for IoT devices using the flash controller. Our approach constructs a secure execution environment on the target flash memory by modifying the execution logic of its controller, leveraging it as a co-processor to execute security-sensitive workloads of the host device. By extending the original functionality of the flash firmware, FCSEE also provides several much-needed security primitives to protect sensitive data. We constructed a prototype based on a Trans-Flash (TF) card and implemented a proof of its confidentiality. Our evaluation results indicate that FCSEE can confidentially execute security-sensitive workloads from the host and efficiently protect its sensitive data.
{"title":"Flash controller-based secure execution environment for protecting code confidentiality","authors":"Zheng Zhang , Jingfeng Xue , Tian Chen , Yuhang Zhao , Weizhi Meng","doi":"10.1016/j.sysarc.2024.103172","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103172","url":null,"abstract":"<div><p>With the rapid evolution of Internet-of-Things (IoT), billions of IoT devices have connected to the Internet, collecting information via tags and sensors. For an IoT device, the application code itself and data collected by sensors can be of great commercial value. It is challenging to protect them because IoT devices are prone to compromise due to the inevitable vulnerabilities of commodity Operating Systems. Trusted Execution Environment (TEE) is one of the solutions that protects sensitive data by running security-sensitive workloads in a secure world. However, this solution does not work for most of the IoT devices that are limited in resources.</p><p>In this paper, we propose Flash Controller-based Secure Execution Environment (FCSEE), an approach to protect security-sensitive code and data for IoT devices using the flash controller. Our approach constructs a secure execution environment on the target flash memory by modifying the execution logic of its controller, leveraging it as a co-processor to execute security-sensitive workloads of the host device. By extending the original functionality of the flash firmware, FCSEE also provides several much-needed security primitives to protect sensitive data. We constructed a prototype based on a Trans-Flash (TF) card and implemented a proof of its confidentiality. Our evaluation results indicate that FCSEE can confidentially execute security-sensitive workloads from the host and efficiently protect its sensitive data.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103172"},"PeriodicalIF":4.5,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001097/pdfft?md5=ddc214324da00a88a4c83e6123dfe876&pid=1-s2.0-S1383762124001097-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140900856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06DOI: 10.1016/j.sysarc.2024.103158
Yuanhai Zhang , Shuai Zhao , Gang Chen , Haoyu Luo , Kai Huang
In industrial real-time systems, the I/O operations are often required to be both timing predictable, i.e., finish before the deadline to ensure safety, and timing accurate, i.e., start at or close to an ideal time instant for optimal I/O performance. However, for I/O-extensive systems, such strict timing requirements raise significant challenges for the scheduling of I/O operations, where execution conflicts widely exist if the I/O operations are scheduled at their ideal time instants. Existing methods mainly focus on one I/O device and apply simple heuristics to schedule I/O operations, which cannot effectively resolve execution conflicts, hence, undermining both timing predictability and accuracy. This paper proposes novel scheduling and allocation methods to maximize the timing accuracy while guaranteeing the predictability of the system. First, on one I/O device, a fine-grained schedule using Mixed Integer Linear Programming (MILP) is constructed that optimizes the timing accuracy of the I/O operations. Then, for systems containing multiple I/O devices of the same type, two novel allocations are proposed to realize parallel timing-accurate I/O control. The first utilizes MILP to further improve the timing accuracy of the system, whereas the second is a heuristic that provides competitive results with low overheads. Experimental results show the proposed methods outperform the state-of-the-art in terms of both timing predictability and accuracy by 37% and 25% on average (up to 5.56x and 33%), respectively.
{"title":"Timing-accurate scheduling and allocation for parallel I/O operations in real-time systems","authors":"Yuanhai Zhang , Shuai Zhao , Gang Chen , Haoyu Luo , Kai Huang","doi":"10.1016/j.sysarc.2024.103158","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103158","url":null,"abstract":"<div><p>In industrial real-time systems, the I/O operations are often required to be both <em>timing predictable</em>, i.e., finish before the deadline to ensure safety, and <em>timing accurate</em>, i.e., start at or close to an ideal time instant for optimal I/O performance. However, for I/O-extensive systems, such strict timing requirements raise significant challenges for the scheduling of I/O operations, where execution conflicts widely exist if the I/O operations are scheduled at their ideal time instants. Existing methods mainly focus on one I/O device and apply simple heuristics to schedule I/O operations, which cannot effectively resolve execution conflicts, hence, undermining both timing predictability and accuracy. This paper proposes novel scheduling and allocation methods to maximize the timing accuracy while guaranteeing the predictability of the system. First, on one I/O device, a fine-grained schedule using Mixed Integer Linear Programming (MILP) is constructed that optimizes the timing accuracy of the I/O operations. Then, for systems containing multiple I/O devices of the same type, two novel allocations are proposed to realize parallel timing-accurate I/O control. The first utilizes MILP to further improve the timing accuracy of the system, whereas the second is a heuristic that provides competitive results with low overheads. Experimental results show the proposed methods outperform the state-of-the-art in terms of both timing predictability and accuracy by 37% and 25% on average (up to 5.56x and 33%), respectively.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103158"},"PeriodicalIF":4.5,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1016/j.sysarc.2024.103162
Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong
Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.
{"title":"BGS: Accelerate GNN training on multiple GPUs","authors":"Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong","doi":"10.1016/j.sysarc.2024.103162","DOIUrl":"10.1016/j.sysarc.2024.103162","url":null,"abstract":"<div><p>Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103162"},"PeriodicalIF":4.5,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141035294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1016/j.sysarc.2024.103159
Doeun Kim , Jinyoung Kim , Kihan Choi , Hyuck Han , Minsoo Ryu , Sooyong Kang
Recently, the zoned namespaces (ZNS) interface has been introduced as a new interface for solid-state drives (SSD), and commercial ZNS SSDs are starting to be used for LSM-tree-based KV-stores, including RocksDB, whose log-structured write characteristics well align with the intra-zone sequential write constraint of the ZNS SSDs. The host software for ZNS SSDs, including ZenFS for RocksDB, considers the lifetime of data when allocating zones to expedite zone reclamation. It also uses a lock-based synchronization mechanism to prevent concurrent writes to a zone, together with a contention avoidance policy that avoids allocating ‘locked’ zones to increase write throughput. However, this policy seriously damages the lifetime-based zone allocation strategy, leading to increased write amplification in KV-stores that support parallel compaction. In this paper, we delve into the underlying causes of this phenomenon and propose a novel zone management scheme, Dynamic Zone Redistribution (DZR), that can be effectively used for such KV-stores. DZR enables both high throughput and low write amplification by effectively addressing the root cause. Experimental results using micro- and macro-benchmarks show that DZR significantly reduces write amplification compared with ZenFS while preserving (or even increasing) write throughput.
{"title":"Dynamic zone redistribution for key-value stores on zoned namespaces SSDs","authors":"Doeun Kim , Jinyoung Kim , Kihan Choi , Hyuck Han , Minsoo Ryu , Sooyong Kang","doi":"10.1016/j.sysarc.2024.103159","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103159","url":null,"abstract":"<div><p>Recently, the zoned namespaces (ZNS) interface has been introduced as a new interface for solid-state drives (SSD), and commercial ZNS SSDs are starting to be used for LSM-tree-based KV-stores, including RocksDB, whose log-structured write characteristics well align with the intra-zone sequential write constraint of the ZNS SSDs. The host software for ZNS SSDs, including ZenFS for RocksDB, considers the lifetime of data when allocating zones to expedite zone reclamation. It also uses a lock-based synchronization mechanism to prevent concurrent writes to a zone, together with a contention avoidance policy that avoids allocating ‘locked’ zones to increase write throughput. However, this policy seriously damages the lifetime-based zone allocation strategy, leading to increased write amplification in KV-stores that support parallel compaction. In this paper, we delve into the underlying causes of this phenomenon and propose a novel zone management scheme, Dynamic Zone Redistribution (DZR), that can be effectively used for such KV-stores. DZR enables both high throughput and low write amplification by effectively addressing the root cause. Experimental results using micro- and macro-benchmarks show that DZR significantly reduces write amplification compared with ZenFS while preserving (or even increasing) write throughput.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103159"},"PeriodicalIF":4.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140823738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1016/j.sysarc.2024.103151
Ruiqing Lei , Xianzhang Chen , Duo Liu , Chunlin Song , Yujuan Tan , Ao Ren
The ever-growing sizes and frequent updating of mobile applications cause high network and storage cost for updating. Hence, emerging mobile systems often employ incremental update algorithms, typically HDiffPatch, to upgrade mobile applications. However, we find that existing incremental update algorithms not only generate a significant amount of redundant data accesses, but also lacks of consistency guarantees for the whole package of application. In this paper, we present a novel Consistent and Efficient Incremental Update (CEIU) mechanism for upgrading mobile applications. Firstly, CEIU reduces the memory consumption and file access of incremental updates by reusing the indexes of blocks in the old image rather than copying the blocks. Secondly, CEIU employs a two-level journaling mechanism to ensure the consistency of the whole package and subfiles of the new image. We implement the proposed mechanism in Linux kernel based on TI-LFAT file system and evaluate it with real-world applications. The experimental results show that the proposed mechanism can reduce 30%–80% memory footprints in comparison with HDiffPatch, the state-of-the-art incremental update algorithm. It also significantly reduces the recovery time when power failure or system crash occurs.
{"title":"CEIU: Consistent and Efficient Incremental Update mechanism for mobile systems on flash storage","authors":"Ruiqing Lei , Xianzhang Chen , Duo Liu , Chunlin Song , Yujuan Tan , Ao Ren","doi":"10.1016/j.sysarc.2024.103151","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103151","url":null,"abstract":"<div><p>The ever-growing sizes and frequent updating of mobile applications cause high network and storage cost for updating. Hence, emerging mobile systems often employ incremental update algorithms, typically HDiffPatch, to upgrade mobile applications. However, we find that existing incremental update algorithms not only generate a significant amount of redundant data accesses, but also lacks of consistency guarantees for the whole package of application. In this paper, we present a novel Consistent and Efficient Incremental Update (CEIU) mechanism for upgrading mobile applications. Firstly, CEIU reduces the memory consumption and file access of incremental updates by reusing the indexes of blocks in the old image rather than copying the blocks. Secondly, CEIU employs a two-level journaling mechanism to ensure the consistency of the whole package and subfiles of the new image. We implement the proposed mechanism in Linux kernel based on TI-LFAT file system and evaluate it with real-world applications. The experimental results show that the proposed mechanism can reduce 30%–80% memory footprints in comparison with HDiffPatch, the state-of-the-art incremental update algorithm. It also significantly reduces the recovery time when power failure or system crash occurs.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103151"},"PeriodicalIF":4.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1016/j.sysarc.2024.103163
Yizhong Liu , Zixiao Jia , Zixu Jiang , Xun Lin , Jianwei Liu , Qianhong Wu , Willy Susilo
Federated learning, involving a central server and multiple clients, aims to keep data local but raises privacy concerns like data exposure and participation privacy. Secure aggregation, especially with pairwise masking, preserves privacy without accuracy loss. Yet, issues persist like security against malicious models, central server fault tolerance, and trust in decryption keys. Resolving these challenges is vital for advancing secure federated learning systems. In this paper, we present BFL-SA, a blockchain-based federated learning scheme via enhanced secure aggregation, which addresses key challenges by integrating blockchain consensus, publicly verifiable secret sharing, and an overdue gradients aggregation module. These enhancements significantly boost security and fault tolerance while improving the efficiency of data utilization in the secure aggregation process. After security analysis, we have demonstrated that BFL-SA achieves secure aggregation even in malicious models. Through experimental comparative analysis, BFL-SA exhibits rapid secure aggregation speed and achieves 100% model aggregation accuracy.
{"title":"BFL-SA: Blockchain-based federated learning via enhanced secure aggregation","authors":"Yizhong Liu , Zixiao Jia , Zixu Jiang , Xun Lin , Jianwei Liu , Qianhong Wu , Willy Susilo","doi":"10.1016/j.sysarc.2024.103163","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103163","url":null,"abstract":"<div><p>Federated learning, involving a central server and multiple clients, aims to keep data local but raises privacy concerns like data exposure and participation privacy. Secure aggregation, especially with pairwise masking, preserves privacy without accuracy loss. Yet, issues persist like security against malicious models, central server fault tolerance, and trust in decryption keys. Resolving these challenges is vital for advancing secure federated learning systems. In this paper, we present BFL-SA, a blockchain-based federated learning scheme via enhanced secure aggregation, which addresses key challenges by integrating blockchain consensus, publicly verifiable secret sharing, and an overdue gradients aggregation module. These enhancements significantly boost security and fault tolerance while improving the efficiency of data utilization in the secure aggregation process. After security analysis, we have demonstrated that BFL-SA achieves secure aggregation even in malicious models. Through experimental comparative analysis, BFL-SA exhibits rapid secure aggregation speed and achieves 100% model aggregation accuracy.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103163"},"PeriodicalIF":4.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140900855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1016/j.sysarc.2024.103161
Eunjin Choi , Jina Park , Kyeongwon Lee , Jae-Jin Lee , Kyuseung Han , Woojoo Lee
In healthcare, anomaly detection has emerged as a central application. This study presents an ultra-low power processor tailored for wearable devices dedicated to anomaly detection. Introducing a unique Day–Night architecture, the processor is bifurcated into two distinct segments: The Day segment and the Night segment, both of which function autonomously. The Day segment, catering to generic wearable applications, is designed to remain largely inactive, awakening only for specific tasks. This approach leads to considerable power savings by incorporating the Main-CPU and system interconnect, both major power consumers. Conversely, the Night segment is dedicated to real-time anomaly detection using sensor data analytics. It comprises a Sub-CPU and a minimal set of IPs, operating continuously but with minimized power consumption. To further enhance this architecture, the paper presents an ultra-lightweight RISC-V core, All-Night core, specialized for anomaly detection applications, replacing the traditional Sub-CPU. To validate the Day–Night architecture, we developed a prototype processor and implemented it on an FPGA board. An anomaly detection application, optimized for this prototype, was also developed to showcase its functional prowess. Finally, when we synthesized the processor prototype using 45 nm process technology, it affirmed our assertion of achieving an energy reduction of up to 57%.
在医疗保健领域,异常检测已成为一项核心应用。本研究提出了一种为可穿戴设备量身定制的超低功耗处理器,专门用于异常检测。该处理器采用独特的昼夜架构,分为两个不同的部分:日用部分和夜间部分,这两个部分均可独立运行。日间部分主要针对一般的可穿戴应用,设计成基本不活动,只有在执行特定任务时才会被唤醒。这种方法通过整合主 CPU 和系统互连这两个耗电大户,大大节省了功耗。相反,"夜间 "部分专门用于利用传感器数据分析进行实时异常检测。它由一个子 CPU 和一组最小的 IP 组成,可连续运行,但功耗最小。为了进一步增强这一架构,本文提出了一个超轻量级 RISC-V 内核--All-Night 内核,专门用于异常检测应用,取代了传统的 Sub-CPU。为了验证 "昼夜 "架构,我们开发了一个原型处理器,并在 FPGA 板上实现。我们还开发了一个针对该原型进行优化的异常检测应用,以展示其强大的功能。最后,当我们使用 45 纳米工艺技术合成处理器原型时,它证实了我们的说法,即实现了高达 57% 的能耗降低。
{"title":"Day–Night architecture: Development of an ultra-low power RISC-V processor for wearable anomaly detection","authors":"Eunjin Choi , Jina Park , Kyeongwon Lee , Jae-Jin Lee , Kyuseung Han , Woojoo Lee","doi":"10.1016/j.sysarc.2024.103161","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103161","url":null,"abstract":"<div><p>In healthcare, anomaly detection has emerged as a central application. This study presents an ultra-low power processor tailored for wearable devices dedicated to anomaly detection. Introducing a unique <em>Day–Night</em> architecture, the processor is bifurcated into two distinct segments: The <em>Day</em> segment and the <em>Night</em> segment, both of which function autonomously. The Day segment, catering to generic wearable applications, is designed to remain largely inactive, awakening only for specific tasks. This approach leads to considerable power savings by incorporating the Main-CPU and system interconnect, both major power consumers. Conversely, the Night segment is dedicated to real-time anomaly detection using sensor data analytics. It comprises a Sub-CPU and a minimal set of IPs, operating continuously but with minimized power consumption. To further enhance this architecture, the paper presents an ultra-lightweight RISC-V core, <em>All-Night</em> core, specialized for anomaly detection applications, replacing the traditional Sub-CPU. To validate the Day–Night architecture, we developed a prototype processor and implemented it on an FPGA board. An anomaly detection application, optimized for this prototype, was also developed to showcase its functional prowess. Finally, when we synthesized the processor prototype using 45 nm process technology, it affirmed our assertion of achieving an energy reduction of up to 57%.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103161"},"PeriodicalIF":4.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124000985/pdfft?md5=3dbd70dbdc0c83e130f00806758df490&pid=1-s2.0-S1383762124000985-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140879289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1016/j.sysarc.2024.103160
Xiaohui Wei, Xiaonan Wang, Yumin Yan, Nan Jiang, Hengshan Yue
DNNs have become pervasive in many security–critical scenarios such as autonomous vehicles and medical diagnoses. Recent studies reveal the susceptibility of DNNs to various adversarial attacks, among which weight Bit-Flip Attacks (BFA) is emerging as a significant security concern. Moreover, Targeted Bit-Flip Attacks (T-BFA), as a novel variant of BFA, can stealthily alter specific source–target classifications while preserving accurate classifications of non-target classes, posing a more severe threat. However, due to the inadequate consideration for T-BFA’s “targeted” characteristic, existing defense mechanisms tend to perform over-protection/-modification to the network, leading to significant defense overheads or non-negligible DNN accuracy reduction.
In this work, we propose ALERT, ALightweight defense mechanism for Enhancing DNN Robustness against T-BFA while maintaining network accuracy. Firstly, fully understanding the key factors that dominate the misclassification among source–target class pairs, we propose a Source-Target-Aware Searching (STAS) method to accurately identify the vulnerable weights under T-BFA. Secondly, leveraging the intrinsic redundancy characteristic of DNNs, we propose a weight random switch mechanism to reduce the exposure of vulnerable weights, thereby weakening the expected impact of T-BFA. Striking a delicate balance between enhancing robustness and preserving network accuracy, we develop a metric to meticulously select candidate weights. Finally, to further enhance the DNN robustness, we present a lightweight runtime monitoring mechanism for detecting T-BFA through weight signature verification, and dynamically optimize the weight random switch strategy accordingly. Evaluation results demonstrate that our proposed method effectively enhances the robustness of DNNs against T-BFA while maintaining network accuracy. Compared with the baseline, our method can tolerate more flipped bits with negligible accuracy loss ( in ResNet-50).
{"title":"ALERT: A lightweight defense mechanism for enhancing DNN robustness against T-BFA","authors":"Xiaohui Wei, Xiaonan Wang, Yumin Yan, Nan Jiang, Hengshan Yue","doi":"10.1016/j.sysarc.2024.103160","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103160","url":null,"abstract":"<div><p>DNNs have become pervasive in many security–critical scenarios such as autonomous vehicles and medical diagnoses. Recent studies reveal the susceptibility of DNNs to various adversarial attacks, among which weight Bit-Flip Attacks (BFA) is emerging as a significant security concern. Moreover, Targeted Bit-Flip Attacks (T-BFA), as a novel variant of BFA, can stealthily alter specific source–target classifications while preserving accurate classifications of non-target classes, posing a more severe threat. However, due to the inadequate consideration for T-BFA’s “targeted” characteristic, existing defense mechanisms tend to perform over-protection/-modification to the network, leading to significant defense overheads or non-negligible DNN accuracy reduction.</p><p>In this work, we propose <u><em>ALERT</em></u>, <u><em>A</em></u> <u><em>L</em></u>ightweight defense mechanism for <u><em>E</em></u>nhancing DNN <u><em>R</em></u>obustness against <u><em>T</em></u>-BFA while maintaining network accuracy. Firstly, fully understanding the key factors that dominate the misclassification among source–target class pairs, we propose a Source-Target-Aware Searching (STAS) method to accurately identify the vulnerable weights under T-BFA. Secondly, leveraging the intrinsic redundancy characteristic of DNNs, we propose a weight random switch mechanism to reduce the exposure of vulnerable weights, thereby weakening the expected impact of T-BFA. Striking a delicate balance between enhancing robustness and preserving network accuracy, we develop a metric to meticulously select candidate weights. Finally, to further enhance the DNN robustness, we present a lightweight runtime monitoring mechanism for detecting T-BFA through weight signature verification, and dynamically optimize the weight random switch strategy accordingly. Evaluation results demonstrate that our proposed method effectively enhances the robustness of DNNs against T-BFA while maintaining network accuracy. Compared with the baseline, our method can tolerate <span><math><mrow><mn>6</mn><mo>.</mo><mn>7</mn><mo>×</mo></mrow></math></span> more flipped bits with negligible accuracy loss (<span><math><mrow><mo><</mo><mn>0</mn><mo>.</mo><mn>1</mn><mtext>%</mtext></mrow></math></span> in ResNet-50).</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103160"},"PeriodicalIF":4.5,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140893404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}