Pub Date : 2024-07-23DOI: 10.1016/j.sysarc.2024.103243
Zohaib Hassan, Aleksandr Ometov, Elena Simona Lohan, Jari Nurmi
Emerging communication technologies, such as 5G and beyond, have introduced diverse requirements that demand high performance and energy efficiency at all levels. Furthermore, the real-time requirements of different services vary significantly — increasing the baseband processor design complexity and demand for flexible hardware platforms. This paper identifies the key characteristics of hardware platforms for baseband processing and describes the existing processing limitations in traditional architectures. In this paper, Coarse-Grained Reconfigurable Architecture (CGRA) is examined as a prospective hardware platform and its characteristic features are highlighted as compared to traditionally employed architectures that make it a suitable candidate for incorporation as a domain-specific accelerator in baseband processing applications. We survey various CGRAs from the last two decades (2004-2023) and analyze their distinct architectural features which can serve as a reference while designing CGRAs for baseband processing applications. Moreover, we investigate the existing challenges toward developing CGRAs for baseband processing and explore their potential solutions. We also provide an overview of the emerging research directions for CGRA and how they can contribute toward the development of advanced baseband processors. Lastly, we highlight a conceptual RISC-V+CGRA framework that can serve as a potential direction toward integrating CGRA in future baseband processing systems.
{"title":"Coarse-grained reconfigurable architectures for radio baseband processing: A survey","authors":"Zohaib Hassan, Aleksandr Ometov, Elena Simona Lohan, Jari Nurmi","doi":"10.1016/j.sysarc.2024.103243","DOIUrl":"10.1016/j.sysarc.2024.103243","url":null,"abstract":"<div><p>Emerging communication technologies, such as 5G and beyond, have introduced diverse requirements that demand high performance and energy efficiency at all levels. Furthermore, the real-time requirements of different services vary significantly — increasing the baseband processor design complexity and demand for flexible hardware platforms. This paper identifies the key characteristics of hardware platforms for baseband processing and describes the existing processing limitations in traditional architectures. In this paper, Coarse-Grained Reconfigurable Architecture (CGRA) is examined as a prospective hardware platform and its characteristic features are highlighted as compared to traditionally employed architectures that make it a suitable candidate for incorporation as a domain-specific accelerator in baseband processing applications. We survey various CGRAs from the last two decades (2004-2023) and analyze their distinct architectural features which can serve as a reference while designing CGRAs for baseband processing applications. Moreover, we investigate the existing challenges toward developing CGRAs for baseband processing and explore their potential solutions. We also provide an overview of the emerging research directions for CGRA and how they can contribute toward the development of advanced baseband processors. Lastly, we highlight a conceptual RISC-V+CGRA framework that can serve as a potential direction toward integrating CGRA in future baseband processing systems.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103243"},"PeriodicalIF":3.7,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001802/pdfft?md5=7be2071289f906c1ad056f84de0a2459&pid=1-s2.0-S1383762124001802-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1016/j.sysarc.2024.103238
Luca Abeni
Real-time virtualization is currently a hot topic, and there is much ongoing research on real-time Virtual Machines and hypervisors. However, most of the previous research focused either on reducing the latencies introduced by the virtualization stack (hypervisor, host Operating System, Virtual Machine scheduling, etc...) or analyzing the virtual CPU scheduling algorithms. Only a few works investigated the impact of the guest Operating System architecture on real-time performance or considered multiple performance metrics (latency, schedulability, startup times, resource consumption) at the same time. This paper compares various features of different virtualization technologies and guest Operating Systems, evaluating their suitability for serving real-time applications. The results indicate that solutions based on KVM (and an appropriate microvm) and the OSv unikernel can be considered viable alternatives to more traditional VMs or containers.
实时虚拟化是当前的热门话题,有关实时虚拟机和管理程序的研究也在不断深入。然而,以前的研究大多集中于减少虚拟化堆栈(管理程序、主机操作系统、虚拟机调度等)带来的延迟,或分析虚拟 CPU 调度算法。只有少数著作研究了客户操作系统架构对实时性能的影响,或同时考虑了多个性能指标(延迟、可调度性、启动时间、资源消耗)。本文比较了不同虚拟化技术和客户操作系统的各种特性,评估了它们对服务实时应用的适用性。结果表明,基于 KVM(和适当的 microvm)和 OSv unikernel 的解决方案可被视为更传统的虚拟机或容器的可行替代方案。
{"title":"Virtualized real-time workloads in containers and virtual machines","authors":"Luca Abeni","doi":"10.1016/j.sysarc.2024.103238","DOIUrl":"10.1016/j.sysarc.2024.103238","url":null,"abstract":"<div><p>Real-time virtualization is currently a hot topic, and there is much ongoing research on real-time Virtual Machines and hypervisors. However, most of the previous research focused either on reducing the latencies introduced by the virtualization stack (hypervisor, host Operating System, Virtual Machine scheduling, etc...) or analyzing the virtual CPU scheduling algorithms. Only a few works investigated the impact of the guest Operating System architecture on real-time performance or considered multiple performance metrics (latency, schedulability, startup times, resource consumption) at the same time. This paper compares various features of different virtualization technologies and guest Operating Systems, evaluating their suitability for serving real-time applications. The results indicate that solutions based on KVM (and an appropriate microvm) and the OSv unikernel can be considered viable alternatives to more traditional VMs or containers.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103238"},"PeriodicalIF":3.7,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001759/pdfft?md5=47eb3b51125a16edd19d29b9dd7294a7&pid=1-s2.0-S1383762124001759-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-20DOI: 10.1016/j.sysarc.2024.103242
Jon Gutiérrez-Zaballa , Koldo Basterretxea , Javier Echanobe
As the deployment of artificial intelligence (AI) algorithms at edge devices becomes increasingly prevalent, enhancing the robustness and reliability of autonomous AI-based perception and decision systems is becoming as relevant as precision and performance, especially in applications areas considered safety-critical such as autonomous driving and aerospace. This paper delves into the robustness assessment in embedded Deep Neural Networks (DNNs), particularly focusing on the impact of parameter perturbations produced by single event upsets (SEUs) on convolutional neural networks (CNN) for image semantic segmentation. By scrutinizing the layer-by-layer and bit-by-bit sensitivity of various encoder–decoder models to soft errors, this study thoroughly investigates the vulnerability of segmentation DNNs to SEUs and evaluates the consequences of techniques like model pruning and parameter quantization on the robustness of compressed models aimed at embedded implementations. The findings offer valuable insights into the mechanisms underlying SEU-induced failures that allow for evaluating the robustness of DNNs once trained in advance. Moreover, based on the collected data, we propose a set of practical lightweight error mitigation techniques with no memory or computational cost suitable for resource-constrained deployments. The code used to perform the fault injection (FI) campaign is available at https://github.com/jonGuti13/TensorFI2, while the code to implement proposed techniques is available at https://github.com/jonGuti13/parameterProtection.
随着人工智能(AI)算法在边缘设备上的部署日益普及,提高基于人工智能的自主感知和决策系统的鲁棒性和可靠性正变得与精度和性能同等重要,尤其是在自动驾驶和航空航天等被视为安全关键的应用领域。本文深入探讨了嵌入式深度神经网络(DNN)的鲁棒性评估,尤其关注了单次事件中断(SEU)产生的参数扰动对用于图像语义分割的卷积神经网络(CNN)的影响。通过仔细研究各种编码器-解码器模型对软误差的逐层和逐位敏感性,本研究深入探讨了分割 DNN 对 SEU 的脆弱性,并评估了模型剪枝和参数量化等技术对以嵌入式实现为目标的压缩模型的鲁棒性的影响。研究结果为了解 SEU 引发故障的机制提供了宝贵的见解,从而可以评估 DNN 预先训练后的鲁棒性。此外,基于收集到的数据,我们提出了一套实用的轻量级错误缓解技术,无需内存或计算成本,适用于资源受限的部署。用于执行故障注入(FI)活动的代码见 ,而用于实现建议技术的代码见 。
{"title":"Evaluating single event upsets in deep neural networks for semantic segmentation: An embedded system perspective","authors":"Jon Gutiérrez-Zaballa , Koldo Basterretxea , Javier Echanobe","doi":"10.1016/j.sysarc.2024.103242","DOIUrl":"10.1016/j.sysarc.2024.103242","url":null,"abstract":"<div><p>As the deployment of artificial intelligence (AI) algorithms at edge devices becomes increasingly prevalent, enhancing the robustness and reliability of autonomous AI-based perception and decision systems is becoming as relevant as precision and performance, especially in applications areas considered safety-critical such as autonomous driving and aerospace. This paper delves into the robustness assessment in embedded Deep Neural Networks (DNNs), particularly focusing on the impact of parameter perturbations produced by single event upsets (SEUs) on convolutional neural networks (CNN) for image semantic segmentation. By scrutinizing the layer-by-layer and bit-by-bit sensitivity of various encoder–decoder models to soft errors, this study thoroughly investigates the vulnerability of segmentation DNNs to SEUs and evaluates the consequences of techniques like model pruning and parameter quantization on the robustness of compressed models aimed at embedded implementations. The findings offer valuable insights into the mechanisms underlying SEU-induced failures that allow for evaluating the robustness of DNNs once trained in advance. Moreover, based on the collected data, we propose a set of practical lightweight error mitigation techniques with no memory or computational cost suitable for resource-constrained deployments. The code used to perform the fault injection (FI) campaign is available at <span><span>https://github.com/jonGuti13/TensorFI2</span><svg><path></path></svg></span>, while the code to implement proposed techniques is available at <span><span>https://github.com/jonGuti13/parameterProtection</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103242"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001796/pdfft?md5=7c38696062f91b81f83a171baa075ab6&pid=1-s2.0-S1383762124001796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heart rate variability (HRV) is a vital sign with the potential to predict stress and various diseases, including heart attack and arrhythmia. Typically, hospitals utilize electrocardiogram (ECG) devices to capture the heart’s bioelectrical signals, which are then used to calculate HRV values. However, this method is costly and inconvenient due to the requirement for stable connections to the body. In recent years, photoplethysmography (PPG) sensors, which collect reflective light signals, have gained attention as a cost-effective alternative for measuring heart health. However, accurately estimating HRV using PPG signals remains a challenging task due to the inherent sensitivity of PPG sensors. To address the challenges, this paper presents an on-device, low-cost machine learning-based system that aims to achieve high-accuracy HRV estimation in real-time. Firstly, we propose a novel unified performance and resource-aware neural network (UP-RaNN) search method that leverages grid search techniques to identify a neural network model that can deliver both high HRV accuracy and smooth operation on resource-limited devices. Secondly, we design a real-time HRV monitoring system using a resource-limited, ultra-low-power microcontroller unit (MCU). This system utilizes the neural network model obtained through the UP-RaNN to provide HRV readings from PPG data in real-time. Thirdly, we evaluate the proposed UP-RaNN method and the real-time HRV monitoring system by comparing its performance to state-of-the-art studies. Moreover, the system is enhanced with adaptive reconfiguration capability, enabling it to improve energy efficiency and adapt to varying demands during runtime. The results demonstrate that when deployed on an MSP430FR5994 development board running at 8 MHz, the trained deep neural network model obtained through our proposed UP-RaNN achieves HRV estimation in just 0.3 s per inference. Additionally, the model exhibits a better mean absolute percentage error ( 5.8%) than the state-of-the-art HRV estimation methods using PPG, while significantly reducing model complexity and computational time.
{"title":"Real-time intelligent on-device monitoring of heart rate variability with PPG sensors","authors":"Jingye Xu, Yuntong Zhang, Mimi Xie, Wei Wang, Dakai Zhu","doi":"10.1016/j.sysarc.2024.103240","DOIUrl":"10.1016/j.sysarc.2024.103240","url":null,"abstract":"<div><p>Heart rate variability (HRV) is a vital sign with the potential to predict stress and various diseases, including heart attack and arrhythmia. Typically, hospitals utilize electrocardiogram (ECG) devices to capture the heart’s bioelectrical signals, which are then used to calculate HRV values. However, this method is costly and inconvenient due to the requirement for stable connections to the body. In recent years, photoplethysmography (PPG) sensors, which collect reflective light signals, have gained attention as a cost-effective alternative for measuring heart health. However, accurately estimating HRV using PPG signals remains a challenging task due to the inherent sensitivity of PPG sensors. To address the challenges, this paper presents an on-device, low-cost machine learning-based system that aims to achieve high-accuracy HRV estimation in real-time. Firstly, we propose a novel unified performance and resource-aware neural network (UP-RaNN) search method that leverages grid search techniques to identify a neural network model that can deliver both high HRV accuracy and smooth operation on resource-limited devices. Secondly, we design a real-time HRV monitoring system using a resource-limited, ultra-low-power microcontroller unit (MCU). This system utilizes the neural network model obtained through the UP-RaNN to provide HRV readings from PPG data in real-time. Thirdly, we evaluate the proposed UP-RaNN method and the real-time HRV monitoring system by comparing its performance to state-of-the-art studies. Moreover, the system is enhanced with adaptive reconfiguration capability, enabling it to improve energy efficiency and adapt to varying demands during runtime. The results demonstrate that when deployed on an MSP430FR5994 development board running at 8 MHz, the trained deep neural network model obtained through our proposed UP-RaNN achieves HRV estimation in just 0.3 s per inference. Additionally, the model exhibits a better mean absolute percentage error (<span><math><mo>∼</mo></math></span> 5.8%) than the state-of-the-art HRV estimation methods using PPG, while significantly reducing model complexity and computational time.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103240"},"PeriodicalIF":3.7,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autonomous vehicles are increasingly utilized in safety-critical and time-sensitive settings like urban environments and competitive racing. Planning maneuvers ahead is pivotal in these scenarios, where the onboard compute platform determines the vehicle’s future actions. This paper introduces an optimized implementation of the Frenet Path Planner, a renowned path planning algorithm, accelerated through GPU processing. Unlike existing methods, our approach expedites the entire algorithm, encompassing path generation and collision avoidance. We gauge the execution time of our implementation, showcasing significant enhancements over the CPU baseline (up to 22x of speedup). Furthermore, we assess the influence of different precision types (double, float, half) on trajectory accuracy, probing the balance between completion speed and computational precision. Moreover, we analyzed the impact on the execution time caused by the use of Nvidia Unified Memory and by the interference caused by other processes running on the same system. We also evaluate our implementation using the F1tenth simulator and in a real race scenario. The results position our implementation as a strong candidate for the new state-of-the-art implementation for the Frenet Path Planner algorithm.
{"title":"GPU implementation of the Frenet Path Planner for embedded autonomous systems: A case study in the F1tenth scenario","authors":"Filippo Muzzini , Nicola Capodieci , Federico Ramanzin , Paolo Burgio","doi":"10.1016/j.sysarc.2024.103239","DOIUrl":"10.1016/j.sysarc.2024.103239","url":null,"abstract":"<div><p>Autonomous vehicles are increasingly utilized in safety-critical and time-sensitive settings like urban environments and competitive racing. Planning maneuvers ahead is pivotal in these scenarios, where the onboard compute platform determines the vehicle’s future actions. This paper introduces an optimized implementation of the Frenet Path Planner, a renowned path planning algorithm, accelerated through GPU processing. Unlike existing methods, our approach expedites the entire algorithm, encompassing path generation and collision avoidance. We gauge the execution time of our implementation, showcasing significant enhancements over the CPU baseline (up to 22x of speedup). Furthermore, we assess the influence of different precision types (double, float, half) on trajectory accuracy, probing the balance between completion speed and computational precision. Moreover, we analyzed the impact on the execution time caused by the use of Nvidia Unified Memory and by the interference caused by other processes running on the same system. We also evaluate our implementation using the F1tenth simulator and in a real race scenario. The results position our implementation as a strong candidate for the new state-of-the-art implementation for the Frenet Path Planner algorithm.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103239"},"PeriodicalIF":3.7,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001760/pdfft?md5=766650cf5ed2c70a29ce4dfeae8db630&pid=1-s2.0-S1383762124001760-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1016/j.sysarc.2024.103237
Yuan Wei, Yongjun Wang, Lei Zhou, Xu Zhou, Zhiyuan Jiang
The security of embedded firmware has become a critical issue in light of the rapid development of the Internet of Things. Current security analysis approaches, such as dynamic analysis, still face bottlenecks and difficulties due to the wide variety of devices and systems. Recent dynamic analysis approaches for embedded firmware have attempted to provide a general solution but heavily rely on detailed device manuals. Meanwhile, approaches that do not rely on manuals have randomness in interrupt triggering, which weakens emulation fidelity and dynamic analysis efficiency. In this paper, we propose a redundant-check-based embedded firmware interrupt modeling and security analysis method that does not rely on commercial manuals. This method involves reverse engineering the control flow of firmware binary and accurately extracting the correct interrupt triggering rules to emulate embedded firmware. We have implemented functional prototypes on QEMU, called IEmu, and evaluated it with 26 firmware in different MCUs. Our results demonstrate significant advantages compared to the recent state-of-the-art approach. On average, IEmu has improved interrupt path exploration efficiency by 2.4 times and fuzz testing coverage by 19%. IEmu restored the interrupt triggering logic in the manual, and emulated three firmware where the state-of-the-art emulator have limitations and found vulnerabilities.
{"title":"IEmu: Interrupt modeling from the logic hidden in the firmware","authors":"Yuan Wei, Yongjun Wang, Lei Zhou, Xu Zhou, Zhiyuan Jiang","doi":"10.1016/j.sysarc.2024.103237","DOIUrl":"10.1016/j.sysarc.2024.103237","url":null,"abstract":"<div><p>The security of embedded firmware has become a critical issue in light of the rapid development of the Internet of Things. Current security analysis approaches, such as dynamic analysis, still face bottlenecks and difficulties due to the wide variety of devices and systems. Recent dynamic analysis approaches for embedded firmware have attempted to provide a general solution but heavily rely on detailed device manuals. Meanwhile, approaches that do not rely on manuals have randomness in interrupt triggering, which weakens emulation fidelity and dynamic analysis efficiency. In this paper, we propose a redundant-check-based embedded firmware interrupt modeling and security analysis method that does not rely on commercial manuals. This method involves reverse engineering the control flow of firmware binary and accurately extracting the correct interrupt triggering rules to emulate embedded firmware. We have implemented functional prototypes on QEMU, called <span>IEmu</span>, and evaluated it with 26 firmware in different MCUs. Our results demonstrate significant advantages compared to the recent state-of-the-art approach. On average, <span>IEmu</span> has improved interrupt path exploration efficiency by 2.4 times and fuzz testing coverage by 19%. <span>IEmu</span> restored the interrupt triggering logic in the manual, and emulated three firmware where the state-of-the-art emulator have limitations and found vulnerabilities.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103237"},"PeriodicalIF":3.7,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1016/j.sysarc.2024.103231
Yunfei Yao , Huiyan Chen , Ke Wang , Haoyang Yu , Yu Wang , Qingnan Wang
With the popularity of cloud computing, a large number of personal and corporate data are outsourced to the cloud. This centralized storage and processing mode not only improves the efficiency of data processing, but also brings challenges to data security and privacy. In view of the development of quantum computing, the traditional public key encryption may no longer be secure in the future. Therefore, it is particularly important to study the public key authenticated keyword searchable encryption (PEAKS) scheme which can resist quantum. This scheme can provide double guarantee for the data in cloud storage: one is to ensure the confidentiality of the data, so that the original content cannot be decrypted even within the cloud service provider; the other is to allow users to perform keyword searches on encrypted data without decrypting the data. This provides a balance of security and efficiency for data search and analysis in cloud computing environment. Recently, Genise et al. designed a Gadget-based iNTRU trapdoor, which has the advantages of small size and high efficiency. Therefore, we design an efficient and secure public key authentication keyword searchable encryption scheme based on iNTRU lattice. The overall running time of this scheme is only more than 300ms.
{"title":"Efficient iNTRU-based public key authentication keyword searchable encryption in cloud computing","authors":"Yunfei Yao , Huiyan Chen , Ke Wang , Haoyang Yu , Yu Wang , Qingnan Wang","doi":"10.1016/j.sysarc.2024.103231","DOIUrl":"10.1016/j.sysarc.2024.103231","url":null,"abstract":"<div><p>With the popularity of cloud computing, a large number of personal and corporate data are outsourced to the cloud. This centralized storage and processing mode not only improves the efficiency of data processing, but also brings challenges to data security and privacy. In view of the development of quantum computing, the traditional public key encryption may no longer be secure in the future. Therefore, it is particularly important to study the public key authenticated keyword searchable encryption (PEAKS) scheme which can resist quantum. This scheme can provide double guarantee for the data in cloud storage: one is to ensure the confidentiality of the data, so that the original content cannot be decrypted even within the cloud service provider; the other is to allow users to perform keyword searches on encrypted data without decrypting the data. This provides a balance of security and efficiency for data search and analysis in cloud computing environment. Recently, Genise et al. designed a Gadget-based iNTRU trapdoor, which has the advantages of small size and high efficiency. Therefore, we design an efficient and secure public key authentication keyword searchable encryption scheme based on iNTRU lattice. The overall running time of this scheme is only more than 300ms.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103231"},"PeriodicalIF":3.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001681/pdfft?md5=787f51baa5f439d50792c9b2134f0118&pid=1-s2.0-S1383762124001681-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141694319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1016/j.sysarc.2024.103235
Abdalla Hadabi , Zheng Qu , Kuo-Hui Yeh , Chien-Ming Chen , Saru Kumari , Hu Xiong
Preserving the confidentiality and integrity of data transmission is paramount in the Internet of Things (IoT)-based healthcare systems. Current encryption techniques that allow plaintext checks primarily serve a specific cryptosystem, lacking the adaptability to work with a diverse system incorporating various cryptographic methods. To address this, we present a unique online/offline heterogeneous signcryption scheme with plaintext checkable encryption (HOOSC-PCE). This approach enables the transition of signcrypted messages from an identity-based cryptosystem (IBC) to a public key infrastructure (PKI) system, improving information interoperability. A key aspect of our scheme is its capacity to allow cloud servers to perform plaintext queries, facilitating efficient data searches using plaintext keywords over encrypted data. Furthermore, the signcryption process is divided into online and offline phases. The online phase handles tasks that require fewer resources, while the offline phase carries out more resource-intensive preparatory tasks. We rigorously tested the HOOSC-PCE scheme’s security by proving it secure under the Random Oracle Model (ROM). Meanwhile, compared to similar work, it effectively reduces computation costs by 46.39%, 19.45%, 18.73%, and 13.25% across offline encryption, online encryption, decryption, and search algorithms. The results indicate that the HOOSC-PCE is secure and efficient, confirming its feasibility for IoT-based healthcare systems.
{"title":"Heterogeneous and plaintext checkable signcryption for integrating IoT in healthcare system","authors":"Abdalla Hadabi , Zheng Qu , Kuo-Hui Yeh , Chien-Ming Chen , Saru Kumari , Hu Xiong","doi":"10.1016/j.sysarc.2024.103235","DOIUrl":"10.1016/j.sysarc.2024.103235","url":null,"abstract":"<div><p>Preserving the confidentiality and integrity of data transmission is paramount in the Internet of Things (IoT)-based healthcare systems. Current encryption techniques that allow plaintext checks primarily serve a specific cryptosystem, lacking the adaptability to work with a diverse system incorporating various cryptographic methods. To address this, we present a unique online/offline heterogeneous signcryption scheme with plaintext checkable encryption (HOOSC-PCE). This approach enables the transition of signcrypted messages from an identity-based cryptosystem (IBC) to a public key infrastructure (PKI) system, improving information interoperability. A key aspect of our scheme is its capacity to allow cloud servers to perform plaintext queries, facilitating efficient data searches using plaintext keywords over encrypted data. Furthermore, the signcryption process is divided into online and offline phases. The online phase handles tasks that require fewer resources, while the offline phase carries out more resource-intensive preparatory tasks. We rigorously tested the HOOSC-PCE scheme’s security by proving it secure under the Random Oracle Model (ROM). Meanwhile, compared to similar work, it effectively reduces computation costs by 46.39%, 19.45%, 18.73%, and 13.25% across offline encryption, online encryption, decryption, and search algorithms. The results indicate that the HOOSC-PCE is secure and efficient, confirming its feasibility for IoT-based healthcare systems.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103235"},"PeriodicalIF":3.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141694851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1016/j.sysarc.2024.103236
Nitish Satya Murthy, Francky Catthoor, Marian Verhelst
A continuing rise in DNN usage in distributed and embedded use cases has demanded more efficient hardware execution in the field. Low-precision GeMMs with optimized data formats have played a key role in more memory and computationally-efficient networks. Recently trending formats are block-scaled representations stemming from tight HW-SW co-optimization, that compress network size by sharing exponents per data block. Prior work mostly focuses on deploying such block-scaled GeMM operations on domain-specific accelerators for optimum efficiency at the cost of flexibility and ease of deployment. In this work, we exploit and optimize the deployment of block-scaled GeMMs on fully-programmable in-order vector processors using ARM SVE. We define a systematic methodology for performing design space exploration to optimally match the workload specifications with processor vector-lengths, different microkernels, block sizes and shapes. We introduce efficient intrinsics-based microkernels with effective loop unrollings, and data-transfer efficient fused requantization strategies to maximize kernel performance, while also ensuring several deployment configurations. We enable generalized block-scaled kernel deployments through tunable block sizes and shapes, which helps in accommodating different accuracy-speed trade-off requirements. Utilizing 2D activation blocks instead of conventional 1D blocks, the static and dynamic BS-INT8 configurations yielded on average 3.8x and 2.9x faster speedups over FP32 models respectively, at no accuracy loss for CNN classification tasks on CIFAR10/100 datasets.
{"title":"Optimization of block-scaled integer GeMMs for efficient DNN deployment on scalable in-order vector processors","authors":"Nitish Satya Murthy, Francky Catthoor, Marian Verhelst","doi":"10.1016/j.sysarc.2024.103236","DOIUrl":"10.1016/j.sysarc.2024.103236","url":null,"abstract":"<div><p>A continuing rise in DNN usage in distributed and embedded use cases has demanded more efficient hardware execution in the field. Low-precision GeMMs with optimized data formats have played a key role in more memory and computationally-efficient networks. Recently trending formats are block-scaled representations stemming from tight HW-SW co-optimization, that compress network size by sharing exponents per data block. Prior work mostly focuses on deploying such block-scaled GeMM operations on domain-specific accelerators for optimum efficiency at the cost of flexibility and ease of deployment. In this work, we exploit and optimize the deployment of block-scaled GeMMs on fully-programmable in-order vector processors using ARM SVE. We define a systematic methodology for performing design space exploration to optimally match the workload specifications with processor vector-lengths, different microkernels, block sizes and shapes. We introduce efficient intrinsics-based microkernels with effective loop unrollings, and data-transfer efficient fused requantization strategies to maximize kernel performance, while also ensuring several deployment configurations. We enable generalized block-scaled kernel deployments through tunable block sizes and shapes, which helps in accommodating different accuracy-speed trade-off requirements. Utilizing 2D activation blocks instead of conventional 1D blocks, the static and dynamic BS-INT8 configurations yielded on average 3.8x and 2.9x faster speedups over FP32 models respectively, at no accuracy loss for CNN classification tasks on CIFAR10/100 datasets.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103236"},"PeriodicalIF":3.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141694195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1016/j.sysarc.2024.103234
E. Tinto, T. Vardanega
Hard real-time systems, characterized by stringent timeliness requirements, occur in an increasing variety of industrial sectors. Some such domains carry important safety-critical concerns, notably avionics, space, and automotive. One common design trend across those domains seeks to reduce the number of computing devices embedded in them by integrating software applications of different criticality levels into one and the same onboard computer. A safety-savvy design approach however requires isolation among components of different criticality, to prevent unintended reciprocal interference across them. Isolation is traditionally achieved through partitioning. Partitioning, however, incurs low resource utilization as cautionary margins are used to inflate partition budgets over their anticipated needs. This situation has prompted research into alternative ways to integration that can safely afford higher levels of utilization. The Mixed-Criticality (MC) approach, which concentrates on the CPU scheduling problem, has yielded a large body of research results that show considerable gains in sustained utilization, but it has yet to meet all of the isolation requirements of safety-critical systems. This work presents a solution to augment a state-of-the-art MC solution with efficient and effective spatial isolation capabilities. Experimental results show that our solution provides adequate guarantees of temporal and spatial isolation with very small runtime overhead.
以严格的及时性要求为特征的硬实时系统在越来越多的工业领域中出现。其中一些领域具有重要的安全关键问题,特别是航空电子、航天和汽车领域。这些领域的一个共同设计趋势是,通过将不同临界等级的软件应用程序集成到同一台车载计算机中,减少嵌入其中的计算设备数量。然而,安全的设计方法需要隔离不同临界等级的组件,以防止它们之间发生意外的相互干扰。隔离传统上是通过分区来实现的。然而,分区会导致资源利用率较低,因为在分区预算超出预期需求时会采用谨慎的余量。这种情况促使人们研究能够安全地提高利用率的其他集成方式。混合临界(MC)方法专注于 CPU 调度问题,已取得大量研究成果,显示在持续利用率方面取得了可观的收益,但仍无法满足安全临界系统的所有隔离要求。这项工作提出了一种解决方案,利用高效和有效的空间隔离能力来增强最先进的 MC 解决方案。实验结果表明,我们的解决方案能充分保证时间和空间隔离,且运行时开销极小。
{"title":"Providing spatial isolation for Mixed-Criticality Systems","authors":"E. Tinto, T. Vardanega","doi":"10.1016/j.sysarc.2024.103234","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103234","url":null,"abstract":"<div><p>Hard real-time systems, characterized by stringent timeliness requirements, occur in an increasing variety of industrial sectors. Some such domains carry important safety-critical concerns, notably avionics, space, and automotive. One common design trend across those domains seeks to reduce the number of computing devices embedded in them by integrating software applications of different criticality levels into one and the same onboard computer. A safety-savvy design approach however requires isolation among components of different criticality, to prevent unintended reciprocal interference across them. Isolation is traditionally achieved through partitioning. Partitioning, however, incurs low resource utilization as cautionary margins are used to inflate partition budgets over their anticipated needs. This situation has prompted research into alternative ways to integration that can safely afford higher levels of utilization. The Mixed-Criticality (MC) approach, which concentrates on the CPU scheduling problem, has yielded a large body of research results that show considerable gains in sustained utilization, but it has yet to meet all of the isolation requirements of safety-critical systems. This work presents a solution to augment a state-of-the-art MC solution with efficient and effective spatial isolation capabilities. Experimental results show that our solution provides adequate guarantees of temporal <em>and</em> spatial isolation with very small runtime overhead.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103234"},"PeriodicalIF":3.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001711/pdfft?md5=03d8377729455f5bea108a124190cff6&pid=1-s2.0-S1383762124001711-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}