Pub Date : 2024-06-01DOI: 10.1016/j.sysarc.2024.103191
Jörg Keller , Saskia Imhof , Peter Sobe
Error correction and erasure codes and steganographic channels use related methods, but are investigated separately. We detail an idea from literature for a steganographic channel in a transmission with error correction code and experimentally investigate it with respect to bandwidth, robustness and detectability. We expand this construction to provide an example of multi-level steganography, i.e., a steganographic channel within a steganographic channel. Furthermore, we investigate the advantages on bandwidth and stealthyness that reversibility of such a steganographic channel brings, together with a new proposal for a covert channel in error-corrected data.
{"title":"Error correction and erasure codes for robust network steganography","authors":"Jörg Keller , Saskia Imhof , Peter Sobe","doi":"10.1016/j.sysarc.2024.103191","DOIUrl":"10.1016/j.sysarc.2024.103191","url":null,"abstract":"<div><p>Error correction and erasure codes and steganographic channels use related methods, but are investigated separately. We detail an idea from literature for a steganographic channel in a transmission with error correction code and experimentally investigate it with respect to bandwidth, robustness and detectability. We expand this construction to provide an example of multi-level steganography, i.e., a steganographic channel within a steganographic channel. Furthermore, we investigate the advantages on bandwidth and stealthyness that reversibility of such a steganographic channel brings, together with a new proposal for a covert channel in error-corrected data.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103191"},"PeriodicalIF":4.5,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001280/pdfft?md5=47d9fd62ba7057a5c2b7c08ce55de641&pid=1-s2.0-S1383762124001280-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31DOI: 10.1016/j.sysarc.2024.103192
Shihao Hong , Yeh-Ching Chung
Resistive random access memory (ReRAM) is a promising technology for AI Processing-in-Memory (PIM) hardware because of its compatibility with CMOS, small footprint, and ability to complete matrix–vector multiplication workloads inside the memory device itself. However, redundant computations are brought on by duplicate weights and inputs when an MVM has to be split into smaller-granularity sequential sub-works in the real world. Recent studies have proposed repetition-pruning to address this issue, but the buffer allocation strategy for enhancing buffer device utilization remains understudied. In preliminary experiments observing input patterns of neural layers with different datasets, the similarity of repetition allows us to transfer the buffer allocation strategy obtained from a small dataset to the computation with a large dataset. Hence, this paper proposes a practical compute-reuse mechanism for ReRAM-based PIM, called CRPIM, which replaces repetitive computations with buffering and reading. Moreover, the subsequent buffer allocation problem is resolved at both inter-layer and intra-layer levels. Our experimental results demonstrate that CRPIM significantly reduces ReRAM cells and execution time while maintaining adequate buffer and energy overhead.
{"title":"CRPIM: An efficient compute-reuse scheme for ReRAM-based Processing-in-Memory DNN accelerators","authors":"Shihao Hong , Yeh-Ching Chung","doi":"10.1016/j.sysarc.2024.103192","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103192","url":null,"abstract":"<div><p>Resistive random access memory (ReRAM) is a promising technology for AI Processing-in-Memory (PIM) hardware because of its compatibility with CMOS, small footprint, and ability to complete matrix–vector multiplication workloads inside the memory device itself. However, redundant computations are brought on by duplicate weights and inputs when an MVM has to be split into smaller-granularity sequential sub-works in the real world. Recent studies have proposed repetition-pruning to address this issue, but the buffer allocation strategy for enhancing buffer device utilization remains understudied. In preliminary experiments observing input patterns of neural layers with different datasets, the similarity of repetition allows us to transfer the buffer allocation strategy obtained from a small dataset to the computation with a large dataset. Hence, this paper proposes a practical compute-reuse mechanism for ReRAM-based PIM, called CRPIM, which replaces repetitive computations with buffering and reading. Moreover, the subsequent buffer allocation problem is resolved at both inter-layer and intra-layer levels. Our experimental results demonstrate that CRPIM significantly reduces ReRAM cells and execution time while maintaining adequate buffer and energy overhead.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103192"},"PeriodicalIF":4.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141249332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31DOI: 10.1016/j.sysarc.2024.103194
Longbao Dai , Jing Mei , Zhibang Yang , Zhao Tong , Cuibin Zeng , Keqin Li
With the arrival of 5G technology and the popularization of the Internet of Things (IoT), mobile edge computing (MEC) has great potential in handling delay-sensitive and compute-intensive (DSCI) applications. Meanwhile, the need for reduced latency and improved energy efficiency in terminal devices is becoming urgent increasingly. However, the users are affected by channel conditions and bursty computational demands in dynamic MEC environments, which can lead to longer task correspondence times. Therefore, finding an efficient task offloading method in stochastic systems is crucial for optimizing system energy consumption. Additionally, the delay due to frequent user–MEC interactions cannot be overlooked. In this article, we initially frame the task offloading issue as a dynamic optimization issue. The goal is to minimize the system’s long-term energy consumption while ensuring the task queue’s stability over the long term. Using the Lyapunov optimization technique, the task processing deadline problem is converted into a stability control problem for the virtual queue. Then, a novel Lyapunov-guided deep reinforcement learning (DRL) for delay-aware offloading algorithm (LyD2OA) is designed. LyD2OA can figure out the task offloading scheme online, and adaptively offload the task with better network quality. Meanwhile, it ensures that deadlines are not violated when offloading tasks in poor communication environments. In addition, we perform a rigorous mathematical analysis of the performance of Ly2DOA and prove the existence of upper bounds on the virtual queue. It is theoretically proven that LyD2OA enables the system to realize the trade-off between energy consumption and delay. Finally, extensive simulation experiments verify that LyD2OA has good performance in minimizing energy consumption and keeping latency low.
{"title":"Lyapunov-guided deep reinforcement learning for delay-aware online task offloading in MEC systems","authors":"Longbao Dai , Jing Mei , Zhibang Yang , Zhao Tong , Cuibin Zeng , Keqin Li","doi":"10.1016/j.sysarc.2024.103194","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103194","url":null,"abstract":"<div><p>With the arrival of 5G technology and the popularization of the Internet of Things (IoT), mobile edge computing (MEC) has great potential in handling delay-sensitive and compute-intensive (DSCI) applications. Meanwhile, the need for reduced latency and improved energy efficiency in terminal devices is becoming urgent increasingly. However, the users are affected by channel conditions and bursty computational demands in dynamic MEC environments, which can lead to longer task correspondence times. Therefore, finding an efficient task offloading method in stochastic systems is crucial for optimizing system energy consumption. Additionally, the delay due to frequent user–MEC interactions cannot be overlooked. In this article, we initially frame the task offloading issue as a dynamic optimization issue. The goal is to minimize the system’s long-term energy consumption while ensuring the task queue’s stability over the long term. Using the Lyapunov optimization technique, the task processing deadline problem is converted into a stability control problem for the virtual queue. Then, a novel Lyapunov-guided deep reinforcement learning (DRL) for delay-aware offloading algorithm (LyD2OA) is designed. LyD2OA can figure out the task offloading scheme online, and adaptively offload the task with better network quality. Meanwhile, it ensures that deadlines are not violated when offloading tasks in poor communication environments. In addition, we perform a rigorous mathematical analysis of the performance of Ly2DOA and prove the existence of upper bounds on the virtual queue. It is theoretically proven that LyD2OA enables the system to realize the trade-off between energy consumption and delay. Finally, extensive simulation experiments verify that LyD2OA has good performance in minimizing energy consumption and keeping latency low.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103194"},"PeriodicalIF":4.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141325950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31DOI: 10.1016/j.sysarc.2024.103195
Biao Han , Cao Xu , Yahui Li , Xiaoyan Wang , Peng Xun
Video streaming has dominated Internet traffic over the past few years, spurring innovations in transport protocols. The QUIC protocol has advantages over TCP, such as faster connection setup and alleviating head-of-line blocking. Multi-path transport protocols like Multipath QUIC (MPQUIC) have been proposed to aggregate the bandwidth of multiple links and provide reliable transmission in poor network conditions. However, reliable transmission incurs unnecessary retransmission costs for MPQUIC, resulting in deteriorating performance, especially in real-time video streaming. Partially reliable transmission, which supports both reliable and unreliable delivery, may perform better by trading off data reliability and timeliness. In this paper, we introduce MPR-QUIC, a multi-path partially reliable transmission protocol for QUIC. Based on MPQUIC, MPR-QUIC extends unreliable transmission to provide partially reliable transmission over multiple paths. Specific schedulers are designed in MPR-QUIC based on priority and deadline, respectively, for video streaming optimization. Video frames with high priority are transmitted first since frames with low priority cannot be decoded before their arrival. Additionally, to alleviate rebuffering and freezing of the video, as many frames as possible should be delivered before the deadline. We evaluate MPR-QUIC experimentally on a testbed and in emulations. Results show that the rebuffer time of MPR-QUIC is significantly decreased by 60% to 80% when compared to state-of-the-art multi-path transmission solutions. The completion ratio of transmitted data blocks is increased by almost 100%.
{"title":"MPR-QUIC: Multi-path partially reliable transmission for priority and deadline-aware video streaming","authors":"Biao Han , Cao Xu , Yahui Li , Xiaoyan Wang , Peng Xun","doi":"10.1016/j.sysarc.2024.103195","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103195","url":null,"abstract":"<div><p>Video streaming has dominated Internet traffic over the past few years, spurring innovations in transport protocols. The QUIC protocol has advantages over TCP, such as faster connection setup and alleviating head-of-line blocking. Multi-path transport protocols like Multipath QUIC (MPQUIC) have been proposed to aggregate the bandwidth of multiple links and provide reliable transmission in poor network conditions. However, reliable transmission incurs unnecessary retransmission costs for MPQUIC, resulting in deteriorating performance, especially in real-time video streaming. Partially reliable transmission, which supports both reliable and unreliable delivery, may perform better by trading off data reliability and timeliness. In this paper, we introduce MPR-QUIC, a multi-path partially reliable transmission protocol for QUIC. Based on MPQUIC, MPR-QUIC extends unreliable transmission to provide partially reliable transmission over multiple paths. Specific schedulers are designed in MPR-QUIC based on priority and deadline, respectively, for video streaming optimization. Video frames with high priority are transmitted first since frames with low priority cannot be decoded before their arrival. Additionally, to alleviate rebuffering and freezing of the video, as many frames as possible should be delivered before the deadline. We evaluate MPR-QUIC experimentally on a testbed and in emulations. Results show that the rebuffer time of MPR-QUIC is significantly decreased by 60% to 80% when compared to state-of-the-art multi-path transmission solutions. The completion ratio of transmitted data blocks is increased by almost 100%.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103195"},"PeriodicalIF":4.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141314351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31DOI: 10.1016/j.sysarc.2024.103184
Rui Shi , Yang Yang , Yingjiu Li , Huamin Feng , Hwee Hwa Pang , Robert H. Deng
An anonymous transit pass system allows passengers to access transport services within fixed time periods, with their privileges automatically deactivating upon time expiration. Although existing transit pass systems are deployable on powerful devices like PCs, their adaptation to more user-friendly devices, such as mobile phones with smart cards, is inefficient due to their reliance on heavy-weight operations like bilinear maps. In this paper, we introduce an innovative anonymous transit pass system, dubbed , optimized for deployment on mobile phones with smart cards, where the smart card is responsible for crucial lightweight operations and the mobile phone handles key-independent and time-consuming tasks. Group signatures with time-bound keys (GS-TBK) serve as our core component, representing a new variant of standard group signatures for the secure use of time-based digital services, preserving users’ privacy while providing flexible authentication services. We first constructed a practical GS-TBK scheme using the tag-based signatures and then applied it to the design of AnoPas. We achieve the most efficient passing protocol compared to the state-of-the-art AnoPas/GS-TBK schemes. We also present an implementation showing that our passing protocol takes around 38.6 ms on a smart card and around 33.6 ms on a mobile phone.
{"title":"AnoPas: Practical anonymous transit pass from group signatures with time-bound keys","authors":"Rui Shi , Yang Yang , Yingjiu Li , Huamin Feng , Hwee Hwa Pang , Robert H. Deng","doi":"10.1016/j.sysarc.2024.103184","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103184","url":null,"abstract":"<div><p>An anonymous transit pass system allows passengers to access transport services within fixed time periods, with their privileges automatically deactivating upon time expiration. Although existing transit pass systems are deployable on powerful devices like PCs, their adaptation to more user-friendly devices, such as mobile phones with smart cards, is inefficient due to their reliance on heavy-weight operations like bilinear maps. In this paper, we introduce an innovative anonymous transit pass system, dubbed <span><math><mrow><mi>A</mi><mi>n</mi><mi>o</mi><mi>P</mi><mi>a</mi><mi>s</mi></mrow></math></span>, optimized for deployment on mobile phones with smart cards, where the smart card is responsible for crucial lightweight operations and the mobile phone handles key-independent and time-consuming tasks. Group signatures with time-bound keys (GS-TBK) serve as our core component, representing a new variant of standard group signatures for the secure use of time-based digital services, preserving users’ privacy while providing flexible authentication services. We first constructed a practical GS-TBK scheme using the tag-based signatures and then applied it to the design of AnoPas. We achieve the most efficient passing protocol compared to the state-of-the-art AnoPas/GS-TBK schemes. We also present an implementation showing that our passing protocol takes around 38.6 ms on a smart card and around 33.6 ms on a mobile phone.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103184"},"PeriodicalIF":4.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141249331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1016/j.sysarc.2024.103190
Jiayun Yan , Jie Chen , Chen Qian , Anmin Fu , Haifeng Qian
In cloud computing, the current challenge lies in managing massive data, which is a computationally overburdened environment for data users. Outsourced computation can effectively ease the memory and computation pressure on overburdened data storage. We propose an outsourced unbounded decryption scheme in the standard assumption and standard model for large data settings based on inner product computation. Security analysis shows that it can achieve adaptive security. The scheme involves the data owner transmitting encrypted data to a third-party cloud server, which is responsible for computing a significant amount of data. Then the ripe data is handed over to the data user for decryption computation. In addition, there is no need to give the prior bounds of the length of the plaintext vector in advance. This allows for the encryption algorithm to run without determining the length of the input data before the setup phase, that is, our scheme is on the unbounded setting. Through theoretical analysis, the storage overhead and communication cost of the data users remain independent of the ciphertext size. The experimental results indicate that the efficiency and performance are greatly enhanced, about 0.03S for data users at the expense of increased computing time on the cloud server.
{"title":"Efficient and privacy-preserving outsourced unbounded inner product computation in cloud computing","authors":"Jiayun Yan , Jie Chen , Chen Qian , Anmin Fu , Haifeng Qian","doi":"10.1016/j.sysarc.2024.103190","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103190","url":null,"abstract":"<div><p>In cloud computing, the current challenge lies in managing massive data, which is a computationally overburdened environment for data users. Outsourced computation can effectively ease the memory and computation pressure on overburdened data storage. We propose an outsourced unbounded decryption scheme in the standard assumption and standard model for large data settings based on inner product computation. Security analysis shows that it can achieve adaptive security. The scheme involves the data owner transmitting encrypted data to a third-party cloud server, which is responsible for computing a significant amount of data. Then the ripe data is handed over to the data user for decryption computation. In addition, there is no need to give the prior bounds of the length of the plaintext vector in advance. This allows for the encryption algorithm to run without determining the length of the input data before the setup phase, that is, our scheme is on the unbounded setting. Through theoretical analysis, the storage overhead and communication cost of the data users remain independent of the ciphertext size. The experimental results indicate that the efficiency and performance are greatly enhanced, about 0.03S for data users at the expense of increased computing time on the cloud server.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103190"},"PeriodicalIF":4.5,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141249330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1016/j.sysarc.2024.103187
Xianyu He, Runyu Zhang, Pengpeng Tian, Lening Zhou, Min Lian, Chaoshu Yang
Emerging Persistent Memory (PM) usually has the serious drawback of expensive write activities. Thus, existing PM-oriented B-trees mainly concentrate on alleviating the write overhead (i.e., reducing PM writes and flush instructions). Unfortunately, due to the improper data organization in the sorted leaf node, existing solutions cause massive data migration when data inserting or node splitting occurs. In this paper, we propose a write-optimized PM-oriented B-tree with aligned flush and selective migration, called WPB-tree, to solve the above problems. WPB-tree first adopts a buffer-assisted mechanism that temporarily stores the newly inserted data to reduce the overhead of entry shifts. Second, WPB-tree employs a selective migration of node entries scheme to achieve less than half of the data migration when a node is split. Moreover, existing PM-oriented B-trees usually employ a coarse-grained lock to avoid thread conflicts, which can severely degrade the concurrency efficiency. Thus, we further propose a fine-grained lock technique for WPB-tree, namely, parallel-efficient WPB-tree (WOPE), to improve the concurrency efficiency. We implement the proposed WPB-tree and WOPE on Linux and conduct extensive evaluations with actual persistent memory, where WOPE achieves 23.5%, 30.7%, and 15.3% of performance improvement (insert, read, and scan) over the straightforward solutions (i.e., SSB-Tree, Fast&Fair, and wBtree), and 10.1% of performance improvement over WPB-tree, on average.
新兴的持久内存(PM)通常都有一个严重的缺点,那就是写入活动成本高昂。因此,现有的面向持久内存的 B 树主要集中在减少写开销(即减少持久内存写入和刷新指令)上。遗憾的是,由于排序叶节点的数据组织不当,现有解决方案会在数据插入或节点拆分时造成大量数据迁移。为了解决上述问题,我们在本文中提出了一种写优化的面向 PM 的 B 树,它具有对齐刷新和选择性迁移功能,称为 WPB-tree。WPB-tree 首先采用缓冲辅助机制,临时存储新插入的数据,以减少入口转移的开销。其次,WPB-tree 采用节点条目的选择性迁移方案,在节点分裂时实现少于一半的数据迁移。此外,现有的面向 PM 的 B 树通常采用粗粒度锁来避免线程冲突,这会严重降低并发效率。因此,我们进一步提出了一种针对 WPB 树的细粒度锁技术,即并行高效 WPB 树(WOPE),以提高并发效率。我们在 Linux 上实现了所提出的 WPB-tree 和 WOPE,并用实际的持久内存进行了广泛的评估,结果表明 WOPE 比直接的解决方案(即 SSB-Tree、Fast&Fair 和 wBtree)分别提高了 23.5%、30.7% 和 15.3% 的性能(、、和),比 WPB-tree 平均提高了 10.1% 的性能。
{"title":"WOPE: A write-optimized and parallel-efficient B+-tree for persistent memory","authors":"Xianyu He, Runyu Zhang, Pengpeng Tian, Lening Zhou, Min Lian, Chaoshu Yang","doi":"10.1016/j.sysarc.2024.103187","DOIUrl":"10.1016/j.sysarc.2024.103187","url":null,"abstract":"<div><p>Emerging Persistent Memory (PM) usually has the serious drawback of expensive write activities. Thus, existing PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-trees mainly concentrate on alleviating the write overhead (i.e., reducing PM writes and flush instructions). Unfortunately, due to the improper data organization in the sorted leaf node, existing solutions cause massive data migration when data inserting or node splitting occurs. In this paper, we propose a write-optimized PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree with aligned flush and selective migration, called WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree, to solve the above problems. WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree first adopts a buffer-assisted mechanism that temporarily stores the newly inserted data to reduce the overhead of entry shifts. Second, WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree employs a selective migration of node entries scheme to achieve less than half of the data migration when a node is split. Moreover, existing PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-trees usually employ a coarse-grained lock to avoid thread conflicts, which can severely degrade the concurrency efficiency. Thus, we further propose a fine-grained lock technique for WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree, namely, parallel-efficient WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree (WOPE), to improve the concurrency efficiency. We implement the proposed WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree and WOPE on Linux and conduct extensive evaluations with actual persistent memory, where WOPE achieves 23.5%, 30.7%, and 15.3% of performance improvement (<em>insert</em>, <em>read</em>, and <em>scan</em>) over the straightforward solutions (i.e., SSB-Tree, Fast&Fair, and wB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>tree), and 10.1% of performance improvement over WPB<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree, on average.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103187"},"PeriodicalIF":4.5,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1016/j.sysarc.2024.103188
Wenyan Yan , Bin Fu , Jing Huang , Ruiqi Lu , Renfa Li , Guoqi Xie
The automotive Electrical/Electronic (E/E) architecture with Time-Sensitive Networking (TSN) as the backbone network and Controller Area Network (CAN) as the intra-domain network has attracted extensive research attention. In this architecture, the CAN-TSN gateway serves as a vital hub for communication between the CAN and TSN networks. However, with frequent information exchange between domains, multiple real-time applications inevitably compete for the same network resources. The limited availability of schedule table entries and bandwidth allocation pose challenges in scheduling design. To mitigate the transmission conflicts at the CAN-TSN gateway, this paper proposes a CAN-to-TSN scheduler consisting of two primary stages. The first stage introduces the Message Aggregation Optimization (MAO) algorithm to aggregate multiple CAN messages into a single TSN message, ultimately decreasing the communication overhead and the schedule table entries number. The second stage proposes the Exploratory Message Scheduling Optimization (EMSO) algorithm based on MAO. EMSO disaggregates and reassembles the CAN messages with small deadlines within the currently un-scheduled TSN message to improve the acceptance ratio of CAN messages. Experimental results demonstrate that EMSO achieves an average acceptance ratio of CAN messages 4.3% higher in preemptive mode and 8.2% higher in non-preemptive mode in TSN than state-of-the-art algorithms.
以时敏网络(TSN)为骨干网络,以控制器局域网(CAN)为域内网络的汽车电气/电子(E/E)架构已引起广泛的研究关注。在这种架构中,CAN-TSN 网关是 CAN 和 TSN 网络之间进行通信的重要枢纽。然而,由于域间信息交换频繁,多个实时应用不可避免地会争夺相同的网络资源。调度表项的有限可用性和带宽分配给调度设计带来了挑战。为缓解 CAN-TSN 网关上的传输冲突,本文提出了一种由两个主要阶段组成的 CAN-to-TSN 调度器。第一阶段引入报文聚合优化(MAO)算法,将多条 CAN 报文聚合成一条 TSN 报文,最终减少通信开销和调度表条目数。第二阶段在 MAO 的基础上提出了探索性报文调度优化(EMSO)算法。EMSO 在当前未调度的 TSN 报文中分解并重新组装截止日期较小的 CAN 报文,以提高 CAN 报文的接受率。实验结果表明,与最先进的算法相比,EMSO 在 TSN 的抢占式模式下实现的 CAN 报文平均接收率提高了 4.3%,在非抢占式模式下提高了 8.2%。
{"title":"A conflict-free CAN-to-TSN scheduler for CAN-TSN gateway","authors":"Wenyan Yan , Bin Fu , Jing Huang , Ruiqi Lu , Renfa Li , Guoqi Xie","doi":"10.1016/j.sysarc.2024.103188","DOIUrl":"https://doi.org/10.1016/j.sysarc.2024.103188","url":null,"abstract":"<div><p>The automotive Electrical/Electronic (E/E) architecture with Time-Sensitive Networking (TSN) as the backbone network and Controller Area Network (CAN) as the intra-domain network has attracted extensive research attention. In this architecture, the CAN-TSN gateway serves as a vital hub for communication between the CAN and TSN networks. However, with frequent information exchange between domains, multiple real-time applications inevitably compete for the same network resources. The limited availability of schedule table entries and bandwidth allocation pose challenges in scheduling design. To mitigate the transmission conflicts at the CAN-TSN gateway, this paper proposes a CAN-to-TSN scheduler consisting of two primary stages. The first stage introduces the Message Aggregation Optimization (MAO) algorithm to aggregate multiple CAN messages into a single TSN message, ultimately decreasing the communication overhead and the schedule table entries number. The second stage proposes the Exploratory Message Scheduling Optimization (EMSO) algorithm based on MAO. EMSO disaggregates and reassembles the CAN messages with small deadlines within the currently un-scheduled TSN message to improve the acceptance ratio of CAN messages. Experimental results demonstrate that EMSO achieves an average acceptance ratio of CAN messages 4.3% higher in preemptive mode and 8.2% higher in non-preemptive mode in TSN than state-of-the-art algorithms.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103188"},"PeriodicalIF":4.5,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141291238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-24DOI: 10.1016/j.sysarc.2024.103186
Héctor Martínez , Sandra Catalán , Adrián Castelló , Enrique S. Quintana-Ortí
We present high performance, multi-threaded implementations of three GEMM-based convolution algorithms for multicore processors with ARM and RISC-V architectures. The codes are integrated into CONVLIB, a library that has the following unique features: (1) scripts to automatically generate a key component of GEMM, known as the micro-kernel, which is typically written in assembly language; (2) a modified analytical model to automatically tune the algorithms to the underlying cache architecture; (3) the ability to select four hyper-parameters: micro-kernel, cache parameters, parallel loop, and GEMM algorithm dynamically between calls to the library, without recompiling it; and (4) a driver to identify the best hyper-parameters. In addition, we provide a detailed performance evaluation of the convolution algorithms, on five ARM and RISC-V processors, and we publicly release the codes.
{"title":"Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures","authors":"Héctor Martínez , Sandra Catalán , Adrián Castelló , Enrique S. Quintana-Ortí","doi":"10.1016/j.sysarc.2024.103186","DOIUrl":"10.1016/j.sysarc.2024.103186","url":null,"abstract":"<div><p>We present high performance, multi-threaded implementations of three GEMM-based convolution algorithms for multicore processors with ARM and RISC-V architectures. The codes are integrated into CONVLIB, a library that has the following unique features: (1) scripts to automatically generate a key component of GEMM, known as the micro-kernel, which is typically written in assembly language; (2) a modified analytical model to automatically tune the algorithms to the underlying cache architecture; (3) the ability to select four hyper-parameters: micro-kernel, cache parameters, parallel loop, and GEMM algorithm dynamically between calls to the library, without recompiling it; and (4) a driver to identify the best hyper-parameters. In addition, we provide a detailed performance evaluation of the convolution algorithms, on five ARM and RISC-V processors, and we publicly release the codes.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103186"},"PeriodicalIF":4.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141142502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1016/j.sysarc.2024.103183
Danial Shiraly , Ziba Eslami , Nasrollah Pakniat
The advent of cloud computing has made cloud server outsourcing increasingly popular among data owners. However, the storage of sensitive data on cloud servers engenders serious challenges for the security and privacy of data. Public Key Authenticated Encryption with Keyword Search (PAEKS) is an effective method that protects information confidentiality and supports keyword searches. Identity-Based Authenticated Encryption with Keyword Search (IBAEKS) is a PAEKS variant in identity-based settings, designed for solving the intractable certificate management problem. To the best of our knowledge, only two IBAEKS schemes exist in the literature, both presented with weak security models that make them vulnerable against what is known as Fully Chosen Keyword attacks. Moreover, the existing IBAEKS schemes are based on the time-consuming bilinear pairing operation, leading to a significant increase in computational cost. To overcome these issues, in this paper, we first propose an enhanced security model for IBAEKS and compare it with existing models. We then prove that the existing IBAEKS schemes are not secure in our enhanced model. We also propose an efficient pairing-free dIBAEKS scheme and prove that it is secure under the enhanced security model. Finally, we compare our proposed scheme with related constructions to indicate its overall superiority.
{"title":"Designated-tester Identity-Based Authenticated Encryption with Keyword Search with applications in cloud systems","authors":"Danial Shiraly , Ziba Eslami , Nasrollah Pakniat","doi":"10.1016/j.sysarc.2024.103183","DOIUrl":"10.1016/j.sysarc.2024.103183","url":null,"abstract":"<div><p>The advent of cloud computing has made cloud server outsourcing increasingly popular among data owners. However, the storage of sensitive data on cloud servers engenders serious challenges for the security and privacy of data. Public Key Authenticated Encryption with Keyword Search (PAEKS) is an effective method that protects information confidentiality and supports keyword searches. Identity-Based Authenticated Encryption with Keyword Search (IBAEKS) is a PAEKS variant in identity-based settings, designed for solving the intractable certificate management problem. To the best of our knowledge, only two IBAEKS schemes exist in the literature, both presented with weak security models that make them vulnerable against what is known as Fully Chosen Keyword attacks. Moreover, the existing IBAEKS schemes are based on the time-consuming bilinear pairing operation, leading to a significant increase in computational cost. To overcome these issues, in this paper, we first propose an enhanced security model for IBAEKS and compare it with existing models. We then prove that the existing IBAEKS schemes are not secure in our enhanced model. We also propose an efficient pairing-free dIBAEKS scheme and prove that it is secure under the enhanced security model. Finally, we compare our proposed scheme with related constructions to indicate its overall superiority.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103183"},"PeriodicalIF":4.5,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141132233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}