首页 > 最新文献

International Journal of Embedded and Real-Time Communication Systems (IJERTCS)最新文献

英文 中文
Segment-Level FP-Scheduling in FreeRTOS FreeRTOS中的段级fp调度
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00026
R. Edmaier, Niklas Ueter, Jian-Jia Chen
In the domain of embedded systems, modern SoCs (System-on-Chips) increasingly employ dedicated hardware to improve the performance of specialized tasks. The herein generated performance benefits come at the cost of increased coordination complexity of multiple tasks accessing these various hardware units in varying alternating sequences. For example, a task may first execute on a processor and then proceed execution on a GPU. This problem is even more complex in the case of real-time constraints, i.e., the execution within formally guaranteed time bounds. Real-time constraints may lead to severe resource under-utilization if the scheduling algorithms are not properly designed. A solution to this problem is self-suspension and segment-level fixed-priority scheduling. In this approach, tasks are divided into successive alternating segments of computation and self-suspension. The task may self-suspend if it tries to access a hardware resource that is already held by another task. In this paper, we propose and discuss different implementations of the segmented self-suspension task model in the FreeRTOS real-time operating system. Moreover, we evaluate the overhead of the different implementations on the OM40007 IoT-module from NXP.
在嵌入式系统领域,现代soc(片上系统)越来越多地采用专用硬件来提高专门任务的性能。这里产生的性能优势是以增加以不同交替顺序访问这些不同硬件单元的多个任务的协调复杂性为代价的。例如,一个任务可能首先在处理器上执行,然后在GPU上继续执行。在实时约束的情况下,即在正式保证的时间范围内执行,这个问题甚至更加复杂。如果调度算法设计不当,实时性约束可能会导致严重的资源利用率不足。解决这个问题的方法是自挂起和段级固定优先级调度。在这种方法中,任务被划分为连续交替的计算段和自暂停段。如果任务试图访问已经由另一个任务持有的硬件资源,它可能会自挂起。本文提出并讨论了分段自挂起任务模型在FreeRTOS实时操作系统中的不同实现。此外,我们评估了恩智浦OM40007物联网模块上不同实现的开销。
{"title":"Segment-Level FP-Scheduling in FreeRTOS","authors":"R. Edmaier, Niklas Ueter, Jian-Jia Chen","doi":"10.1109/RTCSA55878.2022.00026","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00026","url":null,"abstract":"In the domain of embedded systems, modern SoCs (System-on-Chips) increasingly employ dedicated hardware to improve the performance of specialized tasks. The herein generated performance benefits come at the cost of increased coordination complexity of multiple tasks accessing these various hardware units in varying alternating sequences. For example, a task may first execute on a processor and then proceed execution on a GPU. This problem is even more complex in the case of real-time constraints, i.e., the execution within formally guaranteed time bounds. Real-time constraints may lead to severe resource under-utilization if the scheduling algorithms are not properly designed. A solution to this problem is self-suspension and segment-level fixed-priority scheduling. In this approach, tasks are divided into successive alternating segments of computation and self-suspension. The task may self-suspend if it tries to access a hardware resource that is already held by another task. In this paper, we propose and discuss different implementations of the segmented self-suspension task model in the FreeRTOS real-time operating system. Moreover, we evaluate the overhead of the different implementations on the OM40007 IoT-module from NXP.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"169 1","pages":"186-194"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85169270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Welcome Message from the RTCSA 2022 Chairs RTCSA 2022主席的欢迎辞
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/rtcsa55878.2022.00005
{"title":"Welcome Message from the RTCSA 2022 Chairs","authors":"","doi":"10.1109/rtcsa55878.2022.00005","DOIUrl":"https://doi.org/10.1109/rtcsa55878.2022.00005","url":null,"abstract":"","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"1 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83892243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Binary Equilibrium for Efficient LDPC Decoding in 3D NAND Flash 利用二进制平衡实现三维NAND闪存LDPC高效解码
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00018
Hsiang-Sen Hsu, Li-Pin Chang
3D NAND flash is prone to bit errors due to severe charge leakage. Modern SSDs adopt LDPC for bit error management, but LDPC can incur a high read latency through iterative adjustment to the reference voltage. Bit scrambling helps reduce inter-cell interference, and with it, ones and zeros equally contribute to raw data. We observed that as bit errors develop, the 0-bit ratio in raw data deviates from 50%. Inspired by this property, we propose a method for fast adjustment to the reference voltage, involving a placement step and a fine-tuning step. Our method uses only a few hundreds of bytes of RAM but improves the average read latency upon existing methods by up to 24%.
由于严重的电荷泄漏,3D NAND闪存容易出现比特错误。现代ssd采用LDPC进行误码管理,但LDPC通过对参考电压的迭代调整会导致较高的读延迟。位扰有助于减少小区间的干扰,有了它,1和0对原始数据的贡献是一样的。我们观察到,随着比特错误的发展,原始数据中的0比特比率偏离50%。受这一特性的启发,我们提出了一种快速调整参考电压的方法,包括放置步骤和微调步骤。我们的方法只使用几百字节的RAM,但将现有方法的平均读取延迟提高了24%。
{"title":"Exploiting Binary Equilibrium for Efficient LDPC Decoding in 3D NAND Flash","authors":"Hsiang-Sen Hsu, Li-Pin Chang","doi":"10.1109/RTCSA55878.2022.00018","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00018","url":null,"abstract":"3D NAND flash is prone to bit errors due to severe charge leakage. Modern SSDs adopt LDPC for bit error management, but LDPC can incur a high read latency through iterative adjustment to the reference voltage. Bit scrambling helps reduce inter-cell interference, and with it, ones and zeros equally contribute to raw data. We observed that as bit errors develop, the 0-bit ratio in raw data deviates from 50%. Inspired by this property, we propose a method for fast adjustment to the reference voltage, involving a placement step and a fine-tuning step. Our method uses only a few hundreds of bytes of RAM but improves the average read latency upon existing methods by up to 24%.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"94 1","pages":"113-119"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91342144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution 通过GPU-CPU协同执行在移动设备上实现实时AI推理
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00027
Hao Li, J. Ng, T. Abdelzaher
AI-powered mobile applications are becoming increasingly popular due to recent advances in machine intelligence. They include, but are not limited to mobile sensing, virtual assistants, and augmented reality. Mobile AI models, especially Deep Neural Networks (DNN), are usually executed locally, as sensory data are collected and generated by end devices. This imposes a heavy computational burden on the resource-constrained mobile phones. There are usually a set of DNN jobs with deadline constraints waiting for execution. Existing AI inference frameworks process incoming DNN jobs in sequential order, which does not optimally support mobile users’ real-time interactions with AI services. In this paper, we propose a framework to achieve real-time inference by exploring the heterogeneous mobile SoCs, which contain a CPU and a GPU. Considering characteristics of DNN models, we optimally partition the execution between the mobile GPU and CPU. We present a dynamic programming-based approach to solve the formulated real-time DNN partitioning and scheduling problem. The proposed framework has several desirable properties: 1) computational resources on mobile devices are better utilized; 2) it optimizes inference performance in terms of deadline miss rate; 3) no sacrifices in inference accuracy are made. Evaluation results on an off-the-shelf mobile phone show that our proposed framework can provide better real-time support for AI inference tasks on mobile platforms, compared to several baselines.
由于最近机器智能的进步,人工智能驱动的移动应用程序正变得越来越受欢迎。它们包括但不限于移动传感、虚拟助手和增强现实。移动人工智能模型,尤其是深度神经网络(DNN),通常在本地执行,因为感知数据是由终端设备收集和生成的。这给资源有限的移动电话带来了沉重的计算负担。通常有一组DNN作业具有等待执行的截止日期限制。现有的人工智能推理框架按顺序处理传入的DNN任务,这并不能最佳地支持移动用户与人工智能服务的实时交互。在本文中,我们提出了一个框架来实现实时推理的异构移动soc,其中包含一个CPU和一个GPU。考虑到深度神经网络模型的特点,我们优化了移动GPU和CPU之间的执行分区。我们提出了一种基于动态规划的方法来解决制定的实时DNN划分和调度问题。提出的框架有几个可取的特性:1)移动设备上的计算资源得到更好的利用;2)从截止日期缺失率方面优化推理性能;3)不牺牲推理精度。在一个现成的手机上的评估结果表明,与几个基线相比,我们提出的框架可以为移动平台上的AI推理任务提供更好的实时支持。
{"title":"Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution","authors":"Hao Li, J. Ng, T. Abdelzaher","doi":"10.1109/RTCSA55878.2022.00027","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00027","url":null,"abstract":"AI-powered mobile applications are becoming increasingly popular due to recent advances in machine intelligence. They include, but are not limited to mobile sensing, virtual assistants, and augmented reality. Mobile AI models, especially Deep Neural Networks (DNN), are usually executed locally, as sensory data are collected and generated by end devices. This imposes a heavy computational burden on the resource-constrained mobile phones. There are usually a set of DNN jobs with deadline constraints waiting for execution. Existing AI inference frameworks process incoming DNN jobs in sequential order, which does not optimally support mobile users’ real-time interactions with AI services. In this paper, we propose a framework to achieve real-time inference by exploring the heterogeneous mobile SoCs, which contain a CPU and a GPU. Considering characteristics of DNN models, we optimally partition the execution between the mobile GPU and CPU. We present a dynamic programming-based approach to solve the formulated real-time DNN partitioning and scheduling problem. The proposed framework has several desirable properties: 1) computational resources on mobile devices are better utilized; 2) it optimizes inference performance in terms of deadline miss rate; 3) no sacrifices in inference accuracy are made. Evaluation results on an off-the-shelf mobile phone show that our proposed framework can provide better real-time support for AI inference tasks on mobile platforms, compared to several baselines.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"29 1","pages":"195-204"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85137252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
QoS Guaranteed Resource Allocation for Coexisting eMBB and URLLC Traffic in 5G Industrial Networks 5G工业网络中eMBB和URLLC共存流量的QoS保证资源分配
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00015
Dawei Shen, Tianyu Zhang, Jiachen Wang, Qingxu Deng, Song Han, X. Hu
The fifth-generation (5G) cellular networks are increasingly considered for industrial applications, such as factory automation systems. In 5G networks, Enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low-Latency Communication (URLLC) are two essential services. eMBB services require high data rates with some lower bounds while URLLC traffic is subject to strict latency and reliability requirements. Existing approaches to scheduling coexisting eMBB and URLLC traffic all assume that URLLC traffic preempts eMBB traffic immediately upon arrival, which can adversely impact the achievable eMBB data rates. Furthermore, none of the prior work considers guaranteeing minimum data rate requirements imposed on certain eMBB traffic. This paper proposes a new model to capture the URLLC and eMBB requirements and introduces a novel framework, QoSG-RA, to perform network resource allocation for coexisting eMBB and URLLC traffic. QoSG-RA builds on a hybrid offline/online approach which performs offline resource allocation to ensure the Quality of Service (QoS) requirements of eMBB and URLLC traffic to be satisfied and online resource allocation to maximize fairness on the data rates among eMBB traffic based on runtime information. QoSG-RA is able to (i) meet latency and reliability requirements of URLLC traffic, and (ii) maximize the data rates for eMBB traffic in a fair way while fulfilling their minimum data rate requirements. Experimental results demonstrate the effectiveness of QoSG-RA compared to the state-of-the-art.
第五代(5G)蜂窝网络越来越多地被考虑用于工业应用,例如工厂自动化系统。在5G网络中,增强型移动宽带(eMBB)和超可靠低延迟通信(URLLC)是两项基本业务。eMBB业务需要高数据速率和一些下限,而URLLC流量受到严格的延迟和可靠性要求。现有的调度同时存在的eMBB和URLLC流量的方法都假设URLLC流量在到达时立即抢占eMBB流量,这可能会对可实现的eMBB数据速率产生不利影响。此外,之前的工作都没有考虑保证对某些eMBB流量施加的最低数据速率要求。本文提出了一个捕获URLLC和eMBB需求的新模型,并引入了一个新的框架QoSG-RA来对共存的eMBB和URLLC流量进行网络资源分配。QoSG-RA建立在离线/在线混合方法的基础上,通过离线资源分配确保eMBB和URLLC流量的QoS (Quality of Service)需求得到满足,在线资源分配根据运行时信息最大化eMBB流量之间数据速率的公平性。QoSG-RA能够(i)满足URLLC流量的延迟和可靠性要求,(ii)以公平的方式最大化eMBB流量的数据速率,同时满足其最低数据速率要求。实验结果证明了QoSG-RA算法的有效性。
{"title":"QoS Guaranteed Resource Allocation for Coexisting eMBB and URLLC Traffic in 5G Industrial Networks","authors":"Dawei Shen, Tianyu Zhang, Jiachen Wang, Qingxu Deng, Song Han, X. Hu","doi":"10.1109/RTCSA55878.2022.00015","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00015","url":null,"abstract":"The fifth-generation (5G) cellular networks are increasingly considered for industrial applications, such as factory automation systems. In 5G networks, Enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low-Latency Communication (URLLC) are two essential services. eMBB services require high data rates with some lower bounds while URLLC traffic is subject to strict latency and reliability requirements. Existing approaches to scheduling coexisting eMBB and URLLC traffic all assume that URLLC traffic preempts eMBB traffic immediately upon arrival, which can adversely impact the achievable eMBB data rates. Furthermore, none of the prior work considers guaranteeing minimum data rate requirements imposed on certain eMBB traffic. This paper proposes a new model to capture the URLLC and eMBB requirements and introduces a novel framework, QoSG-RA, to perform network resource allocation for coexisting eMBB and URLLC traffic. QoSG-RA builds on a hybrid offline/online approach which performs offline resource allocation to ensure the Quality of Service (QoS) requirements of eMBB and URLLC traffic to be satisfied and online resource allocation to maximize fairness on the data rates among eMBB traffic based on runtime information. QoSG-RA is able to (i) meet latency and reliability requirements of URLLC traffic, and (ii) maximize the data rates for eMBB traffic in a fair way while fulfilling their minimum data rate requirements. Experimental results demonstrate the effectiveness of QoSG-RA compared to the state-of-the-art.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"46 1","pages":"81-90"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82752680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
IPDeN: Real-Time deflection-based NoC with in-order flits delivery IPDeN:实时偏转NoC与按顺序飞行交付
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00023
Yilian Ribot González, Geoffrey Nelissen, E. Tovar
In deflection-based Network-on-Chips (NoC), when several flits entering a router contend for the same output port, one of the flit is routed to the desired output and the others are deflected to alternatives outputs. The approach reduces power consumption and silicon footprint in comparison to virtual-channels (VCs) based solutions. However, due to the non-deterministic number of deflections that flits may suffer while traversing the network, flits may be received in an out-of-order fashion at their destinations. In this work, we present IPDeN, a novel deflection-based NoC that ensures in-order flit delivery. To avoid the use of costly reordering mechanisms at the destination of each communication flow, we propose a solution based on a single small buffer added to each router to prevents flits from over taking other flits belonging to the same communication flow. We also develop a worst-case traversal time (WCTT) analysis for packets transmitted over IPDeN. We implemented IPDeN in Verilog and synthesized it for an FPGA platform. We show that a router of IPDeN requires ≈3-times less hardware resources than routers that use VCs. Experimental results shown that the worst-case and average packets communication time is reduced in comparison to the state-of-the-art.
在基于偏转的片上网络(NoC)中,当几个进入路由器的flit争夺相同的输出端口时,其中一个flit被路由到所需的输出,而其他flit被偏转到备选输出。与基于虚拟通道(vc)的解决方案相比,该方法降低了功耗和硅足迹。然而,由于飞行在穿越网络时可能遭受的不确定数量的偏转,飞行可能在其目的地以无序的方式接收。在这项工作中,我们提出了IPDeN,一种新的基于偏转的NoC,可以确保有序的飞行交付。为了避免在每个通信流的目的地使用昂贵的重新排序机制,我们提出了一种基于在每个路由器中添加单个小缓冲区的解决方案,以防止属于同一通信流的其他flits超过。我们还开发了通过IPDeN传输的数据包的最坏情况遍历时间(WCTT)分析。我们在Verilog中实现了IPDeN,并在FPGA平台上进行了合成。我们表明,使用IPDeN的路由器需要的硬件资源比使用vc的路由器少约3倍。实验结果表明,与最坏情况和平均数据包通信时间相比,该算法减少了最坏情况和平均数据包通信时间。
{"title":"IPDeN: Real-Time deflection-based NoC with in-order flits delivery","authors":"Yilian Ribot González, Geoffrey Nelissen, E. Tovar","doi":"10.1109/RTCSA55878.2022.00023","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00023","url":null,"abstract":"In deflection-based Network-on-Chips (NoC), when several flits entering a router contend for the same output port, one of the flit is routed to the desired output and the others are deflected to alternatives outputs. The approach reduces power consumption and silicon footprint in comparison to virtual-channels (VCs) based solutions. However, due to the non-deterministic number of deflections that flits may suffer while traversing the network, flits may be received in an out-of-order fashion at their destinations. In this work, we present IPDeN, a novel deflection-based NoC that ensures in-order flit delivery. To avoid the use of costly reordering mechanisms at the destination of each communication flow, we propose a solution based on a single small buffer added to each router to prevents flits from over taking other flits belonging to the same communication flow. We also develop a worst-case traversal time (WCTT) analysis for packets transmitted over IPDeN. We implemented IPDeN in Verilog and synthesized it for an FPGA platform. We show that a router of IPDeN requires ≈3-times less hardware resources than routers that use VCs. Experimental results shown that the worst-case and average packets communication time is reduced in comparison to the state-of-the-art.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"20 1","pages":"160-169"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81276467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance Acceleration of Secure Machine Learning Computations for Edge Applications 边缘应用安全机器学习计算的性能加速
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00021
Zi-Jie Lin, Chuan-Chi Wang, Chia-Heng Tu, Shih-Hao Hung
Edge appliances built with machine learning applications have been gradually adopted in a wide variety of application fields, such as intelligent transportation, the banking industry, and medical diagnosis. Privacy-preserving computation approaches can be used on smart appliances in order to secure the privacy of sensitive data, including application data and the parameters of machine learning models. Nevertheless, the data privacy is achieved at the cost of execution time. That is, the execution speed of a secure machine learning application is several orders of magnitude slower than that of the application in plaintext. Especially, the performance gap is enlarged for edge appliances. In this work, in order to improve the execution efficiency of secure applications, an open-source software framework CrypTen is targeted, which is widely used for building secure machine learning applications using the Secure Multi-Party Computation (SMPC) based privacy-preserving computation approach. We analyze the performance characteristics of the secure machine learning applications built with CrypTen, and the analysis reveals that the communication overhead hinders the execution of the secure applications. To tackle the issue, a communication library, OpenMPI, is added to the CrypTen framework as a new communication backend to boost the application performance by up to 50%. We further develop a hybrid communication scheme by combining the OpenMPI backend with the original communication backend with the CrypTen framework. The experimental results show that the enhanced CrypTen framework is able to provide better performance for the small-size data (LeNet5 on MNIST dataset by up to 50% of speedup) and maintain similar performance for large-size data (AlexNet on CIFAR-10), compared to the original CrypTen framework.
基于机器学习应用构建的边缘设备已逐渐被广泛应用于智能交通、银行业和医疗诊断等各种应用领域。隐私保护计算方法可以用于智能设备,以确保敏感数据的隐私,包括应用程序数据和机器学习模型的参数。然而,数据隐私是以牺牲执行时间为代价实现的。也就是说,安全机器学习应用程序的执行速度比明文应用程序的执行速度慢几个数量级。特别是,边缘设备的性能差距扩大。在这项工作中,为了提高安全应用程序的执行效率,针对开源软件框架CrypTen,该框架被广泛用于使用基于安全多方计算(SMPC)的隐私保护计算方法构建安全机器学习应用程序。我们分析了使用CrypTen构建的安全机器学习应用程序的性能特征,分析表明通信开销阻碍了安全应用程序的执行。为了解决这个问题,一个通信库OpenMPI被添加到CrypTen框架中,作为一个新的通信后端,以提高应用程序的性能高达50%。通过将OpenMPI后端与原始通信后端与CrypTen框架相结合,我们进一步开发了一种混合通信方案。实验结果表明,与原始的CrypTen框架相比,增强的CrypTen框架能够为小数据(LeNet5在MNIST数据集上的加速提升高达50%)提供更好的性能,并且对于大数据(AlexNet在CIFAR-10上)保持相似的性能。
{"title":"Performance Acceleration of Secure Machine Learning Computations for Edge Applications","authors":"Zi-Jie Lin, Chuan-Chi Wang, Chia-Heng Tu, Shih-Hao Hung","doi":"10.1109/RTCSA55878.2022.00021","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00021","url":null,"abstract":"Edge appliances built with machine learning applications have been gradually adopted in a wide variety of application fields, such as intelligent transportation, the banking industry, and medical diagnosis. Privacy-preserving computation approaches can be used on smart appliances in order to secure the privacy of sensitive data, including application data and the parameters of machine learning models. Nevertheless, the data privacy is achieved at the cost of execution time. That is, the execution speed of a secure machine learning application is several orders of magnitude slower than that of the application in plaintext. Especially, the performance gap is enlarged for edge appliances. In this work, in order to improve the execution efficiency of secure applications, an open-source software framework CrypTen is targeted, which is widely used for building secure machine learning applications using the Secure Multi-Party Computation (SMPC) based privacy-preserving computation approach. We analyze the performance characteristics of the secure machine learning applications built with CrypTen, and the analysis reveals that the communication overhead hinders the execution of the secure applications. To tackle the issue, a communication library, OpenMPI, is added to the CrypTen framework as a new communication backend to boost the application performance by up to 50%. We further develop a hybrid communication scheme by combining the OpenMPI backend with the original communication backend with the CrypTen framework. The experimental results show that the enhanced CrypTen framework is able to provide better performance for the small-size data (LeNet5 on MNIST dataset by up to 50% of speedup) and maintain similar performance for large-size data (AlexNet on CIFAR-10), compared to the original CrypTen framework.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"83 1","pages":"138-147"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83044458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anytime-Lidar: Deadline-aware 3D Object Detection 随时激光雷达:截止日期感知的3D目标检测
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00010
Ahmet Soyyigit, Shuochao Yao, H. Yun
In this work, we present a novel scheduling frame-work enabling anytime perception for deep neural network (DNN) based 3D object detection pipelines. We focus on computationally expensive region proposal network (RPN) and per-category multi-head detector components, which are common in 3D object detection pipelines, and make them deadline-aware. We propose a scheduling algorithm, which intelligently selects the subset of the components to make effective time and accuracy trade-off on the fly. We minimize accuracy loss of skipping some of the neural network sub-components by projecting previously detected objects onto the current scene through estimations. We apply our approach to a state-of-art 3D object detection network, PointPillars, and evaluate its performance on Jetson Xavier AGX using nuScenes dataset. Compared to the baselines, our approach significantly improve the network’s accuracy under various deadline constraints.
在这项工作中,我们提出了一种新的调度框架,使基于深度神经网络(DNN)的三维目标检测管道能够随时感知。我们重点研究了计算代价昂贵的区域建议网络(RPN)和每类别多头检测器组件,它们在3D目标检测管道中很常见,并使它们具有时间感知。我们提出了一种调度算法,该算法可以智能地选择组件的子集,从而在运行中进行有效的时间和精度权衡。我们通过估计将先前检测到的物体投影到当前场景中,从而最小化跳过某些神经网络子组件的精度损失。我们将我们的方法应用于最先进的3D物体检测网络PointPillars,并使用nuScenes数据集评估其在Jetson Xavier AGX上的性能。与基线相比,我们的方法在各种截止日期约束下显着提高了网络的准确性。
{"title":"Anytime-Lidar: Deadline-aware 3D Object Detection","authors":"Ahmet Soyyigit, Shuochao Yao, H. Yun","doi":"10.1109/RTCSA55878.2022.00010","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00010","url":null,"abstract":"In this work, we present a novel scheduling frame-work enabling anytime perception for deep neural network (DNN) based 3D object detection pipelines. We focus on computationally expensive region proposal network (RPN) and per-category multi-head detector components, which are common in 3D object detection pipelines, and make them deadline-aware. We propose a scheduling algorithm, which intelligently selects the subset of the components to make effective time and accuracy trade-off on the fly. We minimize accuracy loss of skipping some of the neural network sub-components by projecting previously detected objects onto the current scene through estimations. We apply our approach to a state-of-art 3D object detection network, PointPillars, and evaluate its performance on Jetson Xavier AGX using nuScenes dataset. Compared to the baselines, our approach significantly improve the network’s accuracy under various deadline constraints.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"80 1","pages":"31-40"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77604891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Energy-Adaptive Real-time Sensing for Batteryless Devices 无电池设备的能量自适应实时传感
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/RTCSA55878.2022.00028
Mohsen Karimi, Yidi Wang, Hyoseung Kim
The use of batteryless energy harvesting devices has been recognized as a promising solution for their low maintenance requirements and ability to work in harsh environments. However, these devices have to harvest energy from ambient energy sources and execute real-time sensing tasks periodically while satisfying data freshness constraints, which is especially challenging as the energy sources are often unreliable and intermittent. In this paper, we develop an energy-adaptive real-time sensing framework for batteryless devices. This framework includes a lightweight machine learning-based energy predictor that is capable of running on microcontroller devices and predicting the energy availability and intensity based on energy traces. Using this, the framework adapts the schedule of real-time tasks by effectively taking into account the predicted energy supply and the resulting age of information of each task, in order to achieve continuous sensing operations and satisfy given data freshness requirements. We discuss various design choices for adaptive scheduling and evaluate their performance in the context of batteryless devices. Experimental results show that the proposed adaptive real-time approach outperforms the recent methods based on static and reactive approaches, in both energy utilization and data freshness.
无电池能量收集设备的使用被认为是一种很有前途的解决方案,因为它们的维护要求低,并且能够在恶劣的环境中工作。然而,这些设备必须从环境能源中收集能量,并定期执行实时传感任务,同时满足数据新鲜度的限制,这尤其具有挑战性,因为能源通常是不可靠的和间歇性的。在本文中,我们开发了一种用于无电池设备的能量自适应实时传感框架。该框架包括一个轻量级的基于机器学习的能量预测器,它能够在微控制器设备上运行,并根据能量轨迹预测能量可用性和强度。利用这一点,该框架通过有效地考虑每个任务的预测能量供应和由此产生的信息年龄来调整实时任务的时间表,以实现连续的传感操作并满足给定的数据新鲜度要求。我们讨论了各种自适应调度的设计选择,并评估了它们在无电池设备环境下的性能。实验结果表明,本文提出的自适应实时方法在能量利用率和数据新鲜度方面都优于基于静态方法和响应方法的现有方法。
{"title":"Energy-Adaptive Real-time Sensing for Batteryless Devices","authors":"Mohsen Karimi, Yidi Wang, Hyoseung Kim","doi":"10.1109/RTCSA55878.2022.00028","DOIUrl":"https://doi.org/10.1109/RTCSA55878.2022.00028","url":null,"abstract":"The use of batteryless energy harvesting devices has been recognized as a promising solution for their low maintenance requirements and ability to work in harsh environments. However, these devices have to harvest energy from ambient energy sources and execute real-time sensing tasks periodically while satisfying data freshness constraints, which is especially challenging as the energy sources are often unreliable and intermittent. In this paper, we develop an energy-adaptive real-time sensing framework for batteryless devices. This framework includes a lightweight machine learning-based energy predictor that is capable of running on microcontroller devices and predicting the energy availability and intensity based on energy traces. Using this, the framework adapts the schedule of real-time tasks by effectively taking into account the predicted energy supply and the resulting age of information of each task, in order to achieve continuous sensing operations and satisfy given data freshness requirements. We discuss various design choices for adaptive scheduling and evaluate their performance in the context of batteryless devices. Experimental results show that the proposed adaptive real-time approach outperforms the recent methods based on static and reactive approaches, in both energy utilization and data freshness.","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"171 1","pages":"205-211"},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84007461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
RTCSA 2022 Organizers RTCSA 2022组织者
IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-08-01 DOI: 10.1109/rtcsa55878.2022.00006
{"title":"RTCSA 2022 Organizers","authors":"","doi":"10.1109/rtcsa55878.2022.00006","DOIUrl":"https://doi.org/10.1109/rtcsa55878.2022.00006","url":null,"abstract":"","PeriodicalId":38446,"journal":{"name":"International Journal of Embedded and Real-Time Communication Systems (IJERTCS)","volume":"82 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75964957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Embedded and Real-Time Communication Systems (IJERTCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1