Performance Evaluation最新文献_第3页

Efficient handling of sporadic messages in FlexRay 在 FlexRay 中高效处理零星信息

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-09-06 DOI: 10.1016/j.peva.2024.102444

Sunil Kumar P.R. , Manjunath A.S. , Vinod V.

FlexRay is a high-bandwidth protocol that supports hard-deadline periodic and sporadic traffic in modern in-vehicle communication networks. The dynamic segment of FlexRay is used for transmitting hard deadline sporadic messages. In this paper, we describe an algorithm to minimize the duration of the dynamic segment in a FlexRay cycle, yielding better results than existing algorithms in the literature. The proposed algorithm consists of two phases. In the first phase, we assume that a sporadic message instance contends for service with only one instance of each higher-priority message. The lower bound provided by the first phase serves as the initial guess for the number of mini-slots used in the second phase, where an exact scheduling analysis is performed. In the second phase, a sporadic message may contend for service with multiple instances of each higher-priority message. This two-phase approach is efficient because the first phase has low overhead and its estimate greatly reduces the number of iterations needed in the second phase. We conducted experiments using the dataset provided in the literature as well as the SAE benchmark dataset. The experimental results demonstrate superior bandwidth minimization and computational efficiency compared to other algorithms.

FlexRay 是一种高带宽协议，支持现代车载通信网络中的硬限期定期和零星流量。FlexRay 的动态段用于传输硬限期零星信息。在本文中，我们介绍了一种最小化 FlexRay 循环中动态段持续时间的算法，其结果优于文献中的现有算法。所提出的算法包括两个阶段。在第一阶段，我们假设零星信息实例只与每个较高优先级信息的一个实例争夺服务。第一阶段提供的下限可作为第二阶段使用的迷你槽数量的初始猜测，第二阶段将进行精确的调度分析。在第二阶段，零星信息可能会与每个优先级较高信息的多个实例争夺服务。这种两阶段方法之所以高效，是因为第一阶段的开销较低，其估计值大大减少了第二阶段所需的迭代次数。我们使用文献中提供的数据集和 SAE 基准数据集进行了实验。实验结果表明，与其他算法相比，该算法在带宽最小化和计算效率方面更胜一筹。

{"title":"Efficient handling of sporadic messages in FlexRay","authors":"Sunil Kumar P.R. , Manjunath A.S. , Vinod V.","doi":"10.1016/j.peva.2024.102444","DOIUrl":"10.1016/j.peva.2024.102444","url":null,"abstract":"<div><p>FlexRay is a high-bandwidth protocol that supports hard-deadline periodic and sporadic traffic in modern in-vehicle communication networks. The dynamic segment of FlexRay is used for transmitting hard deadline sporadic messages. In this paper, we describe an algorithm to minimize the duration of the dynamic segment in a FlexRay cycle, yielding better results than existing algorithms in the literature. The proposed algorithm consists of two phases. In the first phase, we assume that a sporadic message instance contends for service with only one instance of each higher-priority message. The lower bound provided by the first phase serves as the initial guess for the number of mini-slots used in the second phase, where an exact scheduling analysis is performed. In the second phase, a sporadic message may contend for service with multiple instances of each higher-priority message. This two-phase approach is efficient because the first phase has low overhead and its estimate greatly reduces the number of iterations needed in the second phase. We conducted experiments using the dataset provided in the literature as well as the SAE benchmark dataset. The experimental results demonstrate superior bandwidth minimization and computational efficiency compared to other algorithms.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"166 ","pages":"Article 102444"},"PeriodicalIF":1.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network-calculus service curves of the interleaved regulator 交错调节器的网络计算服务曲线

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-09-02 DOI: 10.1016/j.peva.2024.102443

Ludovic Thomas , Jean-Yves Le Boudec

The interleaved regulator (implemented by IEEE TSN Asynchronous Traffic Shaping) is used in time-sensitive networks for reshaping the flows with per-flow contracts. When applied to an aggregate of flows that come from a FIFO system, an interleaved regulator that reshapes the flows with their initial contracts does not increase the worst-case delay of the aggregate. This shaping-for-free property supports the computation of end-to-end latency bounds and the validation of the network’s timing requirements. A common method to establish the properties of a network element is to obtain a network-calculus service-curve model. The existence of such a model for the interleaved regulator remains an open question. If a service-curve model were found for the interleaved regulator, then the analysis of this mechanism would no longer be limited to the situations where the shaping-for-free holds, which would widen its use in time-sensitive networks. In this paper, we investigate if network-calculus service curves can capture the behavior of the interleaved regulator. For an interleaved regulator that is placed outside of the shaping-for-free requirements (after a non-FIFO system), we develop Spring, an adversarial traffic generation that yields unbounded latencies. Consequently, we prove that no network-calculus service curve exists to explain the interleaved regulator’s behavior. It is still possible to find non-trivial service curves for the interleaved regulator. However, their long-term rate cannot be large enough to provide any guarantee. Specifically, we prove that for the regulators that process at least four flows with the same contract, the long-term rate of any service curve is upper bounded by three times the rate of the per-flow contract.

交错调节器（由 IEEE TSN 异步流量整形实现）可用于时间敏感型网络，以按流量合约重塑流量。当应用于来自先进先出系统的流量集合时，交错调节器根据流量的初始合约对流量进行重塑，不会增加集合的最坏情况延迟。这种 "无整形 "特性有助于计算端到端延迟界限和验证网络的时序要求。建立网元属性的常用方法是获取网络计算服务曲线模型。交错调节器是否存在这样的模型仍是一个未决问题。如果能找到交错调节器的服务曲线模型，那么对该机制的分析将不再局限于无整形的情况，这将扩大其在时间敏感网络中的应用。在本文中，我们研究了网络计算服务曲线能否捕捉交错调节器的行为。对于置于无整形要求之外的交错调节器（在非 FIFO 系统之后），我们开发了 Spring，一种产生无限制延迟的对抗性流量生成。因此，我们证明不存在网络计算服务曲线来解释交错调节器的行为。我们仍有可能为交错调节器找到非三维服务曲线。然而，它们的长期速率不可能大到足以提供任何保证。具体地说，我们证明了对于用相同合约处理至少四个流量的调节器来说，任何服务曲线的长期速率的上界都是每流量合约速率的三倍。

{"title":"Network-calculus service curves of the interleaved regulator","authors":"Ludovic Thomas , Jean-Yves Le Boudec","doi":"10.1016/j.peva.2024.102443","DOIUrl":"10.1016/j.peva.2024.102443","url":null,"abstract":"<div><p>The interleaved regulator (implemented by IEEE TSN Asynchronous Traffic Shaping) is used in time-sensitive networks for reshaping the flows with per-flow contracts. When applied to an aggregate of flows that come from a FIFO system, an interleaved regulator that reshapes the flows with their initial contracts does not increase the worst-case delay of the aggregate. This shaping-for-free property supports the computation of end-to-end latency bounds and the validation of the network’s timing requirements. A common method to establish the properties of a network element is to obtain a network-calculus service-curve model. The existence of such a model for the interleaved regulator remains an open question. If a service-curve model were found for the interleaved regulator, then the analysis of this mechanism would no longer be limited to the situations where the shaping-for-free holds, which would widen its use in time-sensitive networks. In this paper, we investigate if network-calculus service curves can capture the behavior of the interleaved regulator. For an interleaved regulator that is placed outside of the shaping-for-free requirements (after a non-FIFO system), we develop Spring, an adversarial traffic generation that yields unbounded latencies. Consequently, we prove that no network-calculus service curve exists to explain the interleaved regulator’s behavior. It is still possible to find non-trivial service curves for the interleaved regulator. However, their long-term rate cannot be large enough to provide any guarantee. Specifically, we prove that for the regulators that process at least four flows with the same contract, the long-term rate of any service curve is upper bounded by three times the rate of the per-flow contract.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"166 ","pages":"Article 102443"},"PeriodicalIF":1.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142241259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance evaluation of containers for low-latency packet processing in virtualized network environments 虚拟化网络环境中用于低延迟数据包处理的容器性能评估

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-08-28 DOI: 10.1016/j.peva.2024.102442

Florian Wiedner, Max Helm, Alexander Daichendt, Jonas Andre, Georg Carle

Packet processing in current network scenarios faces complex challenges due to the increasing prevalence of requirements such as low latency, high reliability, and resource sharing. Virtualization is a potential solution to mitigate these challenges by enabling resource sharing and on-demand provisioning; however, ensuring high reliability and ultra-low latency remains a key challenge. Since bare-metal systems are often impractical because of high cost and space usage, and the overhead of virtual machines (VMs) is substantial, we evaluate the utilization of containers as a potential lightweight solution for low-latency packet processing. Herein, we discuss the benefits and drawbacks and encourage container environments in low-latency packet processing when the degree of isolation of customer data is adequate and bare metal systems are unaffordable. Our results demonstrate that containers exhibit similar latency performance with more predictable tail-latency behavior than bare metal packet processing. Moreover, deciding which mainboard architecture to use, especially the cache division, is equally vital as containers are prone to higher latencies on more shared caches between cores especially when other optimizations cannot be used. We show that this has a higher impact on latencies within containers than on bare metal or VMs, resulting in the selection of hardware architectures following optimizations as a critical challenge. Furthermore, the results reveal that the virtualization overhead does not impact tail latencies.

由于低延迟、高可靠性和资源共享等要求日益普遍，当前网络场景中的数据包处理面临着复杂的挑战。虚拟化是缓解这些挑战的潜在解决方案，它可以实现资源共享和按需配置；然而，确保高可靠性和超低延迟仍然是一个关键挑战。由于裸机系统通常成本高、占用空间大，而且虚拟机（VM）的开销也很大，因此我们评估了利用容器作为低延迟数据包处理的潜在轻量级解决方案的可行性。在此，我们讨论了容器的优点和缺点，并鼓励在客户数据隔离程度足够且裸机系统无法负担的情况下，在低延迟数据包处理中使用容器环境。我们的研究结果表明，与裸机数据包处理相比，容器表现出相似的延迟性能和更可预测的尾延迟行为。此外，决定使用哪种主板架构，尤其是缓存划分也同样重要，因为容器在内核间共享缓存较多的情况下容易出现较高的延迟，尤其是在无法使用其他优化时。我们的研究表明，与裸机或虚拟机相比，这对容器内延迟的影响更大，因此在优化后选择硬件架构是一项关键挑战。此外，结果还显示，虚拟化开销不会影响尾部延迟。

{"title":"Performance evaluation of containers for low-latency packet processing in virtualized network environments","authors":"Florian Wiedner, Max Helm, Alexander Daichendt, Jonas Andre, Georg Carle","doi":"10.1016/j.peva.2024.102442","DOIUrl":"10.1016/j.peva.2024.102442","url":null,"abstract":"<div><p>Packet processing in current network scenarios faces complex challenges due to the increasing prevalence of requirements such as low latency, high reliability, and resource sharing. Virtualization is a potential solution to mitigate these challenges by enabling resource sharing and on-demand provisioning; however, ensuring high reliability and ultra-low latency remains a key challenge. Since bare-metal systems are often impractical because of high cost and space usage, and the overhead of virtual machines (VMs) is substantial, we evaluate the utilization of containers as a potential lightweight solution for low-latency packet processing. Herein, we discuss the benefits and drawbacks and encourage container environments in low-latency packet processing when the degree of isolation of customer data is adequate and bare metal systems are unaffordable. Our results demonstrate that containers exhibit similar latency performance with more predictable tail-latency behavior than bare metal packet processing. Moreover, deciding which mainboard architecture to use, especially the cache division, is equally vital as containers are prone to higher latencies on more shared caches between cores especially when other optimizations cannot be used. We show that this has a higher impact on latencies within containers than on bare metal or VMs, resulting in the selection of hardware architectures following optimizations as a critical challenge. Furthermore, the results reveal that the virtualization overhead does not impact tail latencies.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"166 ","pages":"Article 102442"},"PeriodicalIF":1.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0166531624000476/pdfft?md5=92c046df1bfad30f8dbdb77dadbb4fd5&pid=1-s2.0-S0166531624000476-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Security-reliability trade-off analysis for transmit antenna selection in cognitive ambient backscatter communications 认知环境反向散射通信中发射天线选择的安全性-可靠性权衡分析

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-08-26 DOI: 10.1016/j.peva.2024.102441

Ahmed N. Elbattrawy , Ahmed H. Abd El-Malek , Sherif I. Rabia , Waheed K. Zahra

Massive deployment of IoT devices raises the need for energy-efficient spectrum-efficient low-cost communications. Ambient backscatter communication (AmBC) technology provides a promising solution to achieve that. Moreover, incorporating AmBC with cognitive radio networks (CRNs) achieves better spectrum efficiency; however, this comes with performance drawbacks. In this work, we investigate the security and reliability performance of an underlay CRN with AmBC, where the backscattering device (BD) exploits the radio frequency (RF) signals of the secondary transmitter (ST), and both the ST and the BD share a common receiver. Different from previous work, we consider an ST with multiple antenna. The ST employs a transmit antenna selection (TAS) scheme to enhance the ST performance and overcome the performance degradation caused by the BD interference. TAS exploits multiple antenna diversity with lower hardware complexity and power consumption. Considering the Nakagami- $m$ fading model, closed-form expressions are derived for the outage probability (OP) and intercept probability (IP) of both the ST and the BD transmissions at the legitimate receiver and the eavesdropper. Moreover, the asymptotic behavior of OPs and IPs is also investigated in the high signal-to-noise ratio regime and the high main-to-eavesdropper ratio regime, respectively. Monte Carlo simulations are performed to validate the derived closed-form expressions. Numerical results show that employing TAS enhances the ST and BD reliability performance by percentages up to 98% and 80%, respectively, at high primary user interference threshold values. Moreover, it results in a better security-reliability trade-off for the ST and the BD.

物联网设备的大规模部署提高了对高能效、低成本频谱通信的需求。环境反向散射通信（AmBC）技术为实现这一目标提供了一种前景广阔的解决方案。此外，将 AmBC 与认知无线电网络（CRN）结合可实现更高的频谱效率，但同时也存在性能缺陷。在这项工作中，我们研究了采用 AmBC 的底层 CRN 的安全性和可靠性能，其中后向散射设备（BD）利用了副发射机（ST）的射频（RF）信号，而 ST 和 BD 共享一个共同的接收器。与之前的研究不同，我们考虑的是带有多天线的 ST。ST 采用发射天线选择（TAS）方案来提高 ST 性能，克服 BD 干扰造成的性能下降。TAS 利用多天线分集，降低了硬件复杂度和功耗。考虑到 Nakagami-m fading 模型，得出了 ST 和 BD 传输在合法接收器和窃听器处的中断概率 (OP) 和截获概率 (IP) 的闭式表达式。此外，还分别研究了高信噪比机制和高主-窃听器比机制下 OP 和 IP 的渐近行为。蒙特卡罗模拟验证了得出的闭式表达式。数值结果表明，在主用户干扰阈值较高的情况下，采用 TAS 可将 ST 和 BD 可靠性能分别提高 98% 和 80%。此外，它还为 ST 和 BD 带来了更好的安全性-可靠性权衡。

{"title":"Security-reliability trade-off analysis for transmit antenna selection in cognitive ambient backscatter communications","authors":"Ahmed N. Elbattrawy , Ahmed H. Abd El-Malek , Sherif I. Rabia , Waheed K. Zahra","doi":"10.1016/j.peva.2024.102441","DOIUrl":"10.1016/j.peva.2024.102441","url":null,"abstract":"<div><p>Massive deployment of IoT devices raises the need for energy-efficient spectrum-efficient low-cost communications. Ambient backscatter communication (AmBC) technology provides a promising solution to achieve that. Moreover, incorporating AmBC with cognitive radio networks (CRNs) achieves better spectrum efficiency; however, this comes with performance drawbacks. In this work, we investigate the security and reliability performance of an underlay CRN with AmBC, where the backscattering device (BD) exploits the radio frequency (RF) signals of the secondary transmitter (ST), and both the ST and the BD share a common receiver. Different from previous work, we consider an ST with multiple antenna. The ST employs a transmit antenna selection (TAS) scheme to enhance the ST performance and overcome the performance degradation caused by the BD interference. TAS exploits multiple antenna diversity with lower hardware complexity and power consumption. Considering the Nakagami-<span><math><mi>m</mi></math></span> fading model, closed-form expressions are derived for the outage probability (OP) and intercept probability (IP) of both the ST and the BD transmissions at the legitimate receiver and the eavesdropper. Moreover, the asymptotic behavior of OPs and IPs is also investigated in the high signal-to-noise ratio regime and the high main-to-eavesdropper ratio regime, respectively. Monte Carlo simulations are performed to validate the derived closed-form expressions. Numerical results show that employing TAS enhances the ST and BD reliability performance by percentages up to 98% and 80%, respectively, at high primary user interference threshold values. Moreover, it results in a better security-reliability trade-off for the ST and the BD.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"166 ","pages":"Article 102441"},"PeriodicalIF":1.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142094687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lure: A simulator for networks of batteryless intermittent nodes 诱惑无电池间歇节点网络模拟器

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-08-23 DOI: 10.1016/j.peva.2024.102440

Mathew L. Wymore, Rohit Sahu, Thomas Ruminski, Vishal Deep, Morgan Ambourn, Gregory Ling, Vishak Narayanan, William Asiedu, Daji Qiao, Henry Duwe

The emerging paradigm of batteryless intermittent sensor networks (BISNs) presents new challenges for researchers of low-power wireless systems and protocols. The nature of these challenges exacerbates the difficulty of evaluating networks of physical sensor nodes, making simulation an even more important component in evaluating performance metrics, such as communication throughput and delay, for BISN designs. To our knowledge, existing simulators and analytical models do not meet the unique needs of BISN research; therefore, we have created a new open-source BISN simulator named Lure. Lure is designed from the ground-up for simulation of batteryless intermittent systems and networks. Written in Python, Lure is powerful, flexible, highly configurable, and supports rapid prototyping of new protocols, systems, and applications, with a low learning curve. In this paper, we present Lure and validate it with experimental data to show that Lure can accurately reflect the reality of BISNs. We then demonstrate the process of applying Lure to research questions in select case studies.

新兴的无电池间歇式传感器网络（BISN）模式为低功耗无线系统和协议研究人员带来了新的挑战。这些挑战的性质加剧了评估物理传感器节点网络的难度，使仿真成为评估 BISN 设计性能指标（如通信吞吐量和延迟）的一个更加重要的组成部分。据我们所知，现有的模拟器和分析模型无法满足 BISN 研究的独特需求；因此，我们创建了一个新的开源 BISN 模拟器，名为 Lure。Lure 是专为模拟无电池间歇系统和网络而设计的。Lure 使用 Python 编写，功能强大、灵活、可配置性高，支持新协议、系统和应用的快速原型开发，学习曲线较低。在本文中，我们将介绍 Lure，并通过实验数据对其进行验证，以表明 Lure 能够准确反映 BISN 的实际情况。然后，我们演示了将 Lure 应用于精选案例研究中的研究问题的过程。

{"title":"Lure: A simulator for networks of batteryless intermittent nodes","authors":"Mathew L. Wymore, Rohit Sahu, Thomas Ruminski, Vishal Deep, Morgan Ambourn, Gregory Ling, Vishak Narayanan, William Asiedu, Daji Qiao, Henry Duwe","doi":"10.1016/j.peva.2024.102440","DOIUrl":"10.1016/j.peva.2024.102440","url":null,"abstract":"<div><p>The emerging paradigm of batteryless intermittent sensor networks (BISNs) presents new challenges for researchers of low-power wireless systems and protocols. The nature of these challenges exacerbates the difficulty of evaluating networks of physical sensor nodes, making simulation an even more important component in evaluating performance metrics, such as communication throughput and delay, for BISN designs. To our knowledge, existing simulators and analytical models do not meet the unique needs of BISN research; therefore, we have created a new open-source BISN simulator named <em>Lure</em>. Lure is designed from the ground-up for simulation of batteryless intermittent systems and networks. Written in Python, Lure is powerful, flexible, highly configurable, and supports rapid prototyping of new protocols, systems, and applications, with a low learning curve. In this paper, we present Lure and validate it with experimental data to show that Lure can accurately reflect the reality of BISNs. We then demonstrate the process of applying Lure to research questions in select case studies.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"166 ","pages":"Article 102440"},"PeriodicalIF":1.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0166531624000452/pdfft?md5=1c6343234e3ac7dad5efd12075fa6bfd&pid=1-s2.0-S0166531624000452-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142094686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inference latency prediction for CNNs on heterogeneous mobile devices and ML frameworks 异构移动设备和 ML 框架上 CNN 的推理延迟预测

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-08-01 DOI: 10.1016/j.peva.2024.102429

Zhuojin Li, Marco Paolieri, Leana Golubchik

Due to the proliferation of inference tasks on mobile devices, state-of-the-art neural architectures are typically designed using Neural Architecture Search (NAS) to achieve good tradeoffs between machine learning accuracy and inference latency. While measuring inference latency of a huge set of candidate architectures during NAS is not feasible, latency prediction for mobile devices is challenging, because of hardware heterogeneity, optimizations applied by machine learning frameworks, and diversity of neural architectures. Motivated by these challenges, we first quantitatively assess the characteristics of neural architectures (specifically, convolutional neural networks for image classification), ML frameworks, and mobile devices that have significant effects on inference latency. Based on this assessment, we propose an operation-wise framework which addresses these challenges by developing operation-wise latency predictors and achieves high accuracy in end-to-end latency predictions, as shown by our comprehensive evaluations on multiple mobile devices using multicore CPUs and GPUs. To illustrate that our approach does not require expensive data collection, we also show that accurate predictions can be achieved on real-world neural architectures using only small amounts of profiling data.

由于移动设备上推理任务的激增，最先进的神经架构通常采用神经架构搜索（NAS）来设计，以便在机器学习准确性和推理延迟之间实现良好的权衡。虽然在 NAS 期间测量大量候选架构的推理延迟并不可行，但由于硬件异构性、机器学习框架应用的优化以及神经架构的多样性，移动设备的延迟预测具有挑战性。在这些挑战的激励下，我们首先对神经架构（特别是用于图像分类的卷积神经网络）、机器学习框架和移动设备对推理延迟有显著影响的特性进行了定量评估。在此评估基础上，我们提出了一种操作型框架，通过开发操作型延迟预测器来应对这些挑战，并在使用多核 CPU 和 GPU 的多种移动设备上进行了全面评估，结果表明端到端延迟预测的准确性很高。为了说明我们的方法不需要昂贵的数据收集，我们还展示了仅使用少量剖析数据就能在真实世界的神经架构上实现准确预测。

{"title":"Inference latency prediction for CNNs on heterogeneous mobile devices and ML frameworks","authors":"Zhuojin Li, Marco Paolieri, Leana Golubchik","doi":"10.1016/j.peva.2024.102429","DOIUrl":"10.1016/j.peva.2024.102429","url":null,"abstract":"<div><p>Due to the proliferation of inference tasks on mobile devices, state-of-the-art neural architectures are typically designed using Neural Architecture Search (NAS) to achieve good tradeoffs between machine learning accuracy and inference latency. While measuring inference latency of a huge set of candidate architectures during NAS is not feasible, latency prediction for mobile devices is challenging, because of hardware heterogeneity, optimizations applied by machine learning frameworks, and diversity of neural architectures. Motivated by these challenges, we first quantitatively assess the characteristics of neural architectures (specifically, convolutional neural networks for image classification), ML frameworks, and mobile devices that have significant effects on inference latency. Based on this assessment, we propose an operation-wise framework which addresses these challenges by developing operation-wise latency predictors and achieves high accuracy in end-to-end latency predictions, as shown by our comprehensive evaluations on multiple mobile devices using multicore CPUs and GPUs. To illustrate that our approach does not require expensive data collection, we also show that accurate predictions can be achieved on real-world neural architectures using only small amounts of profiling data.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"165 ","pages":"Article 102429"},"PeriodicalIF":1.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141714597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Preface: Special issue on ACM/SPEC ICPE 2023 前言：ACM/SPEC ICPE 2023 特刊

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-08-01 DOI: 10.1016/j.peva.2024.102430

Antinisca Di Marco (Research Track Co-Chairs) , Petr Tůma (Research Track Co-Chairs)

引用次数: 0

Retransmission performance in a stochastic geometric cellular network model 随机几何蜂窝网络模型中的重传性能

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-06-22 DOI: 10.1016/j.peva.2024.102428

Ingemar Kaj, Taisiia Morozova

Suppose sender–receiver transmission links in a downlink network at a given data rate are subject to fading, path loss, and inter-cell interference, and that transmissions either pass, suffer loss, or incur retransmission delay. We introduce a method to obtain the average activity level of the system required for handling the buffered work and from this derive the resulting coverage probability and key performance measures. The technique involves a family of stationary buffer distributions which is used to solve iteratively a nonlinear balance equation for the unknown busy-link probability and then identify throughput, loss probability, and delay. The results allow for a straightforward numerical investigation of performance indicators, are in special cases explicit and may be easily used to study the trade-off between reliability, latency, and data rate.

假设在给定数据速率的下行链路网络中，发送方-接收方传输链路受到衰减、路径损耗和小区间干扰的影响，传输要么通过，要么丢失，要么产生重传延迟。我们引入了一种方法来获取处理缓冲工作所需的系统平均活动水平，并由此得出覆盖概率和关键性能指标。该技术涉及一系列静态缓冲区分布，用于反复求解未知忙链路概率的非线性平衡方程，然后确定吞吐量、损失概率和延迟。这些结果允许对性能指标进行直接的数值研究，在特殊情况下是明确的，并可轻松用于研究可靠性、延迟和数据速率之间的权衡。

引用次数: 0

On the performance evaluation of distributed join-idle-queue load balancing with and without token withdrawals 关于有令牌撤回和无令牌撤回的分布式加入-闲置-队列负载平衡的性能评估

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-06-21 DOI: 10.1016/j.peva.2024.102427

Benny Van Houdt

Distributed Join-Idle-Queue load balancing is known to achieve vanishing waiting times in the large-scale limit provided that the number of dispatchers remains fixed, while the number of servers tends to infinity. When the number of dispatchers $m$ scales to infinity together with the number of servers $n$ , such that $r = n / m$ remains fixed, the large-scale performance of Join-Idle-Queue load balancing is less clear as waiting times no longer vanish.

In this paper we first discuss some existing mean field models for distributed Join-Idle-Queue load balancing with $r = n / m$ fixed and explain why the well-known model introduced in Lu et al. (2011) is not exact in the large-scale limit. The inexactness is caused by mixing two variants of distributed Join-Idle-Queue load balancing: a variant with and one without token withdrawals. Next we introduce mean field models for Join-Idle-Queue load balancing with and without token withdrawals, where an idle server places a token at a dispatcher with the shortest among $d$ randomly chosen dispatchers.

The introduced mean field models in case of token withdrawals imply that for phase type distributed service times and a total job arrival rate of $λ n < n$ , the response time of a job corresponds to that in a standard M/PH/1 queue with load $λ q_{0}$ . The value of $q_{0}$ can be determined numerically and depends on $λ, r$ and $d$ , but not on the job size distribution (apart from its mean). This simple behavior is lost if token withdrawals do not take place. For the models without withdrawals we develop fast numerical algorithms to determine the performance. We present simulation experiments that suggest that the unique fixed point of the introduced mean field models provides exact results in the large-scale limit.

众所周知，分布式加入-闲置-队列（Join-Idle-Queue）负载均衡在大规模极限中实现了等待时间的消失，前提是调度员的数量保持固定，而服务器的数量趋于无穷大。本文首先讨论了在 r=n/m 固定的情况下，分布式 Join-Idle-Queue 负载平衡的一些现有均值场模型，并解释了 Lu 等人（2011）提出的著名模型在大规模极限下不精确的原因。造成不精确的原因是混合了分布式 Join-Idle-Queue 负载平衡的两种变体：一种是有令牌提取的变体，另一种是没有令牌提取的变体。接下来，我们为有令牌撤回和无令牌撤回的 Join-Idle-Queue 负载平衡引入均值场模型，其中空闲服务器将令牌放置在随机选择的 d 个调度器中最短的调度器上。在令牌撤回情况下引入的均值场模型意味着，对于相类型分布式服务时间和总作业到达率 λn<n，作业的响应时间对应于负载 λq0 的标准 M/PH/1 队列中的响应时间。q0 的值可以通过数值确定，它取决于 λ、r 和 d，但不取决于作业大小分布（除了其平均值）。如果代币不提取，这种简单的行为就会消失。对于没有撤回的模型，我们开发了快速数值算法来确定其性能。我们进行的模拟实验表明，引入的均值场模型的唯一定点在大规模极限中提供了精确的结果。

{"title":"On the performance evaluation of distributed join-idle-queue load balancing with and without token withdrawals","authors":"Benny Van Houdt","doi":"10.1016/j.peva.2024.102427","DOIUrl":"https://doi.org/10.1016/j.peva.2024.102427","url":null,"abstract":"<div><p>Distributed Join-Idle-Queue load balancing is known to achieve vanishing waiting times in the large-scale limit provided that the number of dispatchers remains fixed, while the number of servers tends to infinity. When the number of dispatchers <span><math><mi>m</mi></math></span> scales to infinity together with the number of servers <span><math><mi>n</mi></math></span>, such that <span><math><mrow><mi>r</mi><mo>=</mo><mi>n</mi><mo>/</mo><mi>m</mi></mrow></math></span> remains fixed, the large-scale performance of Join-Idle-Queue load balancing is less clear as waiting times no longer vanish.</p><p>In this paper we first discuss some existing mean field models for distributed Join-Idle-Queue load balancing with <span><math><mrow><mi>r</mi><mo>=</mo><mi>n</mi><mo>/</mo><mi>m</mi></mrow></math></span> fixed and explain why the well-known model introduced in Lu et al. (2011) is not exact in the large-scale limit. The inexactness is caused by mixing two variants of distributed Join-Idle-Queue load balancing: a variant with and one without token withdrawals. Next we introduce mean field models for Join-Idle-Queue load balancing with and without token withdrawals, where an idle server places a token at a dispatcher with the shortest among <span><math><mi>d</mi></math></span> randomly chosen dispatchers.</p><p>The introduced mean field models in case of token withdrawals imply that for phase type distributed service times and a total job arrival rate of <span><math><mrow><mi>λ</mi><mi>n</mi><mo><</mo><mi>n</mi></mrow></math></span>, the response time of a job corresponds to that in a standard M/PH/1 queue with load <span><math><mrow><mi>λ</mi><msub><mrow><mi>q</mi></mrow><mrow><mn>0</mn></mrow></msub></mrow></math></span>. The value of <span><math><msub><mrow><mi>q</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> can be determined numerically and depends on <span><math><mrow><mi>λ</mi><mo>,</mo><mi>r</mi></mrow></math></span> and <span><math><mi>d</mi></math></span>, but not on the job size distribution (apart from its mean). This simple behavior is lost if token withdrawals do not take place. For the models without withdrawals we develop fast numerical algorithms to determine the performance. We present simulation experiments that suggest that the unique fixed point of the introduced mean field models provides exact results in the large-scale limit.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"165 ","pages":"Article 102427"},"PeriodicalIF":1.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141542944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy performance of off-grid green cellular base stations 离网绿色蜂窝基站的能源性能

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Performance Evaluation

Pub Date : 2024-06-10 DOI: 10.1016/j.peva.2024.102426

Godlove Suila Kuaban , Erol Gelenbe , Tadeusz Czachórski , Piotr Czekalski , Valery Nkemeni

The most energy-hungry parts of mobile networks are the base station sites, which consume around $60 - 80 %$ of their total energy. One of the approaches for relieving this energy pressure on the electricity grid infrastructure and reducing the Operational Expenditures (OPEX) is to power base stations with renewable energy. However, the design of a green mobile network requires the dimensioning of the energy harvesting and storage systems through the estimation of the network’s energy demand. Therefore, this paper develops a diffusion-based modelling framework for solar-powered green off-grid base station sites. We apply this framework to evaluate the energy performance of homogeneous and hybrid energy storage systems supplied by harvested solar energy. We present the complete analysis, with numerical examples, to study the relationship between the design parameters and the energy performance metrics. The numerical computations demonstrate how the proposed framework can be applied to evaluate homogeneous and unconventional hybrid energy storage systems.

移动网络中最耗能的部分是基站，其能耗约占总能耗的 60-80%。缓解电网基础设施的能源压力并降低运营支出（OPEX）的方法之一是利用可再生能源为基站供电。然而，绿色移动网络的设计需要通过估算网络的能源需求来确定能量收集和存储系统的尺寸。因此，本文为太阳能供电的绿色离网基站站点开发了一个基于扩散的建模框架。我们应用这一框架来评估由太阳能收集提供能量的均质和混合储能系统的能量性能。我们以数值实例介绍了完整的分析方法，以研究设计参数与能源性能指标之间的关系。数值计算证明了所提出的框架可用于评估同质和非常规混合储能系统。

{"title":"Energy performance of off-grid green cellular base stations","authors":"Godlove Suila Kuaban , Erol Gelenbe , Tadeusz Czachórski , Piotr Czekalski , Valery Nkemeni","doi":"10.1016/j.peva.2024.102426","DOIUrl":"10.1016/j.peva.2024.102426","url":null,"abstract":"<div><p>The most energy-hungry parts of mobile networks are the base station sites, which consume around <span><math><mrow><mn>60</mn><mo>−</mo><mn>80</mn><mtext>%</mtext></mrow></math></span> of their total energy. One of the approaches for relieving this energy pressure on the electricity grid infrastructure and reducing the Operational Expenditures (OPEX) is to power base stations with renewable energy. However, the design of a green mobile network requires the dimensioning of the energy harvesting and storage systems through the estimation of the network’s energy demand. Therefore, this paper develops a diffusion-based modelling framework for solar-powered green off-grid base station sites. We apply this framework to evaluate the energy performance of homogeneous and hybrid energy storage systems supplied by harvested solar energy. We present the complete analysis, with numerical examples, to study the relationship between the design parameters and the energy performance metrics. The numerical computations demonstrate how the proposed framework can be applied to evaluate homogeneous and unconventional hybrid energy storage systems.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"165 ","pages":"Article 102426"},"PeriodicalIF":1.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141408839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0