Pub Date : 2024-02-20DOI: 10.1016/j.jpdc.2024.104866
Mohammed M. Alani
With the promise of higher throughput, and better response times, 6G networks provide a significant enabler for smart cities to evolve. The rapidly-growing reliance on connected devices within the smart city context encourages malicious actors to target these devices to achieve various malicious goals. In this paper, we present a novel defense technique that creates a cloud-based virtualized honeypot/twin that is designed to receive malicious traffic through edge-based machine learning-enabled detection system. The proposed system performs early identification of malicious traffic in a software defined network-enabled edge routing point to divert that traffic away from the 6G-enabled smart city endpoints. Testing of the proposed system showed an accuracy exceeding 99.8%, with an score of 0.9984.
{"title":"HoneyTwin: Securing smart cities with machine learning-enabled SDN edge and cloud-based honeypots","authors":"Mohammed M. Alani","doi":"10.1016/j.jpdc.2024.104866","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104866","url":null,"abstract":"<div><p>With the promise of higher throughput, and better response times, 6G networks provide a significant enabler for smart cities to evolve. The rapidly-growing reliance on connected devices within the smart city context encourages malicious actors to target these devices to achieve various malicious goals. In this paper, we present a novel defense technique that creates a cloud-based virtualized honeypot/twin that is designed to receive malicious traffic through edge-based machine learning-enabled detection system. The proposed system performs early identification of malicious traffic in a software defined network-enabled edge routing point to divert that traffic away from the 6G-enabled smart city endpoints. Testing of the proposed system showed an accuracy exceeding 99.8%, with an <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score of 0.9984.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139942060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-18DOI: 10.1016/j.jpdc.2024.104867
Wenjie Tang, Yiping Yao, Lizhen Ou, Kai Chen
Publish–subscribe communication is a fundamental service used for message-passing between decoupled applications in distributed simulation. When abundant unnecessary data transfer is introduced, interest-matching services are needed to filter irrelevant message traffic. Frequent demands during simulation execution makes interest matching a bottleneck with increased simulation scale. Contemporary algorithms built for serial processing inadequately leverage multicore processor-based parallel resources. Parallel algorithmic improvements are insufficient for large-scale simulations. Therefore, we propose a hierarchical sort-based parallel algorithm for dynamic interest matching that embeds all update and subscription regions into two full binary trees, thereby transferring the region-matching task to one of node-matching. It utilizes the association between adjacent nodes and the hierarchical relation between parent‒child nodes to eliminate redundant operations, and achieves incremental parallel matching that only compares changed regions. We analyze the time and space complexity of this process. The new algorithm performs better and is more scalable than state-of-the-art algorithms.
{"title":"Hierarchical sort-based parallel algorithm for dynamic interest matching","authors":"Wenjie Tang, Yiping Yao, Lizhen Ou, Kai Chen","doi":"10.1016/j.jpdc.2024.104867","DOIUrl":"10.1016/j.jpdc.2024.104867","url":null,"abstract":"<div><p>Publish–subscribe communication is a fundamental service used for message-passing between decoupled applications in distributed simulation. When abundant unnecessary data transfer is introduced, interest-matching services are needed to filter irrelevant message traffic. Frequent demands during simulation execution makes interest matching a bottleneck with increased simulation scale. Contemporary algorithms built for serial processing inadequately leverage multicore processor-based parallel resources. Parallel algorithmic improvements are insufficient for large-scale simulations. Therefore, we propose a hierarchical sort-based parallel algorithm for dynamic interest matching that embeds all update and subscription regions into two full binary trees, thereby transferring the region-matching task to one of node-matching. It utilizes the association between adjacent nodes and the hierarchical relation between parent‒child nodes to eliminate redundant operations, and achieves incremental parallel matching that only compares changed regions. We analyze the time and space complexity of this process. The new algorithm performs better and is more scalable than state-of-the-art algorithms.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.1016/j.jpdc.2024.104863
Anne Benoit , Thomas Herault , Lucas Perotin , Yves Robert , Frédéric Vivien
This work revisits I/O bandwidth-sharing strategies for HPC applications. When several applications post concurrent I/O operations, well-known approaches include serializing these operations () or fair-sharing the bandwidth across them (FairShare). Another recent approach, I/O-Sets, assigns priorities to the applications, which are classified into different sets based upon the average length of their iterations. We introduce several new bandwidth-sharing strategies, some of them simple greedy algorithms, and some of them more complicated to implement, and we compare them with existing ones. Our new strategies do not rely on any a-priori knowledge of the behavior of the applications, such as the length of work phases, the volume of I/O operations, or some expected periodicity. We introduce a rigorous framework, namely steady-state windows, which enables to derive bounds on the competitive ratio of all bandwidth-sharing strategies for three different objectives: minimum yield, platform utilization, and global efficiency. To the best of our knowledge, this work is the first to provide a quantitative assessment of the online competitiveness of any bandwidth-sharing strategy. This theory-oriented assessment is complemented by a comprehensive set of simulations, based upon both synthetic and realistic traces. The main conclusion is that two of our simple and low-complexity greedy strategies significantly outperform , FairShare and I/O-Sets, and we recommend that the I/O community would implement them for further assessment.
{"title":"Revisiting I/O bandwidth-sharing strategies for HPC applications","authors":"Anne Benoit , Thomas Herault , Lucas Perotin , Yves Robert , Frédéric Vivien","doi":"10.1016/j.jpdc.2024.104863","DOIUrl":"10.1016/j.jpdc.2024.104863","url":null,"abstract":"<div><p>This work revisits I/O bandwidth-sharing strategies for HPC applications. When several applications post concurrent I/O operations, well-known approaches include serializing these operations (<figure><img></figure>) or fair-sharing the bandwidth across them (<span>FairShare</span>). Another recent approach, I/O-Sets, assigns priorities to the applications, which are classified into different sets based upon the average length of their iterations. We introduce several new bandwidth-sharing strategies, some of them simple greedy algorithms, and some of them more complicated to implement, and we compare them with existing ones. Our new strategies do not rely on any a-priori knowledge of the behavior of the applications, such as the length of work phases, the volume of I/O operations, or some expected periodicity. We introduce a rigorous framework, namely <em>steady-state windows</em>, which enables to derive bounds on the competitive ratio of all bandwidth-sharing strategies for three different objectives: minimum yield, platform utilization, and global efficiency. To the best of our knowledge, this work is the first to provide a quantitative assessment of the online competitiveness of any bandwidth-sharing strategy. This theory-oriented assessment is complemented by a comprehensive set of simulations, based upon both synthetic and realistic traces. The main conclusion is that two of our simple and low-complexity greedy strategies significantly outperform <figure><img></figure>, <span>FairShare</span> and I/O-Sets, and we recommend that the I/O community would implement them for further assessment.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139878546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-12DOI: 10.1016/S0743-7315(24)00023-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00023-6","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00023-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000236/pdfft?md5=8661326c859cab793505056ef1edee51&pid=1-s2.0-S0743731524000236-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139726370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.1016/j.jpdc.2024.104855
Ricardo Quislant, Eladio Gutierrez, Oscar Plata
Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, etc. Matrix Profile, a state-of-the-art algorithm to perform time series analysis, finds out the most similar and dissimilar subsequences in a time series in deterministic time and it is exact. Matrix Profile has low arithmetic intensity and it operates on large amounts of time series data, which can be an issue in terms of memory requirements. On the other hand, Hardware Transactional Memory (HTM) is an alternative optimistic synchronization method that executes transactions speculatively in parallel while keeping track of memory accesses to detect and resolve conflicts.
This work evaluates one of the best implementations of Matrix Profile exploring multiple multiprocessor variants and proposing new implementations that consider a variety of synchronization methods (HTM, locks, barriers), as well as algorithm organizations. We analyze these variants using real datasets, both short and large, in terms of speedup and memory requirements, the latter being a major issue when dealing with very large time series. The experimental evaluation shows that our proposals can achieve up to 100× speedup over the sequential algorithm for 128 threads, and up to 3× over the baseline, while keeping memory requirements low and even independent of the number of threads.
{"title":"Exploring multiprocessor approaches to time series analysis","authors":"Ricardo Quislant, Eladio Gutierrez, Oscar Plata","doi":"10.1016/j.jpdc.2024.104855","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104855","url":null,"abstract":"<div><p>Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, etc. <em>Matrix Profile</em>, a state-of-the-art algorithm to perform time series analysis, finds out the most similar and dissimilar subsequences in a time series in deterministic time and it is exact. Matrix Profile has low arithmetic intensity and it operates on large amounts of time series data, which can be an issue in terms of memory requirements. On the other hand, Hardware Transactional Memory (HTM) is an alternative optimistic synchronization method that executes transactions speculatively in parallel while keeping track of memory accesses to detect and resolve conflicts.</p><p>This work evaluates one of the best implementations of Matrix Profile exploring multiple multiprocessor variants and proposing new implementations that consider a variety of synchronization methods (HTM, locks, barriers), as well as algorithm organizations. We analyze these variants using real datasets, both short and large, in terms of speedup and memory requirements, the latter being a major issue when dealing with very large time series. The experimental evaluation shows that our proposals can achieve up to 100× speedup over the sequential algorithm for 128 threads, and up to 3× over the baseline, while keeping memory requirements low and even independent of the number of threads.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000194/pdfft?md5=a25b14cc13a327c9c4b6c5f9abde8126&pid=1-s2.0-S0743731524000194-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139732906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-07DOI: 10.1016/j.jpdc.2024.104854
Qiliang Li , Min Lyu , Liangliang Xu , Yinlong Xu
The RAID2.0 architecture, which uses dozens or even hundreds of disks, is widely adopted for large-capacity data storage. However, limited resources like memory and CPU cause RAID2.0 to execute batch recovery for disk failures. The traditional random data placement and recovery schemes result in highly skewed I/O access within a batch, which slows down the recovery speed. To address this issue, we propose DR-RAID, an efficient reconstruction scheme that balances local rebuilding workloads across all surviving disks within a batch. We dynamically select a batch of tasks with almost balanced read loads and make intra-batch adjustments for tasks with multiple solutions of reading source chunks. Furthermore, we use a bipartite graph model to achieve a uniform distribution of write loads. DR-RAID can be applied with homogeneous or heterogeneous disk rebuilding bandwidth. Experimental results demonstrate that in offline rebuilding, DR-RAID enhances the rebuilding throughput by up to 61.90% compared to the random data placement scheme. With varied rebuilding bandwidth, the improvement can reach up to 65.00%.
{"title":"Fast recovery for large disk enclosures based on RAID2.0: Algorithms and evaluation","authors":"Qiliang Li , Min Lyu , Liangliang Xu , Yinlong Xu","doi":"10.1016/j.jpdc.2024.104854","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104854","url":null,"abstract":"<div><p>The RAID2.0 architecture, which uses dozens or even hundreds of disks, is widely adopted for large-capacity data storage. However, limited resources like memory and CPU cause RAID2.0 to execute batch recovery for disk failures. The traditional random data placement and recovery schemes result in highly skewed I/O access within a batch, which slows down the recovery speed. To address this issue, we propose DR-RAID, an efficient reconstruction scheme that balances local rebuilding workloads across all surviving disks within a batch. We dynamically select a batch of tasks with almost balanced read loads and make intra-batch adjustments for tasks with multiple solutions of reading source chunks. Furthermore, we use a bipartite graph model to achieve a uniform distribution of write loads. DR-RAID can be applied with homogeneous or heterogeneous disk rebuilding bandwidth. Experimental results demonstrate that in offline rebuilding, DR-RAID enhances the rebuilding throughput by up to 61.90% compared to the random data placement scheme. With varied rebuilding bandwidth, the improvement can reach up to 65.00%.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139732543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-05DOI: 10.1016/j.jpdc.2024.104853
B. Naresh Kumar Reddy , Aruru Sai Kumar
Adaptive routing is effective in maintaining higher processor performance and avoids packets over minimal or non-minimal alternate routes without congestion for a multiprocessor system on chip. However, many systems cannot deal with the fact that sending packets over an alternative path rather than the shorter, fixed-priority route can result in packets arriving at the destination node out of order. This can occur if packets belonging to the same communication flow are adaptively routed through a different path. In real-world network systems, there are strategies and algorithms to efficiently handle out-of-order packets without requiring infinite memory. Techniques like buffering, sliding windows, and sequence number management are used to reorder packets while considering the practical constraints of available memory and processing power. The specific method used depends on the network protocol and the requirements of the application. In the proposed technique, a novel technique aimed at improving the performance of multiprocessor systems on chip by implementing adaptive routing based on the Bat algorithm. The framework employs 5 stage pipeline router, that completely gained and forward a packet at the perfect direction in an adaptive mode. Bat algorithm is used to enhance the performance, which can optimize route to transmit packets at the destination. A test was carried out on various NoC sizes (6 X 6 and 8 X 8) under multimedia benchmarks, compared with other related algorithms and implemented on Kintex-7 FPGA board. The outcomes of the simulation illustrate that the proposed algorithm reduces delay and improves the throughput over the other traditional adaptive algorithms.
对于芯片上的多处理器系统而言,自适应路由选择可有效保持较高的处理器性能,并避免数据包通过最小或非最小的备用路径而造成拥塞。然而,许多系统无法处理这样一个事实,即通过替代路径而不是更短、固定优先级的路径发送数据包,会导致数据包不按顺序到达目的地节点。如果属于同一通信流的数据包通过不同路径自适应路由,就会出现这种情况。在现实世界的网络系统中,有一些策略和算法可以在不需要无限内存的情况下有效处理失序数据包。缓冲、滑动窗口和序列号管理等技术可用于重新排序数据包,同时考虑可用内存和处理能力的实际限制。具体采用哪种方法取决于网络协议和应用程序的要求。在所提出的技术中,一种新型技术旨在通过实施基于 Bat 算法的自适应路由来提高芯片上多处理器系统的性能。该框架采用 5 级流水线路由器,以自适应模式在最佳方向完全获取和转发数据包。Bat 算法用于提高性能,可以优化路由,将数据包传送到目的地。在多媒体基准下对不同尺寸(6 X 6 和 8 X 8)的 NoC 进行了测试,与其他相关算法进行了比较,并在 Kintex-7 FPGA 板上进行了实现。仿真结果表明,与其他传统自适应算法相比,建议的算法减少了延迟,提高了吞吐量。
{"title":"Evaluating the effectiveness of Bat optimization in an adaptive and energy-efficient network-on-chip routing framework","authors":"B. Naresh Kumar Reddy , Aruru Sai Kumar","doi":"10.1016/j.jpdc.2024.104853","DOIUrl":"10.1016/j.jpdc.2024.104853","url":null,"abstract":"<div><p>Adaptive routing is effective in maintaining higher processor performance and avoids packets over minimal or non-minimal alternate routes without congestion for a multiprocessor system on chip. However, many systems cannot deal with the fact that sending packets over an alternative path rather than the shorter, fixed-priority route can result in packets arriving at the destination node out of order. This can occur if packets belonging to the same communication flow are adaptively routed through a different path. In real-world network systems, there are strategies and algorithms to efficiently handle out-of-order packets without requiring infinite memory. Techniques like buffering, sliding windows, and sequence number management are used to reorder packets while considering the practical constraints of available memory and processing power. The specific method used depends on the network protocol and the requirements of the application. In the proposed technique, a novel technique aimed at improving the performance of multiprocessor systems on chip by implementing adaptive routing based on the Bat algorithm. The framework employs 5 stage pipeline router, that completely gained and forward a packet at the perfect direction in an adaptive mode. Bat algorithm is used to enhance the performance, which can optimize route to transmit packets at the destination. A test was carried out on various NoC sizes (6 X 6 and 8 X 8) under multimedia benchmarks, compared with other related algorithms and implemented on Kintex-7 FPGA board. The outcomes of the simulation illustrate that the proposed algorithm reduces delay and improves the throughput over the other traditional adaptive algorithms.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the dispersion problem, a set of k co-located mobile robots must relocate themselves in distinct nodes of an unknown network. The network is modeled as an anonymous graph , where the graph's nodes are not labeled. The edges incident to a node v with degree d are labeled with port numbers in the range at v. The robots have unique IDs in the range , where , and are initially placed at a source node s. The task of the dispersion was traditionally achieved based on the assumption of two types of communication abilities: (a) when some robots are at the same node, they can communicate by exchanging messages between them, and (b) any two robots in the network can exchange messages between them. This paper investigates whether this communication ability among co-located robots is absolutely necessary to achieve dispersion. We establish that even in the absence of the ability of communication, the task of the dispersion by a set of mobile robots can be achieved in a much weaker model, where a robot at a node v has access to following very restricted information at the beginning of any round: (1) am I alone at v? (2) did the number of robots at v increase or decrease compared to the previous round?
We propose a deterministic distributed algorithm that achieves the dispersion on any given graph in time , where Δ is the maximum degree of a node in G. Further, each robot uses additional memory, i.e., memory other than the memory required to store its id. We also prove that the task of the dispersion cannot be achieved by a set of mobile robots with additional memory.
在分散问题中,一组 k 个共同定位的移动机器人必须在一个未知网络的不同节点上重新定位。该网络被建模为一个匿名图 G=(V,E),图中的节点没有标记。机器人的 ID 范围为 [0,L](其中 L≥k),并且最初被放置在一个源节点 s 上。传统上,分散任务的实现基于两种通信能力假设:(a) 当一些机器人处于同一节点时,它们之间可以通过交换信息进行通信;(b) 网络中的任何两个机器人之间都可以交换信息。本文研究了同处一地的机器人之间的通信能力是否是实现分散的绝对必要条件。我们发现,即使没有通信能力,一组移动机器人的分散任务也可以在一个弱得多的模型中实现,即节点 v 上的机器人在任何一轮开始时都能获得以下非常有限的信息:(1) v 上只有我一个人吗?(我们提出了一种确定性分布式算法,可在 O(klogL+k2logΔ) 的时间内实现任意给定图 G=(V,E) 的分散,其中 Δ 是 G 中节点的最大度数、内存。我们还证明,使用 O(logL+logΔ) 额外内存的一组移动机器人无法完成分散任务。
{"title":"Collaborative dispersion by silent robots","authors":"Barun Gorain , Partha Sarathi Mandal , Kaushik Mondal , Supantha Pandit","doi":"10.1016/j.jpdc.2024.104852","DOIUrl":"10.1016/j.jpdc.2024.104852","url":null,"abstract":"<div><p>In the dispersion problem, a set of <em>k</em> co-located mobile robots must relocate themselves in distinct nodes of an unknown network. The network is modeled as an anonymous graph <span><math><mi>G</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>,</mo><mi>E</mi><mo>)</mo></math></span>, where the graph's nodes are not labeled. The edges incident to a node <em>v</em> with degree <em>d</em> are labeled with port numbers in the range <span><math><mo>{</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>d</mi><mo>−</mo><mn>1</mn><mo>}</mo></math></span> at <em>v</em>. The robots have unique IDs in the range <span><math><mo>[</mo><mn>0</mn><mo>,</mo><mi>L</mi><mo>]</mo></math></span>, where <span><math><mi>L</mi><mo>≥</mo><mi>k</mi></math></span>, and are initially placed at a source node <em>s</em>. The task of the dispersion was traditionally achieved based on the assumption of two types of communication abilities: (a) when some robots are at the same node, they can communicate by exchanging messages between them, and (b) any two robots in the network can exchange messages between them. This paper investigates whether this communication ability among co-located robots is absolutely necessary to achieve dispersion. We establish that even in the absence of the ability of communication, the task of the dispersion by a set of mobile robots can be achieved in a much weaker model, where a robot at a node <em>v</em> has access to following very restricted information at the beginning of any round: (1) am I alone at <em>v</em>? (2) did the number of robots at <em>v</em> increase or decrease compared to the previous round?</p><p>We propose a deterministic distributed algorithm that achieves the dispersion on any given graph <span><math><mi>G</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>,</mo><mi>E</mi><mo>)</mo></math></span> in time <span><math><mi>O</mi><mrow><mo>(</mo><mi>k</mi><mi>log</mi><mo></mo><mi>L</mi><mo>+</mo><msup><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>log</mi><mo></mo><mi>Δ</mi><mo>)</mo></mrow></math></span>, where Δ is the maximum degree of a node in <em>G</em>. Further, each robot uses <span><math><mi>O</mi><mo>(</mo><mi>log</mi><mo></mo><mi>L</mi><mo>+</mo><mi>log</mi><mo></mo><mi>Δ</mi><mo>)</mo></math></span> additional memory, i.e., memory other than the memory required to store its id. We also prove that the task of the dispersion cannot be achieved by a set of mobile robots with <span><math><mi>o</mi><mo>(</mo><mi>log</mi><mo></mo><mi>L</mi><mo>+</mo><mi>log</mi><mo></mo><mi>Δ</mi><mo>)</mo></math></span> additional memory.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01DOI: 10.1016/j.jpdc.2024.104851
Lizeth Patricia Aguirre Sanchez, Yao Shen, Minyi Guo
In recent decades, the exponential growth of applications has intensified traffic demands, posing challenges in ensuring optimal user experiences within modern networks. Traditional congestion avoidance and control mechanisms embedded in conventional routing struggle to promptly adapt to new-generation networks. Current routing approaches risk-averse outcomes such as (1) scalability constraints, (2) high convergence times, and (3) congestion due to inadequate real-time traffic prioritization. To address these issues, this paper introduces a QoS-Driven Routing Optimization in Software-Defined Networking (SDN) using Deep Reinforcement Learning (DRL) to optimize routing and enhance QoS efficiency. Employing DRL, the proposed DQS optimizes routing decisions by intelligently distributing traffic, guided by a multi-objective function-driven DRL agent that considers both link and queue metrics. Despite the complexity of the network, DQS sustains scalability while significantly reducing convergence times. Through a Docker-based Openflow prototype, results highlight a substantial 20-30% reduction in end-to-end delay compared to baseline methods.
{"title":"DQS: A QoS-driven routing optimization approach in SDN using deep reinforcement learning","authors":"Lizeth Patricia Aguirre Sanchez, Yao Shen, Minyi Guo","doi":"10.1016/j.jpdc.2024.104851","DOIUrl":"10.1016/j.jpdc.2024.104851","url":null,"abstract":"<div><p>In recent decades, the exponential growth of applications has intensified traffic demands, posing challenges in ensuring optimal user experiences within modern networks. Traditional congestion avoidance and control mechanisms embedded in conventional routing struggle to promptly adapt to new-generation networks. Current routing approaches risk-averse outcomes such as (1) scalability constraints, (2) high convergence times, and (3) congestion due to inadequate real-time traffic prioritization. To address these issues, this paper introduces a QoS-Driven Routing Optimization in Software-Defined Networking (SDN) using Deep Reinforcement Learning (DRL) to optimize routing and enhance QoS efficiency. Employing DRL, the proposed DQS optimizes routing decisions by intelligently distributing traffic, guided by a multi-objective function-driven DRL agent that considers both link and queue metrics. Despite the complexity of the network, DQS sustains scalability while significantly reducing convergence times. Through a Docker-based Openflow prototype, results highlight a substantial 20-30% reduction in end-to-end delay compared to baseline methods.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139665146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01DOI: 10.1016/j.jpdc.2024.104850
Zengpeng Li, Huiqun Yu, Guisheng Fan, Jiayin Zhang, Jin Xu
The rapid development of Deep Neural Networks (DNNs) lays solid foundations for Internet of Things systems. However, mobile devices with limited processing capacity and short battery life confront the difficulties of executing complex DNNs. To satisfy different Quality of Service requirements, a feasible solution is offloading DNN layers to edge nodes and the cloud. The energy-efficient offloading problem for DNN-based applications with the deadline and budget constraints in the edge-cloud environment is still an open and challenging issue. To this end, this paper proposes a Hybrid Chaotic Evolutionary Algorithm (HCEA) incorporating diversification and intensification strategies and a DVFS-enabled version of it (HCEA-DVFS). The Archimedes Optimization Algorithm-based diversification strategy exploits global and local guiding information to improve population diversity during the updating process and employs Metropolis acceptance rule of Simulated Annealing to avoid premature convergence. The Genetic Algorithm-based chaotic intensification strategy is designed to enhance the local search capability of HCEA. Moreover, the Dynamic Voltage Frequency Scaling-enabled adjustment strategies can be embedded into HCEA to further reduce energy consumption by resetting frequency levels and reallocating DNN layers. Experimental results over four DNN-based applications demonstrate that HCEA-DVFS can reduce more energy consumption under different deadlines, budgets, and workloads on average by 7.93, 9.68, 11.02, 11.84, and 19.38 percent in comparison with HCEA, PSO-GA, MCEA, AOA, and Greedy, respectively.
{"title":"Energy-efficient offloading for DNN-based applications in edge-cloud computing: A hybrid chaotic evolutionary approach","authors":"Zengpeng Li, Huiqun Yu, Guisheng Fan, Jiayin Zhang, Jin Xu","doi":"10.1016/j.jpdc.2024.104850","DOIUrl":"10.1016/j.jpdc.2024.104850","url":null,"abstract":"<div><p><span><span><span>The rapid development of Deep Neural Networks (DNNs) lays solid foundations for </span>Internet of Things<span> systems. However, mobile devices with limited processing capacity and short battery life confront the difficulties of executing complex DNNs. To satisfy different Quality of Service requirements, a feasible solution is offloading DNN layers to edge nodes and the cloud. The energy-efficient offloading problem for DNN-based applications with the deadline and budget constraints in the edge-cloud environment is still an open and challenging issue. To this end, this paper proposes a Hybrid Chaotic </span></span>Evolutionary Algorithm<span> (HCEA) incorporating diversification and intensification strategies and a DVFS-enabled version of it (HCEA-DVFS). The Archimedes Optimization Algorithm-based diversification strategy exploits global and local guiding information to improve population diversity during the updating process and employs Metropolis acceptance rule of Simulated Annealing to avoid premature convergence. The Genetic Algorithm-based chaotic intensification strategy is designed to enhance the local search capability of HCEA. Moreover, the </span></span>Dynamic Voltage Frequency Scaling-enabled adjustment strategies can be embedded into HCEA to further reduce energy consumption by resetting frequency levels and reallocating DNN layers. Experimental results over four DNN-based applications demonstrate that HCEA-DVFS can reduce more energy consumption under different deadlines, budgets, and workloads on average by 7.93, 9.68, 11.02, 11.84, and 19.38 percent in comparison with HCEA, PSO-GA, MCEA, AOA, and Greedy, respectively.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139664784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}