Pub Date : 2024-05-23DOI: 10.1016/j.jpdc.2024.104900
Sai Munikoti , Balasubramaniam Natarajan , Mahantesh Halappanavar
Influence maximization (IM) is a combinatorial problem of identifying a subset of seed nodes in a network (graph), which when activated, provide a maximal spread of influence in the network for a given diffusion model and a budget for seed set size. IM has numerous applications such as viral marketing, epidemic control, sensor placement and other network-related tasks. However, its practical uses are limited due to the computational complexity of current algorithms. Recently, deep reinforcement learning has been leveraged to solve IM in order to ease the computational burden. However, there are serious limitations in current approaches, including narrow IM formulation that only consider influence via spread and ignore self activation, low scalability to large graphs, and lack of generalizability across graph families leading to a large running time for every test network. In this work, we address these limitations through a unique approach that involves: (1) Formulating a generic IM problem as a Markov decision process that handles both intrinsic and influence activations; (2) incorporating generalizability via meta-learning across graph families. There are previous works that combine deep reinforcement learning with graph neural network but this work solves a more realistic IM problem and incorporates generalizability across graphs via meta reinforcement learning. Extensive experiments are carried out in various standard networks to validate performance of the proposed Graph Meta Reinforcement learning (GraMeR) framework. The results indicate that GraMeR is multiple orders faster and generic than conventional approaches when applied on small to medium scale graphs.
影响最大化(IM)是一个组合问题,即在网络(图)中确定一个种子节点子集,当激活该子集时,在给定的扩散模型和种子集大小预算下,该子集可在网络中提供最大的影响传播。IM 有许多应用,如病毒营销、流行病控制、传感器安置和其他网络相关任务。然而,由于当前算法的计算复杂性,其实际应用受到了限制。最近,人们利用深度强化学习来解决 IM 问题,以减轻计算负担。然而,目前的方法存在严重的局限性,包括只考虑通过传播产生影响而忽略自激活的狭隘 IM 表述、对大型图的可扩展性低、缺乏跨图族的泛化能力,导致每个测试网络的运行时间都很长。在这项研究中,我们采用了一种独特的方法来解决这些局限性,其中包括:(1)将一般的 IM 问题表述为一个马尔可夫决策过程,该过程可同时处理内在激活和影响激活;(2)通过元学习在图族间实现通用性。之前有研究将深度强化学习与图神经网络相结合,但本研究解决的是一个更现实的 IM 问题,并通过元强化学习实现了跨图的通用性。我们在各种标准网络中进行了广泛的实验,以验证所提出的图元强化学习(GraMeR)框架的性能。结果表明,与传统方法相比,GraMeR 在中小型图上的应用速度和通用性要快上数倍。
{"title":"GraMeR: Graph Meta Reinforcement learning for multi-objective influence maximization","authors":"Sai Munikoti , Balasubramaniam Natarajan , Mahantesh Halappanavar","doi":"10.1016/j.jpdc.2024.104900","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104900","url":null,"abstract":"<div><p>Influence maximization (IM) is a combinatorial problem of identifying a subset of seed nodes in a network (graph), which when activated, provide a maximal spread of influence in the network for a given diffusion model and a budget for seed set size. IM has numerous applications such as viral marketing, epidemic control, sensor placement and other network-related tasks. However, its practical uses are limited due to the computational complexity of current algorithms. Recently, deep reinforcement learning has been leveraged to solve IM in order to ease the computational burden. However, there are serious limitations in current approaches, including narrow IM formulation that only consider influence via spread and ignore self activation, low scalability to large graphs, and lack of generalizability across graph families leading to a large running time for every test network. In this work, we address these limitations through a unique approach that involves: (1) Formulating a generic IM problem as a Markov decision process that handles both intrinsic and influence activations; (2) incorporating generalizability via meta-learning across graph families. There are previous works that combine deep reinforcement learning with graph neural network but this work solves a more realistic IM problem and incorporates generalizability across graphs via meta reinforcement learning. Extensive experiments are carried out in various standard networks to validate performance of the proposed Graph Meta Reinforcement learning (GraMeR) framework. The results indicate that GraMeR is multiple orders faster and generic than conventional approaches when applied on small to medium scale graphs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104900"},"PeriodicalIF":3.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141423534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-22DOI: 10.1016/j.jpdc.2024.104921
Lorenzo Canonne , Bilel Derbel , Miwako Tsuji , Mitsuhisa Sato
We design, develop and analyze parallel variants of a state-of-the-art graybox optimization algorithm, namely Drils (Deterministic Recombination and Iterated Local Search), for attacking large-scale pseudo-boolean optimization problems on top of the large-scale computing facilities offered by the supercomputer Fugaku. We first adopt a Master/Worker design coupled with a fully distributed Island-based model, ending up with a number of hybrid OpenMP/MPI implementations of high-level parallel Drils versions. We show that such a design, although effective, can be substantially improved by enabling a more focused iteration-level cooperation mechanism between the core graybox components of the original serial Drils algorithm. Extensive experiments are conducted in order to provide a systematic analysis of the impact of the designed parallel algorithms on search behavior, and their ability to compute high-quality solutions using increasing number of CPU-cores. Results using up to 1024×12-cores NUMA nodes, and NK-landscapes with up to binary variables are reported, providing evidence on the relative strength of the designed hybrid cooperative graybox parallel search.
我们设计、开发并分析了一种最先进的灰盒优化算法的并行变体,即 Drils(确定性重组和迭代局部搜索),用于在超级计算机富加库提供的大规模计算设施之上解决大规模伪布尔优化问题。我们首先采用了 Master/Worker 设计和基于岛的完全分布式模型,最终得到了一些高级并行 Drils 版本的 OpenMP/MPI 混合实现。我们的研究表明,这种设计虽然有效,但可以通过在原始串行 Drils 算法的核心灰盒组件之间建立更集中的迭代级合作机制来大幅改进。我们进行了广泛的实验,以便系统分析所设计的并行算法对搜索行为的影响,以及使用越来越多的 CPU 核心计算高质量解决方案的能力。报告了使用多达 1024×12 核 NUMA 节点和多达 10,000 个二进制变量的 NK-landscapes 的结果,为所设计的混合合作灰箱并行搜索的相对优势提供了证据。
{"title":"Large-scale and cooperative graybox parallel optimization on the supercomputer Fugaku","authors":"Lorenzo Canonne , Bilel Derbel , Miwako Tsuji , Mitsuhisa Sato","doi":"10.1016/j.jpdc.2024.104921","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104921","url":null,"abstract":"<div><p>We design, develop and analyze parallel variants of a state-of-the-art graybox optimization algorithm, namely <span>Drils</span> (Deterministic Recombination and Iterated Local Search), for attacking large-scale pseudo-boolean optimization problems on top of the large-scale computing facilities offered by the supercomputer Fugaku. We first adopt a Master/Worker design coupled with a fully distributed Island-based model, ending up with a number of hybrid OpenMP/MPI implementations of high-level parallel <span>Drils</span> versions. We show that such a design, although effective, can be substantially improved by enabling a more focused iteration-level cooperation mechanism between the core graybox components of the original serial <span>Drils</span> algorithm. Extensive experiments are conducted in order to provide a systematic analysis of the impact of the designed parallel algorithms on search behavior, and their ability to compute high-quality solutions using increasing number of CPU-cores. Results using up to 1024×12-cores NUMA nodes, and NK-landscapes with up to <span><math><mn>10</mn><mo>,</mo><mn>000</mn></math></span> binary variables are reported, providing evidence on the relative strength of the designed hybrid cooperative graybox parallel search.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104921"},"PeriodicalIF":3.8,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1016/j.jpdc.2024.104919
Arthur M. Krause, Paulo C. Santos, Arthur F. Lorenzon, Philippe O.A. Navaux
Cache memories play a significant role in the performance, area, and energy consumption of modern processors, and this impact is expected to grow as on-die memories become larger. While caches are highly effective for cache-friendly access patterns, they introduce unnecessary delays and energy wastage when they fail to serve the required data. Hence, cache bypassing techniques have been proposed to optimize the latency of cache-unfriendly memory accesses. In this scenario, we discuss HBPB, a history-based preemptive bypassing technique that accelerates cache-unfriendly access through the reduced latency of bypassing the caches. By extensively evaluating different real-world applications and hardware cache configurations, we show that HBPB yields energy reductions of up to 75% and performance improvements of up to 50% compared to a version that does not apply cache bypassing. More importantly, we demonstrate that HBPB does not affect the performance of applications with cache-friendly access patterns.
{"title":"HBPB, applying reuse distance to improve cache efficiency proactively","authors":"Arthur M. Krause, Paulo C. Santos, Arthur F. Lorenzon, Philippe O.A. Navaux","doi":"10.1016/j.jpdc.2024.104919","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104919","url":null,"abstract":"<div><p>Cache memories play a significant role in the performance, area, and energy consumption of modern processors, and this impact is expected to grow as on-die memories become larger. While caches are highly effective for cache-friendly access patterns, they introduce unnecessary delays and energy wastage when they fail to serve the required data. Hence, cache bypassing techniques have been proposed to optimize the latency of cache-unfriendly memory accesses. In this scenario, we discuss <em>HBPB</em>, a history-based preemptive bypassing technique that accelerates cache-unfriendly access through the reduced latency of bypassing the caches. By extensively evaluating different real-world applications and hardware cache configurations, we show that <em>HBPB</em> yields energy reductions of up to 75% and performance improvements of up to 50% compared to a version that does not apply cache bypassing. More importantly, we demonstrate that <em>HBPB</em> does not affect the performance of applications with cache-friendly access patterns.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104919"},"PeriodicalIF":3.8,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141078565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-17DOI: 10.1016/j.jpdc.2024.104918
Samaneh Mohammadi , Ali Balador , Sima Sinaei , Francesco Flammini
Federated learning (FL) as a novel paradigm in Artificial Intelligence (AI), ensures enhanced privacy by eliminating data centralization and brings learning directly to the edge of the user's device. Nevertheless, new privacy issues have been raised particularly during training and the exchange of parameters between servers and clients. While several privacy-preserving FL solutions have been developed to mitigate potential breaches in FL architectures, their integration poses its own set of challenges. Incorporating these privacy-preserving mechanisms into FL at the edge computing level can increase both communication and computational overheads, which may, in turn, compromise data utility and learning performance metrics. This paper provides a systematic literature review on essential methods and metrics to support the most appropriate trade-offs between FL privacy and other performance-related application requirements such as accuracy, loss, convergence time, utility, communication, and computation overhead. We aim to provide an extensive overview of recent privacy-preserving mechanisms in FL used across various applications, placing a particular focus on quantitative privacy assessment approaches in FL and the necessity of achieving a balance between privacy and the other requirements of real-world FL applications. This review collects, classifies, and discusses relevant papers in a structured manner, emphasizing challenges, open issues, and promising research directions.
{"title":"Balancing privacy and performance in federated learning: A systematic literature review on methods and metrics","authors":"Samaneh Mohammadi , Ali Balador , Sima Sinaei , Francesco Flammini","doi":"10.1016/j.jpdc.2024.104918","DOIUrl":"10.1016/j.jpdc.2024.104918","url":null,"abstract":"<div><p>Federated learning (FL) as a novel paradigm in Artificial Intelligence (AI), ensures enhanced privacy by eliminating data centralization and brings learning directly to the edge of the user's device. Nevertheless, new privacy issues have been raised particularly during training and the exchange of parameters between servers and clients. While several privacy-preserving FL solutions have been developed to mitigate potential breaches in FL architectures, their integration poses its own set of challenges. Incorporating these privacy-preserving mechanisms into FL at the edge computing level can increase both communication and computational overheads, which may, in turn, compromise data utility and learning performance metrics. This paper provides a systematic literature review on essential methods and metrics to support the most appropriate trade-offs between FL privacy and other performance-related application requirements such as accuracy, loss, convergence time, utility, communication, and computation overhead. We aim to provide an extensive overview of recent privacy-preserving mechanisms in FL used across various applications, placing a particular focus on quantitative privacy assessment approaches in FL and the necessity of achieving a balance between privacy and the other requirements of real-world FL applications. This review collects, classifies, and discusses relevant papers in a structured manner, emphasizing challenges, open issues, and promising research directions.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104918"},"PeriodicalIF":3.8,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000820/pdfft?md5=2ee3078ecc6441a5efe38d3a7c047d80&pid=1-s2.0-S0743731524000820-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141033667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-17DOI: 10.1016/j.jpdc.2024.104920
Mehboob Hussain , Lian-Fu Wei , Amir Rehman , Muqadar Ali , Syed Muhammad Waqas , Fakhar Abbas
Cloud computing delivers a desirable environment for users to run their different kinds of applications in a cloud. Numerous of these applications (tasks), such as bioinformatics, astronomy, biodiversity, and image analysis, are deadline-sensitive. Such tasks must be properly allocated to virtual machines (VMs) to avoid deadline violations, and they should reduce their execution time and cost. Due to the contradictory environment, minimizing the application task's completion time and execution cost is extremely difficult. Thus, we propose a Cost-aware Quantum-inspired Genetic Algorithm (CQGA) to minimize the execution time and cost by meeting the deadline constraints. CQGA is motivated by quantum computing and genetic algorithm. It combines quantum operators (measure, interference, and rotation) with genetic operators (selection, crossover, and mutation). Quantum operators are used for better population diversity, quick convergence, time-saving, and robustness. Genetic operators help to produce new individuals, have good fitness values for individuals, and play a significant role in preserving the evolution quality of the population. In addition, CQGA used a quantum bit as a probabilistic representation because it has higher population diversity attributes than other representations. The simulation outcome exhibits that the proposed algorithm can obtain outstanding convergence performance and reduced maximum cost than benchmark algorithms.
{"title":"Cost-aware quantum-inspired genetic algorithm for workflow scheduling in hybrid clouds","authors":"Mehboob Hussain , Lian-Fu Wei , Amir Rehman , Muqadar Ali , Syed Muhammad Waqas , Fakhar Abbas","doi":"10.1016/j.jpdc.2024.104920","DOIUrl":"10.1016/j.jpdc.2024.104920","url":null,"abstract":"<div><p>Cloud computing delivers a desirable environment for users to run their different kinds of applications in a cloud. Numerous of these applications (tasks), such as bioinformatics, astronomy, biodiversity, and image analysis, are deadline-sensitive. Such tasks must be properly allocated to virtual machines (VMs) to avoid deadline violations, and they should reduce their execution time and cost. Due to the contradictory environment, minimizing the application task's completion time and execution cost is extremely difficult. Thus, we propose a Cost-aware Quantum-inspired Genetic Algorithm (CQGA) to minimize the execution time and cost by meeting the deadline constraints. CQGA is motivated by quantum computing and genetic algorithm. It combines quantum operators (measure, interference, and rotation) with genetic operators (selection, crossover, and mutation). Quantum operators are used for better population diversity, quick convergence, time-saving, and robustness. Genetic operators help to produce new individuals, have good fitness values for individuals, and play a significant role in preserving the evolution quality of the population. In addition, CQGA used a quantum bit as a probabilistic representation because it has higher population diversity attributes than other representations. The simulation outcome exhibits that the proposed algorithm can obtain outstanding convergence performance and reduced maximum cost than benchmark algorithms.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104920"},"PeriodicalIF":3.8,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141047734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-15DOI: 10.1016/j.jpdc.2024.104917
Liangkuan Su , Mingwei Lin , Jianpeng Zhang , Yubiao Pan
Given the distinctive characteristics of flash-based solid-state drives (SSDs), such as out-of-place update scheme, as compared to traditional block storage devices, a flash translation layer (FTL) has been introduced to hide these features. In the FTL, there is an address translation module that implements the conversion from logical addresses to physical addresses. However, existing address mapping algorithms currently fail to fully exploit the mapping information generated by large I/O requests. First, based on this observation, we propose a novel continuity compressed page-level flash address mapping method (CCFTL). This method effectively compresses the mapping relationship between consecutive logical addresses and physical addresses, enabling the storage of more mapping information within the same mapping cache size. Next, we introduce two-level LRU linked list to mitigate the issue of compressed mapping entry splitting that arises from handling write requests. Finally, our experiments show that CCFTL reduced average response times by 52.67%, 16.81%, and 12.71% compared to DFTL, TPFTL, and MFTL, respectively. As the mapping cache size decreases from 2 MB to 1 MB, then further decreases to 256 KB, 128 KB, and eventually down to 64 KB, CCFTL experiences an average decline ratio of less than 3% in average response time, while the other three algorithms show an average decline ratio of 9.51%.
{"title":"CCFTL: A novel continuity compressed page-level flash address mapping method for SSDs","authors":"Liangkuan Su , Mingwei Lin , Jianpeng Zhang , Yubiao Pan","doi":"10.1016/j.jpdc.2024.104917","DOIUrl":"10.1016/j.jpdc.2024.104917","url":null,"abstract":"<div><p>Given the distinctive characteristics of flash-based solid-state drives (SSDs), such as out-of-place update scheme, as compared to traditional block storage devices, a flash translation layer (FTL) has been introduced to hide these features. In the FTL, there is an address translation module that implements the conversion from logical addresses to physical addresses. However, existing address mapping algorithms currently fail to fully exploit the mapping information generated by large I/O requests. First, based on this observation, we propose a novel continuity compressed page-level flash address mapping method (CCFTL). This method effectively compresses the mapping relationship between consecutive logical addresses and physical addresses, enabling the storage of more mapping information within the same mapping cache size. Next, we introduce two-level LRU linked list to mitigate the issue of compressed mapping entry splitting that arises from handling write requests. Finally, our experiments show that CCFTL reduced average response times by 52.67%, 16.81%, and 12.71% compared to DFTL, TPFTL, and MFTL, respectively. As the mapping cache size decreases from 2 MB to 1 MB, then further decreases to 256 KB, 128 KB, and eventually down to 64 KB, CCFTL experiences an average decline ratio of less than 3% in average response time, while the other three algorithms show an average decline ratio of 9.51%.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104917"},"PeriodicalIF":3.8,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141046274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-14DOI: 10.1016/j.jpdc.2024.104916
Wei Xie, Runqun Xiong, Jinghui Zhang, Jiahui Jin, Junzhou Luo
Distributedly training models across diverse clients with heterogeneous data samples can significantly impact the convergence of federated learning. Various novel federated learning methods address these challenges but often require significant communication resources and local computational capacity, leading to reduced global inference accuracy in scenarios with imbalanced label data distribution and quantity skew. To tackle these challenges, we propose FedVGL, a Federated Variational Generative Learning method that directly trains a local generative model to learn the distribution of local features and improve global target model inference accuracy during aggregation, particularly under conditions of severe data heterogeneity. FedVGL facilitates distributed learning by sharing generators and latent vectors with the global server, aiding in global target model training from mapping local data distribution to the variational latent space for feature reconstruction. Additionally, FedVGL implements anonymization and encryption techniques to bolster privacy during generative model transmission and aggregation. In comparison to vanilla federated learning, FedVGL minimizes communication overhead, demonstrating superior accuracy even with minimal communication rounds. It effectively mitigates model drift in scenarios with heterogeneous data, delivering improved target model training outcomes. Empirical results establish FedVGL's superiority over baseline federated learning methods under severe label imbalance and data skew condition. In a Label-based Dirichlet Distribution setting with α=0.01 and 10 clients using the MNIST dataset, FedVGL achieved an exceptional accuracy over 97% with the VGG-9 target model.
{"title":"Federated variational generative learning for heterogeneous data in distributed environments","authors":"Wei Xie, Runqun Xiong, Jinghui Zhang, Jiahui Jin, Junzhou Luo","doi":"10.1016/j.jpdc.2024.104916","DOIUrl":"10.1016/j.jpdc.2024.104916","url":null,"abstract":"<div><p>Distributedly training models across diverse clients with heterogeneous data samples can significantly impact the convergence of federated learning. Various novel federated learning methods address these challenges but often require significant communication resources and local computational capacity, leading to reduced global inference accuracy in scenarios with imbalanced label data distribution and quantity skew. To tackle these challenges, we propose FedVGL, a Federated Variational Generative Learning method that directly trains a local generative model to learn the distribution of local features and improve global target model inference accuracy during aggregation, particularly under conditions of severe data heterogeneity. FedVGL facilitates distributed learning by sharing generators and latent vectors with the global server, aiding in global target model training from mapping local data distribution to the variational latent space for feature reconstruction. Additionally, FedVGL implements anonymization and encryption techniques to bolster privacy during generative model transmission and aggregation. In comparison to vanilla federated learning, FedVGL minimizes communication overhead, demonstrating superior accuracy even with minimal communication rounds. It effectively mitigates model drift in scenarios with heterogeneous data, delivering improved target model training outcomes. Empirical results establish FedVGL's superiority over baseline federated learning methods under severe label imbalance and data skew condition. In a Label-based Dirichlet Distribution setting with <em>α</em>=0.01 and 10 clients using the MNIST dataset, FedVGL achieved an exceptional accuracy over 97% with the VGG-9 target model.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104916"},"PeriodicalIF":3.8,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141036748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-13DOI: 10.1016/j.jpdc.2024.104915
Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li
Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.
{"title":"Energy-efficient triple modular redundancy scheduling on heterogeneous multi-core real-time systems","authors":"Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li","doi":"10.1016/j.jpdc.2024.104915","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104915","url":null,"abstract":"<div><p>Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104915"},"PeriodicalIF":3.8,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-07DOI: 10.1016/j.jpdc.2024.104907
Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu
Traditional federated learning () raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional , any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries () holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity () and differential privacy () based namely for addressing all the above issues. The first step in the framework involves establishing a secure connection based on blockchain-based . This secure connection protects against unauthorized access attacks of any and ensures the transmitted data's authenticity, integrity, and availability. The second step applies to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing with a novel hybrid deep learning to achieve better scores than conventional methods. The performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.
{"title":"SSI−FL: Self-sovereign identity based privacy-preserving federated learning","authors":"Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu","doi":"10.1016/j.jpdc.2024.104907","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104907","url":null,"abstract":"<div><p>Traditional federated learning (<span><math><mi>FL</mi></math></span>) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional <span><math><mi>FL</mi></math></span>, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (<span><math><mi>AD</mi></math></span>) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (<span><math><mi>SSI</mi></math></span>) and differential privacy (<span><math><mi>DP</mi></math></span>) based <span><math><mi>FL</mi></math></span> namely <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> for addressing all the above issues. The first step in the <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> framework involves establishing a secure connection based on blockchain-based <span><math><mi>SSI</mi></math></span>. This secure connection protects against unauthorized access attacks of any <span><math><mi>AD</mi></math></span> and ensures the transmitted data's authenticity, integrity, and availability. The second step applies <span><math><mi>DP</mi></math></span> to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing <span><math><mi>FL</mi></math></span> with a novel hybrid deep learning to achieve better scores than conventional methods. The <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104907"},"PeriodicalIF":3.8,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1016/j.jpdc.2024.104899
Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe
We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.
As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.
Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.
{"title":"Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread","authors":"Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe","doi":"10.1016/j.jpdc.2024.104899","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104899","url":null,"abstract":"<div><p>We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.</p><p>As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.</p><p>Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104899"},"PeriodicalIF":3.8,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}