Pub Date : 2024-05-15DOI: 10.1016/j.jpdc.2024.104917
Liangkuan Su , Mingwei Lin , Jianpeng Zhang , Yubiao Pan
Given the distinctive characteristics of flash-based solid-state drives (SSDs), such as out-of-place update scheme, as compared to traditional block storage devices, a flash translation layer (FTL) has been introduced to hide these features. In the FTL, there is an address translation module that implements the conversion from logical addresses to physical addresses. However, existing address mapping algorithms currently fail to fully exploit the mapping information generated by large I/O requests. First, based on this observation, we propose a novel continuity compressed page-level flash address mapping method (CCFTL). This method effectively compresses the mapping relationship between consecutive logical addresses and physical addresses, enabling the storage of more mapping information within the same mapping cache size. Next, we introduce two-level LRU linked list to mitigate the issue of compressed mapping entry splitting that arises from handling write requests. Finally, our experiments show that CCFTL reduced average response times by 52.67%, 16.81%, and 12.71% compared to DFTL, TPFTL, and MFTL, respectively. As the mapping cache size decreases from 2 MB to 1 MB, then further decreases to 256 KB, 128 KB, and eventually down to 64 KB, CCFTL experiences an average decline ratio of less than 3% in average response time, while the other three algorithms show an average decline ratio of 9.51%.
{"title":"CCFTL: A novel continuity compressed page-level flash address mapping method for SSDs","authors":"Liangkuan Su , Mingwei Lin , Jianpeng Zhang , Yubiao Pan","doi":"10.1016/j.jpdc.2024.104917","DOIUrl":"10.1016/j.jpdc.2024.104917","url":null,"abstract":"<div><p>Given the distinctive characteristics of flash-based solid-state drives (SSDs), such as out-of-place update scheme, as compared to traditional block storage devices, a flash translation layer (FTL) has been introduced to hide these features. In the FTL, there is an address translation module that implements the conversion from logical addresses to physical addresses. However, existing address mapping algorithms currently fail to fully exploit the mapping information generated by large I/O requests. First, based on this observation, we propose a novel continuity compressed page-level flash address mapping method (CCFTL). This method effectively compresses the mapping relationship between consecutive logical addresses and physical addresses, enabling the storage of more mapping information within the same mapping cache size. Next, we introduce two-level LRU linked list to mitigate the issue of compressed mapping entry splitting that arises from handling write requests. Finally, our experiments show that CCFTL reduced average response times by 52.67%, 16.81%, and 12.71% compared to DFTL, TPFTL, and MFTL, respectively. As the mapping cache size decreases from 2 MB to 1 MB, then further decreases to 256 KB, 128 KB, and eventually down to 64 KB, CCFTL experiences an average decline ratio of less than 3% in average response time, while the other three algorithms show an average decline ratio of 9.51%.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104917"},"PeriodicalIF":3.8,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141046274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-14DOI: 10.1016/j.jpdc.2024.104916
Wei Xie, Runqun Xiong, Jinghui Zhang, Jiahui Jin, Junzhou Luo
Distributedly training models across diverse clients with heterogeneous data samples can significantly impact the convergence of federated learning. Various novel federated learning methods address these challenges but often require significant communication resources and local computational capacity, leading to reduced global inference accuracy in scenarios with imbalanced label data distribution and quantity skew. To tackle these challenges, we propose FedVGL, a Federated Variational Generative Learning method that directly trains a local generative model to learn the distribution of local features and improve global target model inference accuracy during aggregation, particularly under conditions of severe data heterogeneity. FedVGL facilitates distributed learning by sharing generators and latent vectors with the global server, aiding in global target model training from mapping local data distribution to the variational latent space for feature reconstruction. Additionally, FedVGL implements anonymization and encryption techniques to bolster privacy during generative model transmission and aggregation. In comparison to vanilla federated learning, FedVGL minimizes communication overhead, demonstrating superior accuracy even with minimal communication rounds. It effectively mitigates model drift in scenarios with heterogeneous data, delivering improved target model training outcomes. Empirical results establish FedVGL's superiority over baseline federated learning methods under severe label imbalance and data skew condition. In a Label-based Dirichlet Distribution setting with α=0.01 and 10 clients using the MNIST dataset, FedVGL achieved an exceptional accuracy over 97% with the VGG-9 target model.
{"title":"Federated variational generative learning for heterogeneous data in distributed environments","authors":"Wei Xie, Runqun Xiong, Jinghui Zhang, Jiahui Jin, Junzhou Luo","doi":"10.1016/j.jpdc.2024.104916","DOIUrl":"10.1016/j.jpdc.2024.104916","url":null,"abstract":"<div><p>Distributedly training models across diverse clients with heterogeneous data samples can significantly impact the convergence of federated learning. Various novel federated learning methods address these challenges but often require significant communication resources and local computational capacity, leading to reduced global inference accuracy in scenarios with imbalanced label data distribution and quantity skew. To tackle these challenges, we propose FedVGL, a Federated Variational Generative Learning method that directly trains a local generative model to learn the distribution of local features and improve global target model inference accuracy during aggregation, particularly under conditions of severe data heterogeneity. FedVGL facilitates distributed learning by sharing generators and latent vectors with the global server, aiding in global target model training from mapping local data distribution to the variational latent space for feature reconstruction. Additionally, FedVGL implements anonymization and encryption techniques to bolster privacy during generative model transmission and aggregation. In comparison to vanilla federated learning, FedVGL minimizes communication overhead, demonstrating superior accuracy even with minimal communication rounds. It effectively mitigates model drift in scenarios with heterogeneous data, delivering improved target model training outcomes. Empirical results establish FedVGL's superiority over baseline federated learning methods under severe label imbalance and data skew condition. In a Label-based Dirichlet Distribution setting with <em>α</em>=0.01 and 10 clients using the MNIST dataset, FedVGL achieved an exceptional accuracy over 97% with the VGG-9 target model.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104916"},"PeriodicalIF":3.8,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141036748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-13DOI: 10.1016/j.jpdc.2024.104915
Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li
Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.
{"title":"Energy-efficient triple modular redundancy scheduling on heterogeneous multi-core real-time systems","authors":"Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li","doi":"10.1016/j.jpdc.2024.104915","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104915","url":null,"abstract":"<div><p>Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104915"},"PeriodicalIF":3.8,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-07DOI: 10.1016/j.jpdc.2024.104907
Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu
Traditional federated learning () raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional , any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries () holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity () and differential privacy () based namely for addressing all the above issues. The first step in the framework involves establishing a secure connection based on blockchain-based . This secure connection protects against unauthorized access attacks of any and ensures the transmitted data's authenticity, integrity, and availability. The second step applies to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing with a novel hybrid deep learning to achieve better scores than conventional methods. The performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.
{"title":"SSI−FL: Self-sovereign identity based privacy-preserving federated learning","authors":"Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu","doi":"10.1016/j.jpdc.2024.104907","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104907","url":null,"abstract":"<div><p>Traditional federated learning (<span><math><mi>FL</mi></math></span>) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional <span><math><mi>FL</mi></math></span>, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (<span><math><mi>AD</mi></math></span>) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (<span><math><mi>SSI</mi></math></span>) and differential privacy (<span><math><mi>DP</mi></math></span>) based <span><math><mi>FL</mi></math></span> namely <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> for addressing all the above issues. The first step in the <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> framework involves establishing a secure connection based on blockchain-based <span><math><mi>SSI</mi></math></span>. This secure connection protects against unauthorized access attacks of any <span><math><mi>AD</mi></math></span> and ensures the transmitted data's authenticity, integrity, and availability. The second step applies <span><math><mi>DP</mi></math></span> to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing <span><math><mi>FL</mi></math></span> with a novel hybrid deep learning to achieve better scores than conventional methods. The <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104907"},"PeriodicalIF":3.8,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1016/j.jpdc.2024.104899
Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe
We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.
As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.
Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.
{"title":"Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread","authors":"Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe","doi":"10.1016/j.jpdc.2024.104899","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104899","url":null,"abstract":"<div><p>We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.</p><p>As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.</p><p>Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104899"},"PeriodicalIF":3.8,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1016/S0743-7315(24)00075-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00075-3","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00075-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"190 ","pages":"Article 104911"},"PeriodicalIF":3.8,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000753/pdfft?md5=04032493554b9c9a6c79c75f9a9aab5d&pid=1-s2.0-S0743731524000753-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1016/j.jpdc.2024.104906
Shuling Shen , Xinlin Chen , Linhe Zhu
In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.
{"title":"Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method","authors":"Shuling Shen , Xinlin Chen , Linhe Zhu","doi":"10.1016/j.jpdc.2024.104906","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104906","url":null,"abstract":"<div><p>In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104906"},"PeriodicalIF":3.8,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1016/j.jpdc.2024.104905
Chaoming Guo , Meijie Ma , Xiang-Jun Li , Guijuan Wang
<div><p>The <em>ω</em>-Rabin number <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><mi>G</mi><mo>)</mo></math></span> and strong <em>ω</em>-Rabin number <span><math><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><mi>G</mi><mo>)</mo></math></span> are two effective parameters to assess transmission latency and fault tolerance of an interconnection network <em>G</em>. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span> which is a variant of the hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span>. For <span><math><mi>n</mi><mo>≥</mo><mi>k</mi><mo>≥</mo><mn>5</mn></math></span>, we prove that <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> for <span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo><</mo><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo></math></span>; <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>+</mo><mn>1</mn></math></span> for <span><math><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo><mn>1</mn></math></span>, where <span><math><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> is the diameter of <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span>. In addition, we present algorithms to construct internally disjoint paths of length at most <span><math><msup><mrow><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub></mrow><mrow><mo>⁎</mo></mrow></msup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> from a source vertex to other <em>ω</em> (<span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo>
{"title":"The Rabin numbers of enhanced hypercubes","authors":"Chaoming Guo , Meijie Ma , Xiang-Jun Li , Guijuan Wang","doi":"10.1016/j.jpdc.2024.104905","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104905","url":null,"abstract":"<div><p>The <em>ω</em>-Rabin number <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><mi>G</mi><mo>)</mo></math></span> and strong <em>ω</em>-Rabin number <span><math><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><mi>G</mi><mo>)</mo></math></span> are two effective parameters to assess transmission latency and fault tolerance of an interconnection network <em>G</em>. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span> which is a variant of the hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span>. For <span><math><mi>n</mi><mo>≥</mo><mi>k</mi><mo>≥</mo><mn>5</mn></math></span>, we prove that <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> for <span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo><</mo><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo></math></span>; <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>+</mo><mn>1</mn></math></span> for <span><math><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo><mn>1</mn></math></span>, where <span><math><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> is the diameter of <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span>. In addition, we present algorithms to construct internally disjoint paths of length at most <span><math><msup><mrow><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub></mrow><mrow><mo>⁎</mo></mrow></msup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> from a source vertex to other <em>ω</em> (<span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104905"},"PeriodicalIF":3.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140645879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1016/j.jpdc.2024.104903
Samuel Ferraz , Vinicius Dias , Carlos H.C. Teixeira , Srinivasan Parthasarathy , George Teodoro , Wagner Meira Jr.
Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.
{"title":"DuMato: An efficient warp-centric subgraph enumeration system for GPU","authors":"Samuel Ferraz , Vinicius Dias , Carlos H.C. Teixeira , Srinivasan Parthasarathy , George Teodoro , Wagner Meira Jr.","doi":"10.1016/j.jpdc.2024.104903","DOIUrl":"10.1016/j.jpdc.2024.104903","url":null,"abstract":"<div><p>Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104903"},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140758522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1016/j.jpdc.2024.104902
Samuel Cajahuaringa , Leandro N. Zanotto , Sandro Rigo , Hervé Yviquel , Munir S. Skaf , Guido Araujo
Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at https://github.com/cces-cepid/massccs.
{"title":"Ion-molecule collision cross-section calculations using trajectory parallelization in distributed systems","authors":"Samuel Cajahuaringa , Leandro N. Zanotto , Sandro Rigo , Hervé Yviquel , Munir S. Skaf , Guido Araujo","doi":"10.1016/j.jpdc.2024.104902","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104902","url":null,"abstract":"<div><p>Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at <span>https://github.com/cces-cepid/massccs</span><svg><path></path></svg>.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104902"},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}