Pub Date : 2024-05-07DOI: 10.1016/j.jpdc.2024.104907
Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu
Traditional federated learning () raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional , any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries () holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity () and differential privacy () based namely for addressing all the above issues. The first step in the framework involves establishing a secure connection based on blockchain-based . This secure connection protects against unauthorized access attacks of any and ensures the transmitted data's authenticity, integrity, and availability. The second step applies to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing with a novel hybrid deep learning to achieve better scores than conventional methods. The performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.
{"title":"SSI−FL: Self-sovereign identity based privacy-preserving federated learning","authors":"Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu","doi":"10.1016/j.jpdc.2024.104907","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104907","url":null,"abstract":"<div><p>Traditional federated learning (<span><math><mi>FL</mi></math></span>) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional <span><math><mi>FL</mi></math></span>, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (<span><math><mi>AD</mi></math></span>) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (<span><math><mi>SSI</mi></math></span>) and differential privacy (<span><math><mi>DP</mi></math></span>) based <span><math><mi>FL</mi></math></span> namely <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> for addressing all the above issues. The first step in the <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> framework involves establishing a secure connection based on blockchain-based <span><math><mi>SSI</mi></math></span>. This secure connection protects against unauthorized access attacks of any <span><math><mi>AD</mi></math></span> and ensures the transmitted data's authenticity, integrity, and availability. The second step applies <span><math><mi>DP</mi></math></span> to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing <span><math><mi>FL</mi></math></span> with a novel hybrid deep learning to achieve better scores than conventional methods. The <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1016/j.jpdc.2024.104899
Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe
We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.
As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.
Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.
{"title":"Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread","authors":"Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe","doi":"10.1016/j.jpdc.2024.104899","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104899","url":null,"abstract":"<div><p>We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.</p><p>As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.</p><p>Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1016/S0743-7315(24)00075-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00075-3","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00075-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000753/pdfft?md5=04032493554b9c9a6c79c75f9a9aab5d&pid=1-s2.0-S0743731524000753-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1016/j.jpdc.2024.104906
Shuling Shen , Xinlin Chen , Linhe Zhu
In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.
{"title":"Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method","authors":"Shuling Shen , Xinlin Chen , Linhe Zhu","doi":"10.1016/j.jpdc.2024.104906","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104906","url":null,"abstract":"<div><p>In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1016/j.jpdc.2024.104905
Chaoming Guo , Meijie Ma , Xiang-Jun Li , Guijuan Wang
The ω-Rabin number and strong ω-Rabin number are two effective parameters to assess transmission latency and fault tolerance of an interconnection network G. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube which is a variant of the hypercube . For , we prove that for ; for , where is the diameter of . In addition, we present algorithms to construct internally disjoint paths of length at most from a source vertex to other ω (
{"title":"The Rabin numbers of enhanced hypercubes","authors":"Chaoming Guo , Meijie Ma , Xiang-Jun Li , Guijuan Wang","doi":"10.1016/j.jpdc.2024.104905","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104905","url":null,"abstract":"<div><p>The <em>ω</em>-Rabin number <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><mi>G</mi><mo>)</mo></math></span> and strong <em>ω</em>-Rabin number <span><math><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><mi>G</mi><mo>)</mo></math></span> are two effective parameters to assess transmission latency and fault tolerance of an interconnection network <em>G</em>. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span> which is a variant of the hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span>. For <span><math><mi>n</mi><mo>≥</mo><mi>k</mi><mo>≥</mo><mn>5</mn></math></span>, we prove that <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> for <span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo><</mo><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo></math></span>; <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>+</mo><mn>1</mn></math></span> for <span><math><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo><mn>1</mn></math></span>, where <span><math><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> is the diameter of <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span>. In addition, we present algorithms to construct internally disjoint paths of length at most <span><math><msup><mrow><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub></mrow><mrow><mo>⁎</mo></mrow></msup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> from a source vertex to other <em>ω</em> (<span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140645879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1016/j.jpdc.2024.104903
Samuel Ferraz , Vinicius Dias , Carlos H.C. Teixeira , Srinivasan Parthasarathy , George Teodoro , Wagner Meira Jr.
Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.
{"title":"DuMato: An efficient warp-centric subgraph enumeration system for GPU","authors":"Samuel Ferraz , Vinicius Dias , Carlos H.C. Teixeira , Srinivasan Parthasarathy , George Teodoro , Wagner Meira Jr.","doi":"10.1016/j.jpdc.2024.104903","DOIUrl":"10.1016/j.jpdc.2024.104903","url":null,"abstract":"<div><p>Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140758522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1016/j.jpdc.2024.104902
Samuel Cajahuaringa , Leandro N. Zanotto , Sandro Rigo , Hervé Yviquel , Munir S. Skaf , Guido Araujo
Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at https://github.com/cces-cepid/massccs.
{"title":"Ion-molecule collision cross-section calculations using trajectory parallelization in distributed systems","authors":"Samuel Cajahuaringa , Leandro N. Zanotto , Sandro Rigo , Hervé Yviquel , Munir S. Skaf , Guido Araujo","doi":"10.1016/j.jpdc.2024.104902","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104902","url":null,"abstract":"<div><p>Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at <span>https://github.com/cces-cepid/massccs</span><svg><path></path></svg>.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-21DOI: 10.1016/j.jpdc.2024.104904
Junyan Qian , Chuanfang Zhang , Zheng Wu , Hao Ding , Long Li
In multi-core processor systems, the Network-on-Chip (NoC) serves as a vital communication infrastructure. To ensure chip reliability during potential failures, this paper proposes a two-level topology reconfiguration algorithm with core-level redundancy technology. Initially, a heuristic topology reconfiguration method utilizing a greedy strategy is proposed to perform local replacement of faulty processing elements (PEs) and generate an initial logical topology with shorter interconnection paths between PEs. Then, an intelligent optimization method based on memetic algorithm is introduced to optimize the generated initial topology for better communication performance. The experimental results demonstrate that compared to the current state-of-the-art algorithm, the proposed algorithm achieves an average improvement of 13.92% and 30.83% on various size topologies in terms of distance factor (DF) and congestion factor (CF), which represent communication delay and traffic balance respectively. The proposed algorithm significantly enhances the communication performance of the target topology, mitigating communication latency and potential congestion problems.
在多核处理器系统中,片上网络(NoC)是重要的通信基础设施。为确保芯片在潜在故障期间的可靠性,本文提出了一种采用内核级冗余技术的两级拓扑重新配置算法。首先,本文提出了一种利用贪婪策略的启发式拓扑重新配置方法,用于执行故障处理元件(PE)的局部替换,并生成具有较短 PE 之间互连路径的初始逻辑拓扑。然后,引入基于记忆算法的智能优化方法,优化生成的初始拓扑结构,以获得更好的通信性能。实验结果表明,与目前最先进的算法相比,所提出的算法在各种规模的拓扑结构上,在距离因子(DF)和拥塞因子(CF)(分别代表通信延迟和流量平衡)方面平均提高了 13.92% 和 30.83%。所提出的算法大大提高了目标拓扑的通信性能,缓解了通信延迟和潜在的拥塞问题。
{"title":"Efficient topology reconfiguration for NoC-based multiprocessors: A greedy-memetic algorithm","authors":"Junyan Qian , Chuanfang Zhang , Zheng Wu , Hao Ding , Long Li","doi":"10.1016/j.jpdc.2024.104904","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104904","url":null,"abstract":"<div><p>In multi-core processor systems, the Network-on-Chip (NoC) serves as a vital communication infrastructure. To ensure chip reliability during potential failures, this paper proposes a two-level topology reconfiguration algorithm with core-level redundancy technology. Initially, a heuristic topology reconfiguration method utilizing a greedy strategy is proposed to perform local replacement of faulty processing elements (PEs) and generate an initial logical topology with shorter interconnection paths between PEs. Then, an intelligent optimization method based on memetic algorithm is introduced to optimize the generated initial topology for better communication performance. The experimental results demonstrate that compared to the current state-of-the-art algorithm, the proposed algorithm achieves an average improvement of 13.92% and 30.83% on various size topologies in terms of distance factor (DF) and congestion factor (CF), which represent communication delay and traffic balance respectively. The proposed algorithm significantly enhances the communication performance of the target topology, mitigating communication latency and potential congestion problems.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1016/j.jpdc.2024.104901
Bieito Beceiro , Jorge González-Domínguez , Laura Morán-Fernández , Verónica Bolón-Canedo , Juan Touriño
Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of the general methods, but also an adaptation of them to work with low-precision fixed point in order to further increase their performance on GPUs. The experimental evaluation was carried out on two modern Nvidia GPUs (Turing T4 and Ampere A100) with highly satisfactory results, achieving speedups of up to 283x when compared to state-of-the-art C implementations.
特征选择算法是当今机器学习所必需的,因为它们能够去除无关信息和冗余信息,从而降低数据维度,提高后续分析的质量。目前的特征选择方法存在的问题是,在处理大型数据集时计算成本高昂。这项工作介绍了基于互信息(MI)度量的三种常用特征选择方法在 Nvidia GPU 上的并行实现:mRMR、JMI 和 DISR。公开的代码不仅包括一般方法的 CUDA 实现,还包括将这些方法调整为使用低精度定点,以进一步提高它们在 GPU 上的性能。实验评估是在两个现代 Nvidia GPU(图灵 T4 和安培 A100)上进行的,结果非常令人满意,与最先进的 C 语言实现相比,速度提高了 283 倍。
{"title":"CUDA acceleration of MI-based feature selection methods","authors":"Bieito Beceiro , Jorge González-Domínguez , Laura Morán-Fernández , Verónica Bolón-Canedo , Juan Touriño","doi":"10.1016/j.jpdc.2024.104901","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104901","url":null,"abstract":"<div><p>Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of the general methods, but also an adaptation of them to work with low-precision fixed point in order to further increase their performance on GPUs. The experimental evaluation was carried out on two modern Nvidia GPUs (Turing T4 and Ampere A100) with highly satisfactory results, achieving speedups of up to 283x when compared to state-of-the-art C implementations.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000650/pdfft?md5=702120f16f21ee1ed938e87b7c2e0385&pid=1-s2.0-S0743731524000650-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1016/j.jpdc.2024.104898
Hala Ajmi , Fakhreddine Zayer , Amira Hadj Fredj, Hamdi Belgacem, Baker Mohammad, Naoufel Werghi, Jorge Dias
This paper introduces an innovative solution for improving the efficiency and speed of the Advanced Encryption Standard (AES) based cryptographic algorithm. The approach leverages in-memory computing (IMC) and is versatile for application across a broad spectrum of IoT applications, including robotic autonomous vehicles and various other scenarios. To achieve this goal, memristor (MR) designs are proposed to emulate the arithmetic operations required for different phases of the AES algorithm, enabling efficient in-memory processing. The key contributions of this work include; 1) The development of a 4 bit-MR state element for implementing different arithmetic operations in an AES hardware prototype; 2) The creation of a pipeline AES design for massive parallelism and MR integration compatibility; and 3) The hardware implementation of the AES-IMC based architecture using the MR emulator. The results show that AES-IMC performs better than existing architectures in terms of higher throughput and energy efficiency. Compared to conventional AES hardware, AES-IMC achieves a 30% power enhancement with comparable throughput. Additionally, when compared to state-of-the-art AES-based NVM engines, AES-IMC demonstrates comparable power dissipation and a 62% increase in throughput. The IMC architecture enables cost-effective real-time deployment of AES, leading to high-performance computing. By leveraging the power of in-memory computing, this system is able to provide improved computational efficiency and faster processing speeds, making it a promising solution for a wide range of applications in the field of autonomous driving and robotics. The potential benefits of this system include improved safety and security of unmanned devices, as well as enhanced performance and cost-effectiveness in a variety of computing environments.
{"title":"Efficient and lightweight in-memory computing architecture for hardware security","authors":"Hala Ajmi , Fakhreddine Zayer , Amira Hadj Fredj, Hamdi Belgacem, Baker Mohammad, Naoufel Werghi, Jorge Dias","doi":"10.1016/j.jpdc.2024.104898","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104898","url":null,"abstract":"<div><p>This paper introduces an innovative solution for improving the efficiency and speed of the Advanced Encryption Standard (AES) based cryptographic algorithm. The approach leverages in-memory computing (IMC) and is versatile for application across a broad spectrum of IoT applications, including robotic autonomous vehicles and various other scenarios. To achieve this goal, memristor (MR) designs are proposed to emulate the arithmetic operations required for different phases of the AES algorithm, enabling efficient in-memory processing. The key contributions of this work include; 1) The development of a 4 bit-MR state element for implementing different arithmetic operations in an AES hardware prototype; 2) The creation of a pipeline AES design for massive parallelism and MR integration compatibility; and 3) The hardware implementation of the AES-IMC based architecture using the MR emulator. The results show that AES-IMC performs better than existing architectures in terms of higher throughput and energy efficiency. Compared to conventional AES hardware, AES-IMC achieves a 30% power enhancement with comparable throughput. Additionally, when compared to state-of-the-art AES-based NVM engines, AES-IMC demonstrates comparable power dissipation and a 62% increase in throughput. The IMC architecture enables cost-effective real-time deployment of AES, leading to high-performance computing. By leveraging the power of in-memory computing, this system is able to provide improved computational efficiency and faster processing speeds, making it a promising solution for a wide range of applications in the field of autonomous driving and robotics. The potential benefits of this system include improved safety and security of unmanned devices, as well as enhanced performance and cost-effectiveness in a variety of computing environments.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140645848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}