首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
CCFTL: A novel continuity compressed page-level flash address mapping method for SSDs CCFTL:适用于固态硬盘的新型连续性压缩页面级闪存地址映射方法
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-15 DOI: 10.1016/j.jpdc.2024.104917
Liangkuan Su , Mingwei Lin , Jianpeng Zhang , Yubiao Pan

Given the distinctive characteristics of flash-based solid-state drives (SSDs), such as out-of-place update scheme, as compared to traditional block storage devices, a flash translation layer (FTL) has been introduced to hide these features. In the FTL, there is an address translation module that implements the conversion from logical addresses to physical addresses. However, existing address mapping algorithms currently fail to fully exploit the mapping information generated by large I/O requests. First, based on this observation, we propose a novel continuity compressed page-level flash address mapping method (CCFTL). This method effectively compresses the mapping relationship between consecutive logical addresses and physical addresses, enabling the storage of more mapping information within the same mapping cache size. Next, we introduce two-level LRU linked list to mitigate the issue of compressed mapping entry splitting that arises from handling write requests. Finally, our experiments show that CCFTL reduced average response times by 52.67%, 16.81%, and 12.71% compared to DFTL, TPFTL, and MFTL, respectively. As the mapping cache size decreases from 2 MB to 1 MB, then further decreases to 256 KB, 128 KB, and eventually down to 64 KB, CCFTL experiences an average decline ratio of less than 3% in average response time, while the other three algorithms show an average decline ratio of 9.51%.

与传统的块存储设备相比,基于闪存的固态硬盘(SSD)具有不同的特性,如非就地更新方案,因此引入了闪存转换层(FTL)来隐藏这些特性。在 FTL 中,有一个地址转换模块可以实现从逻辑地址到物理地址的转换。然而,现有的地址映射算法目前无法充分利用大型 I/O 请求产生的映射信息。首先,基于这一观点,我们提出了一种新颖的连续性压缩页面级闪存地址映射方法(CCFTL)。这种方法能有效压缩连续逻辑地址和物理地址之间的映射关系,从而在相同的映射缓存大小内存储更多的映射信息。接下来,我们引入了两级 LRU 链接列表,以缓解处理写入请求时出现的压缩映射条目分割问题。最后,我们的实验表明,与 DFTL、TPFTL 和 MFTL 相比,CCFTL 的平均响应时间分别缩短了 52.67%、16.81% 和 12.71%。随着映射缓存大小从 2 MB 减小到 1 MB,然后进一步减小到 256 KB、128 KB,最终减小到 64 KB,CCFTL 的平均响应时间平均下降率不到 3%,而其他三种算法的平均下降率为 9.51%。
{"title":"CCFTL: A novel continuity compressed page-level flash address mapping method for SSDs","authors":"Liangkuan Su ,&nbsp;Mingwei Lin ,&nbsp;Jianpeng Zhang ,&nbsp;Yubiao Pan","doi":"10.1016/j.jpdc.2024.104917","DOIUrl":"10.1016/j.jpdc.2024.104917","url":null,"abstract":"<div><p>Given the distinctive characteristics of flash-based solid-state drives (SSDs), such as out-of-place update scheme, as compared to traditional block storage devices, a flash translation layer (FTL) has been introduced to hide these features. In the FTL, there is an address translation module that implements the conversion from logical addresses to physical addresses. However, existing address mapping algorithms currently fail to fully exploit the mapping information generated by large I/O requests. First, based on this observation, we propose a novel continuity compressed page-level flash address mapping method (CCFTL). This method effectively compresses the mapping relationship between consecutive logical addresses and physical addresses, enabling the storage of more mapping information within the same mapping cache size. Next, we introduce two-level LRU linked list to mitigate the issue of compressed mapping entry splitting that arises from handling write requests. Finally, our experiments show that CCFTL reduced average response times by 52.67%, 16.81%, and 12.71% compared to DFTL, TPFTL, and MFTL, respectively. As the mapping cache size decreases from 2 MB to 1 MB, then further decreases to 256 KB, 128 KB, and eventually down to 64 KB, CCFTL experiences an average decline ratio of less than 3% in average response time, while the other three algorithms show an average decline ratio of 9.51%.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104917"},"PeriodicalIF":3.8,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141046274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated variational generative learning for heterogeneous data in distributed environments 针对分布式环境中异构数据的联合变式生成学习
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-14 DOI: 10.1016/j.jpdc.2024.104916
Wei Xie, Runqun Xiong, Jinghui Zhang, Jiahui Jin, Junzhou Luo

Distributedly training models across diverse clients with heterogeneous data samples can significantly impact the convergence of federated learning. Various novel federated learning methods address these challenges but often require significant communication resources and local computational capacity, leading to reduced global inference accuracy in scenarios with imbalanced label data distribution and quantity skew. To tackle these challenges, we propose FedVGL, a Federated Variational Generative Learning method that directly trains a local generative model to learn the distribution of local features and improve global target model inference accuracy during aggregation, particularly under conditions of severe data heterogeneity. FedVGL facilitates distributed learning by sharing generators and latent vectors with the global server, aiding in global target model training from mapping local data distribution to the variational latent space for feature reconstruction. Additionally, FedVGL implements anonymization and encryption techniques to bolster privacy during generative model transmission and aggregation. In comparison to vanilla federated learning, FedVGL minimizes communication overhead, demonstrating superior accuracy even with minimal communication rounds. It effectively mitigates model drift in scenarios with heterogeneous data, delivering improved target model training outcomes. Empirical results establish FedVGL's superiority over baseline federated learning methods under severe label imbalance and data skew condition. In a Label-based Dirichlet Distribution setting with α=0.01 and 10 clients using the MNIST dataset, FedVGL achieved an exceptional accuracy over 97% with the VGG-9 target model.

在具有异构数据样本的不同客户端上分布式训练模型,会严重影响联合学习的收敛性。各种新颖的联合学习方法都能应对这些挑战,但往往需要大量通信资源和本地计算能力,导致在标签数据分布不平衡和数量倾斜的情况下,全局推断的准确性降低。为了应对这些挑战,我们提出了 FedVGL,这是一种联合变异生成学习方法,它能直接训练局部生成模型,以学习局部特征的分布,并在聚合过程中提高全局目标模型推断的准确性,尤其是在数据异构严重的情况下。FedVGL 通过与全局服务器共享生成器和潜向量来促进分布式学习,通过将本地数据分布映射到用于特征重构的变异潜空间来帮助全局目标模型训练。此外,FedVGL 还采用了匿名和加密技术,以在生成模型传输和聚合过程中保护隐私。与传统的联合学习相比,FedVGL 最大限度地减少了通信开销,即使在通信轮数极少的情况下也能显示出卓越的准确性。它能有效缓解异构数据场景中的模型漂移,从而改善目标模型的训练结果。实证结果表明,在严重的标签不平衡和数据倾斜条件下,FedVGL 比基线联合学习方法更具优势。在基于标签的 Dirichlet 分布设置(α=0.01)和 10 个客户端使用 MNIST 数据集的情况下,FedVGL 的 VGG-9 目标模型的准确率超过了 97%。
{"title":"Federated variational generative learning for heterogeneous data in distributed environments","authors":"Wei Xie,&nbsp;Runqun Xiong,&nbsp;Jinghui Zhang,&nbsp;Jiahui Jin,&nbsp;Junzhou Luo","doi":"10.1016/j.jpdc.2024.104916","DOIUrl":"10.1016/j.jpdc.2024.104916","url":null,"abstract":"<div><p>Distributedly training models across diverse clients with heterogeneous data samples can significantly impact the convergence of federated learning. Various novel federated learning methods address these challenges but often require significant communication resources and local computational capacity, leading to reduced global inference accuracy in scenarios with imbalanced label data distribution and quantity skew. To tackle these challenges, we propose FedVGL, a Federated Variational Generative Learning method that directly trains a local generative model to learn the distribution of local features and improve global target model inference accuracy during aggregation, particularly under conditions of severe data heterogeneity. FedVGL facilitates distributed learning by sharing generators and latent vectors with the global server, aiding in global target model training from mapping local data distribution to the variational latent space for feature reconstruction. Additionally, FedVGL implements anonymization and encryption techniques to bolster privacy during generative model transmission and aggregation. In comparison to vanilla federated learning, FedVGL minimizes communication overhead, demonstrating superior accuracy even with minimal communication rounds. It effectively mitigates model drift in scenarios with heterogeneous data, delivering improved target model training outcomes. Empirical results establish FedVGL's superiority over baseline federated learning methods under severe label imbalance and data skew condition. In a Label-based Dirichlet Distribution setting with <em>α</em>=0.01 and 10 clients using the MNIST dataset, FedVGL achieved an exceptional accuracy over 97% with the VGG-9 target model.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104916"},"PeriodicalIF":3.8,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141036748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-efficient triple modular redundancy scheduling on heterogeneous multi-core real-time systems 异构多核实时系统上的高能效三模块冗余调度
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-13 DOI: 10.1016/j.jpdc.2024.104915
Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li

Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.

三模块冗余(TMR)容错机制可以提供近乎完美的故障屏蔽,在提高实时系统可靠性方面具有巨大潜力。然而,一个任务的多个副本同时执行,会导致系统能耗急剧增加。在这项工作中,研究了在异构多核平台上使用 TMR 的并行应用以最小化能耗的问题。首先,改进了异构最早完成时间算法,然后根据给定应用的截止时间约束和可靠性要求,设计了一种延长副本执行时间的算法。其次,根据 TMR 的特性,设计了最小化第三副本执行开销(MEOTC)的算法。最后,考虑到任务执行的实际情况,提出了一种在线能量管理(OEM)方法。我们将所提出的算法与最先进的 AFTSA 算法进行了比较,结果表明两者在能耗方面存在显著差异。具体来说,在轻故障检测方面,MEOTC 算法和 OEM 算法的能耗分别比 AFTSA 算法低 80% 和 72%。在重故障检测方面,与 AFTSA 相比,MEOTC 和 OEM 的能耗分别为 61% 和 55%。
{"title":"Energy-efficient triple modular redundancy scheduling on heterogeneous multi-core real-time systems","authors":"Hongzhi Xu ,&nbsp;Binlian Zhang ,&nbsp;Chen Pan ,&nbsp;Keqin Li","doi":"10.1016/j.jpdc.2024.104915","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104915","url":null,"abstract":"<div><p>Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104915"},"PeriodicalIF":3.8,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSI−FL: Self-sovereign identity based privacy-preserving federated learning SSI-FL:基于自我主权身份的隐私保护联合学习
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-07 DOI: 10.1016/j.jpdc.2024.104907
Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu

Traditional federated learning (FL) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional FL, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (AD) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (SSI) and differential privacy (DP) based FL namely SSIFL for addressing all the above issues. The first step in the SSIFL framework involves establishing a secure connection based on blockchain-based SSI. This secure connection protects against unauthorized access attacks of any AD and ensures the transmitted data's authenticity, integrity, and availability. The second step applies DP to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing FL with a novel hybrid deep learning to achieve better scores than conventional methods. The SSIFL performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.

传统的联合学习(FL)会引发身份欺诈、数据中毒攻击、成员推理攻击和模型反转攻击等安全和隐私问题。在传统的联合学习中,任何实体都可以伪造身份并发起数据中毒攻击。此外,持有更新的全局模型参数的对手(AD)可以通过发起成员推理攻击和模型反转攻击来检索数据集的明文。据我们所知,这是第一项提出基于自我主权身份(SSI)和差分隐私(DP)的 FL(即 SSI-FL)来解决上述所有问题的工作。SSI-FL 框架的第一步是建立基于区块链 SSI 的安全连接。这种安全连接可防止任何 AD 的未经授权访问攻击,并确保传输数据的真实性、完整性和可用性。第二步是应用 DP 防止模型反转攻击和成员推理攻击。第三步的重点是利用新型混合深度学习建立 FL,以获得比传统方法更好的分数。SSI-FL 的性能分析基于安全性、形式、可扩展性和得分分析。此外,所提出的方法优于所有最先进的技术。
{"title":"SSI−FL: Self-sovereign identity based privacy-preserving federated learning","authors":"Rakib Ul Haque ,&nbsp;A.S.M. Touhidul Hasan ,&nbsp;Mohammed Ali Mohammed Al-Hababi ,&nbsp;Yuqing Zhang ,&nbsp;Dianxiang Xu","doi":"10.1016/j.jpdc.2024.104907","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104907","url":null,"abstract":"<div><p>Traditional federated learning (<span><math><mi>FL</mi></math></span>) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional <span><math><mi>FL</mi></math></span>, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (<span><math><mi>AD</mi></math></span>) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (<span><math><mi>SSI</mi></math></span>) and differential privacy (<span><math><mi>DP</mi></math></span>) based <span><math><mi>FL</mi></math></span> namely <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> for addressing all the above issues. The first step in the <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> framework involves establishing a secure connection based on blockchain-based <span><math><mi>SSI</mi></math></span>. This secure connection protects against unauthorized access attacks of any <span><math><mi>AD</mi></math></span> and ensures the transmitted data's authenticity, integrity, and availability. The second step applies <span><math><mi>DP</mi></math></span> to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing <span><math><mi>FL</mi></math></span> with a novel hybrid deep learning to achieve better scores than conventional methods. The <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104907"},"PeriodicalIF":3.8,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread 支持实时 HPC 流行病学的新型多集群工作流系统:调查疫苗接受度对 COVID-19 传播的影响
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-04 DOI: 10.1016/j.jpdc.2024.104899
Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe

We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.

As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.

Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.

我们介绍的 MacKenzie 是一个高性能计算驱动的多集群工作流系统,在 COVID-19 大流行期间被反复用于配置和执行细粒度的美国国家级流行病模拟模型。在 COVID-19 大流行期间,Mackenzie 为联邦和弗吉尼亚州的政策制定者提供了大量 "假设 "情景的实时支持,并在 COVID-19 向疾病流行阶段过渡时继续用于回答相关问题。MacKenzie 是一种新颖的高性能计算元调度程序,可执行美国规模的仿真模型和相关工作流,这些模型和工作流通常会带来巨大的大数据挑战。元调度程序优化了工作流中仿真的总执行时间,有助于提高人类的整体工作效率。作为使用 MacKenzie 进行研究的一个范例,我们介绍了一项建模研究,旨在了解接受疫苗对控制 COVID-19 在美国传播的影响。我们使用了一个 2.88 亿节点的合成社会接触网络(数字孪生),该网络覆盖美国 50 个州和华盛顿特区,由 3300 个县组成,每天有 120 亿次互动。用于流行病模拟的基于代理的高分辨率模型使用了有关疾病进展、疫苗吸收、生产计划、接受趋势、流行率和社会距离准则的现实信息。计算实验表明,对于上述模拟工作量,MacKenzie 能够很好地扩展到 10K CPU 内核。我们的建模结果表明,与更快和更加速的疫苗接种相比,由于疫苗接种犹豫而导致的疫苗接种率降低会使全美避免的感染人数从 670 万降至 450 万,避免的死亡总人数从 3940 万降至 2820 万。尽管两种方案的最终疫苗覆盖率相同,但还是出现了这种情况。我们还发现,如果各州的疫苗接种率都能提高 10%,那么全美可避免的感染人数将从 450 万增加到 470 万(提高 4.4%),可避免的死亡总人数将从 2.82 万增加到 2.99 万(提高 6%)。
{"title":"Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread","authors":"Parantapa Bhattacharya ,&nbsp;Dustin Machi ,&nbsp;Jiangzhuo Chen ,&nbsp;Stefan Hoops ,&nbsp;Bryan Lewis ,&nbsp;Henning Mortveit ,&nbsp;Srinivasan Venkatramanan ,&nbsp;Mandy L. Wilson ,&nbsp;Achla Marathe ,&nbsp;Przemyslaw Porebski ,&nbsp;Brian Klahn ,&nbsp;Joseph Outten ,&nbsp;Anil Vullikanti ,&nbsp;Dawen Xie ,&nbsp;Abhijin Adiga ,&nbsp;Shawn Brown ,&nbsp;Christopher Barrett ,&nbsp;Madhav Marathe","doi":"10.1016/j.jpdc.2024.104899","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104899","url":null,"abstract":"<div><p>We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.</p><p>As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.</p><p>Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104899"},"PeriodicalIF":3.8,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-03 DOI: 10.1016/S0743-7315(24)00075-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00075-3","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00075-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"190 ","pages":"Article 104911"},"PeriodicalIF":3.8,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000753/pdfft?md5=04032493554b9c9a6c79c75f9a9aab5d&pid=1-s2.0-S0743731524000753-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method 基于神经网络方法的网络流行病类传播模型的时空动态分析与参数优化
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-30 DOI: 10.1016/j.jpdc.2024.104906
Shuling Shen , Xinlin Chen , Linhe Zhu

In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.

本文建立了一个反应-扩散模型来研究谣言传播的动态行为。首先,我们考虑了正平衡点的存在。然后,我们进行稳定性分析,研究图灵不稳定性发生的条件。其次,我们利用多尺度分析推导出振幅方程的表达式。在数值模拟过程中,考虑了实际情况。结果表明,控制谣言的传播速度和新增网民数量对遏制网络谣言的传播有很大作用。此外,还证明了振幅方程的分析对图灵模式的形成起着决定性作用。我们还讨论了网络结构变化时的图灵模式现象,并通过蒙特卡罗方法验证了模型的合理性。最后,我们分别考虑了基于统计原理和卷积神经网络的两种方法,利用稳定模式识别具有图灵不稳定性的反应扩散系统的参数。基于统计原理的方法具有更高的准确性,而基于卷积神经网络的方法则大大缩短了识别时间,降低了时间成本。
{"title":"Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method","authors":"Shuling Shen ,&nbsp;Xinlin Chen ,&nbsp;Linhe Zhu","doi":"10.1016/j.jpdc.2024.104906","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104906","url":null,"abstract":"<div><p>In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104906"},"PeriodicalIF":3.8,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Rabin numbers of enhanced hypercubes 增强超立方体的拉宾数
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-24 DOI: 10.1016/j.jpdc.2024.104905
Chaoming Guo , Meijie Ma , Xiang-Jun Li , Guijuan Wang
<div><p>The <em>ω</em>-Rabin number <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><mi>G</mi><mo>)</mo></math></span> and strong <em>ω</em>-Rabin number <span><math><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><mi>G</mi><mo>)</mo></math></span> are two effective parameters to assess transmission latency and fault tolerance of an interconnection network <em>G</em>. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span> which is a variant of the hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span>. For <span><math><mi>n</mi><mo>≥</mo><mi>k</mi><mo>≥</mo><mn>5</mn></math></span>, we prove that <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> for <span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo><</mo><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo></math></span>; <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>+</mo><mn>1</mn></math></span> for <span><math><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo><mn>1</mn></math></span>, where <span><math><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> is the diameter of <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span>. In addition, we present algorithms to construct internally disjoint paths of length at most <span><math><msup><mrow><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub></mrow><mrow><mo>⁎</mo></mrow></msup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> from a source vertex to other <em>ω</em> (<span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo>
ω-拉宾数rω(G)和强ω-拉宾数rω⁎(G)是评估互连网络G的传输延迟和容错性的两个有效参数。由于确定一般图的拉宾数是NP-完全的,我们考虑了增强超立方体Qn,k的拉宾数,它是超立方体Qn的一个变体。对于 n≥k≥5,我们证明了在 1≤ω<n-⌊k2⌋ 时,rω(Qn,k)=rω⁎(Qn,k)=d(Qn,k);在 n-⌊k2⌋≤ω≤n+1 时,rω(Qn,k)=rω⁎(Qn,k)=d(Qn,k)+1,其中 d(Qn,k) 是 Qn,k 的直径。此外,我们还提出了一些算法,用于构建从一个源顶点到 Qn,k 中其他 ω (1≤ω≤n+1) 个目的顶点(不一定不同)的长度最多为 rω⁎(Qn,k)的内部不相交路径。
{"title":"The Rabin numbers of enhanced hypercubes","authors":"Chaoming Guo ,&nbsp;Meijie Ma ,&nbsp;Xiang-Jun Li ,&nbsp;Guijuan Wang","doi":"10.1016/j.jpdc.2024.104905","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104905","url":null,"abstract":"&lt;div&gt;&lt;p&gt;The &lt;em&gt;ω&lt;/em&gt;-Rabin number &lt;span&gt;&lt;math&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;G&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt; and strong &lt;em&gt;ω&lt;/em&gt;-Rabin number &lt;span&gt;&lt;math&gt;&lt;msubsup&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mo&gt;⁎&lt;/mo&gt;&lt;/mrow&gt;&lt;/msubsup&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;G&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt; are two effective parameters to assess transmission latency and fault tolerance of an interconnection network &lt;em&gt;G&lt;/em&gt;. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube &lt;span&gt;&lt;math&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/math&gt;&lt;/span&gt; which is a variant of the hypercube &lt;span&gt;&lt;math&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/math&gt;&lt;/span&gt;. For &lt;span&gt;&lt;math&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;≥&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;mo&gt;≥&lt;/mo&gt;&lt;mn&gt;5&lt;/mn&gt;&lt;/math&gt;&lt;/span&gt;, we prove that &lt;span&gt;&lt;math&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;msubsup&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mo&gt;⁎&lt;/mo&gt;&lt;/mrow&gt;&lt;/msubsup&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt; for &lt;span&gt;&lt;math&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;≤&lt;/mo&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;mo&gt;&lt;&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mo&gt;⌊&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;mo&gt;⌋&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt;; &lt;span&gt;&lt;math&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;msubsup&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mo&gt;⁎&lt;/mo&gt;&lt;/mrow&gt;&lt;/msubsup&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/math&gt;&lt;/span&gt; for &lt;span&gt;&lt;math&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mo&gt;⌊&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;mo&gt;⌋&lt;/mo&gt;&lt;mo&gt;≤&lt;/mo&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;mo&gt;≤&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/math&gt;&lt;/span&gt;, where &lt;span&gt;&lt;math&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt; is the diameter of &lt;span&gt;&lt;math&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/math&gt;&lt;/span&gt;. In addition, we present algorithms to construct internally disjoint paths of length at most &lt;span&gt;&lt;math&gt;&lt;msup&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mo&gt;⁎&lt;/mo&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt; from a source vertex to other &lt;em&gt;ω&lt;/em&gt; (&lt;span&gt;&lt;math&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;≤&lt;/mo&gt;&lt;mi&gt;ω&lt;/mi&gt;&lt;mo&gt;≤&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104905"},"PeriodicalIF":3.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140645879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DuMato: An efficient warp-centric subgraph enumeration system for GPU DuMato:面向 GPU 的高效经中心子图枚举系统
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-22 DOI: 10.1016/j.jpdc.2024.104903
Samuel Ferraz , Vinicius Dias , Carlos H.C. Teixeira , Srinivasan Parthasarathy , George Teodoro , Wagner Meira Jr.

Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.

子图枚举是图形模式挖掘(GPM)算法的核心,其目标是从更大的图形中根据给定属性提取子图。由于不规则性、高内存需求和枚举范式的非三维选择,为 GPU 扩展 GPM 算法具有挑战性。在这项工作中,我们提出了一种深度优先搜索子图探索策略(DFS-wide),以改善不同枚举范式的内存局部性和访问模式。我们设计了一个以翘曲为中心的工作流程,以减少分歧并确保对图数据的访问是聚合的。此外,我们还提出了一种基于权重的动态工作量再分配方法,以缓解负载不平衡问题。我们将这些策略整合到一个名为 DuMato 的系统中,允许通过一套通用的 GPU 基元高效地实现几种 GPM 算法。我们的实验表明,DuMato 的优化非常有效,与最先进的系统相比,它可以探索更大的子图。
{"title":"DuMato: An efficient warp-centric subgraph enumeration system for GPU","authors":"Samuel Ferraz ,&nbsp;Vinicius Dias ,&nbsp;Carlos H.C. Teixeira ,&nbsp;Srinivasan Parthasarathy ,&nbsp;George Teodoro ,&nbsp;Wagner Meira Jr.","doi":"10.1016/j.jpdc.2024.104903","DOIUrl":"10.1016/j.jpdc.2024.104903","url":null,"abstract":"<div><p>Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104903"},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140758522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ion-molecule collision cross-section calculations using trajectory parallelization in distributed systems 利用分布式系统中的轨迹并行化计算离子-分子碰撞截面
IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-22 DOI: 10.1016/j.jpdc.2024.104902
Samuel Cajahuaringa , Leandro N. Zanotto , Sandro Rigo , Hervé Yviquel , Munir S. Skaf , Guido Araujo

Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at https://github.com/cces-cepid/massccs.

离子迁移质谱法(IM-MS)是分析复杂分子结构特征的一种强有力的分析方法。在 IM-MS 中,被测样品被电离并在电场的推动下进入漂移管,与缓冲气体发生碰撞。然后,通过旋转平均碰撞截面(CCS)值的差异来测量离子气相的分离情况。测量的碰撞截面(CCS)对结构表征的有效性主要取决于理论计算的验证。这一验证过程依赖于密集的分子力学模拟,而分子力学模拟对计算要求很高,尤其是对于分子组装和病毒等大型系统。因此,需要可靠、快速的 CCS 计算来帮助解释 IM-MS 实验数据。这项工作介绍了 MassCCS 软件,该软件通过实施基于链接单元的算法,结合高性能计算(HPC)技术,大大提高了 CCS 模拟性能。我们对系统的大小、形状和 CPU 内核数量进行了大量测试。实验结果表明,与离子迁移谱碰撞模拟器(CoSIMS)和高性能碰撞截面(HPCCS)相比,在单节点执行时,速度最多可提高 3 个数量级。此外,我们还采用 OpenMP Cluster(OMPC)在节点间扩展了 MassCCS。OMPC 是一种创新的编程模型,专为开发 HPC 应用程序而设计。它仅使用 OpenMP 指令就能简化开发流程和软件维护。值得注意的是,OMPC 的性能可与纯 MPI 实现相媲美。这一改进使得使用氮缓冲气进行昂贵的 CCS 计算成为可能,对大型系统(如拥有 1,100 万个原子的人类腺病毒)的计算仅需 4 分钟,从而使 MassCCS 成为我们目前所知性能最好的软件。MassCCS 可在 https://github.com/cces-cepid/massccs 免费提供给学术界使用。
{"title":"Ion-molecule collision cross-section calculations using trajectory parallelization in distributed systems","authors":"Samuel Cajahuaringa ,&nbsp;Leandro N. Zanotto ,&nbsp;Sandro Rigo ,&nbsp;Hervé Yviquel ,&nbsp;Munir S. Skaf ,&nbsp;Guido Araujo","doi":"10.1016/j.jpdc.2024.104902","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104902","url":null,"abstract":"<div><p>Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at <span>https://github.com/cces-cepid/massccs</span><svg><path></path></svg>.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104902"},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1