首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
SSI−FL: Self-sovereign identity based privacy-preserving federated learning SSI-FL:基于自我主权身份的隐私保护联合学习
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-05-07 DOI: 10.1016/j.jpdc.2024.104907
Rakib Ul Haque , A.S.M. Touhidul Hasan , Mohammed Ali Mohammed Al-Hababi , Yuqing Zhang , Dianxiang Xu

Traditional federated learning (FL) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional FL, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (AD) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (SSI) and differential privacy (DP) based FL namely SSIFL for addressing all the above issues. The first step in the SSIFL framework involves establishing a secure connection based on blockchain-based SSI. This secure connection protects against unauthorized access attacks of any AD and ensures the transmitted data's authenticity, integrity, and availability. The second step applies DP to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing FL with a novel hybrid deep learning to achieve better scores than conventional methods. The SSIFL performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.

传统的联合学习(FL)会引发身份欺诈、数据中毒攻击、成员推理攻击和模型反转攻击等安全和隐私问题。在传统的联合学习中,任何实体都可以伪造身份并发起数据中毒攻击。此外,持有更新的全局模型参数的对手(AD)可以通过发起成员推理攻击和模型反转攻击来检索数据集的明文。据我们所知,这是第一项提出基于自我主权身份(SSI)和差分隐私(DP)的 FL(即 SSI-FL)来解决上述所有问题的工作。SSI-FL 框架的第一步是建立基于区块链 SSI 的安全连接。这种安全连接可防止任何 AD 的未经授权访问攻击,并确保传输数据的真实性、完整性和可用性。第二步是应用 DP 防止模型反转攻击和成员推理攻击。第三步的重点是利用新型混合深度学习建立 FL,以获得比传统方法更好的分数。SSI-FL 的性能分析基于安全性、形式、可扩展性和得分分析。此外,所提出的方法优于所有最先进的技术。
{"title":"SSI−FL: Self-sovereign identity based privacy-preserving federated learning","authors":"Rakib Ul Haque ,&nbsp;A.S.M. Touhidul Hasan ,&nbsp;Mohammed Ali Mohammed Al-Hababi ,&nbsp;Yuqing Zhang ,&nbsp;Dianxiang Xu","doi":"10.1016/j.jpdc.2024.104907","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104907","url":null,"abstract":"<div><p>Traditional federated learning (<span><math><mi>FL</mi></math></span>) raises security and privacy concerns such as identity fraud, data poisoning attacks, membership inference attacks, and model inversion attacks. In the conventional <span><math><mi>FL</mi></math></span>, any entity can falsify its identity and initiate data poisoning attacks. Besides, adversaries (<span><math><mi>AD</mi></math></span>) holding the updated global model parameters can retrieve the plain text of the dataset by initiating membership inference attacks and model inversion attacks. To the best of our knowledge, this is the first work to propose a self-sovereign identity (<span><math><mi>SSI</mi></math></span>) and differential privacy (<span><math><mi>DP</mi></math></span>) based <span><math><mi>FL</mi></math></span> namely <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> for addressing all the above issues. The first step in the <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> framework involves establishing a secure connection based on blockchain-based <span><math><mi>SSI</mi></math></span>. This secure connection protects against unauthorized access attacks of any <span><math><mi>AD</mi></math></span> and ensures the transmitted data's authenticity, integrity, and availability. The second step applies <span><math><mi>DP</mi></math></span> to protect against model inversion attacks and membership inference attacks. The third step focuses on establishing <span><math><mi>FL</mi></math></span> with a novel hybrid deep learning to achieve better scores than conventional methods. The <span><math><mi>SSI</mi><mo>−</mo><mi>FL</mi></math></span> performance analysis is done based on security, formal, scalability, and score analysis. Moreover, the proposed method outperforms all the state-of-art techniques.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread 支持实时 HPC 流行病学的新型多集群工作流系统:调查疫苗接受度对 COVID-19 传播的影响
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-05-04 DOI: 10.1016/j.jpdc.2024.104899
Parantapa Bhattacharya , Dustin Machi , Jiangzhuo Chen , Stefan Hoops , Bryan Lewis , Henning Mortveit , Srinivasan Venkatramanan , Mandy L. Wilson , Achla Marathe , Przemyslaw Porebski , Brian Klahn , Joseph Outten , Anil Vullikanti , Dawen Xie , Abhijin Adiga , Shawn Brown , Christopher Barrett , Madhav Marathe

We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.

As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.

Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.

我们介绍的 MacKenzie 是一个高性能计算驱动的多集群工作流系统,在 COVID-19 大流行期间被反复用于配置和执行细粒度的美国国家级流行病模拟模型。在 COVID-19 大流行期间,Mackenzie 为联邦和弗吉尼亚州的政策制定者提供了大量 "假设 "情景的实时支持,并在 COVID-19 向疾病流行阶段过渡时继续用于回答相关问题。MacKenzie 是一种新颖的高性能计算元调度程序,可执行美国规模的仿真模型和相关工作流,这些模型和工作流通常会带来巨大的大数据挑战。元调度程序优化了工作流中仿真的总执行时间,有助于提高人类的整体工作效率。作为使用 MacKenzie 进行研究的一个范例,我们介绍了一项建模研究,旨在了解接受疫苗对控制 COVID-19 在美国传播的影响。我们使用了一个 2.88 亿节点的合成社会接触网络(数字孪生),该网络覆盖美国 50 个州和华盛顿特区,由 3300 个县组成,每天有 120 亿次互动。用于流行病模拟的基于代理的高分辨率模型使用了有关疾病进展、疫苗吸收、生产计划、接受趋势、流行率和社会距离准则的现实信息。计算实验表明,对于上述模拟工作量,MacKenzie 能够很好地扩展到 10K CPU 内核。我们的建模结果表明,与更快和更加速的疫苗接种相比,由于疫苗接种犹豫而导致的疫苗接种率降低会使全美避免的感染人数从 670 万降至 450 万,避免的死亡总人数从 3940 万降至 2820 万。尽管两种方案的最终疫苗覆盖率相同,但还是出现了这种情况。我们还发现,如果各州的疫苗接种率都能提高 10%,那么全美可避免的感染人数将从 450 万增加到 470 万(提高 4.4%),可避免的死亡总人数将从 2.82 万增加到 2.99 万(提高 6%)。
{"title":"Novel multi-cluster workflow system to support real-time HPC-enabled epidemic science: Investigating the impact of vaccine acceptance on COVID-19 spread","authors":"Parantapa Bhattacharya ,&nbsp;Dustin Machi ,&nbsp;Jiangzhuo Chen ,&nbsp;Stefan Hoops ,&nbsp;Bryan Lewis ,&nbsp;Henning Mortveit ,&nbsp;Srinivasan Venkatramanan ,&nbsp;Mandy L. Wilson ,&nbsp;Achla Marathe ,&nbsp;Przemyslaw Porebski ,&nbsp;Brian Klahn ,&nbsp;Joseph Outten ,&nbsp;Anil Vullikanti ,&nbsp;Dawen Xie ,&nbsp;Abhijin Adiga ,&nbsp;Shawn Brown ,&nbsp;Christopher Barrett ,&nbsp;Madhav Marathe","doi":"10.1016/j.jpdc.2024.104899","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104899","url":null,"abstract":"<div><p>We present MacKenzie, a HPC-driven multi-cluster workflow system that was used repeatedly to configure and execute fine-grained US national-scale epidemic simulation models during the COVID-19 pandemic. Mackenzie supported federal and Virginia policymakers, in real-time, for a large number of “what-if” scenarios during the COVID-19 pandemic, and continues to be used to answer related questions as COVID-19 transitions to the endemic stage of the disease. MacKenzie is a novel HPC meta-scheduler that can execute US-scale simulation models and associated workflows that typically present significant big data challenges. The meta-scheduler optimizes the total execution time of simulations in the workflow, and helps improve overall human productivity.</p><p>As an exemplar of the kind of studies that can be conducted using Mackenzie, we present a modeling study to understand the impact of vaccine-acceptance in controlling the spread of COVID-19 in the US. We use a 288 million node synthetic social contact network (digital twin) spanning all 50 US states plus Washington DC, comprised of 3300 counties, with 12 billion daily interactions. The highly-resolved agent-based model used for the epidemic simulations uses realistic information about disease progression, vaccine uptake, production schedules, acceptance trends, prevalence, and social distancing guidelines. Computational experiments show that, for the simulation workload discussed above, MacKenzie is able to scale up well to 10 K CPU cores.</p><p>Our modeling results show that, when compared to faster and accelerating vaccinations, slower vaccination rates due to vaccine hesitancy cause averted infections to drop from 6.7M to 4.5M, and averted total deaths to drop from 39.4 K to 28.2 K across the US. This occurs despite the fact that the final vaccine coverage is the same in both scenarios. We also find that if vaccine acceptance could be increased by 10% in all states, averted infections could be increased from 4.5M to 4.7M (a 4.4% improvement) and total averted deaths could be increased from 28.2 K to 29.9 K (a 6% improvement) nationwide.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-05-03 DOI: 10.1016/S0743-7315(24)00075-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00075-3","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00075-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000753/pdfft?md5=04032493554b9c9a6c79c75f9a9aab5d&pid=1-s2.0-S0743731524000753-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method 基于神经网络方法的网络流行病类传播模型的时空动态分析与参数优化
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-30 DOI: 10.1016/j.jpdc.2024.104906
Shuling Shen , Xinlin Chen , Linhe Zhu

In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.

本文建立了一个反应-扩散模型来研究谣言传播的动态行为。首先,我们考虑了正平衡点的存在。然后,我们进行稳定性分析,研究图灵不稳定性发生的条件。其次,我们利用多尺度分析推导出振幅方程的表达式。在数值模拟过程中,考虑了实际情况。结果表明,控制谣言的传播速度和新增网民数量对遏制网络谣言的传播有很大作用。此外,还证明了振幅方程的分析对图灵模式的形成起着决定性作用。我们还讨论了网络结构变化时的图灵模式现象,并通过蒙特卡罗方法验证了模型的合理性。最后,我们分别考虑了基于统计原理和卷积神经网络的两种方法,利用稳定模式识别具有图灵不稳定性的反应扩散系统的参数。基于统计原理的方法具有更高的准确性,而基于卷积神经网络的方法则大大缩短了识别时间,降低了时间成本。
{"title":"Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method","authors":"Shuling Shen ,&nbsp;Xinlin Chen ,&nbsp;Linhe Zhu","doi":"10.1016/j.jpdc.2024.104906","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104906","url":null,"abstract":"<div><p>In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Rabin numbers of enhanced hypercubes 增强超立方体的拉宾数
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-24 DOI: 10.1016/j.jpdc.2024.104905
Chaoming Guo , Meijie Ma , Xiang-Jun Li , Guijuan Wang

The ω-Rabin number rω(G) and strong ω-Rabin number rω(G) are two effective parameters to assess transmission latency and fault tolerance of an interconnection network G. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube Qn,k which is a variant of the hypercube Qn. For nk5, we prove that rω(Qn,k)=rω(Qn,k)=d(Qn,k) for 1ω<nk2; rω(Qn,k)=rω(Qn,k)=d(Qn,k)+1 for nk2ωn+1, where d(Qn,k) is the diameter of Qn,k. In addition, we present algorithms to construct internally disjoint paths of length at most rω(Qn,k) from a source vertex to other ω (1ωn+

ω-拉宾数rω(G)和强ω-拉宾数rω⁎(G)是评估互连网络G的传输延迟和容错性的两个有效参数。由于确定一般图的拉宾数是NP-完全的,我们考虑了增强超立方体Qn,k的拉宾数,它是超立方体Qn的一个变体。对于 n≥k≥5,我们证明了在 1≤ω<n-⌊k2⌋ 时,rω(Qn,k)=rω⁎(Qn,k)=d(Qn,k);在 n-⌊k2⌋≤ω≤n+1 时,rω(Qn,k)=rω⁎(Qn,k)=d(Qn,k)+1,其中 d(Qn,k) 是 Qn,k 的直径。此外,我们还提出了一些算法,用于构建从一个源顶点到 Qn,k 中其他 ω (1≤ω≤n+1) 个目的顶点(不一定不同)的长度最多为 rω⁎(Qn,k)的内部不相交路径。
{"title":"The Rabin numbers of enhanced hypercubes","authors":"Chaoming Guo ,&nbsp;Meijie Ma ,&nbsp;Xiang-Jun Li ,&nbsp;Guijuan Wang","doi":"10.1016/j.jpdc.2024.104905","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104905","url":null,"abstract":"<div><p>The <em>ω</em>-Rabin number <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><mi>G</mi><mo>)</mo></math></span> and strong <em>ω</em>-Rabin number <span><math><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><mi>G</mi><mo>)</mo></math></span> are two effective parameters to assess transmission latency and fault tolerance of an interconnection network <em>G</em>. As determining the Rabin number of a general graph is NP-complete, we consider the Rabin number of the enhanced hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span> which is a variant of the hypercube <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span>. For <span><math><mi>n</mi><mo>≥</mo><mi>k</mi><mo>≥</mo><mn>5</mn></math></span>, we prove that <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> for <span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo>&lt;</mo><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo></math></span>; <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow><mrow><mo>⁎</mo></mrow></msubsup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo><mo>+</mo><mn>1</mn></math></span> for <span><math><mi>n</mi><mo>−</mo><mo>⌊</mo><mfrac><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>⌋</mo><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo><mn>1</mn></math></span>, where <span><math><mi>d</mi><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> is the diameter of <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub></math></span>. In addition, we present algorithms to construct internally disjoint paths of length at most <span><math><msup><mrow><msub><mrow><mi>r</mi></mrow><mrow><mi>ω</mi></mrow></msub></mrow><mrow><mo>⁎</mo></mrow></msup><mo>(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>n</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>)</mo></math></span> from a source vertex to other <em>ω</em> (<span><math><mn>1</mn><mo>≤</mo><mi>ω</mi><mo>≤</mo><mi>n</mi><mo>+</mo>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140645879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DuMato: An efficient warp-centric subgraph enumeration system for GPU DuMato:面向 GPU 的高效经中心子图枚举系统
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-22 DOI: 10.1016/j.jpdc.2024.104903
Samuel Ferraz , Vinicius Dias , Carlos H.C. Teixeira , Srinivasan Parthasarathy , George Teodoro , Wagner Meira Jr.

Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.

子图枚举是图形模式挖掘(GPM)算法的核心,其目标是从更大的图形中根据给定属性提取子图。由于不规则性、高内存需求和枚举范式的非三维选择,为 GPU 扩展 GPM 算法具有挑战性。在这项工作中,我们提出了一种深度优先搜索子图探索策略(DFS-wide),以改善不同枚举范式的内存局部性和访问模式。我们设计了一个以翘曲为中心的工作流程,以减少分歧并确保对图数据的访问是聚合的。此外,我们还提出了一种基于权重的动态工作量再分配方法,以缓解负载不平衡问题。我们将这些策略整合到一个名为 DuMato 的系统中,允许通过一套通用的 GPU 基元高效地实现几种 GPM 算法。我们的实验表明,DuMato 的优化非常有效,与最先进的系统相比,它可以探索更大的子图。
{"title":"DuMato: An efficient warp-centric subgraph enumeration system for GPU","authors":"Samuel Ferraz ,&nbsp;Vinicius Dias ,&nbsp;Carlos H.C. Teixeira ,&nbsp;Srinivasan Parthasarathy ,&nbsp;George Teodoro ,&nbsp;Wagner Meira Jr.","doi":"10.1016/j.jpdc.2024.104903","DOIUrl":"10.1016/j.jpdc.2024.104903","url":null,"abstract":"<div><p>Subgraph enumeration is a heavy-computing procedure that lies at the core of Graph Pattern Mining (GPM) algorithms, whose goal is to extract subgraphs from larger graphs according to a given property. Scaling GPM algorithms for GPUs is challenging due to irregularity, high memory demand, and non-trivial choice of enumeration paradigms. In this work we propose a depth-first-search subgraph exploration strategy (DFS-wide) to improve the memory locality and access patterns across different enumeration paradigms. We design a warp-centric workflow to the problem that reduces divergences and ensures that accesses to graph data are coalesced. A weight-based dynamic workload redistribution is also proposed to mitigate load imbalance. We put together these strategies in a system called DuMato, allowing efficient implementations of several GPM algorithms via a common set of GPU primitives. Our experiments show that DuMato's optimizations are effective and that it enables exploring larger subgraphs when compared to state-of-the-art systems.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140758522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ion-molecule collision cross-section calculations using trajectory parallelization in distributed systems 利用分布式系统中的轨迹并行化计算离子-分子碰撞截面
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-22 DOI: 10.1016/j.jpdc.2024.104902
Samuel Cajahuaringa , Leandro N. Zanotto , Sandro Rigo , Hervé Yviquel , Munir S. Skaf , Guido Araujo

Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at https://github.com/cces-cepid/massccs.

离子迁移质谱法(IM-MS)是分析复杂分子结构特征的一种强有力的分析方法。在 IM-MS 中,被测样品被电离并在电场的推动下进入漂移管,与缓冲气体发生碰撞。然后,通过旋转平均碰撞截面(CCS)值的差异来测量离子气相的分离情况。测量的碰撞截面(CCS)对结构表征的有效性主要取决于理论计算的验证。这一验证过程依赖于密集的分子力学模拟,而分子力学模拟对计算要求很高,尤其是对于分子组装和病毒等大型系统。因此,需要可靠、快速的 CCS 计算来帮助解释 IM-MS 实验数据。这项工作介绍了 MassCCS 软件,该软件通过实施基于链接单元的算法,结合高性能计算(HPC)技术,大大提高了 CCS 模拟性能。我们对系统的大小、形状和 CPU 内核数量进行了大量测试。实验结果表明,与离子迁移谱碰撞模拟器(CoSIMS)和高性能碰撞截面(HPCCS)相比,在单节点执行时,速度最多可提高 3 个数量级。此外,我们还采用 OpenMP Cluster(OMPC)在节点间扩展了 MassCCS。OMPC 是一种创新的编程模型,专为开发 HPC 应用程序而设计。它仅使用 OpenMP 指令就能简化开发流程和软件维护。值得注意的是,OMPC 的性能可与纯 MPI 实现相媲美。这一改进使得使用氮缓冲气进行昂贵的 CCS 计算成为可能,对大型系统(如拥有 1,100 万个原子的人类腺病毒)的计算仅需 4 分钟,从而使 MassCCS 成为我们目前所知性能最好的软件。MassCCS 可在 https://github.com/cces-cepid/massccs 免费提供给学术界使用。
{"title":"Ion-molecule collision cross-section calculations using trajectory parallelization in distributed systems","authors":"Samuel Cajahuaringa ,&nbsp;Leandro N. Zanotto ,&nbsp;Sandro Rigo ,&nbsp;Hervé Yviquel ,&nbsp;Munir S. Skaf ,&nbsp;Guido Araujo","doi":"10.1016/j.jpdc.2024.104902","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104902","url":null,"abstract":"<div><p>Ion Mobility coupled with Mass Spectrometry (IM-MS) stands as a strong analytical method for structurally characterizing complex molecules. In IM-MS, the sample under investigation is ionized and propelled by an electric field into a drift tube, which collides against a buffer gas. The separation of the ion gas phase is then measured through the differences in their rotationally averaged Collision Cross-Section (CCS) values. The effectiveness of the measured Collision Cross-Section (CCS) for structural characterization critically depends on the validation against theoretical calculations. This validation process relies on intensive molecular mechanics simulations, which can be computationally demanding, especially for large systems such as molecular assemblies and viruses. Therefore, reliable and fast CCS calculations are needed to help interpret IM-MS experimental data. This work presents the MassCCS software, which considerably increases the CCS simulation performance by implementing a linked-cell-based algorithm, incorporating High-Performance Computing (HPC) techniques. We performed extensive tests regarding the system size, shape, and number of CPU cores. Experimental results reveal speedups up to 3 orders of magnitude faster than Collision Simulator for Ion Mobility Spectrometry (CoSIMS) and High-Performance Collision Cross Section (HPCCS), optimized solutions for CCS simulations, for a single node execution. In addition, we extended MassCCS at the inter-node level by employing OpenMP Cluster (OMPC). OMPC is an innovative programming model designed for the development of HPC applications. It streamlines the development process and simplifies software maintenance using only OpenMP directives. Notably, OMPC delivers a performance level comparable to a pure MPI implementation. This enhancement enabled expensive CCS calculations using nitrogen buffer gas for large systems such as human adenovirus with ∼11 million atoms in just ∼4 min, making MassCCS the most performant software nowadays, to the best of our knowledge. MassCCS is available as free software for Academic use at <span>https://github.com/cces-cepid/massccs</span><svg><path></path></svg>.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient topology reconfiguration for NoC-based multiprocessors: A greedy-memetic algorithm 基于 NoC 的多处理器的高效拓扑重新配置:贪婪内存算法
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-21 DOI: 10.1016/j.jpdc.2024.104904
Junyan Qian , Chuanfang Zhang , Zheng Wu , Hao Ding , Long Li

In multi-core processor systems, the Network-on-Chip (NoC) serves as a vital communication infrastructure. To ensure chip reliability during potential failures, this paper proposes a two-level topology reconfiguration algorithm with core-level redundancy technology. Initially, a heuristic topology reconfiguration method utilizing a greedy strategy is proposed to perform local replacement of faulty processing elements (PEs) and generate an initial logical topology with shorter interconnection paths between PEs. Then, an intelligent optimization method based on memetic algorithm is introduced to optimize the generated initial topology for better communication performance. The experimental results demonstrate that compared to the current state-of-the-art algorithm, the proposed algorithm achieves an average improvement of 13.92% and 30.83% on various size topologies in terms of distance factor (DF) and congestion factor (CF), which represent communication delay and traffic balance respectively. The proposed algorithm significantly enhances the communication performance of the target topology, mitigating communication latency and potential congestion problems.

在多核处理器系统中,片上网络(NoC)是重要的通信基础设施。为确保芯片在潜在故障期间的可靠性,本文提出了一种采用内核级冗余技术的两级拓扑重新配置算法。首先,本文提出了一种利用贪婪策略的启发式拓扑重新配置方法,用于执行故障处理元件(PE)的局部替换,并生成具有较短 PE 之间互连路径的初始逻辑拓扑。然后,引入基于记忆算法的智能优化方法,优化生成的初始拓扑结构,以获得更好的通信性能。实验结果表明,与目前最先进的算法相比,所提出的算法在各种规模的拓扑结构上,在距离因子(DF)和拥塞因子(CF)(分别代表通信延迟和流量平衡)方面平均提高了 13.92% 和 30.83%。所提出的算法大大提高了目标拓扑的通信性能,缓解了通信延迟和潜在的拥塞问题。
{"title":"Efficient topology reconfiguration for NoC-based multiprocessors: A greedy-memetic algorithm","authors":"Junyan Qian ,&nbsp;Chuanfang Zhang ,&nbsp;Zheng Wu ,&nbsp;Hao Ding ,&nbsp;Long Li","doi":"10.1016/j.jpdc.2024.104904","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104904","url":null,"abstract":"<div><p>In multi-core processor systems, the Network-on-Chip (NoC) serves as a vital communication infrastructure. To ensure chip reliability during potential failures, this paper proposes a two-level topology reconfiguration algorithm with core-level redundancy technology. Initially, a heuristic topology reconfiguration method utilizing a greedy strategy is proposed to perform local replacement of faulty processing elements (PEs) and generate an initial logical topology with shorter interconnection paths between PEs. Then, an intelligent optimization method based on memetic algorithm is introduced to optimize the generated initial topology for better communication performance. The experimental results demonstrate that compared to the current state-of-the-art algorithm, the proposed algorithm achieves an average improvement of 13.92% and 30.83% on various size topologies in terms of distance factor (DF) and congestion factor (CF), which represent communication delay and traffic balance respectively. The proposed algorithm significantly enhances the communication performance of the target topology, mitigating communication latency and potential congestion problems.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CUDA acceleration of MI-based feature selection methods 基于 MI 的特征选择方法的 CUDA 加速
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-18 DOI: 10.1016/j.jpdc.2024.104901
Bieito Beceiro , Jorge González-Domínguez , Laura Morán-Fernández , Verónica Bolón-Canedo , Juan Touriño

Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of the general methods, but also an adaptation of them to work with low-precision fixed point in order to further increase their performance on GPUs. The experimental evaluation was carried out on two modern Nvidia GPUs (Turing T4 and Ampere A100) with highly satisfactory results, achieving speedups of up to 283x when compared to state-of-the-art C implementations.

特征选择算法是当今机器学习所必需的,因为它们能够去除无关信息和冗余信息,从而降低数据维度,提高后续分析的质量。目前的特征选择方法存在的问题是,在处理大型数据集时计算成本高昂。这项工作介绍了基于互信息(MI)度量的三种常用特征选择方法在 Nvidia GPU 上的并行实现:mRMR、JMI 和 DISR。公开的代码不仅包括一般方法的 CUDA 实现,还包括将这些方法调整为使用低精度定点,以进一步提高它们在 GPU 上的性能。实验评估是在两个现代 Nvidia GPU(图灵 T4 和安培 A100)上进行的,结果非常令人满意,与最先进的 C 语言实现相比,速度提高了 283 倍。
{"title":"CUDA acceleration of MI-based feature selection methods","authors":"Bieito Beceiro ,&nbsp;Jorge González-Domínguez ,&nbsp;Laura Morán-Fernández ,&nbsp;Verónica Bolón-Canedo ,&nbsp;Juan Touriño","doi":"10.1016/j.jpdc.2024.104901","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104901","url":null,"abstract":"<div><p>Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of the general methods, but also an adaptation of them to work with low-precision fixed point in order to further increase their performance on GPUs. The experimental evaluation was carried out on two modern Nvidia GPUs (Turing T4 and Ampere A100) with highly satisfactory results, achieving speedups of up to 283x when compared to state-of-the-art C implementations.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000650/pdfft?md5=702120f16f21ee1ed938e87b7c2e0385&pid=1-s2.0-S0743731524000650-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and lightweight in-memory computing architecture for hardware security 面向硬件安全的高效轻量级内存计算架构
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-04-16 DOI: 10.1016/j.jpdc.2024.104898
Hala Ajmi , Fakhreddine Zayer , Amira Hadj Fredj, Hamdi Belgacem, Baker Mohammad, Naoufel Werghi, Jorge Dias

This paper introduces an innovative solution for improving the efficiency and speed of the Advanced Encryption Standard (AES) based cryptographic algorithm. The approach leverages in-memory computing (IMC) and is versatile for application across a broad spectrum of IoT applications, including robotic autonomous vehicles and various other scenarios. To achieve this goal, memristor (MR) designs are proposed to emulate the arithmetic operations required for different phases of the AES algorithm, enabling efficient in-memory processing. The key contributions of this work include; 1) The development of a 4 bit-MR state element for implementing different arithmetic operations in an AES hardware prototype; 2) The creation of a pipeline AES design for massive parallelism and MR integration compatibility; and 3) The hardware implementation of the AES-IMC based architecture using the MR emulator. The results show that AES-IMC performs better than existing architectures in terms of higher throughput and energy efficiency. Compared to conventional AES hardware, AES-IMC achieves a 30% power enhancement with comparable throughput. Additionally, when compared to state-of-the-art AES-based NVM engines, AES-IMC demonstrates comparable power dissipation and a 62% increase in throughput. The IMC architecture enables cost-effective real-time deployment of AES, leading to high-performance computing. By leveraging the power of in-memory computing, this system is able to provide improved computational efficiency and faster processing speeds, making it a promising solution for a wide range of applications in the field of autonomous driving and robotics. The potential benefits of this system include improved safety and security of unmanned devices, as well as enhanced performance and cost-effectiveness in a variety of computing environments.

本文介绍了一种创新解决方案,用于提高基于高级加密标准(AES)的加密算法的效率和速度。该方法利用内存计算(IMC),适用于广泛的物联网应用,包括机器人自动驾驶汽车和其他各种场景。为了实现这一目标,我们提出了忆阻器(MR)设计,以模拟 AES 算法不同阶段所需的算术运算,从而实现高效的内存处理。这项工作的主要贡献包括:1)开发了用于在 AES 硬件原型中实现不同算术运算的 4 位 MR 状态元素;2)创建了用于大规模并行性和 MR 集成兼容性的流水线 AES 设计;3)使用 MR 仿真器实现了基于 AES-IMC 架构的硬件实施。结果表明,就更高的吞吐量和能效而言,AES-IMC 的性能优于现有架构。与传统的 AES 硬件相比,AES-IMC 在吞吐量相当的情况下提高了 30% 的功耗。此外,与最先进的基于 AES 的 NVM 引擎相比,AES-IMC 的功耗相当,吞吐量提高了 62%。IMC 架构实现了具有成本效益的 AES 实时部署,带来了高性能计算。通过利用内存计算的强大功能,该系统能够提供更高的计算效率和更快的处理速度,使其成为自动驾驶和机器人领域各种应用的理想解决方案。该系统的潜在优势包括提高无人驾驶设备的安全性,以及在各种计算环境中提高性能和成本效益。
{"title":"Efficient and lightweight in-memory computing architecture for hardware security","authors":"Hala Ajmi ,&nbsp;Fakhreddine Zayer ,&nbsp;Amira Hadj Fredj,&nbsp;Hamdi Belgacem,&nbsp;Baker Mohammad,&nbsp;Naoufel Werghi,&nbsp;Jorge Dias","doi":"10.1016/j.jpdc.2024.104898","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104898","url":null,"abstract":"<div><p>This paper introduces an innovative solution for improving the efficiency and speed of the Advanced Encryption Standard (AES) based cryptographic algorithm. The approach leverages in-memory computing (IMC) and is versatile for application across a broad spectrum of IoT applications, including robotic autonomous vehicles and various other scenarios. To achieve this goal, memristor (MR) designs are proposed to emulate the arithmetic operations required for different phases of the AES algorithm, enabling efficient in-memory processing. The key contributions of this work include; 1) The development of a 4 bit-MR state element for implementing different arithmetic operations in an AES hardware prototype; 2) The creation of a pipeline AES design for massive parallelism and MR integration compatibility; and 3) The hardware implementation of the AES-IMC based architecture using the MR emulator. The results show that AES-IMC performs better than existing architectures in terms of higher throughput and energy efficiency. Compared to conventional AES hardware, AES-IMC achieves a 30% power enhancement with comparable throughput. Additionally, when compared to state-of-the-art AES-based NVM engines, AES-IMC demonstrates comparable power dissipation and a 62% increase in throughput. The IMC architecture enables cost-effective real-time deployment of AES, leading to high-performance computing. By leveraging the power of in-memory computing, this system is able to provide improved computational efficiency and faster processing speeds, making it a promising solution for a wide range of applications in the field of autonomous driving and robotics. The potential benefits of this system include improved safety and security of unmanned devices, as well as enhanced performance and cost-effectiveness in a variety of computing environments.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140645848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1