首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
The (t,k)-diagnosability of Cayley graph generated by 2-tree 由2-tree生成的Cayley图的(t,k)可诊断性
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-21 DOI: 10.1016/j.jpdc.2025.105068
Lulu Yang , Shuming Zhou , Eddie Cheng
Multiprocessor systems, which typically use interconnection networks (or graphs) as underlying topologies, are widely utilized for big data analysis in scientific computing due to the advancements in technologies such as cloud computing, IoT, social network. With the dramatic expansion in the scale of multiprocessor systems, the pursuit and optimization of strategies for identifying faulty processors have become crucial to ensuring the normal operation of high-performance computing systems. System-level diagnosis is a process designed to distinguish between faulty processors and fault-free processors in multiprocessor systems. The (t,k)-diagnosis, a generalization of sequential diagnosis, proceeds to identify at least k faulty processors and repair them in each iteration under the assumption that there are at most t faulty processors whenever tk. We show that Cayley graph generated by 2-tree is (2n3,2n4)-diagnosable under the PMC model for n5 while it is (2n3(2n6)2n4,2n4)-diagnosable under the MM model for n4. As an empirical case study, the (t,k)-diagnosabilities of the alternating group graph AGn under the PMC model and the MM* model have been determined.
多处理器系统通常使用互连网络(或图形)作为底层拓扑,由于云计算、物联网、社交网络等技术的进步,多处理器系统被广泛用于科学计算中的大数据分析。随着多处理器系统规模的急剧扩大,故障处理器识别策略的追求和优化已成为保证高性能计算系统正常运行的关键。在多处理机系统中,系统级诊断是一种用于区分故障处理机和无故障处理机的过程。(t,k)-诊断是对顺序诊断的一种推广,它在假设t≥k时最多有t个故障处理器的情况下,在每次迭代中识别出至少k个故障处理器并对其进行修复。我们证明了由2-tree生成的Cayley图在n≥5的PMC模型下是(2n−3,2n−4)可诊断的,而在n≥4的MM模型下是(2n−3(2n−6)2n−4,2n−4)可诊断的。作为实证研究,确定了交替群图AGn在PMC模型和MM*模型下的(t,k)-可诊断性。
{"title":"The (t,k)-diagnosability of Cayley graph generated by 2-tree","authors":"Lulu Yang ,&nbsp;Shuming Zhou ,&nbsp;Eddie Cheng","doi":"10.1016/j.jpdc.2025.105068","DOIUrl":"10.1016/j.jpdc.2025.105068","url":null,"abstract":"<div><div>Multiprocessor systems, which typically use interconnection networks (or graphs) as underlying topologies, are widely utilized for big data analysis in scientific computing due to the advancements in technologies such as cloud computing, IoT, social network. With the dramatic expansion in the scale of multiprocessor systems, the pursuit and optimization of strategies for identifying faulty processors have become crucial to ensuring the normal operation of high-performance computing systems. System-level diagnosis is a process designed to distinguish between faulty processors and fault-free processors in multiprocessor systems. The <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosis, a generalization of sequential diagnosis, proceeds to identify at least <em>k</em> faulty processors and repair them in each iteration under the assumption that there are at most <em>t</em> faulty processors whenever <span><math><mi>t</mi><mo>≥</mo><mi>k</mi></math></span>. We show that Cayley graph generated by 2-tree is <span><math><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the PMC model for <span><math><mi>n</mi><mo>≥</mo><mn>5</mn></math></span> while it is <span><math><mo>(</mo><mfrac><mrow><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>(</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>6</mn><mo>)</mo></mrow><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn></mrow></mfrac><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the MM<sup>⁎</sup> model for <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span>. As an empirical case study, the <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosabilities of the alternating group graph <span><math><mi>A</mi><msub><mrow><mi>G</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span> under the PMC model and the MM* model have been determined.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105068"},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143687634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge-driven approach to multi-objective IoT task graph scheduling in fog-cloud computing 雾云计算中多目标物联网任务图调度的知识驱动方法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-18 DOI: 10.1016/j.jpdc.2025.105069
Hadi Gholami, Hongyang Sun
Despite the significant growth of Internet of Things (IoT), there are prominent limitations of this emerging technology, such as limited processing power and storage. Along with the expansion of IoT networks, the fog-cloud computing paradigm has been developed to optimize the provision of services to IoT users by offloading computations to the more powerful processing resources. In this paper, with the aim of optimizing multiple objectives of makespan, energy consumption, and cost, we develop a novel automatic three-module algorithm to schedule multiple task graphs offloaded from IoT devices to the fog-cloud environment. Our algorithm combines the Genetic Algorithm (GA) and the Random Forest (RF) classifier, which we call Hybrid GA-RF (HGARF). Each of the three modules has a responsibility and they are repeated sequentially to extract knowledge from the solution space in the form of IF-THEN rules. The first module is responsible for generating solutions for the training set using a GA. Here, we introduce a chromosome encoding method and a crossover operator to create diversity for multiple task graphs. By expressing a concept called bottleneck and two conditions, we also develop a mutation operator to identify and reduce the workload of certain processing centers. The second module aims at generating rules from the solutions of the training set, and to that end employs an RF classifier. Here, in addition to proposing features to construct decision trees, we develop a format for extracting and recording IF-THEN rules. The third module checks the quality of the generated rules and refines them by predicting the processing resources as well as removing less important rules from the rule set. Finally, the developed HGARF algorithm automatically determines its termination condition based on the quality of the provided solutions. Experimental results demonstrate that our method effectively improves the objective functions in large-size task graphs by up to 13.24 % compared to some state-of-the-art methods.
尽管物联网(IoT)的显著增长,但这种新兴技术存在突出的局限性,例如有限的处理能力和存储。随着物联网网络的扩展,雾云计算范式已经被开发出来,通过将计算卸载到更强大的处理资源来优化为物联网用户提供的服务。在本文中,为了优化完工时间、能耗和成本的多个目标,我们开发了一种新的自动三模块算法来调度从物联网设备卸载到雾云环境的多个任务图。我们的算法结合了遗传算法(GA)和随机森林(RF)分类器,我们称之为混合GA-RF (HGARF)。这三个模块都有各自的职责,它们依次重复,以IF-THEN规则的形式从解空间中提取知识。第一个模块负责使用遗传算法生成训练集的解。在这里,我们引入了染色体编码方法和交叉算子来创建多任务图的多样性。通过表达瓶颈和两个条件的概念,我们还开发了一个突变算子来识别和减少某些加工中心的工作量。第二个模块旨在从训练集的解中生成规则,并为此使用RF分类器。在这里,除了提出构造决策树的特征之外,我们还开发了一种用于提取和记录IF-THEN规则的格式。第三个模块检查生成的规则的质量,并通过预测处理资源以及从规则集中删除不太重要的规则来改进它们。最后,开发的HGARF算法根据所提供的解的质量自动确定其终止条件。实验结果表明,与现有的方法相比,该方法有效地提高了大型任务图的目标函数,提高了13.24%。
{"title":"A knowledge-driven approach to multi-objective IoT task graph scheduling in fog-cloud computing","authors":"Hadi Gholami,&nbsp;Hongyang Sun","doi":"10.1016/j.jpdc.2025.105069","DOIUrl":"10.1016/j.jpdc.2025.105069","url":null,"abstract":"<div><div>Despite the significant growth of Internet of Things (IoT), there are prominent limitations of this emerging technology, such as limited processing power and storage. Along with the expansion of IoT networks, the fog-cloud computing paradigm has been developed to optimize the provision of services to IoT users by offloading computations to the more powerful processing resources. In this paper, with the aim of optimizing multiple objectives of makespan, energy consumption, and cost, we develop a novel automatic three-module algorithm to schedule multiple task graphs offloaded from IoT devices to the fog-cloud environment. Our algorithm combines the Genetic Algorithm (GA) and the Random Forest (RF) classifier, which we call Hybrid GA-RF (HGARF). Each of the three modules has a responsibility and they are repeated sequentially to extract knowledge from the solution space in the form of IF-THEN rules. The first module is responsible for generating solutions for the training set using a GA. Here, we introduce a chromosome encoding method and a crossover operator to create diversity for multiple task graphs. By expressing a concept called bottleneck and two conditions, we also develop a mutation operator to identify and reduce the workload of certain processing centers. The second module aims at generating rules from the solutions of the training set, and to that end employs an RF classifier. Here, in addition to proposing features to construct decision trees, we develop a format for extracting and recording IF-THEN rules. The third module checks the quality of the generated rules and refines them by predicting the processing resources as well as removing less important rules from the rule set. Finally, the developed HGARF algorithm automatically determines its termination condition based on the quality of the provided solutions. Experimental results demonstrate that our method effectively improves the objective functions in large-size task graphs by up to 13.24 % compared to some state-of-the-art methods.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105069"},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data quality management in big data: Strategies, tools, and educational implications 大数据中的数据质量管理:策略、工具和教育意义
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-13 DOI: 10.1016/j.jpdc.2025.105067
Thu Nguyen , Hong-Tri Nguyen , Tu-Anh Nguyen-Hoang
This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.
本研究探讨了教育领域对有效的大数据质量管理(BDQM)的迫切需求,数据质量在这一领域具有深远影响,但仍未得到充分探索。这项工作从需求分析和标准制定系统地推进到大数据工作流中用于监控和提高数据质量的工具的部署。本研究的贡献体现在五个研究问题上,即数据质量对分析的影响、评估标准的建立、集中管理策略、改进技术以及针对教育的 BDQM 适应性。通过解决这些问题,研究推进了理论和实践框架,为利益相关者提供了提高数据驱动型教育计划的可靠性和效率的工具。这项研究整合了人工智能(AI)和分布式计算,引入了一个新颖的多阶段 BDQM 框架,强调数据质量评估、集中管理和 AI 增强型改进技术。这项工作强调了强大的 BDQM 系统在支持知情决策和实现教育项目可持续成果方面的变革潜力。调查结果凸显了大数据架构中自动数据管理的潜力,表明数据质量框架可以通过利用人工智能和分布式计算得到显著提升。此外,调查还强调了大数据质量管理的新兴趋势,特别是(i)自动数据清理和清洗以及(ii)数据丰富和增强。
{"title":"Data quality management in big data: Strategies, tools, and educational implications","authors":"Thu Nguyen ,&nbsp;Hong-Tri Nguyen ,&nbsp;Tu-Anh Nguyen-Hoang","doi":"10.1016/j.jpdc.2025.105067","DOIUrl":"10.1016/j.jpdc.2025.105067","url":null,"abstract":"<div><div>This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105067"},"PeriodicalIF":3.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMI-GPU: Inverted multi-index for billion-scale approximate nearest neighbor search with GPUs IMI-GPU:基于gpu的十亿尺度近似最近邻搜索的反向多索引
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-04 DOI: 10.1016/j.jpdc.2025.105066
Alan Araujo , Willian Barreiros Jr. , Jun Kong , Renato Ferreira , George Teodoro
Similarity search is utilized in specialized database systems designed to handle multimedia data, often represented by high-dimensional features. In this paper, we focus on speeding up the search process with GPUs. This problem has been previously approached by accelerating the Inverted File with Asymmetric Distance Computation algorithm on GPUs (IVFADC-GPU). However, the most recent algorithm for CPU, Inverted Multi-Index (IMI), was not considered for parallelization, being found too challenging for efficient GPU deployment. Thus, we propose a novel and efficient version of IMI for GPUs called IMI-GPU. We propose a new design of the multi-sequence algorithm of IMI, enabling efficient GPU execution. We compared IMI-GPU with IVFADC-GPU using a billion-scale dataset in which IMI-GPU achieved speedups of about 3.2× and 1.9× at Recall@1 and at Recall@16 respectively. The algorithms have been compared in a variety of scenarios and our novel IMI-GPU has shown to significantly outperform IVFADC on GPUs for the majority of tested cases.
相似度搜索在专门的数据库系统中被用于处理多媒体数据,这些数据通常由高维特征表示。在本文中,我们的重点是加快图形处理器的搜索过程。这个问题之前已经通过gpu上的非对称距离计算算法(IVFADC-GPU)来加速倒排文件。然而,最新的CPU算法,倒排多索引(IMI),并没有考虑并行化,被发现对高效的GPU部署太具有挑战性。因此,我们提出了一种新颖高效的gpu IMI版本,称为IMI- gpu。我们提出了一种新的IMI多序列算法设计,使GPU能够高效地执行。我们使用十亿规模的数据集将IMI-GPU与IVFADC-GPU进行了比较,其中IMI-GPU在Recall@1和Recall@16分别实现了约3.2倍和1.9倍的加速。这些算法已经在各种场景中进行了比较,我们的新型IMI-GPU在大多数测试用例中都显示出明显优于gpu上的IVFADC。
{"title":"IMI-GPU: Inverted multi-index for billion-scale approximate nearest neighbor search with GPUs","authors":"Alan Araujo ,&nbsp;Willian Barreiros Jr. ,&nbsp;Jun Kong ,&nbsp;Renato Ferreira ,&nbsp;George Teodoro","doi":"10.1016/j.jpdc.2025.105066","DOIUrl":"10.1016/j.jpdc.2025.105066","url":null,"abstract":"<div><div>Similarity search is utilized in specialized database systems designed to handle multimedia data, often represented by high-dimensional features. In this paper, we focus on speeding up the search process with GPUs. This problem has been previously approached by accelerating the Inverted File with Asymmetric Distance Computation algorithm on GPUs (IVFADC-GPU). However, the most recent algorithm for CPU, Inverted Multi-Index (IMI), was not considered for parallelization, being found too challenging for efficient GPU deployment. Thus, we propose a novel and efficient version of IMI for GPUs called IMI-GPU. We propose a new design of the multi-sequence algorithm of IMI, enabling efficient GPU execution. We compared IMI-GPU with IVFADC-GPU using a billion-scale dataset in which IMI-GPU achieved speedups of about 3.2× and 1.9× at Recall@1 and at Recall@16 respectively. The algorithms have been compared in a variety of scenarios and our novel IMI-GPU has shown to significantly outperform IVFADC on GPUs for the majority of tested cases.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105066"},"PeriodicalIF":3.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-01 DOI: 10.1016/S0743-7315(25)00027-9
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00027-9","DOIUrl":"10.1016/S0743-7315(25)00027-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105060"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Massive parallel simulation of gas turbine combustion using a fully implicit unstructured solver on the heterogeneous Sunway Taihulight supercomputer 基于全隐式非结构求解器在神威太湖之光异构超级计算机上的燃气轮机燃烧大规模并行模拟
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-13 DOI: 10.1016/j.jpdc.2025.105055
Fei Gao , Hu Ren , Zhuyin Ren , Ming Liu , Chengpeng Zhao , Guangwen Yang
Massive parallel simulations of a full annular aeroengine combustor chamber have been achieved on the on-chip heterogeneous Sunway Taihulight supercomputer. A billion-size unstructured mesh is generated through grid replication and rotation, accompanied by the development of an efficient geometric matching algorithm to address the conformal interface issue. We developed graph-based and tree-based loop fusion approaches for implicit solving procedure of the momentum equation, it is found that the strategic utilization of data reuse and separation of vector computation significantly enhances the performance on many-core processor. For linear system, a finer-grained parallelization based on sparse matrix-vector multiplication and vector computation is validated. Massive parallel tests utilizing 16 K processes with 1 M cores are successfully conducted to simulate the turbulent non-premixed combustion in an aeroengine combustor with nearly one billion cells. Compared to the pre-optimization version, this fully accelerated code achieves an impressive 5.48 times speedup in overall performance, with a parallel efficiency of up to 59 %.
在片上异构的神威“太湖之光”超级计算机上实现了全环形航空发动机燃烧室的大规模并行仿真。通过网格复制和旋转生成十亿大小的非结构化网格,同时开发了有效的几何匹配算法来解决保形界面问题。针对动量方程的隐式求解过程,提出了基于图的和基于树的循环融合方法,发现数据重用和矢量计算分离的策略利用显著提高了多核处理器上的性能。对于线性系统,验证了基于稀疏矩阵-向量乘法和向量计算的细粒度并行化方法。利用16 K过程和1 M芯成功地模拟了航空发动机近10亿个燃烧室的湍流非预混燃烧。与预优化版本相比,这个完全加速的代码在整体性能上实现了令人印象深刻的5.48倍加速,并行效率高达59%。
{"title":"Massive parallel simulation of gas turbine combustion using a fully implicit unstructured solver on the heterogeneous Sunway Taihulight supercomputer","authors":"Fei Gao ,&nbsp;Hu Ren ,&nbsp;Zhuyin Ren ,&nbsp;Ming Liu ,&nbsp;Chengpeng Zhao ,&nbsp;Guangwen Yang","doi":"10.1016/j.jpdc.2025.105055","DOIUrl":"10.1016/j.jpdc.2025.105055","url":null,"abstract":"<div><div>Massive parallel simulations of a full annular aeroengine combustor chamber have been achieved on the on-chip heterogeneous Sunway Taihulight supercomputer. A billion-size unstructured mesh is generated through grid replication and rotation, accompanied by the development of an efficient geometric matching algorithm to address the conformal interface issue. We developed graph-based and tree-based loop fusion approaches for implicit solving procedure of the momentum equation, it is found that the strategic utilization of data reuse and separation of vector computation significantly enhances the performance on many-core processor. For linear system, a finer-grained parallelization based on sparse matrix-vector multiplication and vector computation is validated. Massive parallel tests utilizing 16 K processes with 1 M cores are successfully conducted to simulate the turbulent non-premixed combustion in an aeroengine combustor with nearly one billion cells. Compared to the pre-optimization version, this fully accelerated code achieves an impressive 5.48 times speedup in overall performance, with a parallel efficiency of up to 59 %.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105055"},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed landmark labeling for social networks 面向社交网络的分布式地标标记
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-13 DOI: 10.1016/j.jpdc.2025.105057
Arda Şener, Hüsnü Yenigün, Kamer Kaya
Distance queries are a fundamental part of many network analysis applications. They can be used to infer the closeness of two users in social networks, the relation between two sites in a web graph, or the importance of the interaction between two proteins or molecules. Being able to answer these queries rapidly has many benefits in the area of network analysis. Pruned Landmark Labeling (Pll) is a technique used to generate an index for a given graph that allows the shortest path queries to be completed in a fraction of the time when compared to a standard breadth-first or a depth-first search-based algorithm. Parallel Shortest-distance Labeling (Psl) reorganizes the steps of Pll for the multithreaded setting and is designed particularly for social networks for which the index sizes can be much larger than what a single server can store. Even for a medium-size, 5 million vertex graph, the index size can be more than 40 GB. This paper proposes a hybrid, shared- and distributed-memory algorithm, DPSL, by partitioning the input graph via a vertex separator. The proposed method improves both the parallel execution time and the maximum memory consumption by distributing both the data and the work across multiple nodes of a cluster. For instance, on a graph with 5M vertices and 150M edges, using 4 nodes, DPSL reduces the execution time and maximum memory consumption by 2.13× and 1.87×, respectively, compared to our improved implementation of Psl.
距离查询是许多网络分析应用程序的基本组成部分。它们可以用来推断社交网络中两个用户的亲密程度,网络图中两个站点之间的关系,或者两个蛋白质或分子之间相互作用的重要性。能够快速回答这些查询在网络分析领域有很多好处。修剪的地标标记(Pll)是一种用于为给定图生成索引的技术,与基于标准宽度优先或深度优先的搜索算法相比,该技术允许在很短的时间内完成最短路径查询。并行最短距离标记(Parallel short -distance Labeling, Psl)为多线程设置重新组织了Pll的步骤,它是专门为索引大小远远大于单个服务器所能存储的索引大小的社交网络而设计的。即使是中等大小的500万个顶点图,索引大小也可能超过40 GB。本文提出了一种混合、共享和分布式内存算法DPSL,该算法通过一个顶点分隔符对输入图进行划分。该方法通过将数据和工作分布在集群的多个节点上,提高了并行执行时间和最大内存消耗。例如,在一个有5M个顶点和150M条边的图上,使用4个节点,与我们改进的Psl实现相比,DPSL的执行时间和最大内存消耗分别减少了2.13倍和1.87倍。
{"title":"Distributed landmark labeling for social networks","authors":"Arda Şener,&nbsp;Hüsnü Yenigün,&nbsp;Kamer Kaya","doi":"10.1016/j.jpdc.2025.105057","DOIUrl":"10.1016/j.jpdc.2025.105057","url":null,"abstract":"<div><div>Distance queries are a fundamental part of many network analysis applications. They can be used to infer the closeness of two users in social networks, the relation between two sites in a web graph, or the importance of the interaction between two proteins or molecules. Being able to answer these queries rapidly has many benefits in the area of network analysis. Pruned Landmark Labeling (<span>Pll</span>) is a technique used to generate an index for a given graph that allows the shortest path queries to be completed in a fraction of the time when compared to a standard breadth-first or a depth-first search-based algorithm. Parallel Shortest-distance Labeling (<span>Psl</span>) reorganizes the steps of <span>Pll</span> for the multithreaded setting and is designed particularly for social networks for which the index sizes can be much larger than what a single server can store. Even for a medium-size, 5 million vertex graph, the index size can be more than 40 GB. This paper proposes a hybrid, shared- and distributed-memory algorithm, DPSL, by partitioning the input graph via a vertex separator. The proposed method improves both the parallel execution time and the maximum memory consumption by distributing both the data and the work across multiple nodes of a cluster. For instance, on a graph with 5M vertices and 150M edges, using 4 nodes, DPSL reduces the execution time and maximum memory consumption by 2.13× and 1.87×, respectively, compared to our improved implementation of <span>Psl</span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105057"},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring data science workflows: A practice-oriented approach to teaching processing of massive datasets 探索数据科学工作流:面向实践的海量数据集处理教学方法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-12 DOI: 10.1016/j.jpdc.2025.105043
Johannes Schoder , H. Martin Bücker
Massive datasets are typically processed by a sequence of different stages, comprising data acquisition and preparation, data processing, data analysis, result validation, and visualization. In conjunction, these stages form a data science workflow, a key element enabling the solution of data-intensive problems. The complexity and heterogeneity of these stages require a diverse set of techniques and skills. This article discusses a hands-on practice-oriented approach aiming to enable and motivate graduate students to engage with realistic data science workflows. A major goal of the approach is to bridge the gap between academia and industry by integrating programming assignments that implement different data workflows with real-world data. In consecutive assignments, students are exposed to the methodology of solving problems using big data frameworks and are required to implement different data workflows of varying complexity. This practice-oriented approach is well received by students, as confirmed by different surveys.
海量数据集通常由一系列不同的阶段处理,包括数据采集和准备、数据处理、数据分析、结果验证和可视化。结合起来,这些阶段形成了数据科学工作流,这是解决数据密集型问题的关键要素。这些阶段的复杂性和异质性需要不同的技术和技能。本文讨论了一种面向实践的方法,旨在使和激励研究生参与现实的数据科学工作流程。该方法的一个主要目标是通过将实现不同数据工作流的编程作业与实际数据集成在一起,弥合学术界和工业界之间的差距。在连续的作业中,学生将接触到使用大数据框架解决问题的方法,并需要实现不同复杂性的不同数据工作流程。不同的调查证实,这种以实践为导向的方法深受学生的欢迎。
{"title":"Exploring data science workflows: A practice-oriented approach to teaching processing of massive datasets","authors":"Johannes Schoder ,&nbsp;H. Martin Bücker","doi":"10.1016/j.jpdc.2025.105043","DOIUrl":"10.1016/j.jpdc.2025.105043","url":null,"abstract":"<div><div>Massive datasets are typically processed by a sequence of different stages, comprising data acquisition and preparation, data processing, data analysis, result validation, and visualization. In conjunction, these stages form a data science workflow, a key element enabling the solution of data-intensive problems. The complexity and heterogeneity of these stages require a diverse set of techniques and skills. This article discusses a hands-on practice-oriented approach aiming to enable and motivate graduate students to engage with realistic data science workflows. A major goal of the approach is to bridge the gap between academia and industry by integrating programming assignments that implement different data workflows with real-world data. In consecutive assignments, students are exposed to the methodology of solving problems using big data frameworks and are required to implement different data workflows of varying complexity. This practice-oriented approach is well received by students, as confirmed by different surveys.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105043"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient GPU-accelerated parallel cross-correlation 高效gpu加速并行互相关
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-12 DOI: 10.1016/j.jpdc.2025.105054
Karel Maděra, Adam Šmelko, Martin Kruliš
Cross-correlation is a data analysis method widely employed in various signal processing and similarity-search applications. Our objective is to design a highly optimized GPU-accelerated implementation that will speed up the applications and also improve energy efficiency since GPUs are more efficient than CPUs in data-parallel tasks. There are two rudimentary ways to compute cross-correlation — a definition-based algorithm that tries all possible overlaps and an algorithm based on the Fourier transform, which is much more complex but has better asymptotical time complexity. We have focused mainly on the definition-based approach which is better suited for smaller input data and we have implemented multiple CUDA-enabled algorithms with multiple optimization options. The algorithms were evaluated on various scenarios, including the most typical types of multi-signal correlations, and we provide empirically verified optimal solutions for each of the studied scenarios.
互相关是一种广泛应用于各种信号处理和相似搜索应用的数据分析方法。我们的目标是设计一个高度优化的gpu加速实现,它将加快应用程序的速度,并提高能源效率,因为gpu在数据并行任务中比cpu更高效。计算互相关有两种基本方法——一种是基于定义的算法,它尝试所有可能的重叠;另一种是基于傅立叶变换的算法,它要复杂得多,但具有更好的渐近时间复杂度。我们主要关注基于定义的方法,这种方法更适合较小的输入数据,我们已经实现了多种支持cuda的算法,具有多种优化选项。这些算法在各种场景下进行了评估,包括最典型的多信号相关类型,我们为每个研究场景提供了经验验证的最优解决方案。
{"title":"Efficient GPU-accelerated parallel cross-correlation","authors":"Karel Maděra,&nbsp;Adam Šmelko,&nbsp;Martin Kruliš","doi":"10.1016/j.jpdc.2025.105054","DOIUrl":"10.1016/j.jpdc.2025.105054","url":null,"abstract":"<div><div>Cross-correlation is a data analysis method widely employed in various signal processing and similarity-search applications. Our objective is to design a highly optimized GPU-accelerated implementation that will speed up the applications and also improve energy efficiency since GPUs are more efficient than CPUs in data-parallel tasks. There are two rudimentary ways to compute cross-correlation — a definition-based algorithm that tries all possible overlaps and an algorithm based on the Fourier transform, which is much more complex but has better asymptotical time complexity. We have focused mainly on the definition-based approach which is better suited for smaller input data and we have implemented multiple CUDA-enabled algorithms with multiple optimization options. The algorithms were evaluated on various scenarios, including the most typical types of multi-signal correlations, and we provide empirically verified optimal solutions for each of the studied scenarios.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105054"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DePoL: Assuring training integrity in collaborative learning via decentralized verification DePoL:通过分散验证确保协作学习中的培训完整性
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-12 DOI: 10.1016/j.jpdc.2025.105056
Zhicheng Xu , Xiaoli Zhang , Xuanyu Yin , Hongbing Cheng
Collaborative learning enables multiple participants to jointly train complex models but is vulnerable to attacks like model poisoning or backdoor attacks. Ensuring training integrity can prevent these threats by blocking any tampered contributions from affecting the model. However, traditional approaches often suffer from single points of bottleneck or failure in decentralized environments. To address these issues, we propose DePoL, a secure, scalable, and efficient decentralized verification framework based on duplicated execution. DePoL leverages blockchain to distribute the verification tasks across multiple participant-formed groups, eliminating single-point bottlenecks. Within each group, redundant verification and a majority-based arbitration prevent single points of failure. To further enhance security, DePoL introduces a two-stage plagiarism-free commitment scheme to prevent untrusted verifiers from exploiting public on-chain data. Additionally, a hybrid verification method employs fuzzy matching to handle unpredictable reproduction errors, while a “slow path” ensures zero false positives for honest trainers. Our theoretical analysis demonstrates DePoL's security and termination properties. Extensive evaluations show that DePoL has overhead similar to common distributed machine learning algorithms, while outperforming centralized verification schemes in scalability, reducing training latency by up to 46%. Additionally, DePoL effectively handles reproduction errors with 0 false positives.
协作学习使多个参与者能够共同训练复杂的模型,但容易受到模型中毒或后门攻击等攻击。确保训练的完整性可以通过阻止任何篡改的贡献影响模型来防止这些威胁。然而,在分散的环境中,传统的方法经常受到单点瓶颈或故障的困扰。为了解决这些问题,我们提出了一种基于重复执行的安全、可扩展、高效的去中心化验证框架DePoL。DePoL利用区块链将验证任务分布到多个参与者组成的组中,从而消除了单点瓶颈。在每个组中,冗余验证和基于多数的仲裁可以防止单点故障。为了进一步提高安全性,DePoL引入了一个两阶段的无剽窃承诺方案,以防止不受信任的验证者利用链上公共数据。此外,混合验证方法采用模糊匹配来处理不可预测的复制错误,而“慢路径”确保诚实的训练者零误报。我们的理论分析证明了DePoL的安全性和终端特性。广泛的评估表明,DePoL的开销与常见的分布式机器学习算法相似,但在可扩展性方面优于集中式验证方案,可将训练延迟减少高达46%。此外,DePoL有效地处理了0误报的复制错误。
{"title":"DePoL: Assuring training integrity in collaborative learning via decentralized verification","authors":"Zhicheng Xu ,&nbsp;Xiaoli Zhang ,&nbsp;Xuanyu Yin ,&nbsp;Hongbing Cheng","doi":"10.1016/j.jpdc.2025.105056","DOIUrl":"10.1016/j.jpdc.2025.105056","url":null,"abstract":"<div><div>Collaborative learning enables multiple participants to jointly train complex models but is vulnerable to attacks like model poisoning or backdoor attacks. Ensuring training integrity can prevent these threats by blocking any tampered contributions from affecting the model. However, traditional approaches often suffer from single points of bottleneck or failure in decentralized environments. To address these issues, we propose <span>DePoL</span>, a secure, scalable, and efficient decentralized verification framework based on duplicated execution. <span>DePoL</span> leverages blockchain to distribute the verification tasks across multiple participant-formed groups, eliminating single-point bottlenecks. Within each group, redundant verification and a majority-based arbitration prevent single points of failure. To further enhance security, <span>DePoL</span> introduces a <em>two-stage plagiarism-free commitment scheme</em> to prevent untrusted verifiers from exploiting public on-chain data. Additionally, a <em>hybrid verification method</em> employs fuzzy matching to handle unpredictable reproduction errors, while a “slow path” ensures zero false positives for honest trainers. Our theoretical analysis demonstrates <span>DePoL</span>'s security and termination properties. Extensive evaluations show that <span>DePoL</span> has overhead similar to common distributed machine learning algorithms, while outperforming centralized verification schemes in scalability, reducing training latency by up to 46%. Additionally, <span>DePoL</span> effectively handles reproduction errors with 0 false positives.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105056"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1