Parallel Computing最新文献

英文中文

Adaptively parallel runtime verification based on distributed network for temporal properties 基于分布式网络的时间属性自适应并行运行时验证

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-09-01 DOI: 10.1016/j.parco.2023.103034

Bin Yu , Xu Lu , Cong Tian , Meng Wang , Chu Chen , Ming Lei , Zhenhua Duan

Runtime verification is a lightweight verification technique that verifies whether a monitored program execution satisfies a desired property. Online runtime verification faces challenges regarding efficiency and property expressiveness, which limit its widespread adoption. However, there is a lack of research that addresses both of these issues. With the basis of a distributed network, we propose an adaptively parallel approach to verify full regular temporal properties of C programs in an online manner. During program execution, segments of the generated state sequence are verified by distributed machines concurrently, while each segment is also verified in each multi-core machine with an adaptive number of threads. Experimental results demonstrate that, with supporting more expressive properties, our approach has a speedup of 2.5X–5.0X compared with other runtime verification approaches.

运行时验证是一种轻量级的验证技术，用于验证被监视的程序执行是否满足所需的属性。在线运行时验证面临着效率和属性表达性方面的挑战，这限制了它的广泛采用。然而，缺乏解决这两个问题的研究。在分布式网络的基础上，我们提出了一种自适应并行方法来在线验证C程序的全正则时间特性。在程序执行过程中，生成的状态序列的段由分布式机器并发地进行验证，同时每个段也在每个多核机器中以自适应的线程数进行验证。实验结果表明，与其他运行时验证方法相比，我们的方法在支持更多表达属性的情况下，速度提高了2.5 - 5.0 x。

引用次数: 0

Using heterogeneous GPU nodes with a Cabana-based implementation of MPCD 使用异构GPU节点和基于cabana的MPCD实现

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-09-01 DOI: 10.1016/j.parco.2023.103033

R. Halver, Christoph Junghans, G. Sutmann

引用次数: 0

Big data BPMN workflow resource optimization in the cloud 云中的大数据BPMN工作流资源优化

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-09-01 DOI: 10.1016/j.parco.2023.103025

Srđan Daniel Simić, Nikola Tanković, Darko Etinger

Cloud computing is one of the critical technologies that meet the demand of various businesses for the high-capacity computational processing power needed to gain knowledge from their ever-growing business data. When utilizing cloud computing resources to deal with Big Data processing, companies face the challenge of determining the optimal use of resources within their business processes. The miscalculation of the necessary resources directly affects their budget and can cause delays in the cycle time of their key processes. This study investigates the simulation of cloud resource optimization for Big Data workflows modeled with the Business Process Modeling Notation (BPMN). To this end, a BPMN performance evaluation framework was developed. The framework’s capabilities were presented using real-world data science workflow and later evaluated on workflows consisting of 13, 52, and 104 tasks. The results show that the developed framework is adequate for estimating the overall run-time distribution and optimizing the cloud resource deployment and that the BPMN can be utilized for Big Data processing workflows. Therefore, this study contributes to BPMN practitioners by providing a tool to apply BPMN for their Big Data workflows and decision-makers by giving them critical insights into their key business processes. The framework source code is available at https://github.com/ntankovic/python-bpmn-engine.

云计算是满足各种业务对高容量计算处理能力的需求的关键技术之一，这些能力需要从不断增长的业务数据中获取知识。在利用云计算资源处理大数据处理时，企业面临的挑战是确定其业务流程中资源的最佳使用。对必要资源的错误计算直接影响到他们的预算，并可能导致关键流程周期时间的延迟。本研究探讨了用业务流程建模符号(BPMN)建模的大数据工作流的云资源优化模拟。为此，开发了BPMN性能评估框架。该框架的功能是使用真实的数据科学工作流来展示的，随后在包含13、52和104个任务的工作流上进行了评估。结果表明，所开发的框架足以估计整体运行时分布和优化云资源部署，并且可以将BPMN用于大数据处理工作流。因此，本研究为BPMN从业者提供了一个将BPMN应用于其大数据工作流的工具，并为决策者提供了对其关键业务流程的关键见解，从而为他们做出了贡献。该框架的源代码可从https://github.com/ntankovic/python-bpmn-engine获得。

{"title":"Big data BPMN workflow resource optimization in the cloud","authors":"Srđan Daniel Simić, Nikola Tanković, Darko Etinger","doi":"10.1016/j.parco.2023.103025","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103025","url":null,"abstract":"<div><p>Cloud computing is one of the critical technologies that meet the demand of various businesses for the high-capacity computational processing power needed to gain knowledge from their ever-growing business data. When utilizing cloud computing resources to deal with Big Data processing, companies face the challenge of determining the optimal use of resources within their business processes. The miscalculation of the necessary resources directly affects their budget and can cause delays in the cycle time of their key processes. This study investigates the simulation of cloud resource optimization for Big Data workflows modeled with the Business Process Modeling Notation (BPMN). To this end, a BPMN performance evaluation framework was developed. The framework’s capabilities were presented using real-world data science workflow and later evaluated on workflows consisting of 13, 52, and 104 tasks. The results show that the developed framework is adequate for estimating the overall run-time distribution and optimizing the cloud resource deployment and that the BPMN can be utilized for Big Data processing workflows. Therefore, this study contributes to BPMN practitioners by providing a tool to apply BPMN for their Big Data workflows and decision-makers by giving them critical insights into their key business processes. The framework source code is available at <span>https://github.com/ntankovic/python-bpmn-engine</span><svg><path></path></svg>.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103025"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight ESA:一种用于神威太湖之光生物数据库检索的高效序列比对算法

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-08-01 DOI: 10.1016/j.parco.2023.103043

H. Zhang, Zhiyi Huang, Yawen Chen, Jianguo Liang, Xiran Gao

引用次数: 0

Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimization 通过贝叶斯优化查找异构计算中触发浮点异常的输入

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-08-01 DOI: 10.1016/j.parco.2023.103042

I. Laguna, Anh Tran, G. Gopalakrishnan

引用次数: 0

A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems 一种灵活的稀疏矩阵数据格式及其在共享存储系统上有限元矩阵装配的并行算法

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103039

A. Sky, César Polindara, I. Muench, C. Birk

引用次数: 0

Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures 异构体系结构中不规则点对点通信节点感知策略的性能表征

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103021

Shelby Lockhart , Amanda Bienz , William D. Gropp , Luke N. Olson

Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix–vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.

由于包含了异构计算节点，超级计算机体系结构正朝着更高的计算吞吐量发展。这些多GPU节点提高了节点上的计算效率，同时也增加了要通信的数据量和潜在数据流路径的数量。在这项工作中，我们通过性能建模描述了在异构计算环境中使用MPI进行不规则点对点通信的性能，展示了设备感知和通过主机通信技术进行分级的标准通信策略的局限性。所提出的模型建议通过主机进程暂存通信数据，然后使用节点感知通信策略来实现高节点间消息计数。值得注意的是，模型还预测，当与大量节点通信时，利用所有可用的CPU核心来通信节点间数据的节点感知通信会导致最具性能的策略。通过对分布式稀疏矩阵-矢量产品中不规则点对点通信模式的案例研究，提供了模型验证。重要的是，我们讨论了模型预测对新兴超级计算机架构的通信策略设计的影响。

{"title":"Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures","authors":"Shelby Lockhart , Amanda Bienz , William D. Gropp , Luke N. Olson","doi":"10.1016/j.parco.2023.103021","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103021","url":null,"abstract":"<div><p>Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix–vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 103021"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49728377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Segment based power-efficient scheduling for real-time DAG tasks on edge devices 基于段的边缘设备实时DAG任务节能调度

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103022

Lei Yu , Tianqi Zhong , Peng Bi , Lan Wang , Fei Teng

Smart Mobile Devices (SMDs) are crucial for the edge computing paradigm’s real-world sensing. Real-time applications, which are computationally intensive and periodic with strict time constraints, can typically be used to replicate real-world sensing. Such applications call for increased processing speed, memory capacity, and battery life on SMDs, which are typically resource-constrained due to physical size restrictions. As a result, scheduling real-time applications for SMDs that are power efficient is crucial for the regular operation of edge computing platforms, and downstream decision-making tasks like computation offloading require the prediction of power consumption using power-saving approaches like DVFS. The main question is how to swiftly develop a better solution to the NP-Hard power efficient scheduling problem with DVFS. Thus, by segmenting the aligned tasks on an SMD, we present a segment-based analysis approach. Additionally, we offer a segment-based scheduling algorithm (SEDF) that draws inspiration from the segment-based analysis approach to achieve power-efficient scheduling for these real-time workloads. This segment-based approach yields a power consumption bound (PB), and a computation offloading use case is developed to demonstrate the application of PB in the subsequent decision-making processes. Both simulations and actual device tests are used to confirm the PB, SEDF, and the effectiveness of offloading decision-making. We demonstrate empirically that PB can be utilized to make approximative optimal decisions in decision-making problems involving computation offloading. SEDF is a straightforward and effective scheduling approach that can cut the power consumption of a multi-core SMD by roughly 30%.

智能移动设备（SMD）对于边缘计算范式的真实世界感知至关重要。实时应用程序是计算密集型的，具有严格的时间限制，通常可以用于复制真实世界的传感。这种应用要求提高SMD的处理速度、存储器容量和电池寿命，而SMD通常由于物理尺寸限制而受到资源限制。因此，为SMD调度节能的实时应用程序对于边缘计算平台的正常运行至关重要，而计算卸载等下游决策任务需要使用DVFS等节能方法来预测功耗。主要问题是如何利用DVFS快速开发出一个更好的NP硬功率高效调度问题的解决方案。因此，通过对SMD上的对齐任务进行分段，我们提出了一种基于分段的分析方法。此外，我们还提供了一种基于分段的调度算法（SEDF），该算法的灵感来自于基于分段的分析方法，以实现这些实时工作负载的节能调度。这种基于分段的方法产生了功耗界限（PB），并开发了一个计算卸载用例来演示PB在后续决策过程中的应用。模拟和实际设备测试都用于确认PB、SEDF和卸载决策的有效性。我们从经验上证明了PB可以用于在涉及计算卸载的决策问题中做出近似最优决策。SEDF是一种简单有效的调度方法，可以将多核SMD的功耗降低约30%。

{"title":"Segment based power-efficient scheduling for real-time DAG tasks on edge devices","authors":"Lei Yu , Tianqi Zhong , Peng Bi , Lan Wang , Fei Teng","doi":"10.1016/j.parco.2023.103022","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103022","url":null,"abstract":"<div><p><span>Smart Mobile Devices<span><span><span> (SMDs) are crucial for the edge computing paradigm’s real-world sensing. Real-time applications, which are computationally intensive and periodic with strict time constraints, can typically be used to replicate real-world sensing. Such applications call for increased processing speed, memory capacity, and battery life on SMDs, which are typically resource-constrained due to physical size restrictions. As a result, scheduling real-time applications for SMDs that are power efficient is crucial for the regular operation of edge computing platforms, and downstream decision-making tasks like </span>computation offloading require the prediction of </span>power consumption using power-saving approaches like DVFS. The main question is how to swiftly develop a better solution to the NP-Hard power efficient scheduling problem with DVFS. Thus, by segmenting the aligned tasks on an SMD, we present a segment-based analysis approach. Additionally, we offer a segment-based </span></span>scheduling algorithm (SEDF) that draws inspiration from the segment-based analysis approach to achieve power-efficient scheduling for these real-time workloads. This segment-based approach yields a power consumption bound (PB), and a computation offloading use case is developed to demonstrate the application of PB in the subsequent decision-making processes. Both simulations and actual device tests are used to confirm the PB, SEDF, and the effectiveness of offloading decision-making. We demonstrate empirically that PB can be utilized to make approximative optimal decisions in decision-making problems involving computation offloading. SEDF is a straightforward and effective scheduling approach that can cut the power consumption of a multi-core SMD by roughly 30%.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 103022"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49728378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient checkpoint/Restart of CUDA applications 有效的检查点/重新启动CUDA应用程序

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103018

Akira Nukada , Taichiro Suzuki , Satoshi Matsuoka

We present NVCR which enables transparent checkpoint and restart of CUDA applications. NVCR, works as an extension of major system-level checkpoint software such as BLCR and DMTCP, employs proxy-process and application accesses GPU devices via the proxy-process to improve the compatibility with latest CUDA runtime software. To reduce the overhead of inter-process communications, NVCR efficiently uses SYSV IPC shared memory as CUDA pinned memory. Performance evaluations using micro benchmarks and Amber as a real application show that NVCR’ overhead is acceptably low.

我们介绍了NVCR，它可以实现CUDA应用程序的透明检查点和重启。NVCR是BLCR和DMTCP等主要系统级检查点软件的扩展，采用代理进程，应用程序通过代理进程访问GPU设备，以提高与最新CUDA运行时软件的兼容性。为了减少进程间通信的开销，NVCR有效地使用SYSV IPC共享内存作为CUDA固定内存。使用微基准测试和Amber作为实际应用程序进行的性能评估表明，NVCR的开销低得可以接受。

引用次数: 0

GPU acceleration of Levenshtein distance computation between long strings 长字符串间Levenshtein距离计算的GPU加速

IF 1.4 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Parallel Computing

Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103019

David Castells-Rufas

Computing edit distance for very long strings has been hampered by quadratic time complexity with respect to string length. The WFA algorithm reduces the time complexity to a quadratic factor with respect to the edit distance between the strings. This work presents a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains. The implementation allows to address the computation of the edit distance between strings having hundreds of millions of characters. The performance of the algorithm depends on the similarity between the strings. For strings longer than million characters, the performance is the best ever reported, which is above TCUPS for strings with similarities greater than 70% and above one hundred TCUPS for 99.9% similarity.

计算超长字符串的编辑距离一直受到字符串长度的二次时间复杂性的阻碍。WFA算法将时间复杂度降低到相对于字符串之间的编辑距离的二次因子。这项工作介绍了WFA算法的GPU实现和一种新的优化，该优化可以将要计算的元素减半，从而提供额外的性能增益。该实现允许处理具有数亿个字符的字符串之间的编辑距离的计算。算法的性能取决于字符串之间的相似性。对于长度超过百万个字符的字符串，性能是有史以来最好的，对于相似性大于70%的字符串，其性能高于TCUPS，对于99.9%的相似性，其性能超过100 TCUPS。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Parallel Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀