Parallel Process. Lett.最新文献

英文中文

A Note on the Steiner k-Diameter of Tensor Product Networks 关于张量积网络的Steiner k-直径的一个注记

Parallel Process. Lett.

Pub Date : 2019-06-01 DOI: 10.1142/S0129626419500087

Pranav Arunandhi, E. Cheng, Christopher Melekian

Given a graph [Formula: see text] and [Formula: see text], the Steiner distance [Formula: see text] is the minimum size among all connected subgraphs of [Formula: see text] whose vertex sets contain [Formula: see text]. The Steiner [Formula: see text]-diameter [Formula: see text] is the maximum value of [Formula: see text] among all sets of [Formula: see text] vertices. In this short note, we study the Steiner [Formula: see text]-diameters of the tensor product of complete graphs.

给定一个图[公式:见文]和[公式:见文]，斯坦纳距离[公式:见文]是[公式:见文]的所有连通子图(其顶点集包含[公式:见文])的最小尺寸。Steiner[公式:见文]-diameter[公式:见文]是所有[公式:见文]顶点集合中[公式:见文]的最大值。在这篇短文中，我们研究了Steiner[公式:见原文]-完全图张量积的直径。

引用次数: 1

The Generalized Connectivity of Data Center Networks 数据中心网络的广义连通性

Parallel Process. Lett.

Pub Date : 2019-06-01 DOI: 10.1142/S0129626419500075

Chen Hao, Weihua Yang

The generalized [Formula: see text]-connectivity of a graph [Formula: see text] is a parameter that can measure the reliability of a network [Formula: see text] to connect any [Formula: see text] vertices in [Formula: see text], which is a generalization of traditional connectivity. Let [Formula: see text] and [Formula: see text] denote the maximum number [Formula: see text] of edge-disjoint trees [Formula: see text] in [Formula: see text] such that [Formula: see text] for any [Formula: see text] and [Formula: see text]. For an integer [Formula: see text] with [Formula: see text], the generalized [Formula: see text]-connectivity of a graph [Formula: see text] is defined as [Formula: see text] and [Formula: see text]. Data centers are essential to the business of companies such as Google, Amazon, Facebook and Microsoft et al. Based on data centers, the data center networks [Formula: see text], introduced by Guo et al. in 2008, have many desirable properties. In this paper, we study the generalized [Formula: see text]-connectivity of [Formula: see text] and show that [Formula: see text] for [Formula: see text] and [Formula: see text].

广义[公式:见文]-图的连通性[公式:见文]是衡量网络[公式:见文]连接[公式:见文]中任意[公式:见文]顶点的可靠性的参数，是对传统连通性的一般化。令[公式:见文]和[公式:见文]表示[公式:见文]中边不相交树[公式:见文]的最大数目[公式:见文]，使得[公式:见文]对于任何[公式:见文]和[公式:见文]。对于具有[公式:见文]的整数[公式:见文]，将图[公式:见文]的广义[公式:见文]-连通性定义为[公式:见文]和[公式:见文]。数据中心对谷歌、亚马逊、Facebook和微软等公司的业务至关重要。基于数据中心，郭等人在2008年提出的数据中心网络[公式:见文本]具有许多理想的特性。本文研究了[公式:见文]的广义[公式:见文]-连通性，并证明了[公式:见文]和[公式:见文]的[公式:见文]。

引用次数: 5

Round Robin Thread Selection Optimization in Multithreaded Processors 多线程处理器中的轮循线程选择优化

Parallel Process. Lett.

Pub Date : 2019-05-10 DOI: 10.1142/S0129626419500038

Shane Carroll, Wei-Ming Lin

We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.

为了提高系统吞吐量和资源分配的公平性，我们提出了一种在多线程管道中进行循环排序的方法。我们表明，使用具有典型任意顺序的轮循会导致共享资源的低效使用和随后的线程饥饿。为了解决这个问题，但仍然使用简单的轮询方法，我们在运行时周期性地对轮询顺序进行优化和动态排序。我们表明，对于4线程工作负载，通过在运行时对线程顺序进行排序，吞吐量可以提高9%以上，协调吞吐量可以提高3%以上。我们对管道的多个阶段进行了实验，并在使用SPEC CPU 2006基准测试的几个实验中显示出一致的结果。此外，由于该技术仍然是一个简单的轮询，因此提高的性能只需要很少的开销来实现。

引用次数: 0

Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs gpu集群上高效的代数多网格预处理

Parallel Process. Lett.

Pub Date : 2019-05-10 DOI: 10.1142/S0129626419500014

A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone

Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.

许多科学应用需要使用Krylov子空间方法求解大型稀疏线性方程组;在这种情况下，有效预条件的选择可能对克雷洛夫解算器的收敛性至关重要。代数多重网格(algeaic MultiGrid, AMG)方法由于其最优的计算成本和算法可扩展性而被广泛应用于预处理。gpu的广泛使用，现在在许多最快的超级计算机中都可以找到，这就提出了在高吞吐量处理器上有效实现这些方法的问题。在这项工作中，我们专注于AMG预调节器的应用阶段，特别是能够利用gpu集群计算能力的平滑和粗糙级解算器的选择和实现。我们在求解阶段使用与局部块相关的稀疏近似逆来考虑块jacobi平滑。选择近似逆而不是稀疏矩阵分解是由于与gpu上的大型三角形系统的解相比，矩阵向量乘积暴露出大量的并行性。选择的平滑器和求解器在MLD2P4库提供的AMG预处理框架内实现，使用来自PSBLAS库的合适的稀疏矩阵数据结构。他们的行为在执行速度和可扩展性方面得到了说明，在一个关于地下水建模的测试用例中，由地平线2020项目EoCoE中的j lich超级计算中心提供。

{"title":"Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs","authors":"A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone","doi":"10.1142/S0129626419500014","DOIUrl":"https://doi.org/10.1142/S0129626419500014","url":null,"abstract":"Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"93 Suppl 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128836650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Efficient Communication Induced Checkpointing Protocol for Broadcast Network-based Distributed Systems 基于广播网络的分布式系统的高效通信诱导检查点协议

Parallel Process. Lett.

Pub Date : 2019-05-10 DOI: 10.1142/S012962641950004X

Jinho Ahn

This paper proposes an enhanced Fully Informed Communication-Induced Checkpointing (FI-CIC) protocol to highly improve the possibility of detecting Z-cycle free patterns with no extra control message by utilizing the advantageous feature of the broadcast network in an effective way compared with the original FI-CIC protocol. Experimental results show that our protocol outperforms the previous one in terms of the number of forced checkpoints per process.

本文提出了一种增强的完全知情通信诱导检查点(FI-CIC)协议，与原有的FI-CIC协议相比，利用广播网络的优势，有效地提高了在没有额外控制消息的情况下检测z循环自由模式的可能性。实验结果表明，我们的协议在每个进程的强制检查点数量方面优于先前的协议。

引用次数: 2

Implementing ♢P with Bounded Messages on a Network of ADD Channels 在ADD通道网络上实现有界消息的招收P

Parallel Process. Lett.

Pub Date : 2019-05-10 DOI: 10.1142/S0129626419500026

Saptaparni Kumar, J. Welch

We present an implementation of the eventually perfect failure detector [Formula: see text] from the original hierarchy of the Chandra-Toueg [3] oracles on an arbitrary partitionable network composed of unreliable channels that can lose and reorder messages. Prior implementations of [Formula: see text] have assumed different partially synchronous models ranging from bounded point-to-point message delay and reliable communication to unbounded message size and known network topologies. We implement [Formula: see text] under very weak assumptions on an arbitrary, partitionable network composed of Average Delayed/Dropped (ADD) channels [11] to model unreliable communication. Unlike older implementations, our failure detection algorithm uses bounded-sized messages to eventually detect all nodes that are unreachable (crashed or disconnected) from it.

我们提出了一个最终完美的故障检测器的实现，它来自Chandra-Toueg[3]预言机的原始层次结构，在一个由可能丢失和重新排序消息的不可靠通道组成的任意可分区网络上。先前的实现[公式:见文本]假设了不同的部分同步模型，从有界的点对点消息延迟和可靠通信到无界的消息大小和已知的网络拓扑。我们在一个由平均延迟/丢弃(ADD)信道组成的任意可分割网络[11]上，在非常弱的假设下实现[公式:见文本]，以模拟不可靠的通信。与旧的实现不同，我们的故障检测算法使用有限大小的消息来最终检测所有无法到达(崩溃或断开)的节点。

引用次数: 6

Optimizing Data Intensive Flows for Networks on Chips 优化芯片上网络的数据密集型流

Parallel Process. Lett.

Pub Date : 2018-12-18 DOI: 10.1142/S0129626421500134

Junwei Zhang, Yang Liu, Shi Li, T. Robertazzi

A novel framework is proposed to find efficient data intensive flow distributions on Networks on Chip (NoC). Voronoi diagram techniques are used to divide a NoC array of homogeneous processors and links into clusters. A new mathematical tool, named the flow matrix, is proposed to find the optimal flow distribution for individual clusters. Individual flow distributions on clusters are reconciled to be more evenly distributed. This leads to an efficient makespan and a significant savings in the number of cores actually used. The approach here is described in terms of a mesh interconnection but is suitable for other interconnection topologies.

提出了一种在芯片网络(NoC)上寻找高效数据密集型流分布的新框架。Voronoi图技术用于将同构处理器和链接的NoC阵列划分为集群。提出了一种新的数学工具，称为流量矩阵，用于寻找单个簇的最优流量分布。单个流量在集群上的分布被调和为更均匀的分布。这将导致有效的makespan，并显著节省实际使用的内核数量。这里的方法是根据网状互连来描述的，但也适用于其他互连拓扑。

引用次数: 3

Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution 基于共轭梯度粗网格解的多网格求解器可重构硬件生成

Parallel Process. Lett.

Pub Date : 2018-12-01 DOI: 10.1142/S0129626418500160

Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig

Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.

不仅在高性能计算(HPC)领域，现场可编程门阵列(fpga)也是一种迅速流行的加速器技术。然而，与中央处理单元(cpu)甚至图形处理单元(gpu)相比，它们使用了完全不同的编程范式和工具集，增加了额外的开发步骤并需要专门的知识，阻碍了科学计算的广泛应用。为了弥补这种可编程性差距，领域特定语言(dsl)是一种流行的选择，用于从抽象算法描述生成低级实现。在这项工作中，我们展示了基于fpga的多网格方法从相同的代码库生成数值求解器实现的方法，该代码库也用于使用MPI和OpenMP的混合并行化为cpu生成代码。我们的方法产生了一种硬件设计，可以在输入网格大小为4096的情况下每秒计算多达11个v周期，并且在中程FPGA上使用共轭梯度(CG)方法进行最粗略的解决方案，胜过在英特尔至强处理器上的矢量化多线程执行。

{"title":"Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution","authors":"Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig","doi":"10.1142/S0129626418500160","DOIUrl":"https://doi.org/10.1142/S0129626418500160","url":null,"abstract":"Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Regular Connected Bipancyclic Spanning Subgraphs of Torus Networks 环面网络的正则连通双环生成子图

Parallel Process. Lett.

Pub Date : 2018-12-01 DOI: 10.1142/S0129626418500135

M. Lu, Shurong Zhang, Weihua Yang

It is well known that an [Formula: see text]-dimensional torus [Formula: see text] is Hamiltonian. Then the torus [Formula: see text] contains a spanning subgraph which is 2-regular and 2-connected. In this paper, we explore a strong property of torus networks. We prove that for any even integer [Formula: see text] with [Formula: see text], the torus [Formula: see text] contains a spanning subgraph which is [Formula: see text]-regular, k-connected and bipancyclic; and if [Formula: see text] is odd, the result holds when some [Formula: see text] is even.

众所周知，一个[公式:见文]维环面[公式:见文]是哈密顿的。那么环面[公式:见正文]包含一个2正则2连通的生成子图。本文探讨环面网络的一个强性质。我们用[公式:见文]证明了对于任意偶数[公式:见文]，环面[公式:见文]包含一个生成子图，该子图为[公式:见文]-正则，k连通，双环;如果[Formula: see text]是奇数，当[Formula: see text]是偶数时，结果成立。

引用次数: 1

Fractional Matching Preclusion for (n, k)-Star Graphs (n, k)-星图的分数匹配排除

Parallel Process. Lett.

Pub Date : 2018-12-01 DOI: 10.1142/S0129626418500172

Tianlong Ma, Y. Mao, E. Cheng, Jinling Wang

The matching preclusion number of a graph is the minimum number of edges whose deletion results in a graph that has neither perfect matchings nor almost perfect matchings. As a generalization, Liu and Liu introduced the concept of fractional matching preclusion number in 2017. The Fractional Matching Preclusion Number (FMP number) of G is the minimum number of edges whose deletion leaves the resulting graph without a fractional perfect matching. The Fractional Strong Matching Preclusion Number (FSMP number) of G is the minimum number of vertices and/or edges whose deletion leaves the resulting graph without a fractional perfect matching. In this paper, we obtain the FMP number and the FSMP number for (n, k)-star graphs. In addition, all the optimal fractional strong matching preclusion sets of these graphs are categorized.

图的匹配排除数是图的最小边数，删除这些边会导致图既不存在完美匹配，也不存在几乎完美匹配。作为推广，Liu和Liu在2017年引入了分数匹配排除数的概念。G的分数匹配排除数(FMP数)是删除后的图中没有分数完美匹配的最小边数。G的分数阶强匹配排除数(FSMP数)是其删除使结果图没有分数阶完美匹配的顶点和/或边的最小数量。本文得到了(n, k)-星图的FMP数和FSMP数。此外，对这些图的所有最优分数型强匹配排除集进行了分类。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Parallel Process. Lett.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀