首页 > 最新文献

Computing Systems in Engineering最新文献

英文 中文
A block-circulant preconditioner for domain decomposition algorithm for the solution of the elliptic problems by second order finite elements 二阶有限元求解椭圆型问题的区域分解算法的块循环预条件
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00039-9
B. Kiss, G. Molnárka

A preconditioned conjugate gradient domain decomposition method was given Refs 1 and 2 for the solution of a system of linear equations arising in the finite element method applied to the elliptic Dirichlet, Neumann and mixed boundary value problems. We have proved that the construction can be generalized2 for higher order finite element method. Here we give a construction and theoretical investigation of preconditioners for second order finite elements. A method and the the results of calculation is given. The presented numerical experiments show that this preconditioner works well.

本文给出了求解椭圆型Dirichlet、Neumann和混合边值问题中出现的线性方程组的一种预条件共轭梯度域分解方法(参1和2)。证明了这种构造可以推广到高阶有限元法中。本文给出了二阶有限元预调节器的构造和理论研究。给出了一种方法和计算结果。数值实验表明,该预调节器工作良好。
{"title":"A block-circulant preconditioner for domain decomposition algorithm for the solution of the elliptic problems by second order finite elements","authors":"B. Kiss,&nbsp;G. Molnárka","doi":"10.1016/0956-0521(95)00039-9","DOIUrl":"10.1016/0956-0521(95)00039-9","url":null,"abstract":"<div><p>A preconditioned conjugate gradient domain decomposition method was given Refs 1 and 2 for the solution of a system of linear equations arising in the finite element method applied to the elliptic Dirichlet, Neumann and mixed boundary value problems. We have proved that the construction can be generalized<sup>2</sup> for higher order finite element method. Here we give a construction and theoretical investigation of preconditioners for second order finite elements. A method and the the results of calculation is given. The presented numerical experiments show that this preconditioner works well.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 369-376"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00039-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90824717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualising parallel numerical software performance on a shared memory multiprocessor 共享内存多处理器上并行数值软件性能的可视化
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00034-8
Pete Lee, Chris Phillips

We consider here the use of a software package which can be used to monitor and visualise the behaviour, with respect to data accesses, of parallel software in a multiprocessor environment supporting shared memory. The purpose of such monitoring is two-fold: to aid in the understanding of the behaviour of a given algorithm, and to support the debugging of that software. To illustrate the use of this package we analyse its facilities in connection with a parallel implementation of block Gaussian elimination to solve a system of equations which arises when a certain spectral method is employed to solve an elliptic partial differential equation in two dimensions. We outline the method, indicate the synchronisation mechanisms which are necessary to ensure that the correct sequence of operations take place, and briefly describe the facilities provided by Encore Parallel Fortran which support these mechanisms. We then examine the facilities of the visualisation software and indicate how these were adapted to monitor accesses to a packed storage representation of a block sparse array. Finally we illustrate the use of the software in the context of the solution of a particular partial differential equation.

我们在这里考虑使用一个软件包,该软件包可用于监控和可视化在支持共享内存的多处理器环境中并行软件的数据访问行为。这种监控的目的是双重的:帮助理解给定算法的行为,并支持该软件的调试。为了说明这个包的使用,我们分析了它的功能与并行实现块高斯消去来解决当采用某种谱方法来解决二维椭圆偏微分方程时出现的方程组。我们概述了该方法,指出了同步机制,这是确保正确的操作顺序发生所必需的,并简要描述了支持这些机制的Encore Parallel Fortran提供的设施。然后,我们检查了可视化软件的功能,并指出如何调整这些功能以监控对块稀疏数组的打包存储表示的访问。最后,我们举例说明了该软件在求解特定偏微分方程中的应用。
{"title":"Visualising parallel numerical software performance on a shared memory multiprocessor","authors":"Pete Lee,&nbsp;Chris Phillips","doi":"10.1016/0956-0521(95)00034-8","DOIUrl":"10.1016/0956-0521(95)00034-8","url":null,"abstract":"<div><p>We consider here the use of a software package which can be used to monitor and visualise the behaviour, with respect to data accesses, of parallel software in a multiprocessor environment supporting shared memory. The purpose of such monitoring is two-fold: to aid in the understanding of the behaviour of a given algorithm, and to support the debugging of that software. To illustrate the use of this package we analyse its facilities in connection with a parallel implementation of block Gaussian elimination to solve a system of equations which arises when a certain spectral method is employed to solve an elliptic partial differential equation in two dimensions. We outline the method, indicate the synchronisation mechanisms which are necessary to ensure that the correct sequence of operations take place, and briefly describe the facilities provided by Encore Parallel Fortran which support these mechanisms. We then examine the facilities of the visualisation software and indicate how these were adapted to monitor accesses to a packed storage representation of a block sparse array. Finally we illustrate the use of the software in the context of the solution of a particular partial differential equation.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 351-356"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00034-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72963611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A parallel implementation of an interactive ray-tracing algorithm 交互式光线跟踪算法的并行实现
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00037-2
A.Augusto Sousa, F.Nunes Ferreira

One of the most-used rendering algorithms in Computer Graphics is the Ray-Tracing. The “standard” (Whited like) Ray-Tracing is a good rendering algorithm but with a drawback: the time necessary to produce an image is too large (several hours of CPU time are necessary to make a good picture of a moderately sophisticated 3D scene) and the image is only ready to be observed at the end of processing. This kind of situation is difficult to accept in systems where interactivity is the first goal. “Increasing Realism” in Ray-Tracing tries to avoid the problem by supplying the user with a preview of the final image. This preview can be calculated in a considerably shorter time but permits that, with some margin of error, the user can imagine (even see, sometimes) some final effects. With more processing time the image quality continues improving without loss of previous results. The user can, at any time, interrupt the session if the image does not match what he wants. Simultaneously with the above idea, it is necessary to accelerate image production. Parallelism is then justified by the need of more processing power. The aim of this text is to describe the Interactive Ray-Tracing Algorithm implementation, using a parallel architecture based on Transputers. An overview of the architecture used is presented and the main parallel processes and related problems are discussed.

光线追踪是计算机图形学中最常用的渲染算法之一。“标准”(像白色一样)光线追踪是一种很好的渲染算法,但有一个缺点:生成图像所需的时间太长(需要几个小时的CPU时间来制作一个中等复杂的3D场景的好图片),并且图像只有在处理结束时才能被观察到。在以交互性为首要目标的系统中,这种情况很难被接受。光线追踪中的“增加真实感”试图通过向用户提供最终图像的预览来避免这个问题。这个预览可以在相当短的时间内计算出来,但允许用户在有一定误差的情况下想象(有时甚至可以看到)一些最终效果。随着更多的处理时间,图像质量不断提高,而不会失去以前的结果。如果图像与用户想要的不匹配,用户可以随时中断会话。与上述理念同时,有必要加快图像制作。然后,需要更多的处理能力来证明并行性是合理的。本文的目的是描述交互式光线追踪算法的实现,使用基于Transputers的并行架构。概述了所使用的体系结构,并讨论了主要的并行进程和相关问题。
{"title":"A parallel implementation of an interactive ray-tracing algorithm","authors":"A.Augusto Sousa,&nbsp;F.Nunes Ferreira","doi":"10.1016/0956-0521(95)00037-2","DOIUrl":"10.1016/0956-0521(95)00037-2","url":null,"abstract":"<div><p>One of the most-used rendering algorithms in Computer Graphics is the Ray-Tracing. The “standard” (Whited like) Ray-Tracing is a good rendering algorithm but with a drawback: the time necessary to produce an image is too large (several hours of CPU time are necessary to make a good picture of a moderately sophisticated 3D scene) and the image is only ready to be observed at the end of processing. This kind of situation is difficult to accept in systems where interactivity is the first goal. “Increasing Realism” in Ray-Tracing tries to avoid the problem by supplying the user with a preview of the final image. This preview can be calculated in a considerably shorter time but permits that, with some margin of error, the user can imagine (even see, sometimes) some final effects. With more processing time the image quality continues improving without loss of previous results. The user can, at any time, interrupt the session if the image does not match what he wants. Simultaneously with the above idea, it is necessary to accelerate image production. Parallelism is then justified by the need of more processing power. The aim of this text is to describe the Interactive Ray-Tracing Algorithm implementation, using a parallel architecture based on Transputers. An overview of the architecture used is presented and the main parallel processes and related problems are discussed.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 409-414"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00037-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85788527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The effects on communication of data representation of nested preconditionings for massively parallel architectures 大规模并行体系结构中嵌套预处理对数据表示的影响
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00032-1
J.C. Díaz, F. Pradeau

The effect which the representation of the data (matrices and vectors) has on the communication patterns of preconditionings for exploitation of massively parallel architectures is discussed. Preconditioned iterative methods are used to solve the sparse linear systems generated by discretizations of partial differential equations in many areas of science and engineering. The preconditionings considered are based on nested incomplete factorization with approximate tridiagonal inverses using a two color line ordering of the discretization grid. These preconditionings can be described in terms of vector-vector to vector operations of dimension equal to half the total number of grid points.

讨论了数据(矩阵和向量)的表示方式对大规模并行架构中使用的前提条件通信模式的影响。在科学和工程的许多领域,预条件迭代法被用于求解由偏微分方程离散化产生的稀疏线性系统。所考虑的先决条件是基于嵌套的不完全分解,具有近似的三对角逆,使用离散网格的两色线排序。这些前提条件可以用维数等于网格点总数一半的向量-向量到向量操作来描述。
{"title":"The effects on communication of data representation of nested preconditionings for massively parallel architectures","authors":"J.C. Díaz,&nbsp;F. Pradeau","doi":"10.1016/0956-0521(95)00032-1","DOIUrl":"10.1016/0956-0521(95)00032-1","url":null,"abstract":"<div><p>The effect which the representation of the data (matrices and vectors) has on the communication patterns of preconditionings for exploitation of massively parallel architectures is discussed. Preconditioned iterative methods are used to solve the sparse linear systems generated by discretizations of partial differential equations in many areas of science and engineering. The preconditionings considered are based on nested incomplete factorization with approximate tridiagonal inverses using a two color line ordering of the discretization grid. These preconditionings can be described in terms of <em>vector-vector to vector</em> operations of dimension equal to half the total number of grid points.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 437-441"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00032-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75600762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Concurrent attribute evaluation 并发属性求值
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00028-3
João Saraiva, Pedro Henriques

This text presents an implementation of a concurrent attribute evaluator system. This system was developed with the main objective of allowing the implementation of several strategies of concurrent attribute evaluation and not to build a faster compiler to a specific case. The system is implemented in a tightly-coupled machine. One realistic compiler was built and the first results are discussed.

本文给出了一个并发属性求值器系统的实现。开发该系统的主要目标是允许实现几种并发属性求值策略,而不是针对特定情况构建更快的编译器。该系统是在紧耦合机器中实现的。建立了一个真实的编译器,并讨论了第一个结果。
{"title":"Concurrent attribute evaluation","authors":"João Saraiva,&nbsp;Pedro Henriques","doi":"10.1016/0956-0521(95)00028-3","DOIUrl":"10.1016/0956-0521(95)00028-3","url":null,"abstract":"<div><p>This text presents an implementation of a concurrent attribute evaluator system. This system was developed with the main objective of allowing the implementation of several strategies of concurrent attribute evaluation and not to build a faster compiler to a specific case. The system is implemented in a tightly-coupled machine. One realistic compiler was built and the first results are discussed.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 451-457"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00028-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80472537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance of a QR algorithm implementation on a multicluster of transputers 在多集群转发器上实现QR算法的性能
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00025-9
Fernando José Ferreira , Paulo B. Vasconcelos , Filomena D. d'Almeida

Some results of an implementation of the QR factorization by Householder reflectors, on a multicluster transputer system with distributed memory are presented, that show how important is the communication time between processor in the performance of the algorithm. The QR factorization was chosen as test method because it is required for many real life applications, for instance in least squares problems. We use a version of Householder transformation that is the basis for numerically stable QR factorization. The machine used was the MultiCluster 2 model of Parsytec which is distributed memory system with 16 Inmos T800 processors. The Helios operating system was chosen because it provides transparency in CPU management. However it limits the sets of connecting topologies to be used. The results are presented in terms of speedup and efficiency, showing the importance of the communication time on the total elapsed time.

本文给出了一个基于Householder反射器的QR分解算法在多集群分布式存储系统上的实现结果,说明了处理器间的通信时间对算法性能的重要性。选择QR分解作为测试方法是因为它在许多实际应用中是必需的,例如在最小二乘问题中。我们使用Householder变换的一个版本,它是数字稳定QR分解的基础。使用的机器是Parsytec的MultiCluster 2模型,这是一个带有16个Inmos T800处理器的分布式内存系统。之所以选择Helios操作系统,是因为它提供了CPU管理的透明性。但是,它限制了要使用的连接拓扑集。结果以加速和效率的形式呈现,显示了通信时间对总运行时间的重要性。
{"title":"Performance of a QR algorithm implementation on a multicluster of transputers","authors":"Fernando José Ferreira ,&nbsp;Paulo B. Vasconcelos ,&nbsp;Filomena D. d'Almeida","doi":"10.1016/0956-0521(95)00025-9","DOIUrl":"10.1016/0956-0521(95)00025-9","url":null,"abstract":"<div><p>Some results of an implementation of the QR factorization by Householder reflectors, on a multicluster transputer system with distributed memory are presented, that show how important is the communication time between processor in the performance of the algorithm. The QR factorization was chosen as test method because it is required for many real life applications, for instance in least squares problems. We use a version of Householder transformation that is the basis for numerically stable QR factorization. The machine used was the MultiCluster 2 model of Parsytec which is distributed memory system with 16 Inmos T800 processors. The Helios operating system was chosen because it provides transparency in CPU management. However it limits the sets of connecting topologies to be used. The results are presented in terms of speedup and efficiency, showing the importance of the communication time on the total elapsed time.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 363-367"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00025-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77247019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory optimization for parallel functional programs 并行函数程序的内存优化
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00030-5
Balaram Sinharoy , Boleslaw Szymanski

Parallel functional languages use single valued variables to avoid semantically irrelevant data dependence constraints. Programs containing iterations that redefine variables in a procedural language have the corresponding variables declared with additional dimensions in a single assignment language. This extra temporal dimension, unless optimized, requires an exorbitant amount of memory and in parallel programs imposes a large delay between the data producer and consumers. For certain loop arrangements, a window containing a few elements of the dimension can be created. Usually, there are many ways for defining a loop arrangement in an implementation of a functional program and a trade-off between the memory saving and the needed level of parallelism has to be taken into account when selecting the implementation. In this paper we prove that the problem of determining the best loop arrangement by partitioning the dependence graph is NP-hard. In addition, we describe a heuristic for solving this problem. Finally, we present examples of parallel functional programs in which the memory optimization results in reducing the local and shared memory requirements and communication delays.

并行函数式语言使用单值变量来避免语义上不相关的数据依赖约束。包含在过程语言中重新定义变量的迭代的程序具有在单一赋值语言中声明带有附加维度的相应变量。除非进行优化,否则这个额外的时间维度需要大量的内存,并且在并行程序中会在数据生产者和消费者之间施加很大的延迟。对于某些循环安排,可以创建包含该维度的几个元素的窗口。通常,在函数式程序的实现中有许多定义循环安排的方法,在选择实现时必须考虑在内存节省和所需的并行性级别之间进行权衡。本文证明了通过划分依赖图来确定最佳环路排列的问题是np困难的。此外,我们还描述了一种求解该问题的启发式算法。最后,我们给出了并行函数程序的示例,其中内存优化可以减少本地和共享内存需求以及通信延迟。
{"title":"Memory optimization for parallel functional programs","authors":"Balaram Sinharoy ,&nbsp;Boleslaw Szymanski","doi":"10.1016/0956-0521(95)00030-5","DOIUrl":"10.1016/0956-0521(95)00030-5","url":null,"abstract":"<div><p>Parallel functional languages use single valued variables to avoid semantically irrelevant data dependence constraints. Programs containing iterations that redefine variables in a procedural language have the corresponding variables declared with additional dimensions in a single assignment language. This extra temporal dimension, unless optimized, requires an exorbitant amount of memory and in parallel programs imposes a large delay between the data producer and consumers. For certain loop arrangements, a window containing a few elements of the dimension can be created. Usually, there are many ways for defining a loop arrangement in an implementation of a functional program and a trade-off between the memory saving and the needed level of parallelism has to be taken into account when selecting the implementation. In this paper we prove that the problem of determining the best loop arrangement by partitioning the dependence graph is NP-hard. In addition, we describe a heuristic for solving this problem. Finally, we present examples of parallel functional programs in which the memory optimization results in reducing the local and shared memory requirements and communication delays.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 415-422"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00030-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80862790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient solution of fluid flow using the generalised conjugate grandient algorithm on a transputer-based machine 在基于传输器的机器上应用广义共轭扩张算法求解流体流动
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00026-7
B.A. Tanyi, R.W. Thatcher

The discretisation of the equations governing fluid flow gives rise to coupled, quasi-linear and non-symmetric systems. The solution is usually obtained by iteration using a guess-and-correct procedure where each iteration aims to improve the solution of the previous step. Each step or outer iteration of the process involves the solution of nominally linear algebraic systems. These systems are normally solved using methods based on the Gauss-Seidel iteration—such as the TDMA. However, these methods generally converge very slowly and can be very time consuming for realistic applications. In this paper, these equations are solved using the Generalised Conjugate Gradient (GCG) algorithm with a simple-to-implement Gauss-Seidel-based preconditioner on a distributed memory message-passing machine. We take advantage of the fact that only tentative improvements to the flow-field are sought during each iteration and study the convergence behaviour of the parallel implementation on a multi-processor environment.

控制流体流动的方程的离散化产生耦合的、准线性的和非对称的系统。解决方案通常是通过使用猜测和正确的过程迭代获得的,其中每次迭代的目的是改进前一步的解决方案。该过程的每一步或外部迭代都涉及到名义上线性代数系统的解。这些系统通常使用基于高斯-塞德尔迭代的方法来解决,例如TDMA。然而,这些方法通常收敛速度很慢,并且在实际应用中非常耗时。本文在一个分布式内存消息传递机上,利用广义共轭梯度(GCG)算法和一个简单实现的基于gauss - seidel的预条件来求解这些方程。利用每次迭代只对流场进行尝试性改进的特点,研究了并行实现在多处理器环境下的收敛行为。
{"title":"Efficient solution of fluid flow using the generalised conjugate grandient algorithm on a transputer-based machine","authors":"B.A. Tanyi,&nbsp;R.W. Thatcher","doi":"10.1016/0956-0521(95)00026-7","DOIUrl":"10.1016/0956-0521(95)00026-7","url":null,"abstract":"<div><p>The discretisation of the equations governing fluid flow gives rise to coupled, quasi-linear and non-symmetric systems. The solution is usually obtained by iteration using a guess-and-correct procedure where each iteration aims to improve the solution of the previous step. Each step or outer iteration of the process involves the solution of nominally linear algebraic systems. These systems are normally solved using methods based on the Gauss-Seidel iteration—such as the TDMA. However, these methods generally converge very slowly and can be very time consuming for realistic applications. In this paper, these equations are solved using the Generalised Conjugate Gradient (GCG) algorithm with a simple-to-implement Gauss-Seidel-based preconditioner on a distributed memory message-passing machine. We take advantage of the fact that only tentative improvements to the flow-field are sought during each iteration and study the convergence behaviour of the parallel implementation on a multi-processor environment.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 319-324"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00026-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80074528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aurora vs. Muse: a portability study of two or-parallel Prolog systems Aurora vs. Muse:两个或并行Prolog系统的可移植性研究
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00042-9
Manuel Eduardo Correia , Fernando M.A. Silva, Vítor Santos Costa

Prolog programs have explicit parallelism, that is, parallelism which can be exploited by a machine with minimal user effort. Or-parallelism is one such form of parallelism, and is particularly useful in that it is present in the many Prolog applications where several alternatives need to be considered. Or-parallelism has been exploited successfully in several systems, and especially in the Aurora and Muse systems. In this paper we analyze the portability of these two parallel systems onto a commercial shared memory parallel computer, a Sun SPARCcenter 2000 with 8 processors, running the Solaris 2.2 Operating System. We also analyze both systems' performance for classical benchmark programs and for two large Prolog applications.

Prolog程序具有显式的并行性,也就是说,机器可以用最少的用户努力来利用并行性。Or-parallelism是并行的一种形式,它在许多需要考虑多种替代方案的Prolog应用程序中特别有用。Or-parallelism已经在几个系统中得到了成功的利用,尤其是在Aurora和Muse系统中。在本文中,我们分析了这两个并行系统在商用共享内存并行计算机上的可移植性,这台计算机是一台带有8个处理器的Sun SPARCcenter 2000,运行Solaris 2.2操作系统。我们还分析了这两个系统在经典基准程序和两个大型Prolog应用程序中的性能。
{"title":"Aurora vs. Muse: a portability study of two or-parallel Prolog systems","authors":"Manuel Eduardo Correia ,&nbsp;Fernando M.A. Silva,&nbsp;Vítor Santos Costa","doi":"10.1016/0956-0521(95)00042-9","DOIUrl":"10.1016/0956-0521(95)00042-9","url":null,"abstract":"<div><p>Prolog programs have explicit parallelism, that is, parallelism which can be exploited by a machine with minimal user effort. Or-parallelism is one such form of parallelism, and is particularly useful in that it is present in the many Prolog applications where several alternatives need to be considered. Or-parallelism has been exploited successfully in several systems, and especially in the Aurora and Muse systems. In this paper we analyze the portability of these two parallel systems onto a commercial shared memory parallel computer, a Sun SPARCcenter 2000 with 8 processors, running the Solaris 2.2 Operating System. We also analyze both systems' performance for classical benchmark programs and for two large Prolog applications.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 345-349"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00042-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73563687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Columnwise block LU factorization using blas kernels on VAX 6520/2VP 在VAX 6520/2VP上使用blas内核的列式块LU分解
Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00049-6
Paulo B. Vasconcelos , Filomena D. D'Almeida

The LU factorization of a matrix A is a widely used algorithm, for instance in the solution of linear systems Ax = b. The increasing capacities of high performance computers allow us to use direct methods for systems of large and dense matrices. To build portable and efficient LU codes for vector and parallel computers, this method is rewritten in block versions and BLAS (Basic Linear Algebra Subprograms) kernels are used to mask the architectural details and allow good performance of codes such as the LAPACK (Linear Algebra PACKage) library. In the references it was proved that this strategy leads to portability and efficiency of codes using tuned BLAS kernels. After a short description of the block versions we will present some results obtained on the VAX 6520/2VP, comparing the block algorithm versus point algorithm, and vectorized versions versus scalar versions. The three columnwise versions of the block algorithm showed similar performance for this computer and large matrix dimensions. The block size used is a crucial parameter for these algorithms and the results show that the best performance is obtained with block size 64 (for large matrices) which is the vector registered size of the machine used.

矩阵a的LU分解是一种广泛使用的算法,例如在线性系统Ax = b的解中。高性能计算机的容量不断增加,使我们能够对大型和密集矩阵系统使用直接方法。为了为矢量和并行计算机构建可移植和高效的LU代码,该方法被重写为块版本,并使用BLAS(基本线性代数子程序)内核来掩盖体系结构细节,并允许诸如LAPACK(线性代数包)库之类的代码具有良好的性能。在参考文献中证明了这种策略可以使用调优的BLAS内核提高代码的可移植性和效率。在对块版本的简短描述之后,我们将展示在VAX 6520/2VP上获得的一些结果,比较块算法与点算法,矢量化版本与标量版本。对于这台计算机和大的矩阵维数,块算法的三个列式版本显示出相似的性能。使用的块大小是这些算法的关键参数,结果表明,块大小为64(对于大矩阵)时获得了最佳性能,这是所使用的机器的向量注册大小。
{"title":"Columnwise block LU factorization using blas kernels on VAX 6520/2VP","authors":"Paulo B. Vasconcelos ,&nbsp;Filomena D. D'Almeida","doi":"10.1016/0956-0521(95)00049-6","DOIUrl":"10.1016/0956-0521(95)00049-6","url":null,"abstract":"<div><p>The LU factorization of a matrix <em>A</em> is a widely used algorithm, for instance in the solution of linear systems <em>Ax</em> = <em>b</em>. The increasing capacities of high performance computers allow us to use direct methods for systems of large and dense matrices. To build portable and efficient LU codes for vector and parallel computers, this method is rewritten in block versions and BLAS (Basic Linear Algebra Subprograms) kernels are used to mask the architectural details and allow good performance of codes such as the LAPACK (Linear Algebra PACKage) library. In the references it was proved that this strategy leads to portability and efficiency of codes using tuned BLAS kernels. After a short description of the block versions we will present some results obtained on the VAX 6520/2VP, comparing the block algorithm versus point algorithm, and vectorized versions versus scalar versions. The three columnwise versions of the block algorithm showed similar performance for this computer and large matrix dimensions. The block size used is a crucial parameter for these algorithms and the results show that the best performance is obtained with block size 64 (for large matrices) which is the vector registered size of the machine used.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 423-429"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00049-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85349666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computing Systems in Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1