Microprocessing and Microprogramming最新文献

英文中文

Deriving structured parallel implementations for numerical methods 推导数值方法的结构化并行实现

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(96)00007-5

Thomas Rauber, Gudula Rünger

The numerical solution of differential equations is an important problem in the natural sciences and engineering. But the computational effort to find a solution with the desired accuracy is usually quite large. This suggests the use of powerful parallel machines which often use a distributed memory organization. In this article, we present a parallel programming methodology to derive structured parallel implementations of numerical methods that exhibit two levels of potential parallelism, a coarse-grain method parallelism and a medium grain parallelism on data or systems. The derivation process is subdivided into three stages: The first stage identifies the potential for parallelism in the numerical method, the second stage fixes the implementation decisions for a parallel program and the third stage derives the parallel implementation for a specific parallel machine. The derivation process is supported by a group-SPMD computational model that allows the prediction of runtimes for a specific parallel machine. This enables the programmer to test different alternatives and to implement only the most promising one. We give several examples for the derivation of parallel implementations and of the performance prediction. Experiments on an Intel iPSC/860 confirm the accuracy of the runtime predictions. The parallel programming methodology separates the software issues from the architectural details, enables the design of well-structured, reusable and portable software and supplies a formal basis for automatic support.

微分方程的数值解是自然科学和工程中的一个重要问题。但是，要找到具有所需精度的解决方案，计算工作量通常相当大。这建议使用功能强大的并行机器，这些机器通常使用分布式内存组织。在本文中，我们提出了一种并行编程方法来推导数值方法的结构化并行实现，这些方法表现出两种潜在的并行性，即数据或系统上的粗粒度方法并行性和中粒度并行性。推导过程分为三个阶段:第一阶段确定数值方法的并行性潜力，第二阶段确定并行程序的实现决策，第三阶段推导特定并行机的并行实现。推导过程由一个组- spmd计算模型支持，该模型允许预测特定并行机的运行时间。这使程序员能够测试不同的备选方案，并只实现最有希望的方案。我们给出了并行实现的推导和性能预测的几个例子。在Intel iPSC/860上的实验证实了运行时预测的准确性。并行编程方法将软件问题从架构细节中分离出来，使设计结构良好、可重用和可移植的软件成为可能，并为自动支持提供正式的基础。

{"title":"Deriving structured parallel implementations for numerical methods","authors":"Thomas Rauber, Gudula Rünger","doi":"10.1016/0165-6074(96)00007-5","DOIUrl":"10.1016/0165-6074(96)00007-5","url":null,"abstract":"<div><p>The numerical solution of differential equations is an important problem in the natural sciences and engineering. But the computational effort to find a solution with the desired accuracy is usually quite large. This suggests the use of powerful parallel machines which often use a distributed memory organization. In this article, we present a parallel programming methodology to derive structured parallel implementations of numerical methods that exhibit two levels of potential parallelism, a coarse-grain method parallelism and a medium grain parallelism on data or systems. The derivation process is subdivided into three stages: The first stage identifies the potential for parallelism in the numerical method, the second stage fixes the implementation decisions for a parallel program and the third stage derives the parallel implementation for a specific parallel machine. The derivation process is supported by a group-SPMD computational model that allows the prediction of runtimes for a specific parallel machine. This enables the programmer to test different alternatives and to implement only the most promising one. We give several examples for the derivation of parallel implementations and of the performance prediction. Experiments on an Intel iPSC/860 confirm the accuracy of the runtime predictions. The parallel programming methodology separates the software issues from the architectural details, enables the design of well-structured, reusable and portable software and supplies a formal basis for automatic support.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 589-608"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00007-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132860160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Performance evaluation and optimization in low-cost cellular SIMD systems 低成本蜂窝SIMD系统的性能评估与优化

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(96)00008-7

Alberto Broggi , Francesco Gregoretti

Low-cost massively parallel architectures are generally characterized by a number of processors which is often far lower that the size of the data set, and by a limited amount of memory owned by each Processing Element. As a consequence, low-cost mesh-connected architectures can utilize only a specific processor virtualization mechanism which is based on the sequential scanning of the data set stored in an external memory. As a consequence of this virtualization mechanism, applications must be developed according to some precise criteria. This paper presents the optimization of some key parameters for the improvement of system performance. These optimizations are validated through an image processing case study.

低成本的大规模并行架构的特点通常是处理器数量远低于数据集的大小，并且每个处理元素拥有有限的内存量。因此，低成本的网状连接架构只能利用特定的处理器虚拟化机制，该机制基于对外部存储器中存储的数据集的顺序扫描。由于这种虚拟化机制，应用程序必须根据某些精确的标准开发。为了提高系统的性能，本文对一些关键参数进行了优化。通过图像处理案例研究验证了这些优化。

引用次数: 5

Scope: An extensible interactive environment for the performance evaluation of parallel systems 范围:用于并行系统性能评估的可扩展交互环境

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(96)00003-8

Yves Arrouye

This paper presents Scope, an environment for the performance analysis of parallel systems based on the analysis of execution traces. Scope's design stresses scalability and easy extensibility. It does encourage interactive and non-linear exploration of the studied system's execution.

We first explain our motivation for developing yet another performance evaluation tool, and see what the strong points of our environment are; we then give a non-technical, high-level overview of the design of some of the most interesting features of Scope and the current realizations. This presentation ends with some perspectives on the developments and experiments that will be done in the immediate future.

本文介绍了基于执行轨迹分析的并行系统性能分析环境Scope。Scope的设计强调可伸缩性和易于扩展。它确实鼓励对所研究系统的执行进行交互式和非线性探索。我们首先解释开发另一种绩效评估工具的动机，并看看我们的环境的优点是什么;然后，我们对Scope的一些最有趣的特性的设计和当前的实现进行了非技术的、高层次的概述。本演讲以一些关于在不久的将来将进行的发展和实验的观点结束。

引用次数: 3

Calendar79 Calendar79

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/S0165-6074(96)90001-0

引用次数: 0

Modeling of optimal load balancing strategy using queueing theory 基于排队理论的最优负载均衡策略建模

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(95)00006-2

François Spies

The aim of this article is to present an original modeling of dynamic load balancing, using queueing theory and probabilities. After briefly presenting the dynamic load balancing techniques, we model the optimal strategy. We verify the analytical results by using simulation techniques. This modeling method is applicable to other strategies, incorporating a greater number of variables. The analysis of the results obtained by the optimal model allows us to progress to the elaboration of other strategies to improve load balancing efficiency.

本文的目的是利用排队理论和概率给出一个动态负载平衡的原始模型。在简要介绍了动态负载均衡技术之后，我们对最优策略进行了建模。利用仿真技术对分析结果进行了验证。这种建模方法适用于包含更多变量的其他策略。通过分析最优模型获得的结果，我们可以进一步阐述提高负载均衡效率的其他策略。

引用次数: 30

Designing parallel programs by the graphical language GRAPNEL 用图形语言GRAPNEL设计并行程序

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(96)00005-1

Péter Kacsuk, Gábor Dózsa, Tibor Fadgyas

We propose a new visual programming language, called GRAPNEL (GRAphical Process's NEt Language), for designing distributed parallel programs based on the message passing programming paradigm. GRAPNEL supports graphically the Process Group abstraction and the automatic generation of several regular process topology based on predefined topology templates. Dynamic process creation and destruction are possible but can be applied only in a well structured manner.

GRAPNEL is a hybrid language, where the communication related parts of the program are described using graphical symbols but textual descriptions are applied where they are more appropriate. The first prototype of the GRAPNEL programming environment uses the PVM as the basis of the message passing mechanism. Textual program parts can be written in standard C. Other message passing libraries (e.g. MPI) and ordinary textual languages (e.g. FORTRAN) are to be supported in the future.

我们提出了一种新的可视化编程语言，称为graphnel(图形进程的。NEt语言)，用于设计基于消息传递编程范式的分布式并行程序。GRAPNEL以图形方式支持进程组抽象，并基于预定义的拓扑模板自动生成多个常规进程拓扑。动态过程的创建和销毁是可能的，但只能以结构良好的方式应用。GRAPNEL是一种混合语言，其中程序中与通信相关的部分使用图形符号进行描述，但在更合适的地方使用文本描述。GRAPNEL编程环境的第一个原型使用PVM作为消息传递机制的基础。文本程序部分可以用标准c语言编写。其他消息传递库(例如MPI)和普通文本语言(例如FORTRAN)将在将来得到支持。

引用次数: 56

Exploiting partial replication in unbalanced parallel loop scheduling on multicomputer 利用多机不平衡并行循环调度中的部分复制

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(96)00002-6

Salvatore Orlando , Raffaele Perego

We consider the problem of scheduling parallel loops whose iterations operate on large array data structures and are characterized by highly varying execution times (unbalanced or non-uniform parallel loops). A general parallel loop implementation template for message-passing distributed-memory multiprocessors (multicomputers) is presented. Assuming that it is impossible to statically determine the distribution of the computational load on the data accessed, the template exploits a hybrid scheduling strategy. The data are partially replicated on the processor's local memories and iterations are statically scheduled until first load imbalances are detected. At this point an effective dynamic scheduling technique is adopted to move iterations among nodes holding the same data. Most of the communications needed to implement dynamic load balancing are overlapped with computations, as a very effective prefetching policy is adopted. The template scales very well, since knowing where data are replicated makes it possible to balance the load without introducing high overheads.

In the paper a formal characterization of load imbalance related to a generic problem instance is also proposed. This characterization is used to derive an analytical cost model for the template, and in particular, to tune those parameters of the template that depend on the costs related to the specific features of the target machine and the specific problem.

The template and the related cost model are validated by experiments conducted on a 128-node nCUBE 2, whose results are reported and discussed.

我们考虑调度并行循环的问题，其迭代操作在大型数组数据结构上，并且具有高度变化的执行时间(不平衡或非均匀并行循环)。提出了一种用于消息传递分布式存储多处理器(多计算机)的通用并行循环实现模板。假设不可能静态地确定所访问数据的计算负载分布，该模板利用混合调度策略。数据部分复制到处理器的本地内存中，迭代被静态调度，直到检测到第一次负载不平衡。在这一点上，采用了一种有效的动态调度技术来在持有相同数据的节点之间移动迭代。由于采用了非常有效的预取策略，实现动态负载均衡所需的大部分通信都与计算重叠。模板的可伸缩性非常好，因为知道在哪里复制数据，可以在不引入高开销的情况下平衡负载。本文还提出了一种与一般问题实例相关的负载不平衡的形式化表征。该特性用于导出模板的分析成本模型，特别是用于调整模板的那些参数，这些参数取决于与目标机器的特定特征和特定问题相关的成本。在128节点的nCUBE 2上进行了实验，验证了模板和相关的成本模型，并对实验结果进行了报告和讨论。

{"title":"Exploiting partial replication in unbalanced parallel loop scheduling on multicomputer","authors":"Salvatore Orlando , Raffaele Perego","doi":"10.1016/0165-6074(96)00002-6","DOIUrl":"10.1016/0165-6074(96)00002-6","url":null,"abstract":"<div><p>We consider the problem of scheduling parallel loops whose iterations operate on large array data structures and are characterized by highly varying execution times (<em>unbalanced or non-uniform</em> parallel loops). A general parallel loop implementation template for message-passing distributed-memory multiprocessors (<em>multicomputers</em>) is presented. Assuming that it is impossible to statically determine the distribution of the computational load on the data accessed, the template exploits a hybrid scheduling strategy. The data are partially replicated on the processor's local memories and iterations are statically scheduled until first load imbalances are detected. At this point an effective dynamic scheduling technique is adopted to move iterations among nodes holding the same data. Most of the communications needed to implement dynamic load balancing are overlapped with computations, as a very effective prefetching policy is adopted. The template scales very well, since knowing where data are replicated makes it possible to balance the load without introducing high overheads.</p><p>In the paper a formal characterization of load imbalance related to a generic problem instance is also proposed. This characterization is used to derive an analytical cost model for the template, and in particular, to tune those parameters of the template that depend on the costs related to the specific features of the target machine and the specific problem.</p><p>The template and the related cost model are validated by experiments conducted on a 128-node nCUBE 2, whose results are reported and discussed.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 645-658"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00002-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125448052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Parallel systems engineering 并行系统工程

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/S0165-6074(96)90000-9

Peter Milligan, Stephen Winter

引用次数: 0

A two-level programming strategy for distributed systems 分布式系统的两级编程策略

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(95)00032-1

D. Conde, R. Menéndez, M. González Harbour, J.A. Gregorio

In this paper we present a global approach for programming distributed multiprocessor systems. In this approach, applications are developed as a global parallel program that is independent of the particular hardware architecture, and is represented through an extended Petri net model. The building blocks for the global program are tasks that are implemented using standard programming languages. A highly automated tool is used to allocate the different tasks to processing nodes in a near-optimum way, minimizing message traffic in the interconnection network and balancing the execution workload in the different nodes. The combined use of this tool with analysis and simulation tools for Petri nets allows us to obtain information about the performance and behavior of the global program. The tool divides the original extended Petri net into several subnets that are distributed among the different nodes, and provides for the installation, execution, and monitoring of the program. An example is presented in which our programming strategy is compared to PVM, which is a widely extended software tool for the distribution of programs in a network of computers.

本文提出了一种分布式多处理器系统编程的全局方法。在这种方法中，应用程序被开发为独立于特定硬件体系结构的全局并行程序，并通过扩展的Petri网模型表示。全局程序的构建块是使用标准编程语言实现的任务。使用高度自动化的工具以接近最佳的方式将不同的任务分配给处理节点，从而最小化互连网络中的消息流量并平衡不同节点中的执行工作负载。该工具与Petri网的分析和仿真工具的结合使用使我们能够获得有关全局程序的性能和行为的信息。该工具将原始的扩展Petri网划分为几个子网，这些子网分布在不同的节点之间，并提供程序的安装、执行和监视。给出了一个示例，其中将我们的编程策略与PVM进行比较，PVM是一种广泛扩展的软件工具，用于在计算机网络中分发程序。

{"title":"A two-level programming strategy for distributed systems","authors":"D. Conde, R. Menéndez, M. González Harbour, J.A. Gregorio","doi":"10.1016/0165-6074(95)00032-1","DOIUrl":"10.1016/0165-6074(95)00032-1","url":null,"abstract":"<div><p>In this paper we present a global approach for programming distributed multiprocessor systems. In this approach, applications are developed as a global parallel program that is independent of the particular hardware architecture, and is represented through an extended Petri net model. The building blocks for the global program are tasks that are implemented using standard programming languages. A highly automated tool is used to allocate the different tasks to processing nodes in a near-optimum way, minimizing message traffic in the interconnection network and balancing the execution workload in the different nodes. The combined use of this tool with analysis and simulation tools for Petri nets allows us to obtain information about the performance and behavior of the global program. The tool divides the original extended Petri net into several subnets that are distributed among the different nodes, and provides for the installation, execution, and monitoring of the program. An example is presented in which our programming strategy is compared to PVM, which is a widely extended software tool for the distribution of programs in a network of computers.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 541-554"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00032-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134071098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From transformations to methodology in parallel program development: A case study 从并行程序开发中的转换到方法论:一个案例研究

Microprocessing and Microprogramming

Pub Date : 1996-04-01 DOI: 10.1016/0165-6074(96)00004-X

Sergei Gorlatch

The Bird-Meertens formalism (BMF) of higher-order functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We develop a parallel program for polynomial multiplication, starting with a straight-forward mathematical specification and arriving at the target processor topology together with a program for each processor of it. The development process is based on formal transformations; design decisions concerning data partitioning, processor interconnections, etc. are governed by formal type analysis and performance estimation rather than made ad hoc. The parallel target implementation is parameterized for an arbitrary number of processors; for the particular number, the target program is both time and cost-optimal. We compare our results with systolic solutions to polynomial multiplication.

列表上的高阶函数的Bird-Meertens形式化(BMF)是一个支持从功能规范中形式化推导算法的数学框架。本文报告了在并行程序开发过程中系统地使用BMF的一个案例研究结果。我们开发了一个多项式乘法的并行程序，从一个直接的数学规范开始，到达目标处理器拓扑以及它的每个处理器的程序。开发过程以形式转换为基础;有关数据分区、处理器互连等的设计决策是由正式的类型分析和性能评估决定的，而不是临时制定的。对任意数量的处理器参数化并行目标实现;对于特定的数字，目标方案是时间和成本最优的。我们将我们的结果与多项式乘法的收缩解进行比较。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Microprocessing and Microprogramming

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀